5 interesting things (04/12/2020)

How to set compensation using commonsense principles – yet another artwork by Erik Bernhardsson. I like his analytics approach and the way he models his ideas. His manifest regarding compensation systems (Good/bad compensation systems) is brilliant. I believe most of us agree with him while he put it into words. His modeling has some drawbacks that he is aware of. For example, assuming certainty in employee productivity, almost perfect knowledge of the market. Yet, it is totally worth your time.

https://erikbern.com/2020/06/08/how-to-set-compensation-using-commonsense-principles.html

7 Over Sampling techniques to handle Imbalanced Data – imbalanced data is a common real world scenario, specifically in healthcare where most of the patients don’t have a certain condition one is looking for. Over-sampling is a method to handle imbalanced data, this post describes several techniques to handle it. Interestingly, at least in this specific example, most of the techniques do not bring significant improvement. I would therefore compare several techniques and won’t just try one of them. 
https://towardsdatascience.com/7-over-sampling-techniques-to-handle-imbalanced-data-ec51c8db349f

This post uses a the following package which I didn’t know before (it would be great if it could become part of scikit-learn) – https://imbalanced-learn.readthedocs.io/en/stable/index.html

It would be nice to see a similar post fo downsampling techniques.


Python’s do’s and don’t do – very nicely and written with good examples – 
https://towardsdatascience.com/10-quick-and-clean-coding-hacks-in-python-1ccb16aa571b

Every Complex DataFrame Manipulation, Explained & Visualized Intuitively – can’t remember how pandas function work? great, you are not alone. You can use this guide to quickly remind you how melt, explode, pivot and others work.
https://medium.com/analytics-vidhya/every-dataframe-manipulation-explained-visualized-intuitively-dbeea7a5529e

Causal Inference that’s not A/B Testing: Theory & Practical Guide – Causality is often overlooked in the industry. Many times you developed a model that is “good enough” and move on. However, this might increase bias and lead to unfavourable results. This post suggests a hands-on approach to causality accompanied by code samples.

https://towardsdatascience.com/causal-inference-thats-not-a-b-testing-theory-practical-guide-f3c824ac9ed2

5 interesting things (11/3/2020)

Never attribute to stupidity that which is adequately explained by opportunity cost – if you have a capacity for only 10 words, those are the 10 words you should take – “prioritization is the most value creating activity in any company”. One general, I really like Erik Bernhardsson writing and ideas, I find his posts insightful.

Causal inference tutorials thread – check this thread if you are interested in causal inference.
The (Real) 11 Reasons I Don’t Hire you – I follow Charity Major’s blog since I heard here in ״Distributed Matters” in 2015. Besides being a very talented engineer, she also writes about being a founder and manager. Hiring and is hard for both sides and although it is hard to believe it is not always personal and a candidate has to be a good match for the presented company, for the future company, to the team. The company would like to have exciting and challenging tasks for the candidate so she will be happy and grow in the company, And of course, we are all human and make mistakes from time to time. Read Charity’s post in order to examine the not hiring decision from the employer’s point of view.

The 22 Most-Used Python Packages in the World – an analysis of the most downloaded Python packages on PyPI over the past 365 days. Some of the results are surprising – I expected pip to be the most used package and it is only the fourth after urllib3, six and boto core, and requests to be ranked a bit higher. Some of the packages are internals we are not even aware of such as idna and pyasn1. Interesting reflection.

Time-based cross-validation – it might seem obvious when reading but there are few things to notice when training a model that has time dependency. This post also includes Python code to support the suggested solutions.