5 interesting things (04/12/2020)

How to set compensation using commonsense principles – yet another artwork by Erik Bernhardsson. I like his analytics approach and the way he models his ideas. His manifest regarding compensation systems (Good/bad compensation systems) is brilliant. I believe most of us agree with him while he put it into words. His modeling has some drawbacks that he is aware of. For example, assuming certainty in employee productivity, almost perfect knowledge of the market. Yet, it is totally worth your time.


7 Over Sampling techniques to handle Imbalanced Data – imbalanced data is a common real world scenario, specifically in healthcare where most of the patients don’t have a certain condition one is looking for. Over-sampling is a method to handle imbalanced data, this post describes several techniques to handle it. Interestingly, at least in this specific example, most of the techniques do not bring significant improvement. I would therefore compare several techniques and won’t just try one of them. 

This post uses a the following package which I didn’t know before (it would be great if it could become part of scikit-learn) – https://imbalanced-learn.readthedocs.io/en/stable/index.html

It would be nice to see a similar post fo downsampling techniques.

Python’s do’s and don’t do – very nicely and written with good examples – 

Every Complex DataFrame Manipulation, Explained & Visualized Intuitively – can’t remember how pandas function work? great, you are not alone. You can use this guide to quickly remind you how melt, explode, pivot and others work.

Causal Inference that’s not A/B Testing: Theory & Practical Guide – Causality is often overlooked in the industry. Many times you developed a model that is “good enough” and move on. However, this might increase bias and lead to unfavourable results. This post suggests a hands-on approach to causality accompanied by code samples.