5 interesting things (8/6/2020)

DBScan practitioners guide – DBScan is a density-based clustering method. One important advantage comparing to K-means is DBScan’s ability to identify noise and outliers. I feel that  DBScan is often under-estimated. See this guide to learn more on how DBScan works, how to choose hyperparameters and more.


Fourier Transform for Data Science – When I was in undergrad school I learned FFT out of context, it was just an algorithm in the textbook and I didn’t understand what it was good for. Later I was asked about it in an oral in grad school and was able to mumble something. Much later I tried to pull some analysis on ECG waves and then I finally understood what it was about.Read this post if you want to demystify Fourier transform. 


Bonus – OpenCV tutorial on Fourier Transform


Dataset shift – dataset shift happens when the test set and the train set come from different distributions. There are multiple expressions of this phenomenon, such as covariate shift, concept shift, prior distribution shift. I believe that every data scientist working in the industry came across at least one of those manifestations. This post provides a very good introduction to the topic and useful links if you want to delve.


A Practical Framework for AI Adoption: A Five-Step Process – having several years of experience as a data scientist I have noticed that data products are often not deployed, do not meet stakeholders’ expectations, not used as the data scientist intended, etc. This post introduces a framework that tries to remedy some of those problems.


Diagrams as code – a code-based tool to draw system diagrams, possibly easier than fighting draw.io. It contains many icons including several cloud providers (AWS, GCP, Azure, etc). common servicers (K8S, Elastic, spark), etc. All in all, seems very promising.