TL;DR – Yet another clustering evaluation metric Davies-Bouldin index was suggested by David L. Davies and Donald W. Bouldin in “A Cluster Separation Measure” (IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227. doi:10.1109/TPAMI.1979.4766909, full pdf) Just like Silhouette score, Calinski-Harabasz index and Dunn index, Davies-Bouldin index provide an internal evaluation schema. I.e. the […]Read more "Davies-Bouldin Index"
Super Mario from Microsoft (Daniel Molnar) – Data Janitor 101, one of the best reasoned talks I heard for a long time. Andrew Clegg, data scientist @ Etsy gave an historic review on Semantic Similarity and Taxonomic Distance and how it is used in Etsy. Slides are here. Topic Modeling on Github repositories – presented […]Read more "5 Berlin Data Native 2016 Highlights"
From Philipp Krenn’s, Developer Advocate at Elastic, “Databases – The Choice is Yours” talk.Read more "Data Natives Berlin 2016 (1st day)"
My summary and notes for “Detecting Data Errors: Where are we and what needs to be done?” by Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang Proceedings of the VLDB Endowment 9.12 (2016): 993-1004. Paper can be found – here In this paper […]Read more "Detecting Data Errors: Where are we and what needs to be done?"
Joel test for Data Science – the inspiration and adjustment to data science done in Domino, both were interesting reads for me. https://blog.dominodatalab.com/joel-test-data-science/ What I Wish I Knew About Data For Startups – I speciialy related to documenting and testing events tracking. I was surpsirsed about how little this topic and its best practices are discussed. http://www.jeannicholashould.com/what-I-wish-I-knew-about-data-for-startups.html […]Read more "5 interesting things (28/08/2016)"
NLP is a broad term which contains many types of question and challenges such as – language detection, Part-of-Speech tagging, relation extraction, named entity recognition, OCR, speech recognition, sentiment extraction and many more. There are of course, several Python libraries which try to tackle some of those problems. This post aims to provide a short […]Read more "5 Python NLP pacakges"
I got a diversity scholarship from Num Focus to attend the PyData Berlin event. Num Focus is an NGO which supports open source data science projects among them – Jupyter, matplotlib, Numpy, pandas etc. This post is not a summary of the events or of the talks that I attended in but rather hints to a […]Read more "PyData Berlin 2016 #pydatabln"