My summary and notes for “Detecting Data Errors: Where are we and what needs to be done?” by Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang Proceedings of the VLDB Endowment 9.12 (2016): 993-1004. Paper can be found – here In this paper […]Read more "Detecting Data Errors: Where are we and what needs to be done?"
Joel test for Data Science – the inspiration and adjustment to data science done in Domino, both were interesting reads for me. https://blog.dominodatalab.com/joel-test-data-science/ What I Wish I Knew About Data For Startups – I speciialy related to documenting and testing events tracking. I was surpsirsed about how little this topic and its best practices are discussed. http://www.jeannicholashould.com/what-I-wish-I-knew-about-data-for-startups.html […]Read more "5 interesting things (28/08/2016)"
NLP is a broad term which contains many types of question and challenges such as – language detection, Part-of-Speech tagging, relation extraction, named entity recognition, OCR, speech recognition, sentiment extraction and many more. There are of course, several Python libraries which try to tackle some of those problems. This post aims to provide a short […]Read more "5 Python NLP pacakges"
I got a diversity scholarship from Num Focus to attend the PyData Berlin event. Num Focus is an NGO which supports open source data science projects among them – Jupyter, matplotlib, Numpy, pandas etc. This post is not a summary of the events or of the talks that I attended in but rather hints to a […]Read more "PyData Berlin 2016 #pydatabln"
Similar Wikipedia Pages – This post present Wikipedia similar pages chrome extension. Phrasing this in other words it is a recommendation system for Wikipedia pages. They are not the first one to do it Wikiwand as well as Wikipedia themselves (in Beta) created such a feature. They compare the result a bit and it would have […]Read more "5 interesting things (16/05/2016)"
Spoiler detector – Get your annotated data set for free! Nice way to get (though not perfect) an annotated data set and create some social good. Would be interesting to expand it to more movies and television shows and to other social networks. Interesting to see how predictive a model which is trained on one […]Read more "5 interesting things (20/04/2016)"
My talk from Swiss Python Summit is online –Read more "Python’s Guide to the Galaxy – SPS16"