Super Mario from Microsoft (Daniel Molnar) – Data Janitor 101, one of the best reasoned talks I heard for a long time. Andrew Clegg, data scientist @ Etsy gave an historic review on Semantic Similarity and Taxonomic Distance and how it is used in Etsy. Slides are here. Topic Modeling on Github repositories – presented […]Read more "5 Berlin Data Native 2016 Highlights"
From Philipp Krenn’s, Developer Advocate at Elastic, “Databases – The Choice is Yours” talk.Read more "Data Natives Berlin 2016 (1st day)"
My summary and notes for “Detecting Data Errors: Where are we and what needs to be done?” by Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang Proceedings of the VLDB Endowment 9.12 (2016): 993-1004. Paper can be found – here In this paper […]Read more "Detecting Data Errors: Where are we and what needs to be done?"