Authors: David Tsurel, Dan Pelleg, Ido Guy, Dafna Shahaf Article can be found here Trivia facts can drive users engagement, But what are trivia fact? Is the fact “Barack Obama is part of the Obama family” a trivia fact? Is the fact “Barack Obama is Grammy Award winner” a trivia fact? This paper tackle the […]Read more "Fun Facts: Automatic Trivia Fact Extraction from Wikipedia"
I run Spark code on Java. I had data with the following schema – And I wanted to get a single record for a user which has the following schema – Attached the user defined aggregation function I wrote to achieve it. Before that –Read more "Map Spark UDAF (Java)"
Code challenges are a common tool to evaluate candidate ability to develop software. Of course there are other indicators such as – blog posting, open source involvement, github repository, personal recommendations, etc. Yet, code challenges are frequently used. I recently got to check some code challenges and was surprised from some of the things I found […]Read more "Code Challenges Anti-Patterns"
Recently Stack Overflow published few posts comparing the usage of Stack Overflow between different segments \ scenarios: How Do Students Use Stack Overflow? What Programming Languages Are Used Most on Weekends? Women in the 2016 Stack Overflow Survey Few comments regarding those posts – How Do Students Use Stack Overflow? “R and MATLAB are pretty […]Read more "SO end of year surveys"
TL;DR – Yet another clustering evaluation metric Davies-Bouldin index was suggested by David L. Davies and Donald W. Bouldin in “A Cluster Separation Measure” (IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227. doi:10.1109/TPAMI.1979.4766909, full pdf) Just like Silhouette score, Calinski-Harabasz index and Dunn index, Davies-Bouldin index provide an internal evaluation schema. I.e. the […]Read more "Davies-Bouldin Index"
Super Mario from Microsoft (Daniel Molnar) – Data Janitor 101, one of the best reasoned talks I heard for a long time. Andrew Clegg, data scientist @ Etsy gave an historic review on Semantic Similarity and Taxonomic Distance and how it is used in Etsy. Slides are here. Topic Modeling on Github repositories – presented […]Read more "5 Berlin Data Native 2016 Highlights"