Map Spark UDAF (Java)

I run Spark code on Java. I had data with the following schema – And I wanted to get a single record for a user which has the following schema – Attached the user defined aggregation function I wrote to achieve it. Before that –

Read more "Map Spark UDAF (Java)"

Code Challenges Anti-Patterns

Code challenges are a common tool to evaluate candidate ability to develop software. Of course there are other indicators such as – blog posting, open source involvement, github repository, personal recommendations, etc. Yet, code challenges are frequently used. I recently got to check some code challenges and was surprised from some of the things I found […]

Read more "Code Challenges Anti-Patterns"

SO end of year surveys

Recently Stack Overflow published  few posts comparing the usage of Stack Overflow between different segments \ scenarios: How Do Students Use Stack Overflow? What Programming Languages Are Used Most on Weekends? Women in the 2016 Stack Overflow Survey Few comments regarding those posts – How Do Students Use Stack Overflow? “R and MATLAB are pretty […]

Read more "SO end of year surveys"

Davies-Bouldin Index

TL;DR – Yet another clustering evaluation metric Davies-Bouldin index was suggested by David L. Davies and Donald W. Bouldin in “A Cluster Separation Measure” (IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227. doi:10.1109/TPAMI.1979.4766909, full pdf) Just like Silhouette score, Calinski-Harabasz index and Dunn index, Davies-Bouldin index provide an internal evaluation schema. I.e. the […]

Read more "Davies-Bouldin Index"