In memory of Udy Brill
pypi analysis – this posts starts with some technical issues, e.g. how to scrape pacakge dependencies from pypi packages and goes on to showing a graph of dependencies between python packages as well as analysis on the graph structure. What we learned from this analysis – requests, django and six are among the most important packages – PageRank and connectivity degree wise. requests also leads in the”betweenness centrality” matrix. While both django and six does not appear in the top 10 results of this matrix. I believe that the reason is the both django and six are used for more specific use-cases \ applications while requests which is more general. One can see that the top-10 packages in “betweenness centrality” matrix are quite general ones (testing and setup) as well as open stack clients.
The post includes two more visualizations – adjacency matrix, which exposes existence of cliques in the graph with some details about them and degree distribution in the graph. A long tail distribution as one can easily expect. Most packages are not imported by anyone else and a few packages are imported by many other packages.
Estimate user locations in social media – in the world of targeting and online advertisment one of the challenges is to learn as much as possible about the user for better targeting. Such data includes – gender, age, marital status, field of interest and of course location. No use in advertising a shop which is 500km from the user. But, no one is perfect and so is data. We don’t always know user location and this post describe two approaches for estimating user location from social media.
10 Lessons from 10 Years of Amazon Web Services – post by Werner Vogels, Amazon CTO concluding 10 years of AWS. Although I sometimes have criticism they (together with other palyer and technology enhancements) truly changed the world. A very interesting read both for AWS users and non users of AWS.
Introduction to Boosting – TL;DR – think iterative – at each step you improve the model so it will predict well samples which were not predicted correctly so far. Really good overview on the concept of boosting, now I want to use this knowledge and play with it –
curl vs wget – a comparison of the two tools from a contributor to both projects. While (for me) there is not immediate day-to-day implication it is good to look a bit under the hood and to see the pros and cons of each for such a common tools. For me it would be easier to see it as a table.
Udy was a friend and a colleague of mine. He died in a track in New Zealand during his honeymoon. For me he had the perfect mix of curiosity, professionalism, team member and a positive person.
I’ll remember Udy in everyday moments like eating food with a lot of sauce, reading something about sorting algorithms and hearing Tracy Chapman.
This is a video we made in a company hackathon we worked together.