5 interesting things (13/12/2015)

Systemml – distributed and declarative machine learning platform. Looks like a promising project which now joined Apache Software foundation and was initially developed by IBM. I wonder how it will influence the development of Spark MLLib.


Monopoly as Markov Chain – I guess I am developing a fetish to Markov chains. Although this model is not always realistic (so as in this case..) it is amazing what we can get out of it and the approximate simulations we can create. This post simulates a Monopoly game using Markov chains and does a very interesting job. Monopoly is linear in the sense that you must move according to the dice and the choices you make are limited (buy or don’t buy). In contrast to backgammon where you also have strategy involved and therefore it is a better choice to model it with Markov chains.


Vocabulary – “Python Module to get Meanings, Synonyms and what not for a given word”. This module brands itself as an alternative to NLTK presenting data about meaning, synonyms, antonyms, part of speech, pronunciation, etc with a leaner approach and more pythonic approach. I don’t know if is as good and NLTK or will evolve there but it sure looks like an alternative worth checking.


Probability recap – If you forgot probability class from university this will probably be a good recap. However, if you work as a data scientist you probably used those daily. But code visualization, good examples and good way to share your knowledge with other colleagues.


What we talk about when we talk about distributed systems – for once a non misleading title. Thinking about the distributed systems course I took on grad school this would have been a great introduction.



5 interesting things (1/12/2015)

Improving My CLI’s Autocomplete with Markov Chains – Markov chains are the basis for many auto-complete algorithms we know and use on daily basis, e.g keyboards on mobile devices. In this case it is a developer hack to improve auto-complete in a development tool. It is always nice when theory comes to life.


10 more lessons learned from building Machine Learning systems – slides of a presentation by Xavier Amatriain, VP Engineering at Quora (previously Director Algorithms Engineering @Netflix). Very insightful presentation (I would of loved to hear the full one). The name refers to a lecture by Amatriain called “10 lessons learned from building Machine Learning systems”  exactly a year before).



Beyond One-Hot: an exploration of categorical variables – It is not all about numbers.. in many cases features are not numeric. If we are lucky – features will be binary – has \ does not have a symptom, spam \ not spam or ordinal – amount of pain a patient experiences, etc. But sometimes it is neither – e.g a state, color, etc. What then? this post compare several techniques to deal with categorical variables. While it is very basic it well explained (although examples would have helped) and it can give a great intuition for someone who faces those problems for the first time.


Bandit Algorithms for Bullying – Getting More Lunch Money – this post explains bandit algorithm using very common, easy to understand example. However, I would build the post a bit different. There is a lot of text telling the story comparing to the scientific parts. In my opinion the scientific parts should be emphasized a bit more (bold text, bullets, etc.)


C.H.I.P vs Pi zero – The new Pi zero made a lot of buzz in the last week offering a computer for 5$. However, Pi zero is not the only player in the field of sub 10$ computers. This post compares Pi zero and C.H.I.P spec and abilities. There are many comments saying the comparison is biased towards C.H.I.P (ignoring shipping costs, unfair comparison of cable costs, reputation, etc.) but overall I think it is worth reading.