The Shogun Machine Learning Toolbox
Building, bug tracing and deployment is done using – http://buildbot.net/.
Learning Chess from Data
EuroPython talk I gave today. Joint work with N.M.
Talk is here – https://ep2014.europython.eu/en/schedule/sessions/18/
Slides and code – https://github.com/nivm/learningchess
Log everything with logstash and elasticsearch
EuroPython talk by: Peter Hoffmann, @peterhoffmann
Another very good talk by an experienced speaker. However the name is kind of misleading. Yes – logstash and elasticsearch were mentioned however to main concept of the talk was really logging chain and centralized logging while logstash and elasticsearch are two tools along this chain and there are some alternatives in every step of this chain.
This talk gave me a lot to think about the logging on the company aspect and how should they run, monitor, etc. Also there is some tension to solve \ define about what is logged and what error message \ outputs an app should provide (“logging best practices”).
The video is already available in the EuroPython site and I hope that the slides would be available too soon.
Full Stack Python
EuroPython talk by: Matt Makai, @mattmakai
5 interesting things (18/7/2014)
A\B Testing – No need to say that A\B is a very hot buzz word in the industry. A\B tests were used long ago in psychology but today there are much more accessible and easy to set. The following series of posts (one is still not publish) describe 5 simple guide lines for A\B testing.
http://sl8r000.github.io/ab_testing_statistics/
Shellify – decorator which turns Python models into shell. I don’t see an everyday usage to it, maybe on developing but it has a high coolness factor which is important as well.
https://bitbucket.org/johannestaas/shellify
Code review without your glasses – I like this post because in my humble opinion it is very creative, beyond the box. It is also a good reminder about about what to notice in code review.
http://robertheaton.com/2014/06/20/code-review-without-your-eyes/
Side kick – the last link reminded me of rubber duck debugging and on the way I found this – http://rubberduckreview.com/
Deployment academy – a series of posts by Rainforest in their blog. This time a post about “zero downtime database migrations”. The post is simple and easy to understand and answer a common issues in the life of a developer. Of course, practices and specific problems differ from one organization to another but the core ideas are as is.
https://blog.rainforestqa.com/2014-06-27-zero-downtime-database-migrations/
Awesome *
Are we going back?
In the past week I have bumped into two github repositories such as awesome-python and awesome-sysadmin. Both repositories do a great job and compose and interesting list \ index of relevant tools.
However, this made me feel that we are going back. Those indices reminded me the pre-search-engines days or shell I say the BME days (before modern era ;). Specifically this reminded me of Alta Vista (but also other indices sites – do you recall Lycos?) where all the links where indexed under some category and sub categories and one should have dig in those categories to find what he looked for.
Aren’t the search engines today strong enough to answer the query “python machine learning package”?. Is human indexing really our resort? I don’t really think so. I believe in the power of the human behind the machine and their ability to build a good enough searching engines. Those indices can be very useful but I would rather have them built automatically and not manually.
5 interesting things (23/6/2014)
Celery best practices – python celery package is on my “todo list” once I have a relevant task. However, some of the thoughts and ideas which are talked about in this post are more general regarding software engineering.
https://denibertovic.com/posts/celery-best-practices/
Keep calm and learn d3.js – d3.js is also something which is on my “todo list”. I used it randomly here-and-there but would like to do it better (and also to gain some deeper experience with JS). However, this is a very nice tutorial to start with d3.js.
http://slides.com/kentenglish/getting-started-with-d3-js-using-the-toronto-parking-ticket-data
http://cjauvin.blogspot.ca/2014/06/dbscan-blues.html
Machine learning in Airbnb – This post is a case study of machine learning as it is used in Airbnb. I like reading such posts because it is interesting to learn about the challenges other organizations face and about their solutions and how they use existing tools and packages and adjust and optimize them to their need (in this post – scikit learn and re-writing R export function in c++).
http://nerds.airbnb.com/architecting-machine-learning-system-risk/
Current events – world cup predictions – as I love both sports and machine learning, statistics and so forth there are several posts which try to employ ML techniques to predict world cup results. The most interesting post I read so far is Nate Silver’s post. It takes many features into account and explain them clearly.
http://fivethirtyeight.com/features/its-brazils-world-cup-to-lose/
Andrew Ng and PyStruct meetup
Yesterday I attended the “Andrew Ng and pyStruct” meetup.
http://www.meetup.com/berlin-machine-learning/events/179264562/
I was lucky enough to get a place to the meetup due to the Germany-Portugal game that happened on the same time 🙂
The first part by Andrew Ng was a video meetup joint to 3 locations – Paris, Zurich and Berlin. Andrew Ng is a co-founder of Coursera and a Machine Learning guru. He teaches the ML course in Coursera which is one of the most popular courses in Coursera (took it myself and it is a very good and structured introduction to machine learning, new session started yesterday). He teaches in Standford and soon he will be leaving to Baidu research.
The talk included 15-20 minutes of introduction to deep learning, recent results, applications and challenges. He mainly focused on scaling up deep learning algorithms for using billions features \ properties. The rest of the talk was question answering mostly regarding the theoretical aspects of deep learning, future challenges, etc. For me one of the most important things he said was “innovation is a result of team work”.
Some known applications of deep learning is – speak recognition, image processing, etc.
In the end he suggested taking Stanford deep learning tutorial – http://deeplearning.stanford.edu/tutorial/
T.R – there are currently 2 python packages I know which deal with deep learning –
The next talk was given by Andreas Mueller. You can find his slides here.
Muelller introduced structured prediction which is a natural extension or a generalization of regression problems to a structured output rather than just a number \ class . Structured learning has advantage over other algorithms of supervised learning as it can learn several properties at once and use the correlations between those properties.
Example – costumers data, several properties of costumers – gender, marriage status, has children, owns a car, etc. One can guess that married and has children properties are highly correlated and that when learning about those properties together there is a better chance of getting good results. It is better than LDA in the sense that it has less classes (not every combination of the variables is a class) and it requires less training data.
Other examples include pixel classification – classifying each pixel to an object in the image and OCR, etc.
He then talked about PyStruct – a python package for structured prediction. Actually not much to add that is not written in the documents.
5 interesting things (14/6/2014)
My little helper – search engine for code examples and use cases. I haven’t yet tried that “live” when I needed something, but I hope I will remember it next time.
Httpie and Percol – two unrelated tools but I see a lot of similarity between them as they try to change the common way we do things on command line. Http try to make curl requests more human understandable and percol which try to make filtering using piping more interactive. Reminds me a bit of edinting in sublime.
https://github.com/jakubroztocil/httpie
https://github.com/mooz/percol
Python 3 is good for you – A bit long but very interesting. Overview 10 features which are new in Python 3. There were recently a lot of posts around the web discussing whether Python 3 is better than Python 2.x, whether Python 3 should be rolled back and buried forever etc. This is one of the most informative posts (although it could be summarized and shorter) I have read. I think that one of the main reasons organization don’t currently move to Python 3 beside the fact the people and organization don’t love changes is because it is an expensive process (mainly compatibility) and even this post does not succeed in convincing with its’ added value.
http://asmeurer.github.io/python3-presentation/slides.html#1
Are we humans or are we dancers, sorry computers –
http://wired.com/2014/01/how-to-hack-okcupid/all
Kernel tricks – very clear post about kernel trick which also make clear additional machine learning terminology and the examples are very good. I would say that this is a very good post for beginners-intermediates in Machine Learning. Going the extra mile would be writing something similar about PCA has it has a lot of similar ideas.
http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html