Graph Databases, a little connected tour

EuroPython talk by: Francisco Fernández Castaño
 
 
The talk was classified as novice and so it was – very basic graph database ideas.
 
Castaño started by presenting the general idea of graph databases and showing some use cases.
 
One of the most known use cases – social media data, friends-of-friends.
 
Then he presented Neo4j which is graph database written in Java but originally was written in Python. Neo4j is known to be very scalable and to support ACID transactions. Another nice property about Neo4j is the ability to extend the give rest API but your own needs.
 
The next part of talked focused on the cypher language – which is the way to query the Neo4j database. Neo4j have some nice UI properties and I really missed some reference or example for that on this talk.
 
Last but not least tip given in the talk, you can try Neo4j without having to install it on your machine – http://www.graphenedb.com/ (free sand box of up to 1k nodes and 10k edges).

The Shogun Machine Learning Toolbox

EuroPython talk by: Heiko Strathmann, herrstrathmann.de
 
 
Strathmann presented the Shogun toolbox. Shogun is a Machine Learning toolbox which implement most of the common supervised and unsupervised algorithms. Shogun is implemented in C++ and interacts with Python as well as with Java, Ruby, Matlab, Lau, etc. Shogun is meant to run on single machine and no distributed.
 
Shogun is an open source project started on 2004. It has 8 core developers + 20 contributors and it is fairly active. It also has a collebaration with Google summer of code – 29 projects so far, 8 on going.
 
The C++ maybe the most significant advantage of Shogun over SciPy. It is simply faster and enables more efficient memory allocation and run time optimization and well as multi-language support . Therefore enabling training and classifying data based on larger number of samples and higher dimension.
 
The binding \ interfaces to other languages is done using – http://www.swig.org

Building, bug tracing and deployment is done using – http://buildbot.net/.

 
Summarizing – Specially when going larger Shogun seems like a good alternative to SciPy (competition makes both products better). Shogun site offers some tutorials include notebooks and demos.

Log everything with logstash and elasticsearch

EuroPython talk by: Peter Hoffmann@peterhoffmann

 
A talk by Peter Hoffmann from Blue Yonder which is one of the sponsors of the conference.

 

Another very good talk by an experienced speaker. However the name  is kind of misleading. Yes – logstash and elasticsearch were mentioned however to main concept of the talk was really logging chain and centralized logging while logstash and elasticsearch are two tools along this chain and there are some alternatives in every step of this chain.

This talk gave me a lot to think about the logging on the company aspect and how should they run, monitor, etc. Also there is some tension to solve \ define about what is logged and what error message \ outputs an app should provide (“logging best practices”).

The video is already available in the EuroPython site and I hope that the slides would be available too soon.

Full Stack Python

EuroPython talk by: Matt Makai, @mattmakai 

 
I expected this talk to be full with buzzwords and it was… but in a good sense. 

 
Makai is the builder of Full Stack Python site as such he spoke about what you need from the moment you have an idea to a python web app until you deploy including all the essential steps – wsgi server, hosting, logging etc.
 
Every layer in the site include relevant link and tutorials. Good starting point for a python web-app developer.
 
The talk was interesting due to Makai enthusiasm to teach and to share his knowledge (and of course to promote his site and that’s legit as well) and his professional knowledge. There are really few people the can speak so fluently for 25 minutes.
 

5 interesting things (18/7/2014)

A\B Testing – No need to say that A\B is a very hot buzz word in the industry. A\B tests were used long ago in psychology but today there are much more accessible and easy to set. The following series of posts (one is still not publish) describe 5 simple guide lines for A\B testing.

http://sl8r000.github.io/ab_testing_statistics/

Shellify – decorator which turns Python models into shell. I don’t see an everyday usage to it, maybe on developing but it has a high coolness factor which is important as well.

https://bitbucket.org/johannestaas/shellify

Code review without your glasses – I like this post because in my humble opinion it is very creative, beyond the box. It is also a good reminder about about what to notice in code review.

http://robertheaton.com/2014/06/20/code-review-without-your-eyes/

Side kick – the last link reminded me of rubber duck debugging and on the way I found this – http://rubberduckreview.com/

Deployment academy – a series of posts by Rainforest in their blog. This time a post about “zero downtime database migrations”. The post is simple and easy to understand and answer a common issues in the life of a developer. Of course, practices and specific problems differ from one organization to another but the core ideas are as is.

https://blog.rainforestqa.com/2014-06-27-zero-downtime-database-migrations/

Awesome *

Are we going back?

In the past week I have bumped into two github repositories such as awesome-python and awesome-sysadmin. Both repositories do a great job and compose and interesting list \ index of relevant tools.

However, this made me feel that we are going back. Those indices reminded me the pre-search-engines days or shell I say the BME days (before modern era ;). Specifically this reminded me of Alta Vista (but also other indices sites – do you recall Lycos?) where all the links where indexed under some category and sub categories and one should have dig in those categories to find what he looked for.

Aren’t the search engines today strong enough to answer the query “python machine learning package”?. Is human indexing really our resort? I don’t really think so. I believe in the power of the human behind the machine and their ability to build a good enough searching engines. Those indices can be very useful but I would rather have them built automatically and not manually.

5 interesting things (23/6/2014)

Celery best practices – python celery package is on my “todo list” once I have a relevant task. However, some of the thoughts and ideas which are talked about in this post are more general regarding software engineering. 

https://denibertovic.com/posts/celery-best-practices/

Keep calm and learn d3.js – d3.js is also something which is on my “todo list”. I used it randomly here-and-there but would like to do it better (and also to gain some deeper experience with JS). However, this is a very nice tutorial to start with d3.js.

http://slides.com/kentenglish/getting-started-with-d3-js-using-the-toronto-parking-ticket-data

DBSCAN well explained – unsupervised learning sometimes feels like no man’s land and k-means is almost always the choice when picking a clustering algorithm. This post not only does a great job by explaining the algorithm itself but it also gives great examples and show how to adjust the parameters. 

http://cjauvin.blogspot.ca/2014/06/dbscan-blues.html

Machine learning in Airbnb – This post is a case study of machine learning as it is used in Airbnb. I like reading such posts because it is interesting to learn about the challenges other organizations face and about their solutions and how they use existing tools and packages and adjust and optimize them to their need (in this post – scikit learn and re-writing R export function in c++).

http://nerds.airbnb.com/architecting-machine-learning-system-risk/

Current events – world cup predictions – as I love both sports and machine learning, statistics and so forth there are several posts which try to employ ML techniques to predict world cup results. The most interesting post I read so far is Nate Silver’s post. It takes many features into account and explain them clearly.

http://fivethirtyeight.com/features/its-brazils-world-cup-to-lose/

Andrew Ng and PyStruct meetup

Yesterday I attended the “Andrew Ng and pyStruct” meetup. 

http://www.meetup.com/berlin-machine-learning/events/179264562/

I was lucky enough to get a place to the meetup due to the Germany-Portugal game that happened on the same time 🙂

The first part by Andrew Ng was a video meetup joint to 3 locations – Paris, Zurich and Berlin. Andrew Ng is a co-founder of Coursera and a Machine Learning guru. He teaches the ML course in Coursera which is one of the most popular courses in Coursera (took it myself and it is a very good and structured introduction to machine learning, new session started yesterday). He teaches in Standford and soon he will be leaving to Baidu research.

The talk included 15-20 minutes of introduction to deep learning, recent results,  applications and challenges. He mainly focused on scaling up deep learning algorithms for using billions features \ properties. The rest of the talk was question answering mostly regarding the theoretical aspects of deep learning, future challenges, etc. For me one of the most important things he said was “innovation is a result of team work”.

Some known applications of deep learning is – speak recognition, image processing, etc.

In the end he suggested taking Stanford deep learning tutorial – http://deeplearning.stanford.edu/tutorial/

T.R – there are currently 2 python packages I know which deal with deep learning – 

 The next talk was given by Andreas Mueller. You can find his slides here.

Muelller introduced structured prediction which is a natural extension or a generalization of regression problems to a structured output rather than just a number \ class . Structured learning has advantage over other algorithms of supervised learning as it can learn several properties at once and use the correlations between those properties.

Example – costumers data, several properties of costumers – gender, marriage status, has children, owns a car, etc. One can guess that married and has children properties are highly correlated and that when learning about those properties together there is a better chance of getting good results. It is better than LDA in the sense that it has less classes (not every combination of the variables is a class) and it requires less training data.

Other examples include pixel classification – classifying each pixel to an object in the image and OCR, etc.

He then talked about PyStruct – a python package for structured prediction. Actually not much to add that is not written in the documents. 

5 interesting things (14/6/2014)

My little helper – search engine for code examples and use cases. I haven’t yet tried that “live” when I needed something, but I hope I will remember it next time.

https://sourcegraph.com/

Httpie and Percol – two unrelated tools but I see a lot of similarity between them as they try to change the common way we do things on command line. Http try to make curl requests more human understandable and percol which try to make filtering using piping more interactive. Reminds me a bit of edinting in sublime.

https://github.com/jakubroztocil/httpie

https://github.com/mooz/percol

 

Python 3 is good for you – A bit long but very interesting. Overview 10 features which are new in Python 3. There were recently a lot of posts around the web discussing whether Python 3 is better than Python 2.x, whether Python 3 should be rolled back and buried forever etc. This is one of the most informative posts (although it could be summarized and shorter) I have read. I think that one of the main reasons organization don’t currently move to Python 3 beside the fact the people and organization don’t love changes is because it is an expensive process (mainly compatibility) and  even this post does not succeed in convincing with its’ added value.

http://asmeurer.github.io/python3-presentation/slides.html#1

Are we humans or are we dancerssorry computers 

http://wired.com/2014/01/how-to-hack-okcupid/all

Kernel tricks – very clear post about kernel trick which also make clear additional machine learning terminology and the examples are very good. I would say that this is a very good post for beginners-intermediates in Machine Learning. Going the extra mile would be writing something similar about PCA has it has a lot of similar ideas.

http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html