Spoiler detector – Get your annotated data set for free! Nice way to get (though not perfect) an annotated data set and create some social good. Would be interesting to expand it to more movies and television shows and to other social networks. Interesting to see how predictive a model which is trained on one social network preforms on other social networks (for this case and on general)
Deep dive to Python virtualenv – although it only addresses python requirements and not more general system requirements (e.g this case), it is a very important and common tool to make sure that all your python requirements are in place. This post provide some deeper look into virtualenv and pyenv, a python version management tool.
6 Lesser Known Python Data Analysis Libraries – TL;DR – mrjob, delorean, natsort, tinydb, prettytable and vincent. If I had to write the same blog post I am not sure that those are the packages I would have chosen. mrjob is maintained but I feel it is a bit outdated and there are better way now days to run multi-step mapreduce jobs (e.g Apache Spark). natsort – I understand the need, personally it is easier for me to write the sort function myself and even better avoid sorting as much as possible. prettytable – I prefer pandas printing over this printing. And last but not least – I am not sure that I would really categorize all those packages as “Data Analysis”.
Python datacleaner – would definitely be a strong candidate when writing a post such of the above. Designated to make the process of cleaning pandas data frames quicker and easier. Interesting to see how this project will evolve.
Deep Detect – “Open Source + Deep Learning + API + Server”. Deep learning version of PredictionIO. It is written in C++11 and uses Caffe for deep learning. It seems natural to me that predictionIO and DeepDetect will cooperate in the future or someone will develop a deep learning template for predictionIO.