5 interesting things (04/06/2015)

Pitfalls when mining Wikipedia – this research was done on a bit old data (2013) but it show quite easy way to clean wikipedia dumps. Although they suggest a way to repeat the process they did, I suspect that the structure of some of the dumps was changed a bit.

An interesting question for me in this scope is the relation between content \ views of same entities cross different languages. This can be calculated by joining page count results with wikidata dumps once they are complete. Crunching wikidata dumps a bit, at the moment it does not feel mature enough and does not include the data about all the articles in wikipedia.

https://blog.lateral.io/2015/06/the-unknown-perils-of-mining-wikipedia/

Weather prediction with Amazon Machine Learning – nice, simple way to start playing with Amazon machine learning –

http://arnesund.com/2015/05/31/using-amazon-machine-learning-to-predict-the-weather/

How to evaluate machine learning models – a series of 5 posts by dato  –

http://blog.dato.com/how-to-evaluate-machine-learning-models-part-1-orientation

Mean shift clustering – clustering approach with few advantages over common methods. E.g – don’t have to define number of clusters in advance  – but you have to tune the bandwidth. On the other hand it slower than other algorithms.

http://spin.atomicobject.com/2015/05/26/mean-shift-clustering

The future of spark – post following Strata + Hadoop conference in London.
https://www.linkedin.com/pulse/future-apache-spark-rodrigo-rivera

One thought on “5 interesting things (04/06/2015)

Leave a comment