I got a free ticket to distributed matters conference winning the challenge of the python big data analytics meetup group. The line up was impressive (including Kyle Kingsbury from project “call me maybe”), the topics were very interesting and the location was very nice (first time I hear a talk in a disco party hall) and close to […]

Read more "#dmconf15"

5 interesting things (11/9/2015)

Density based clustering – the clearest and most practical guide I read about density based clustering. http://blog.dominodatalab.com/topology-and-density-based-clustering/ Word segment – this python library which is train with over a trillion-word corpus aims to help segment text to words. E.g “thisisatest” to “this is a test”. I tried a random example -“helloworld’ and it didn’t split it at […]

Read more "5 interesting things (11/9/2015)"

5 interesting things (05/09/2015)

Time map visualization of discrete event – Very good idea for visualization of discrete events when the order of events is not important but rather the general pattern. Good to visualize time between failures, time between visits of users in site \ user actions, etc. https://districtdatalabs.silvrback.com/time-maps-visualizing-discrete-events-across-many-timescales Cyber attacks map – so cool http://map.norsecorp.com/ Why different people […]

Read more "5 interesting things (05/09/2015)"

Apache Flink workshop

On Wednesday I took part in “Stream Processing with Apache Flink“. The workshop was hosted by Carmeq and was super generous.   Apache Flink is a distributed streaming dataflow engine. There are several obvious competitors including Apache Spark, Apache storm and MapReduce (and possible apache tez).   The main question for me when coming to adopt […]

Read more "Apache Flink workshop"