Five interesting things (25/10/2015)

Whatsapp CLI – control you server using Whatsapp. The next step for me is to connect my home (heating, lights, doors) and control it via Whatsapp. To be honest I assume that some such implementations exists.

Receipt parser – A project which was created as part of Trivago Hackathon. I actually thought about the need of this product many times – first for personal accounting, keep an eye on my spending. Second for taking it to the next steps – alerts about things I should buy, alerts for buying things I don’t need and alerts for cheaper prices.

Time magazine visual trends – good ideas are priceless and when the implementation is also clean, nice and reveals interesting insights it is even more exciting.

Google spreadsheet to ElasticSearch – to tell the truth, I could not think of a worse architecture than to use Google spreadsheet as a database (or a proxy in the way to a DB). Having said that, Elastic release a google spreadsheet plugin to import spreadsheet content to ElasticSearch instance.

Snakefooding python – snakefood is a tool to create graph dependencies. This post show the dependency graphs for some very common python libraries (Flask, django, Celery, requests, etc.). It presents the pros and cons of using snakefood, what it exposes and what it does not expose (many small files -> many imports -> complex dependency graph vs one file, spaghetti code -> no imports -> very clear simple graph). I find it as a tool that supports developing and detecting non necessary dependencies.


AWS loft Berlin

This week Amazon opened a loft in Berlin which is suppose to be open for 4 weeks. The loft is currently on pilot and there are several other lofts around the world. I think it is a very good strategic call to have it in Berlin as the startup is a emerging and many people want to try and learn more about cloud services while the hands-on experience is sometimes limited.

So what is going on in the loft? It is open everyday 10-18 and there is amazon employee “in duty” which you can consult with regarding AWS services. A very inviting work space. And workshops, demos, bootcamp, etc. All, of course, related to AWS services.

I took part in two workshops on Thursday morning – “An overview of Hadoop & Spark, using Amazon Elastic MapReduce” and “Processing streams of data with Amazon Kinesis (and other tools)“. Both lectures were given by Michael Hanisch, solution architect at Amazon. The first talk was a bit messy as it covered many topics but eventually ended up jumping here and there between general things about Hadoop, tips about EMR and changes in the AMI concepts and versioning and clues about Spark.

The second talk was much more focused. It started by introducing the need to Amazon Kinesis. Then explaining the architecture – producer, streams, shards, clients, clearing up the capabilities and constraints also mentioning kinesis autoscaling utils.. The next step was a deeper dive to kinesis producer library and kinesis client library. Moving forward to kinesis firehose (which was introduced in the re:invent last week) and integration with additional input and output sources and aws services. To sum up the talk ended with tips and best practices. AWS Lambda​ was also mentioned several times over the talk as a tool to process stream data.

Quite exciting time to be in Berlin.

5 interesting things (06/10/2015)

Restaurants recommendations – I read quite a lot about recommendation systems lately and I love this post because it talks about a restaurant domain while many of the posts related to recommendation system refers to music, television and movies. And geospatial features are very important here comparing to movie, music and television recommendations.

Data Science workflow – this blog post present a well structured approach to data science process. While I loved the structured approach the entry point – “get data, have fun with it” is twisted. I believe that when working on a product most of the time you will want to solve a problem or introduce a new feature, i.e. you already have the question you want to answer rather than explore a dataset and think about the questions you can answer with it. Also missed part of documenting you work.

AWS in plain English – or AWS for humans. If you are not that yet or not familiar with the different services this is a nice way to introduce the terminology.

Fedrer vs Djokovic – why not Nadal? it seem kind of abuse in the product but for sure it exposes the product and make some buzz.

What to do with small data – big data is one of the buzz words in the last few years. While it was previously whispered only by tech people it is now a common, well known phrase. But, many companies do not really have big data, they have few users and need to perform well for those users and one day they might have big data, many users and tons of features. Until then, there are some clues in this post.