5 interesting things (30/07/2015)

Python neural network in 11 lines – the name is a bit misleading but nice DIY initiative.

Python or R
 – The bottom line for me is that both are tools and you should adjust the tool you use to the task. I personally feel more comfortable on Python and really glad that ideas, prototypes and thoughts I had can move to production using almost the same code and does not require additional work.

Python, Ruby and Golang web framework comparison
– I think that for most of the people and projects this comparison is not relevant as either they are experts \ using a specific language or the language is limited by the project environment. But for the cases it is relevant – switching between language, learning a language and POCing it is nice to have a comparison with frameworks you know, feel comfort with.
Pandas and Apache Spark data frames differences – Both Pandas and Apache Spark are very common tools for data scientist using Python. It is therefore very important to know the differences between the data frames in both tools as one may assume it might function the same.

Anti patterns
 – at first glance I thought it was a very weird idea but then going over several examples and it was actually interesting and enlightening. Some things I do automatically and haven’t thought of them for a long time, some matters of style, etc. Interesting although I was skeptic in the beginning 🙂

5 interesting things (23/07/2015)

Diversity in tech? This post describe a black woman’s experience in the Stanford Computer Science Major. As part of TOA events I went to Zalando’s diversity in tech panel last week. So I got to think about this subject for a while both from my situation – foreign women in tech, not speaking the local language and from a wider point of view. Having women in Universities (and any other population) is a necessary condition (but not sufficient)  to having diversity in tech.

Invitation to Scala
 – this post tries to make Scala less intimidating for  Scala beginners. May the force be with him (and with me)

Pyxley
 – Python powered dashboards. I’m always excited about visualization tools. It is built with Pandas data frame in mind and therefore should be relatively intuitive for data scientist which uses python.

Cloudera Ibis
 – Cloudera reveals Ibis project which is aimed to give python end-to-end pipelines specially for data scientist with in the well known PyLab eco-system (pandas, scikit-learn, scipy, etc).

Clustering check-ins with Spark and Cassandra
 – the title is self explanatory… Loading check-ins data data to Cassandra, analyzing it with Spark and visualizing it with zeppelin. All in all a reasonable data product pipeline put together beautifully

 

Database debate

As part of the Berlin Tech Open Air events I want to the “Database debate” in DC Media networks. The two sides of the debate were Simon Willnauer, lead engineer of ElasticSearch and co-founder of Elasticand Carter Page, technical lead for Google BigTable.

The talk was hosted by DC Media networks employee and was navigated really well with nice questions and interactions between Willnauer and Page. However it was not a “Database debate” at least for two reasons. ElasticSearch currently does not brand itself as a database and as admitted by Willnauer is not mature enough. Therefore it was not a debate – no pros and cons, no one against the other. But rather two solutions for different problems which both somehow relate to the buzzword “big data”.

Some expected questions were asked – use-cases, road map and future features, bug fixes, comparison to other solutions. But also less expected questions – what would you do different if rebuilding the product, pitfalls of beginners and some more technical deep dive questions asked by the audience.

One of the nice questions asked by the host was – “What is the weirdest usage you have seen to your product”. Willnauer answered – “playing chess with it using near neighbor to compute the next step”. Following Call me maybe project I though of “Call me checkmate” project – playing chess using different databases.

Overall, very nice and chill atmosphere. Although I’m not sure why it is called open air..

DatabaseDebasePic

5 interesting things (12/07/2015)

Code management tools by AWS – in the last RE:INVENT event (October 2014) Amazon said that this year they are going to focus on new tools for code management and deployment. Now they reveal those tools –

https://aws.amazon.com/blogs/aws/code-management-and-deployment/

Mail received from the closest Oak tree – I find this blig post and the whole process charming. The city of Melbourne created a technology interface to get the citizens more involved in the city life and exciting things happened. I find those interactions between the everyday life and the public sphere as one of the most fascinating challenges of the coming years – making the public sphere more accessible, smart and open.

http://www.citylab.com/tech/2015/07/when-you-give-a-tree-an-email-address/398219/

Toyplot – another python plotting library. Seems to work natively with Numpy. Still quite young – number of different possible charts is limited but I’m it will become more mature in the near future. Beside nice, interactive charts which I’m able to configure to my needs (axes, legend, colors, scale, exporting \ embedding visualizations etc) what I look for in a good plotting package is answer all the different type of charts I need. I don’t want to start juggling between several packages each for a different type. At least on this area there is always a place to grow – heat maps, geo-spatial maps, 3d, etc.

http://toyplot.readthedocs.org/

Python design patterns – I’m Tom and I’m lazy, I admit it. I think I said it before but IMHO a good software developer is a lazy one. One who automates what she can, uses existing tools and packages when available and reuses her own code. This is the main task of design patterns – solve common problems and provide best practices. And also create common language so different developers and stakeholders can communicate. This github repository collects design patterns implementations in Python.

https://github.com/faif/python-patterns

Mining twitter data with Python – a seven posts series by Marco Bonzanini. Goes through the entire process starting with getting twitter access token and ends with data visualization using d3.js and sentiment analysis.

http://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

5 interesting things (10/07/2015)

Three Useful Python Libraries for Startups – tl;dr this post suggest Whitenoise, Phonenumbers and Pdfki as 3 important packages for startups and requests and Python-dateutil as runner-ups. IMHO, those are very strange choices. I agree with choosing requests as a package that simplify http requests and which is important to infrastructure. I expected the other packages as well to relate to infrastructure. Possibly libraries I would think of – django and boto. Possibly also numpy and \ or pandas for very common statistic and analysis use cases.

http://blog.instavest.com/three-useful-python-libraries-for-startups

Trending @ Instagram – I used to work on a very similar problem to this one and facing almost the same challenges – ranking and scoring, grouping. It is always interesting to see how different people approach the same problem which I know intimately.

http://instagram-engineering.tumblr.com/post/122961624217/trending-at-instagram

Git from the inside out – version control is very important tool in the everyday life so it is nice to look into one possible implementation of it.

https://codewords.recurse.com/issues/two/git-from-the-inside-out

Document clustering with Python – simple, clear howto guide which both explain the theory lightly, examine several clustering algorithms and sums up with visualizations.

http://nbviewer.ipython.org/github/brandomr/document_cluster/blob/master/cluster_analysis_web.ipynb

Deploying python packages @ Nylas – I love such posts which explain the real life problem they faced, suggest several solution \ possible alternative and their pros and cons and show what and why they eventually choose. Specifically as I believe every python developer run onto those problems at least once (a day :))

https://nylas.com/blog/packaging-deploying-python