6 interesting things (20/2/2015)

Making an exception but I really came across some interesting things –

A\A Testing – well known idea in machine learning (train, cross-validation, test) and in other research best practices now used as a take off on the buzz word term – “A\B testing” but yet well explained and can be eye opening on the right moment –
Intro to into – Easier conversion between somehow complex data types in python
Reading this I was also exposed
Getting started with Spark in Python – very very clear tutorial about all the required steps to get started. I cannot wait to find a good enough excuse to work with spark.
Fuzzy – Fast Python phonetic algorithms. Nothing new or too fancy, just came across it this week and found it useful.
Typo Distance – finds typo distance between two strings. It uses qwerty layout but you can configure different layout pretty easily. The algorithm is quite heavy and time consuming, there is some room for improvement (although it is not actively maintained). For example – adding a max parameter which stops the computation once the typo distance is higher than the allowed distance.

Topy – Python script to fix typos in text, using rule-sets developed by theRegExTypoFix project from Wikipedia. The basic rule set is an English rule set but other rule sets are also available. Trying it, I’m positive about it but it is not baked \ mature enough and I would like it to be more easy to use in code than as a command line tool.

https://github.com/intgr/topy

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s