Five Talks from spaCy-IRL Worth Watching – great summarisation of 5 talks from spaCy-IRL conference which took place in Berlin in the beginning of July. The summarisations are very exact – not too deep, not too shallow and makes you want to watch the talks. From the meta perspective – a very nice connection between academia and industry leveraging ideas from academia to solve industry problems.
King – Man + Woman = King ? In 2016 “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” was published and showed that the pre-trained word2vec model which was trained on Google News articles exhibited gender stereotypes to, “a disturbing extent.”. Apparently, according to “Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor” at least some of the bias stems from optimisations \ restrictions done in order to present better results. Most significant one the answer to “a to b is like c to ..?” cannot be b. This does not mean that there is no bias, it only means that it was not measured and formalised correctly. This emphasises once again the need to understand the algorithms we use and their limitations.
Bonus – linear digression episode – Revisiting Biased Word Embeddings
10 tips for code review – code review can be a stressful task for both the reviewer and the person her work is being reviewed. This post is from the reviewer point of view, how to make this process more efficient and constructive to both sides. A good follow up post would be how to listen and reach to code review. From my experience, many times it is a boiling point for relationships inside teams and can break teams when not done correctly.
How to label data – if you ever did a data science project you know that obtaining tagged data is a real hassle. You often discover that you don’t have enough data, the tagging is not what you need, etc. This guide will help you avoid pitfalls when issuing a labelling project.
Data-Science Recruitment — Why You May Be Doing It Wrong – post by data science team lead in Zencity regarding do’s and don’t do in the interviewing process for data scientists. In the last few years I widnessed many of this flaws – asking non relevant riddles, given a very long home exercise, not well defined with doubtful data. I would like to emphasise for candidates, specially junior candidates, that if you have doubts during the interview process consider looking for another place.