Junior Data Science — Choosing your first job

This post was publish on Medium


While there are many people who would like to become a data scientist and are looking for their first position, junior data science positions are rare. Data science positions range from very research oriented positions in companies which also publish in scientific conferences (quite rare) to positions which are more hands-on and involve lots of coding. (Junior) Data scientists also come from diverse backgrounds: recent grads (bsc, msc and PhDs in different fields), experienced developers which would like to learn new skills, retraining and so on.

While the junior data science positions are rare, it is important to make an accurate choice and avoid common pitfalls. This post was triggered by Ori Cohen’s post “Data-Science Recruitment — Why You May Be Doing It Wrong” which was oriented to the recruiting side. This post is for the data scientist who is looking for their first job. Here are few insights.

Don’t be the first data scientist in the company

This sounds like a very sexy position — you recently graduated from the university and you were able to impress a small startup with your skills. They offer you to be the first data scientist in the company, boom! You will be able to shape the methods, process and tools the right way, like you always envisioned!

״In theory, theory and practice are the same. In practice, they are not״(Benjamin Brewster).

Many practical tasks are not like in the textbook or in Andrew Ng’s course. You will most probably need guidance and advice from an experienced data scientist who already made her mistakes, is familiar with the data and with the product’s constraints and is simply more experienced. The skills you want to learn varies over time but it always a good idea always have someone around that you can learn from.

An additional issue is that small companies usually have little data, usually not enough to train models, and the data quality might also be an issue. This will require changes in the product which should be defined and implemented. As a junior data scientist it might be complicated to do both the technical part and the politics which is required for such a change.

How would you know you are interviewing for the first data science position:

  1. You will be told so explicitly — “you will be our first data scientist”
  2. None of your interviewers is a data scientist and the questions they ask don’t reflect a deep understanding of the topic.

People Don’t Quit Jobs — They Quit Bosses

And before quitting — people work for bosses.

Interviews are two-sided. The company interviews you, but you also interview the company. Does the product excites you? Do you think the company has the right values and culture fit for you? Would you like to work for this manager?

Most likely you will work closely with your manager and teammates. Did they impress you? Would you value their feedback?

In order to learn and improve, a lot of feedback and communication is required, especially when you are in a junior position. Are there regular 1:1s? Is there an on boarding plan? Do they participate in conferences \ is there an education budget? Does the company have the work-life-balance you are looking for?

During an interview, the interviewer might want to please you so if you’ll ask these questions directly they might answer what you expect to hear. Talking with teammates and other co-workers in the company can give you additional insights about the team and the company.

Tools and Technologies

If you mainly focus on research you might find this point secondary. However, for your next position, hands on experience may be required. Be sure to choose a place which uses reasonable technologies and not a niche, esoteric technologies. E.g using assembly for machine learning, working in mainframe environment, etc.

Current reasonable technology stack for data scientist includes : python (maybe scala, maybe R depends on your risk aversion) and scientific python packages (pandas, numpy, scipy, etc), cloud environment, some kind of database (postgres \ mysql \ elasticsearch \ mongodb).

Last but not least — choose something you are passionate about so you will be happy to go to work in the morning and dream about your code at night 🙂


Special thanks to Liad Pollak and Idit Cohen who made this text readable

5 interesting things (23/07/2019)

Five Talks from spaCy-IRL Worth Watching – great summarisation of 5 talks from spaCy-IRL conference which took place in Berlin in the beginning of July. The summarisations are very exact – not too deep, not too shallow and makes you want to watch the talks. From the meta perspective – a very nice connection between academia and industry leveraging ideas from academia to solve industry problems.

 

King – Man + Woman = King ? In 2016 “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” was published and showed that the pre-trained word2vec model which was trained on Google News articles exhibited gender stereotypes to, “a disturbing extent.”. Apparently, according to “Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor” at least some of the bias stems from optimisations \ restrictions done in order to present better results. Most significant one the answer to “a to b is like c to ..?” cannot be b. This does not mean that there is no bias, it only means that it was not measured and formalised correctly. This emphasises once again the need to understand the algorithms we use and their limitations.

 

 

Bonus – linear digression episode – Revisiting Biased Word Embeddings

 

10 tips for code review – code review can be a stressful task for both the reviewer and the person her work is being reviewed. This post is from the reviewer point of view, how to make this process more efficient and constructive to both sides. A good follow up post would be how to listen and reach to code review. From my experience, many times it is a boiling point for relationships inside teams and can break teams when not done correctly.

 

 

How to label data – if you ever did a data science project you know that obtaining tagged data is a real hassle. You often discover that you don’t have enough data, the tagging is not what you need, etc. This guide will help you avoid pitfalls when issuing a labelling project.

 

 

Data-Science Recruitment — Why You May Be Doing It Wrong – post by data science team lead in Zencity regarding do’s and don’t do in the interviewing process for data scientists. In the last few years I widnessed many of this flaws – asking non relevant riddles, given a very long home exercise, not well defined with doubtful data. I would like to emphasise for candidates, specially junior candidates, that if  you have doubts during the interview process consider looking for another place.