5 interesting things (26/06/2019)

Checklist for debugging neural networks – well written trouble shooting for neural networks models which is not language or framework specific!
https://towardsdatascience.com/checklist-for-debugging-neural-networks-d8b2a9434f21

Why Software Projects Take Longer Than You Think A Statistical Model – great post about a problem we all face. Usually we try to solve it using “instrumental changes” – changing methods \ processes \ … . This post tries to show that there is more to it than just the behavioural change.

Google What-If-Tool (WIT) – A nice tool by Google that was released few month ago. The terminology is actually a bit misleading and counterfactuals don’t carry the meaning they have in causal inference. It is more like matching with two possible distance matrices – L1 and L2.

causallib – New python causal inference package from IBM
There is also a python causal inference package from Microsoft which was released about a year ago – https://github.com/Microsoft/dowhy.

A Visual Intro to NumPy and Data Representation – What can I say, I really like Jay’s guides –
Advertisements

4 insights from BDHW19

This week I attended BDHW19 – Big data in Health Care which was hosted by the Weizmann Institute of Science in collaboration with Nature Medicine. The conference had a great line up of speakers – leading researchers in the field from academia, industry and HMO’s.
There were few ideas and themes that were mentioned several times from different angles and I would highlight few of them.
(all sessions were recorded and I’ll add a link once they are online)

EHR data in Israel – By law, every Israeli resident must be registered with one of the HMO’s. The HMO’s in Israel are in a special position were the where they are run as non-for-profit organizations and are prohibited by law from denying any Israeli resident membership. Israelies HMO’s hold EHR data from the mid nineties which means that the biggest HMO (Clalit) have longitudinal data of over 20 years for 4.5m heterogenic patients. Together with greater researchers and collaboration with the academia this enables amazing research which hopefully later propagates and influence our daily life.
The Israeli AI Healthcare Startup Landscape of 2018 – https://www.startuphub.ai/israeli-ai-healthcare-startups-2018

Deployment of HC models – while there are great result and tools developed on research the way to deploy those models, use the new ideas is long and contains many obstacles. Only very few models really turned into health care products – alert system, treatment guide lines, bio markers, personalized medicine, etc. Few caveats in the way are interpretability, robust machine learning and causality. We must keep in mind that eventually our research should affect the end users – clinicians, patients, etc.

More on this –
Suchi Saria – “Tutorial : Safe and Reliable Machine Learning” from FAT* 2019.
Ziad Obermeyer – “Using machine learning to understand and improve physician decision making”.

Collaboration – there are many efforts done in the field by many parties and in order to get good result and to move from journals to the field we need to cooperate. We need to ask the right questions and design good RCT or emulate them correctly. We need high quality data (or at least be aware to the quality of our data) so biobanks and dataset owners and researchers need to cooperate in order to get the most of the data. In order to see if our models generalizes well we should run them on different datasets. In order to see that our models make sense from medical perspective clinicians must be part of the process. We need everyone on board.

More on this –
Rachel Ramoni – “Mine is Big ? Ours is Bigger: Million Veteran Program and the Case for Coordinated Collaboration”
Nigam Shah – “Good machine learning for better healthcare”. See also Clinical Informatics Consult.

Causality – The C word. Causal graph, counterfactuals, confounders, treatment effect.. It was present almost in every talk implicitly or explicitly.  Naturally some studies are more causal by nature such as “which drug is better”, “do X cause Y” and some need to take into account causal mechanisms, identify confounding, etc. There is a shift from prediction tasks to causal tasks.
One key insight from Hernan’s tutorial – we don’t compare treatments, we compare strategies. I.e, studies in this field should move from comparing point interventions to comparing sustained treatment strategies. Moving to treatment strategies we should to be aware to treatment confounder loop.

More on this –
Uri Shalit – “Predicting individual-level treatment effects in patients: challenges and proposed best practices”.
Miguel Hernan – “How do we learn what works? A two-step algorithm for causal inference from healthcare data” and tutorial “Comparative Effectiveness of Dynamic Treatment Strategies: The renaissance of the g-formula”.

5 interesting things (17/01/2019)

How to Grow Neat Software Architecture out of Jupyter Notebooks – jupyter notebooks is a very common tool used by data scientist. However, the gap between this code to production or to reusing it is sometimes big. How can we over come this gap? See some ideas in this post.

https://github.com/guillaume-chevalier/How-to-Grow-Neat-Software-Architecture-out-of-Jupyter-Notebooks

High-performance medicine: the convergence of human and artificial intelligence – a very extensive survey of machine learning use cases in healthcare.

https://www.nature.com/articles/s41591-018-0300-7

New Method for Compressing Neural Networks Better Preserves Accuracy – a paper by Amazon Alexa team (mainly). Deep learning models can be huge and the incentive of compressing them is clear, this paper show how to compress the networks while not reducing the accuracy too much (1% vs 3.5% of previous works). This is mainly achieved by compressing the embedding matrix using SVD.

https://developer.amazon.com/blogs/alexa/post/a7bb4a16-c86b-4019-b3f9-b0d663b87d30/new-method-for-compressing-neural-networks-better-preserves-accuracy

Translating Between Statistics and Machine Learning – different paradigms sometimes use different terminology for the same ideas. This guide tries to bridge the terminology gap between statistics and machine learning.

https://insights.sei.cmu.edu/sei_blog/2018/11/translating-between-statistics-and-machine-learning.html

Postmake – “A directory of the best tools and resources for your projects”. I’m not sure how best is defined but samplig few categories it seems good (e.g. development categorty is pretty messy including github, elasticsearch and sublime together). I liked the website design and the trajectory. I do miss some category of task managment ( couldn’t find Jira and any.do is not really a calender). It is at least good resource for inspiration.

https://postmake.io

5 interesting things (4/11/2018)

Deep density networks and uncertainty in recommender systems – Yoel Zeldes and Inbar Naor from Taboola engineering team published a series of posts (4 so far) about uncertainty in models – where this uncertainty comes from, how one can explore and use this uncertainty, etc. This post series relates to a paper they present in the workshop in this year KDD conference.

First post – https://engineering.taboola.com/using-uncertainty-interpret-model/

Decision tree visualization – this post will be part of The Mechanics of Machine Learning by Terence Parr and Jeremy Howard. The post discusses the creating of dtreeviz from several aspects – considerations regarding visualizing decisions trees, comparison to current tools, implementation details, etc. Fascinating read.

The Tale of 1001 Black Boxes – many words were already spilled about the model Amazon used trying to automate their HR system. I like this one as I believe it explains the pitfalls clearly even to someone how is not an ML professional and it tries to grow from this point.

Lessons Learned from Applying Deep Learning for NLP Without Big Data – in the last 2 years everyone are doing deep learning but to be honest one of the very common issues in the industry is not having enough labeled data and thus deep learning can not always being applied. This post suggest few techniques to overcome not having enough data for NLP tasks.

Machine Learning for Health Care course – a paper a day keeps the doctor away. Not this doctor 😉

Syllabus of Princeton Machine Learning for Health Care course (COS597C) given by Barbara Engelhardt. The reading list is very varied (from NLP to vision through reinforcement learning) and interesting. I definitely add at least some of those papers to my queue.

5 interesting things (13/08/2018)

JQ cook book – I find myself using JQ quite often and sometimes to more complex things than just filtering fields.

https://github.com/stedolan/jq/wiki/Cookbook

Bonus point – list of text-based file formats and command line tools for manipulating each – https://github.com/dbohdan/structured-text-tools

Missingno – Missing data visualization module for Python. This package offers a variety of visualization to understand the missing data in your data set and the correlations between the absent of different properties.

https://github.com/ResidentMario/missingno

Add time order in Recommendation system – the meaning of time order in this context is item x should be followed by item y. E.g for tv series – chapter 1 should be viewed before chapter 2. I don’t know if it is state of the art work in this domain but it is nice and should be relatively easy to implement when the prerequisite graph is unknown.

https://medium.com/@jujukala/how-to-use-time-order-in-machine-learning-recommendations-for-education-97091d2ab138

Cognitive bias are everywhere – and this time who they affect you management performance. Referring to the last 2 points in the summary – “Establish trust and openness with your peers and reports” – I’m a big believer in 1:1s, I found this resource bundle here. “Understand motivational theory, especially intrinsic motivation” – maybe the most important thing I learned being a scout leader is that every person have different motivation, you cannot lead others by what motivate you. Understanding this made a big change on how I view the world.

https://medium.freecodecamp.org/cognitive-bias-and-why-performance-management-is-so-hard-8852a1b874cd

Want to Improve Your Productivity at Work? Take a Cooking Class – on general I really like when interdisciplinary ideas mix and this is an interesting thought about the topic. The point that was most interesting for me was “Set your mise-en-place”. I see it a bit different \ wider from the writer – as a manager you should sometimes prepare the “mise-en-place” for your team. If they need to integrate with external service – take care of the NDA, API documentation, etc. Requirements and design can also sometimes viewed as “mise-en-place” for developers.

https://medium.com/forbes/want-to-improve-your-productivity-at-work-take-a-cooking-class-37ac08bf2f26