But it takes 11 minutes to read the tutorial…
But it takes 11 minutes to read the tutorial…
How to Grow Neat Software Architecture out of Jupyter Notebooks – jupyter notebooks is a very common tool used by data scientist. However, the gap between this code to production or to reusing it is sometimes big. How can we over come this gap? See some ideas in this post.
High-performance medicine: the convergence of human and artificial intelligence – a very extensive survey of machine learning use cases in healthcare.
New Method for Compressing Neural Networks Better Preserves Accuracy – a paper by Amazon Alexa team (mainly). Deep learning models can be huge and the incentive of compressing them is clear, this paper show how to compress the networks while not reducing the accuracy too much (1% vs 3.5% of previous works). This is mainly achieved by compressing the embedding matrix using SVD.
Translating Between Statistics and Machine Learning – different paradigms sometimes use different terminology for the same ideas. This guide tries to bridge the terminology gap between statistics and machine learning.
Postmake – “A directory of the best tools and resources for your projects”. I’m not sure how best is defined but samplig few categories it seems good (e.g. development categorty is pretty messy including github, elasticsearch and sublime together). I liked the website design and the trajectory. I do miss some category of task managment ( couldn’t find Jira and any.do is not really a calender). It is at least good resource for inspiration.
Deep density networks and uncertainty in recommender systems – Yoel Zeldes and Inbar Naor from Taboola engineering team published a series of posts (4 so far) about uncertainty in models – where this uncertainty comes from, how one can explore and use this uncertainty, etc. This post series relates to a paper they present in the workshop in this year KDD conference.
Syllabus of Princeton Machine Learning for Health Care course (COS597C) given by Barbara Engelhardt. The reading list is very varied (from NLP to vision through reinforcement learning) and interesting. I definitely add at least some of those papers to my queue.
JQ cook book – I find myself using JQ quite often and sometimes to more complex things than just filtering fields.
Bonus point – list of text-based file formats and command line tools for manipulating each – https://github.com/dbohdan/structured-text-tools
Missingno – Missing data visualization module for Python. This package offers a variety of visualization to understand the missing data in your data set and the correlations between the absent of different properties.
Add time order in Recommendation system – the meaning of time order in this context is item x should be followed by item y. E.g for tv series – chapter 1 should be viewed before chapter 2. I don’t know if it is state of the art work in this domain but it is nice and should be relatively easy to implement when the prerequisite graph is unknown.
Cognitive bias are everywhere – and this time who they affect you management performance. Referring to the last 2 points in the summary – “Establish trust and openness with your peers and reports” – I’m a big believer in 1:1s, I found this resource bundle here. “Understand motivational theory, especially intrinsic motivation” – maybe the most important thing I learned being a scout leader is that every person have different motivation, you cannot lead others by what motivate you. Understanding this made a big change on how I view the world.
Want to Improve Your Productivity at Work? Take a Cooking Class – on general I really like when interdisciplinary ideas mix and this is an interesting thought about the topic. The point that was most interesting for me was “Set your mise-en-place”. I see it a bit different \ wider from the writer – as a manager you should sometimes prepare the “mise-en-place” for your team. If they need to integrate with external service – take care of the NDA, API documentation, etc. Requirements and design can also sometimes viewed as “mise-en-place” for developers.
Trivia facts can drive users engagement, But what are trivia fact?
Is the fact “Barack Obama is part of the Obama family” a trivia fact?
Is the fact “Barack Obama is Grammy Award winner” a trivia fact?
This paper tackle the problem of automatically extracting trivia facts from Wikipedia.
In this paper Tsurel et al. focussed on exploiting Wikipedia categories structure (i.e. X is a Y). Categories represent set of articles with common theme such as “Epic films based on actual events”, “Capitals in Europe”, “Empirical laws”. An article can have several categories. The main motivation to use categories and not free text is that categories are cleaner than sentences and capture the essence of the sentence better.
According to Miriam-Webster dictionary a trivia is:
The first path Tsurel et al. tested was to look for a small categories a”presumably, a small category indicates a rare and unique property if an entity, and would be an interesting trivia fact”. This path proved to be too specific e.g “Muhammad Ali is an alumni of Central High School in Louisville, Kentucky”.
[TR] As commented in the paper this fact is is not a good trivia fact because the specific high school has no importance to the reader and or to Ali’s character. But, as stated later – when coming to personalizing trivia facts there maybe readers which find this fact interesting (e.g other alumni’s of this high school).
This led Tsurel et al. to the first required property of trivia fact – surprise.
Surprise reflects how unusual the article with respect to the category. So they would like to define similarity matrix between article a and category C. A category is a set of articles therefore the similarity is defined as:
Surprise is defined as the inverse of the average similarity –
Example of results for this measure for Hedy Lamarr –
As you can see in the example above the surprise factor itself is not enough as it does not capture other aspects in Hedy Lamarr’s life (e.g. she invented radio encryption!).
Examining those categories and seeing that they are very spread led the team to the define the cohesiveness of category.
Cohesiveness of a category measures the similarity between items in the same category. Intuitively if an item is not similar to the other items in the category it might indicate that it is a trivia fact (or as mention later in the paper – detect anomalies).
Practically speaking the cohesiveness if category C is defined as the average similarity between each pair of articles in the category.
Hedy Lamarr’s results w.r.t to cohesiveness –
Tying it together
The trivia score of article a to category C is define as:
Interpret trivia score:
Standard similarities methods don’t fit this case from 2 mains reasons –
Further optimization on the computation such as caching, comparing only to subset of articles and parallel computation can be done when coming to implement this algorithm in production settings.
The authors evaluated their algorithm empirically against –
The authors crawled wikipedia and created a dataset of trivia facts for 109 articles. For each article they created a trivia fact for each algorithm. The textual format was “a is in the group C”.
Trivia Evaluation Study
Using the trivia facts above each fact was presented to 5 crown workers yielding 2180 evaluations.
The respondents were asked to agree \ disagree the facts according to the following statements or note that they don’t understand the fact:
The score of a fact was determined by the majority vote.
In this part the team used google ad to tie trivia facts to searches and analyzed the bounce rate and time on page for the collected clicks (almost 500).
Discussion and Further work
Limitation – the algorithm works well for human entities but worse on other domain such as movies and cities.
Future work –
[TR] – this would require additional notation of a good trivia question. The “good” question in this example is interesting since it involves a contrast between two categories.
Other applications –
[TR] – Improve learning experience on learning platforms by enriching the UI with trivia facts.