5 interesting things (05/09/2015)

Time map visualization of discrete event – Very good idea for visualization of discrete events when the order of events is not important but rather the general pattern. Good to visualize time between failures, time between visits of users in site \ user actions, etc.

https://districtdatalabs.silvrback.com/time-maps-visualizing-discrete-events-across-many-timescales

Cyber attacks map – so cool

http://map.norsecorp.com/

Why different people think different – the actual title of this post is “Why a Mathematician, Statistician, & Machine Learner Solve the Same Problem Differently” but I think it misses in many aspects. First, the comparison is a bit shallow ignoring non parametric statistics, machine learning models with hyper-paremeters, etc.  Machine does make assumptions on the data – by choosing the features you use (even if they’ll eventually assigned with weight of 0) you make assumption about the data. Moreover, choosing the model, kernel etc. assumes something regarding the features’ distribution.

Researchers, data scientist, statisticians, people think differently. Some of them tend to use tools they know and worked for them before, some of them want to use new tools and ideas. I believe the people with same education (ML, statisticians, etc) but from different sources \ institutes will also have different approaches no matching the theorem in this post.

http://www.galvanize.com/blog/2015/08/26/why-a-mathematician-statistician-machine-learner-solve-the-same-problem-differently-2

Spreadsheets are graphs? I like this post because it presents some fresh spirit to a painful problem we all experience – sharing documents \ [reserving knowledge. In almost every organizations there are all kind of documents (spreadsheets, word documents, presentations, RFPs, etc.) but it is almost never connected among them and almost always a mess. So this is another angle too look on this problem.

http://neo4j.com/blog/spreadsheets-are-graphs-too/

Cross validation done wrong – things that are clear when thinking about them but we don’t usually spend time thinking on them. The bottom line is always isolate completely between your training set to the cross validation and test sets.

http://www.alfredo.motta.name/cross-validation-done-wrong/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s