Earlier today, I spoke at DataTLV conference about box plots – what they expose, what they hide, and how they mislead. My slides can be found here, and the code used to generate the plots is here.
Key takeaways
- Boxplots show 5 number statistics – min, max, median, q1 and,q3.
- The flaws of Box Plots can be divided into two – data that is not present in the visualization (e.g. number of samples, distribution) and the visualization being counter-intuitive (e.g. quartiles is hard to grasp the concept).
- I choose solutions that are easy to implement, either by leveraging existing packages code or by adding small tweaks. I used plotly.
- Aside of those adjustment I many times box plot is just not the right graph for the job.
- If the statistical literacy of your audience is not well founded I would try avoiding using box plot.
Topics I didn’t talk about and worth mentioning
- Mary Eleanor Hunt Spear – data visualization specialize who pioneered the development of the bar chart and box plot. I had a slide about her but went too fast, and skipped it. See here.
- How percentiles are calculated – Several methods exist, and different Python packages use different default methods. Read more –http://jse.amstat.org/v14n3/langford.html
Resources I used to prepare the talk
- https://blog.minitab.com/en/statistics-and-quality-data-analysis/how-to-think-outside-the-boxplot
- https://www.statisticshowto.com/probability-and-statistics/descriptive-statistics/box-plot/
- https://asklexph.com/thinking-outside-the-box-plot
- https://nightingaledvs.com/ive-stopped-using-box-plots-should-you/
- https://www.greenbook.org/mr/market-research-news/replacing-boxplots-and-histograms-with-rugs-violins-and-bean-plots/
One thought on “Think outside of the Box Plot”