Picture This: Descriptive Statistics Don't Tell the Whole Story

May 16, 2017

Here's a great example of why it is important to plot your data and visually review the result before drawing any conclusions. Remarkably, the Y and X data used to create both of these scatter plots have the same descriptive statistics, and the same (almost) correlation coefficient value (r), but each plot certainly tells a very different story.

Here's a link to the original work by Justin Matejka and George Fitzmaurice, two researchers at Autodesk, who developed an algorithm to generate the variables with matching statistics: https://www.autodeskresearch.com/publications/samestats

