There are many things of which teenagers are “sure”: that they will never be like their parents; that the greatest artist of all time is (insert name of band of the moment); that pretty much any clothing is appropriate to wear in public (I’m talking to you, leggings-as-pants).
Thankfully, with age comes wisdom.
It would have been impossible to convince my teenage self that I would ever, ever use algebra – but I use it all the time. Since even the most advanced statistical techniques are rooted in basic math, artfully conveying most data results only requires invoking some familiar principles that we learned in high school.
Here, we’ll discuss four of the more common advanced statistical analysis types used in market research, and suggest techniques to display the data results in a way that makes sense to most audiences.
ANOVAs – Analysis of Variance is a technique for comparing group means. For non-technical audiences, bar graphs or simple boxplots (with a labeled grand mean) effectively help viewers visualize how the data are spread within and between groups. It’s also nice to provide a companion footnote or sidebar reporting the ANOVA results, including the F-statistic and degrees of freedom.
Correlations – Correlations describe relationships between variables: if the value of A changes, does the value of B change as well? (Age and wisdom, anyone?) (Disclaimer: correlation is not the same as causation). These relationships can be positive or negative; weak or strong. For showing both direction and magnitude, matrix scatter plots do it well (remember not to connect the dots!). The correlation coefficients and data N should also be reported.
Regressions – The general idea behind regression will be familiar to anyone who took algebra: y = mx + b. In regression, we’re trying to estimate the fit between our dependent variable and the independent variable(s), which regression assumes is a linear relationship. Accordingly, simple line graphs are the best way to show regression results. When comparing the results of multiple regression models (e.g., does model A or model B or model C fit the data better?) – multiple lines with different colors or patterns are appropriate, as long as they are labeled. As a rule, the r-squared, degrees of freedom, and data N should also be provided in a companion footnote or sidebar.
Time Series – Most often, time series are used to identify patterns from one period of time and use these values to estimate or predict future events. When it comes to displaying time series data, line graphs are again the way to go. (Stephen Few at PerceptualEdge has several excellent examples of how to present time series data well). A crucial but often overlooked component in creating compelling time series graphs is in selecting the most appropriate time range – important patterns in the data can be lost if the time focus is either too narrow or too broad, and viewers can come away with very different interpretations of the data based on what time period is highlighted. Other musts: using simple labeled trend lines of different colors or textures (no stacking!), with clearly labeled time values on the x-axis. Some people also prefer to present the y-axis on the right-hand side (thus drawing attention to the most recent data point).
To cap off my series on data visualization, my next post will offer the ultimate list of do’s and don’ts for displaying data visually. Stay tuned!
Data Visualization Lesson 1: Examine the Y-Axis
Data Visualization Lesson 2: Think of Grandma
Data Visualization Lesson 3: Abela’s Rubric
Data Visualization Lesson 4: The Best Pies are Desserts
Data Visualization Lesson 6: The Ultimate List of Dos and Don’ts