I’ve said it before and I’ll say it again: interaction is one of the key underrated topics in statistics.
I thought about this today (OK, a couple months ago, what with our delay) when reading a post by Dan Kopf on the exaggeration of small truths. Or, to put it another way, statistically significant but not practically significant.
I’ll give you Kopf’s story and then explain how everything falls into place when we think about interactions.
Immediately after an album is released, the critics descend. The period from the first to last major review for an albums typically falls between 1-6 months. As time passes, the average review gets slightly worse. Assuming my methodology is appropriate and the data is accurate and representative, this is very likely a statistical truth.
But is this interesting? . . .
My result about album reviews worsening over the review periods is “statistically significant.” The p-value is so small it risks vanishing. My initial response to the finding was excitement and to begin armchair psychologizing on what could be causing this. I even wrote an extensive article on my speculations.
But I [Kopf] was haunted by this image:
Each point is a review. Red ones are above average for that album, and green ones below. . . . With so many data points, it can be difficult for the human eye to determine correlation, but your eyes don’t deceive you. There is not much going on here. Only 1% of the variation of an album’s rating is explained by knowing when in the order of reviews an album fell. . . .
Is 1% worth considering? It depends on the subject matter, but in this case, its probably not. When you combine the large sample sizes that come with big data and the speed of modern computing, it is relatively easy to find patterns in data that are statistically significant . . . . But many of these patterns will be uninteresting and/or meaningless for decision making. . . .
But here’s the deal. What does it mean for the pattern to be tiny but still statistically significant? There are lots of albums that get reviewed. Each set of reviews has a time trend. Some trends go up, some go down. Is the average trend positive or negative? Who cares? The average trend is a mixture of + and – trends, and whether the avg is + or – for any given year depends on the population of albums for that year.
So I think the answer is the secret weapon (or, to do it more efficiently, a hierarchical model). Slice the data a bunch of ways. If the trend is negative for every album, or for 90% of albums, then this is notable, if puzzling: how exactly would that be, that the trend is almost always negative, but the aggregate pattern is so weak?
More likely, the trend is positive for some, negative for others, and you could try to understand that variation.
The key is to escape from the trap of trying to estimate a single parameter. Also to point out the near-meaninglessness of statistical significance the context of varying patterns.