Skip to content

“It’s Always Sunny in Correlationville: Stories in Science,” or, Science should not be a game of Botticelli

There often seems to be an attitude among scientists and journal editors that, if a research team has gone to the trouble of ensuring rigor in some part of their study (whether in the design, the data collection, or the analysis, but typically rigor is associated with “p less than .05” and some random assignment or regression analysis, somewhere in the paper), that then they are allowed to speculate for free.
Story time can take over.

It’s a bit like that word game, Botticelli, where once you stump your opponent, you get a free question.

Indeed, in the science game, it often seems that story time is the goal, and the point of all the experimentation is just to amass a chit that will allow you to publish a story that you can then disseminate to the world under the imprimatur of the scientific establishment.


I got to thinking about this after Javier Benitez pointed me to this article by Nate Kornell (no relation to the famous researchers on ESP and eating behavior) subtitled, “Education studies always seem to have a happy ending. Why doesn’t education?”

Kornell writes:

Data don’t speak for themselves. I’ve been to a few painful lab meetings where a new grad student tried letting data speak for themselves. It is ugly. Even interesting data, without interpretation, are boring to the point of being pointless.

So scientists tell stories. Ask any scientist. If you don’t tell a story you don’t publish. There is nothing wrong with stories, of course, as long as they’re non-fiction.

Sometimes, though, the storytelling gets too creative. . . .

Recently it hit me: These stories always seem to have happy endings, at least in education. You know those Hollywood movies about the white teacher who comes into a poor school, and at first she clashes with her minority students, but in the end, they lift each other up educationally and morally? Correlational research in education is the same way: The news is good, the problem can be solved, the moral is uplifting, and optimism flows like water. It’s always sunny in correlationville.

What’s wrong with a little bit of optimism?

Kornell explains:

Optimism is great. But science is a search for truth. The truth is not always pretty. When it isn’t—when the pessimists are right—too much optimism can be harmful.

I [Kornell] got to thinking of this because of a new study in the journal Psychological Science.

Chen, L., Bae, S. R., Battista, C., Qin, S., Chen, T., Evans, T. M., & Menon, V. (2018). Positive Attitude Toward Math Supports Early Academic Success: Behavioral Evidence and Neurocognitive Mechanisms. Psychological Science.

Here’s the synopsis you’ll find in the press release: If a kid is not doing well in math, it might be because of his attitude. Make him feel more positive about math and he’ll start doing better.

Let’s look closer. . . . This is a correlational study. As the authors say in their penultimate paragraph: “We could not determine the direction of causal influences between positive attitude and math achievement” . . .

Yet they also say, in the very next paragraph: “In conclusion, our study demonstrates, for the first time, that PAM in children has a unique and significant effect on math achievement independent of general cognitive abilities and that this relation is mediated by the MTL memory system.” In fact, the title of the article is “Positive Attitude Toward Math Supports Early Academic Success: Behavioral Evidence and Neurocognitive Mechanisms.” [emphases added]

The words “effect” and “supports” are causal language. . . .

Kornell continues:

These optimistic conclusions should not be taken at face value because there are other, equally valid, ways to look at the data. . . .

First, it’s a truism that people often like things they’re good at. Therefore, we should expect being good at math to cause kids to like math. That alone is enough to explain the attitude-performance correlation. (If that wasn’t enough, did you notice that part of the PAM attitude measure asked kids whether they’re good at math? How could PAM scores not be correlated with math performance?)

In other words, attitude might not actually have any effect on performance. If this is true, then changing a kid’s attitude toward math will not make them better at it. That’s more pessimistic, but it gets worse. . . .

If a kid has below-average math aptitude, they will tend to struggle with math. This struggle will affect their attitude. That is, kids who are not good at math will grow to hate it. In fact, it is the very strength of the correlation in this study, between performance and attitude, that brings down the hammer on kids who don’t do well in math. It’s a strong correlation, which means few will buck the trend. In other words, this study shows that very few kids with low math aptitude will ever like math (which seems, anecdotally, true). They’re doomed to hate it. (Hopefully, this is going too far. Remember, this is storytelling. But it’s consistent with the data.)

In summary:

Different stories about the data produce different headlines:

Optimist: “Math performance can be improved by changing kids’ attitudes toward math!”

Pessimist: “Kids’ math performance determines their attitudes, and kids with low aptitude are doomed to hate math.”

Here’s why it matters. The optimist is going to invest funds into improving attitudes to create a positive cycle. The pessimist is going to give extra math help to kids who are struggling at a young age to prevent a negative cycle.

When we read science, we want to hear the truth. But we also like to see problems (like low math scores) get solved. It’s not that we want scientists to tell happy stories about a bad world. We want them to tell happy stories and we want the world to conform to the stories they tell. We want to live in correlationville. But we live on earth.

One thing that Kornell didn’t mention is that the article in question uses brain scans. On the plus side, this represents potentially valuable information on intermediate outcomes (the “neurocognitive mechanisms”) in the paper’s title). On the minus side, this is the sort of high-tech flourish that can fool reviewers into thinking there’s more to the research than there really is.

I have not read the article in detail and so I can’t really comment on Kornell’s specific points, but I agree with his general message which is that we have to be careful about unidirectional spins on research.

An example I recall from a few years ago was the finding that Nobel prize winners live two years longer than comparable non-winners. Setting aside any qualms about the study itself, there’s a big problem with the positive-spin interpretation. As I wrote, one could just as well summarize the study as, “Not getting the Nobel Prize reduces your expected lifespan by two years.” A lot more people don’t get the prize than do, so the negative spin could be warranted.

It’s related to the fallacy of the one-sided bet.

The only place I’d alter Kornell’s article is to reduce his emphasis on correlation vs. causation, as all these problems of interpretation also arise in purely correlational studies as well.


  1. Curious says:

    Another possible causal path Kornell fails to mention is that the way in which we structure maths education and teach it is only effective with a small percentage of people.

  2. Jonathan says:

    Another (?) aspect is that the standard for statistical meaning gives imprimatur to an aggregated concept without fully accounting for the complex nature of the concept being aggregated. A study about a school involves complex variables that can never be exactly reproduced, at least not outside studies within homogenous populations applied within that homogenous population, and then with caveats regarding complexity infiltrating through the definition of ‘homogenous’. Mathematically, complex variables literally invade your space. This isn’t hard to prove: argue who is the greatest player in any sport and you cannot definitively cross generational lines, just as you can definitively take a player’s career and re-imagine it as if he or she played for a different team in the same era or in a different era only when you reduce ‘definitive’ to raw adjustments like factors for relative batting average levels. You can’t adequately account for the availability of talent or who was allowed in the game, or the history of who played it and so on. That’s apparently what The Ringer is for: lists of anything subjectively ranked being the great male interest, so I saw yesterday a ranking of celebrities named Paul (which did not include St. Paul so that already subjectively defines the choice context).

    I sometimes think of this issue as related to the Axiom of Choice: it isn’t ‘real’ in the sense the other Axioms can be seen – like disjoint is rather obviously not the same as joint in the attributes under consideration. Choice is interpreted as existing. Haphazard application allows people to avoid talking about the definitions they used to restrict the field to which choice is constrained. That is, the process of reducing the infinities of choice is often hidden behind words like ‘we controlled for …’ a list of issues which you treat as fixed or as varying in specific ways when you really don’t know how they interact behind the scenes. What you present is a slice, which is a choice, but the entirety of the choice process is not visible, all the way down to pretending that this or that variable can be fixed or averaged or otherwise treated in regards to all the other variables, and thus that you can pretend that this or that other variable can be …

    I’ve referred to the system pushing back at you in education, that anyone who has ever been alive and gone to school knows that people resist being taught, that the educational system may work in this moment and not the next, that it may work for this kid for this period of time but then not for that kid but maybe it works for this other kid. All these responses suggest – no, they tell us plain a huge bell ringing – that the system alters as you push on it and the way it alters is complicated and our studies have great trouble because they don’t adequately account for push back, for whatever it is that happens when you try this or that. Think of education as a complicated game in which you send a ball to a kid and the kid hits it back: except the ball isn’t a ball but is ideas and lessons and skills and methods of behavior that are attempting to connect to the kid’s individual persona, to the kid’s milieu of friends, family and community, to larger culture, to all sorts of things including psychological fears. How do you include in a study a kid’s fear that success can remove that kid from that kid’s milieu, which whether seen as good or bad by you is what the kid knows? You can try to study that point but it’s balled up with other complex points. Pick a point: it’s complex! Treat it as real and you’re treating complex numbers as real! You can do that but so many people don’t grasp that’s what they’re doing and the limits of what it means to say that you’re generating slices which are statistically valid – not ‘true’ – in the slice you took. As your blog makes eminently clear, they don’t grasp that slices which appear the same don’t replicate.

    This is why physicists prefer working with assumed spherical cows. And why they work with assumed spherical cow equivalents in layer upon layer of possibility. I think it would be a great service if people recognized that calculation of physical results using stuff that reliably can be measured objectively in great detail requires a layer of potential interaction within another layer of potential interaction within another layer of potential interaction so the numbers of possible things that can happen to even a simple ball of stuff becomes extraordinarily huge very quickly. We can calculate those results only because all of those possibilities repeat the basic steps, so we can get values for huge chains of possibilities. Can’t do that with higher order complex things.

  3. Nick says:

    Many psychologists are highly skilled at using language that is obviously intended to suggest causality, but which is ju-u-u-ust capable of being interpreted otherwise if it ever got to a court of law that required proof “beyond a reasonable doubt”. Another common ploy is to have the penultimate paragraph of the Discussion section start with “Of course, our analyses were cross-sectional, so we make no claims that A causes B”, following several paragraphs in which the authors in effect made precisely that claim.

    And of course, when it comes to the subsequent media coverage, “Sadly, we can’t prevent journalists making mistakes” — although one could, of course, decide not to place links to all of those erroneous articles on ones bio page.

  4. Kyle C says:

    Brain scans = alchemy. “Oooh, it lights up.” One day we may have a causal model of how the brain works (a la chemistry). We do not now. We do not even know what it means for brain “activity” to be “associated with” a mental state. It’s faux-causal spin all the way down.

  5. Scott Porter says:

    As a minor note, I think the Nobel prize example is not a good one to add here. It just muddies the water. Since the policy of that prize is not to award posthumously, the effect is really one of survivor bias, which both optimist and pessimist headlines ignore.

  6. Matt Skaggs says:

    “Data don’t speak for themselves.”

    “In my career as a psychology researcher, I have never seen data that spoke for themselves.”

    There, fixed it for you, Dr. Kornell.

    I can’t be the first person who has ever sat through a Powerpoint presentation where a well-conceived early slide summarizes the data, and the rest of the (interpretation) slides add nothing to the summary slide. I don’t think this is just picking nits. All researchers should be thinking about how to present data in such a way that the data speak for themselves. IIRC, this has been discussed at this blog before.

    At the other end of the spectrum, if the presentation is all about the story and not the data, the data slide is usually a mess.

  7. Terry says:

    Isn’t this just the self-esteem silliness in slightly different garb?

    Wasn’t all that put down decades ago?

    Is it really so hard to see that people don’t like being forced to do things they aren’t good at?

  8. Anonymous for this one says:

    The active insistence on conflating correlation with causation is rampant among administrators at my institution. Examples:

    – Students who fail to complete 30 credits freshmen year are less likely to graduate in 4 years than students who complete at least 30 credits freshmen year. Therefore academic advisors should push students who are on track for 27 credits to get themselves up to 30.

    – Research has shown that students who are not meeting expectations within the first four weeks of classes are more likely to fail the class. Therefore we should devote extra resources to early assessment during the first four weeks of classes. As though the first four weeks are special in a way that the 2nd four weeks are not, or in a way that the last four weeks of the previous semester are not. As if every measure of educational achievement was not correlated with every other measure of educational achievement.

    – Incoming freshmen who attend the big pre-semester orientation event for new students get better grades and have better graduation rates than incoming freshmen who do not. Therefore something about pre-semester orientation must be really important for determining later success!

    Most of this stuff is benign; I think early assessment is a good thing, and I think discouraging students who’d like to graduate in 4 years from taking low credit loads early on is (usually) a good thing, and I think the pre-semester orientation is event is a good thing. But the active, knowing refusal to take “correlation doesn’t imply causation” seriously frustrates me. Every now and then administration people will give that cursory acknowledgment… “of course, this is only correlational, BUT…” and then everything after the “but” screams “let’s just go ahead and pretend it’s causal anyway”. They are so obviously using their data cynically, because on a superficial level it supports their story. And because the story is a happy, positive story, who’s gonna object?

    I think this kind of stuff is embarrassing coming from an institution of higher learning.

    • When I was at Duke Stats Dept (2007/8), I got an email message that it was very important that some faculty attend a meeting of a group of US wide university deans that were going discuss an ongoing study of graduate student completion. About 30 deans and about 30 other faculty showed up.

      One of the deans had a summer student who had time to look at the data – basically time to completion of degrees. Not knowing what to do with censored data they had analysed just the completers. No one seemed concerned about this. That’s when I realized no one else in the room had any background in statistics.

      They tolerated my comment about there being better methods for analyzing censored data but when I raised concerns about them trying to discern causality from uncontrolled, unadjusted, poorly analyzed data (completers only), especially with regard to small groups, some of the deans lost their patients and told the other deans that statistician often raise unnecessary and distracting considerations. They then happily went back to story time discussions.

      I was a bit surprised as I had an inflated opinion of university deans at the time. Also, I did not understand that “very important” in the email was code for “lost cause, don’t waste your time”.

  9. James says:

    “Indeed, in the science game, it often seems that story time is the goal, and the point of all the experimentation is just to amass a chit that will allow you to publish a story that you can then disseminate to the world under the imprimatur of the scientific establishment.”

    The distortion can be further magnified by a second-order effect whereby a polemicist uses the published study as a “chit” to publish *their* interpretation of the conclusions of the study, which has the “imprimatur of the scientific establishment.”

  10. Kyle C says:

    “Statisticians often raise unnecessary and distracting considerations” is basically the response you get whenever you raise elementary questions (of the kind I have learned from this blog to ask) in comments on social scientists’ blog posts. Sounds as though you folks are like the lawyers of the Ph.D. world. :-)

Leave a Reply