The 2019 project: How false beliefs in statistical differences still live in social science and journalism today

It’s the usual story. PNAS, New York Times, researcher degrees of freedom, story time. Weakliem reports:

[The NYT article] said that a 2016 survey found that “when asked to imagine how much pain white or black patients experienced in hypothetical situations, the medical students and residents insisted that black people felt less pain.” I [Weakliem] was curious about how big the differences were, so I read the paper.

Clicking through to read the research article, which was published in PNAS, I did not find any claim that the medical students and residents insisted that black people felt less pain.

I did see this sentence from the PNAS article: “Specifically, we test whether people—including people with some medical training—believe that black people feel less pain than do white people.” But, as Weakliem finds out, it turns out there were no average differences:

Medical students who had a high number of false beliefs rated the white cases as experiencing more pain; medical students who had a low number of false beliefs rated the black cases as experiencing more. High and low were defined relative to the mean, so that implied that medical students with average numbers of false beliefs rated the black and white cases about the same.

The authors included their data as a supplement to the article, so I [Weakliem] downloaded it and calculated the means. The average rating for the black cases was 7.622, on a scale of 1-10, while the average rating for the white cases was 7.626—that is, almost identical. The study also asked how the different cases should be treated—135 gave the same recommendation for both of their cases, 40 recommended stronger medication for their white case, and 28 for their black case. Since the total distribution of conditions was the same for the black and white cases, this means that in this sample, treatment recommendations were different for black and whites. However, the difference was not statistically significant at conventional levels (p is about .14)—that is, the sample difference could easily have come up by chance.

So you could conclude that, in this sample, there is no evidence that medical students rate the pain of blacks and whites differently, but perhaps some evidence that they treat white pain more aggressively. (If you just went by statistical significance, you would accept the hypothesis that they treat hypothetical black and white cases the same, but a more sensible conclusion would that you should collect more data). The paper, however, didn’t do this. . . .

Hey, this is a big fat researcher degree of freedom! The authors of this paper easily could’ve summarized their results as, “White people no longer believe black people feel less pain than do white people.” That could’ve been the title of the PNAS article. And then the New York Times article could’ve been, “Remember that finding that white people believe that black people feel less pain? It’s no longer the case.”

OK, I guess not, as PNAS would never have published such a paper. The interaction between beliefs of physical differences, beliefs about pain, and attitudes toward pain treatment—that’s what made the paper publishable. Unfortunately, the patterns that were found could be explainable by noise—but, no problem, there were enough statistical knobs to be turned that the researchers could find statistical significance and declare a win. At that point, maybe they felt that going back and reporting, “No average difference,” would ruin their story.

Weakliem summarizes:

The statement that “the medical students and residents insisted that black people felt less pain” is false: they rated black and white pain as virtually equal. I [Weakliem] don’t blame Villarosa [the author of the NYT article] for that—the way it was written, I could see how someone would interpret the results that way. I don’t really blame the authors either—interaction effects can be confusing. I would blame the journal (PNAS) for (1) not asking the authors to show means for the black and white examples as standard procedure and (2) not getting reviewers who understand interaction effects.

I don’t know if I agree with Weakliem in letting the authors of these articles off the hook. The NYT article did misrepresent the claims in the PNAS article; the PNAS article did come in to test a hypothesis and then never report the result of that test; so both these articles failed their readers, at least regarding this particular claim. Indeed, the title of the NYT article is, “Myths about physical racial differences were used to justify slavery — and are still believed by doctors today”—a message that is completely changed by reporting that the PNAS study found no average belief in pain differences.

As for PNAS, I think it’s too much to expect they can find reviewers who understand interaction effects—that’s really complicated—and I guess it’s too much to expect that they would turn down an article that fits their political preconceptions. But, jeez, can’t they at least be concerned about data quality? Study 1 was based on 121 participants on Mechanical Turk. Study 2 was based on 418 medical students at a single university. I can see the rationale for Study 2—medical students grow up and become doctors, so we should be concerned about their views regarding medical treatment. But I can’t see how it can be considered scientifically legitimate to take data from 121 Mechanical Turk participants and report them in the abstract of the paper as telling us something about “a substantial number of white laypeople.” You don’t need to understand interaction effects to see the problem here; you just need to stop drinking the causal-identification Kool-Aid (the attitude by which any statistically significant difference is considered to represent some true population effect, as long as it is associated with a randomized treatment assignment, instrumental variable analysis, or regression discontinuity).

24 thoughts on “The 2019 project: How false beliefs in statistical differences still live in social science and journalism today

  1. Hi. You might have added that the reference to slavery in the NYT headline supercharges the whole discussion so it’s not just about judgments of pain levels and how aggressively to treat them.

  2. “Myths about myths about physical racial differences were used to justify bad journal articles — and are still believed by journalists today”

    • Michael,

      Is your argument that these beliefs never existed? And that this research is evidence against the existence of those beliefs? Because that’s how it sounds.

      • Curious:

        It’s no myth that myths of physical racial differences were used to justify slavery. The myth, as reported in the above-linked news article, is that these myths of physical racial differences are still believed by doctors today. But I guess we have to be careful here. These myths of physical racial differences are surely believed by some, even many, doctors today. These differences just seem to appear on average, and they don’t turn up in the study being cited.

        One of the difficulties of correcting misreporting, whether from the news media or the scientific literature, is that the original reporting is often sloppy, and in the correction we often have to be super-careful not to misrepresent what we are correcting.

        • “One of the difficulties of correcting misreporting, whether from the news media or the scientific literature, is that the original reporting is often sloppy, and in the correction we often have to be super-careful not to misrepresent what we are correcting.”

          Sad but true.

      • Nah, it was just me being a smartass. “Myths about myths” refers to the research article’s authors making claims not supported by the data. It’s now a meta-myth: the paper, and then the column, have created a myth that there’s empirical evidence that medical professionals still believe these myths. I have no doubt that racist myths guided white doctors, and may still. Look no further than the horrific history of gynecology: “The modern vaginal speculum was developed by J. Marion Sims, a plantation doctor in Lancaster County, South Carolina. Between 1845 and 1849, Sims performed dozens of surgeries, without anesthesia, on at least 12 enslaved women.”

        https://en.wikipedia.org/wiki/Speculum_(medical)

  3. What I find really disturbing about both the study and the way it was used by the NYT is that they support the claim that medical people in some general sense believe that physical racial differences make Blacks less susceptible to pain. We are seeing that a lot these days, “white people”, “the culture” etc. are racist.

    I agree that racism is an immense social problem, but to characterize it — and misrepresent research accordingly — as a generic, undifferentiated force is really letting true racists off the hook. I have no doubt there are some racist doctors. It would be useful to have a sense of how many there are and to what extent it influences their treatment. That would mean doing research designed around capturing differences across medical people. And that would take us back to a longstanding issue, the fixation on finding average effects when the structure of effect differences is what we ought to be interested in.

    • This is reminiscent of how, in the implicit bias studies, it turned out that explicit (albeit grudgingly expressed) biases really did the harm if you gave people time to reflect on their decisions.

    • Peter:

      It is also important to acknowledge the long term impacts of both past and current overt & covert racism on future outcomes. I agree with Kyle that ‘implicit’ is mostly doing the work of ‘covert’.

    • Peter said,

      “I agree that racism is an immense social problem, but to characterize it — and misrepresent research accordingly — as a generic, undifferentiated force is really letting true racists off the hook. I have no doubt there are some racist doctors. It would be useful to have a sense of how many there are and to what extent it influences their treatment. That would mean doing research designed around capturing differences across medical people. And that would take us back to a longstanding issue, the fixation on finding average effects when the structure of effect differences is what we ought to be interested in.”

      +1

  4. Not in defense of the study’s conclusions or the Times article…IMO it’s useful to look at that study in a broader context…

    When I first read the following, I was left scratching my head:

    > The present work sheds light on a heretofore unexplored source of racial bias in pain assessment and treatment recommendations within a relevant population (i.e., medical students and residents), in a context where racial disparities are well documented (i.e., pain management). It demonstrates that beliefs about biological differences between blacks and whites—beliefs dating back to slavery—are associated with the perception that black people feel less pain than do white people and with inadequate treatment recommendations for black patients’ pain.

    https://batten.virginia.edu/about/news/black-americans-are-systematically-under-treated-pain-why

    I was wondering how they authors got to the “associated with…inadequate treatment recommendations for black patients’ pain.” Seemed like a big overreach.

    Then I Googled…from one of the authors…

    > Racial disparities are particularly striking in pain treatment, Trawalter said, with studies showing that Black patients are significantly less likely to be prescribed pain medication and that they generally receive lower doses of it when they are. One possible reason for this, supported by existing studies, is that white people believe Black people experience less pain.

    So at some level I think the author may have been filling in some gaps based on “priors” that aren’t explicated in the study in question. Not an excuse, (i.e., could be evidence of “motivated reasoning”) but perhaps something of an explanation.

  5. I think Taibbi’s Hate Inc. book comes into play here. The NYT increasingly sells its audience outgroup based fear and teath gnashing. That model worked for Fox for years. For the NYT that means selling to it’s readership the menace of racist white Trump supporters. It also offers salvation from the original sin of racism to white liberal readership through awareness and penance. There’s probably less interest in parsing truth for stories that fit the narrative of this business model.

  6. Not surprisingly a side issue or the main issue of a discussion can be easily buried under an agenda. For those past middle age a realization that neural sensitivity declines with age may not be surprising if they are at all curious and sufficiently knowledgeable. Medical focus seems to be on the extreme manifestations eg diabetic feet and gross edema as gage of pathology.
    As an experiment run a nail brush on your foot from toes upward and notice the change in sensitivity. Or run a hair brush on your scalp. Try to remember how hot a drink you could tolerate when younger. I have found this to be the case and I have no reason to think that my response is different in kind rather in degree. What I had believed to be an advantage (high pain threshold) is actually a decline.
    So it is possible that loss of sensitivity distributes with age, pathology and possibly populations. The salient thing for me is the degradation and its progress as a possible marker for treatment.
    Nothing to do with racism per se but that is what will get attention if somehow connected and even if some evidence exists.I don’t believe that such evidence has been pursued as this does not benefit the medical business.

  7. I had been surprised to see so relatively few comments to this post — especially given the click-baity title — until now. There is a twitter shouting match happening today between a controversial writer (and supporters) and a social science editor at Science (and supporters) who has accused the writer of mangling data in his writing and being a bigot. The writer says he didn’t and is not a bigot. You can guess what that twitter “conversation” looks like. Now, I read this blog as a relative layperson with an interest in statistics. But if I were an academic or other professional whose career rides on publishing journal articles, and I saw an editor at Science engage so readily in a vitriolic tweet war and level serious accusations at a person, I wouldn’t comment on a blog post on a sensitive subject either.

  8. Definitely could be a mistake on my part, but I calculated 7.624 as the pain rating for black cases and 7.637 as the rating for white cases using the ‘Data – Raw and Cleaned’ file (all data cleaned worksheet). Converted the data into table form, filtered column E to include ‘w’ respondents only, and then used =AGGREGATE(1,3,Table1[blpain]) and =AGGREGATE(1,3,Table1[whpain]) to calculate average pain scores on the filtered set.

    Either way, the pain score difference is clinically insignificant and the point of this post still stands. Just hoping for some clarification.

  9. I went to UVA med school, and while I wasn’t part of this study I did talk to a student that was and he told me that the students didn’t take the questions seriously (I’m pretty sure they had no idea it would turn into a publication and become a New York Times story). Whenever researchers do these response-based studies they should really have most of the questions be positive control questions to see who is taking it seriously and have the question of interest be buried among the rest.

Leave a Reply to anon Cancel reply

Your email address will not be published. Required fields are marked *