BMJ FAIL: The system is broken. (Some reflections on bad research, scientism, the importance of description, and the challenge of negativity)

tl;dr

It’s not the British Medical Journal’s “fault” that they published a bad paper. I mean, sure, yeah, it’s 100% their fault, but you can’t fault a journal for publishing the occasional dud. And there’s not really a mechanism for retracting a paper that’s just seriously flawed, if no fraud is suspected.

So the system is broken. Not in allowing bad papers to be published—there’s no way to avoid that—but in not having a way to say, Whoops! We messed up there.

For that matter, I don’t think the Journal of Personality and Social Psychology ever published a “this was a screw-up” notice regarding that ESP paper from 2011.

Four themes

This post ties together four recent themes:

– Bad research

– Scientism or the fallacy of measurement

– The importance of descriptive science

– The challenge of negativity

The bad research is this article from the British Medical Journal making strong claims based on models like this:

Scientism or the fallacy of measurement arises when researchers take a particular measurement or statistical procedure and simply assert it as their goal, as in this recent article published by the U.S. National Academy of Sciences:

Evidence for bisexual orientation requires that Minimum Arousal have an inverted U-shaped distribution and that Absolute Arousal Difference be U-shaped.

The importance of descriptive science is that it can be valuable to look at raw data and perform statistical analyses without feeling the need to jump to big generic statements.

Finally, the challenge of negativity is that critical work is often held to a higher standard than original claims. This might make sense in terms of the process of science—we should be open to speculation, and new ideas are often wrong but this is often how we can move forward—but what it means in practice is that if you’re cheerleading, you can be all kinds of sloppy and this is accepted by the science establishment, but if you’re critical you have to be extra careful. It’s the research incumbency rule.

The story is that the British Medical Journal published a bad article. That happens: journals publish bad articles all the time. I’m sure I’ve published some bad articles myself on occasion! I posted a comment, and the authors posted a content-free reply. This is standard operating procedure in all of science—indeed, we’ve been trained to give minimal, unthinking responses to criticism.

But it makes me sad. These researchers—their careers aren’t over! They’ll have lots more chances to do more work in their jobs. Why not take this opportunity to learn something from their mistakes? The problem they’re working on is important, so it’s important to try to get things right.

That’s bad research, but that example also featured scientism, in that the research paper used statistical methods (in this case, meta-analysis) without careful thought of the quality of the data or the details of the model they were fitting.

It would be as if you tried to build a bus by throwing together an engine, some gears, wheels, some seats, a steering wheel, etc., but without a real plan of how they fit together.

This relates to the importance of descriptive science: don’t go making strong claims from analysis of data that you haven’t fully described or understood.

And then, the challenge of negativity. Suppose I were to try to publish a paper in the British Medical Journal saying that their recently published paper was worthless. That would take a lot of work! Not really worth the effort, probably. It’s great that the British Medical Journal has a place for online comment, but it’s too bad that, for those readers who don’t look really carefully, the published article will stand on its own.

There’s no easy answer here. It’s not like the journal should retract their article just because a couple people say it’s wrong, and it’s not like the journal editors could go in and reread the article and decide that it’s a hot mess and decide to pull it from their journal. That way lies madness. So I have no great solution, beyond more open and prominent post-publication review.

The aforementioned BMJ paper publishes its prepublication reviews, but in this case, as in so many others, the peers missed some important problems. I guess it’s possible that the article was first sent to JAMA, NEJM, and Lancet and the reviewers at those journals saw through the problems with the article, so it ended up at BMJ. But I guess it doesn’t matter.

You know that joke:

Q: What do you call the guy who finished last in his class at medical school?

A: Doctor.

Here’s another:

Q: What do you call a paper that has serious flaws but finally made its way through the peer review process, somewhere?

A: Published.

It’s a problem.

Was the system always broken?

But none of this is new. Scientific journals have always published papers of varying quality, including some that were flat-out wrong in the sense of not offering the claimed evidence (recall evidence vs. truth). And it’s not like they were ever removing bad papers from their records. Actually, now that everything’s online, it’s easier than ever to remove or fix mistakes. We’re not seeing this now, but we weren’t seeing it before either. So maybe all that’s going on is we’re noticing it more?

There’s a temptation to say the system isn’t broken. Even if you don’t want to go full “Harvard” and say “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” (that quote’s like an outtake from a low-budget remake of The Social Network), you might say that science is self-correcting, and mistakes get corrected. And maybe that’s true—but it seems like a serious flaw in the system that there’s a mistake out there, and we all know it’s a mistake, but it’s gonna be sitting there in the literature, wasting people’s time, forever.

Again . . .

I have no animus toward this journal, its editors, or the authors of the article under discussion. We all make mistakes, and there are hundreds of examples I could’ve picked to make this point. I just had to pick one example, that’s all.

37 thoughts on “BMJ FAIL: The system is broken. (Some reflections on bad research, scientism, the importance of description, and the challenge of negativity)

  1. This seems eerily familiar, except minus some of the ideological cover for grifting that I’ve run into:

    https://advances.sciencemag.org/content/6/23/eaax3787
    https://arxiv.org/abs/1902.09442

    It’s really important for journals to have a routine Comment procedure, implemented promptly and impartially. I’ve found that the Physical Review journals do an excellent job of this. AAAS (see above), not so much. I’m told that medical journals are also bad.

    • Michael:

      Grifting’s another story. The BMJ example doesn’t seem like grifting (except in the very general sense that people are being paid do to bad work). Rather, it seems like they sincerely think they’re doing good science, that they think of science as a series of buttons to press. I remain distressed at their attitude that they can do a good meta-analysis without assessing the quality of the separate analyses that go into it.

  2. Here is an idea to throw out there. Instead of asking each journal to implement their own Comment procedure (agree this would be good), what about making essentially a Yelp, an open review site, for all papers? It can be indexed by DOI. If a paper is bad enough to warrant substantial criticism, the criticism can be posted there. Authors can reply if they wish, etc. I”m sure this thought has occurred to others–would it work?

    • As Andrew notes, this has been done. To answer the question “would it work?”, it depends on what you mean by “work.” PubPeer exists and seems robust. Most papers, of course, are never commented on. There are, however, a lot of papers with comments; some of the comments are good. It is rare that anyone cares about the comments, though, except in cases where they reveal actual fraud.

      As an example, consider the comments here, on a very high-impact gut microbiome & Parkinson’s disease paper that’s been cited over a 1000 times since 2016. Note especially the comment at the bottom, which is very good, raising severe questions about the paper’s analysis. (Most readers of this blog will be able to follow the argument. As another good comment notes, “Unfortunately, the statistical methods used in this paper are something of a compilation of how not to do things.”) Have the authors responded? No. (It’s been four years…) How many of the thousands of authors of the citing papers have read the PubPeer comments? I’d guess a few dozen. Is there any consequence to the criticisms? No. But, at least they’re out there…

        • Thanks, I like this from one of the reviews:

          “the weasel words describing the error bars for the motor tests and microgial quantifications suggest that all measurements were pooled…”

          sounds like at the very least they didn’t make clear whether measurements were pooled or not and probably made it *unclear* by not specifically saying one way or the other, then using words that leave the possibility open – which is probably appropriately interpreted as “we pooled the measurements but we don’t want to say so”. I mean I think that’s a fair conclusion, since they didn’t say otherwise.

      • One improvement would be displaying PubPeer comments on journal article homepages (if altmetrics can link to tweets, why not?). Currently you need to actively know about PubPeer to either search it or to install the PubPeer browser plugin.

        • True enough, although their screening seems to have improved recently. I suppose I was just suggesting that journals should note the presence of a comment, rather than its displaying its content, as the current PubPeer browser plugin does already. The altmetric links to tweets could already link to spam or troll comments; maybe this is more likely than PubPeer trolling in some areas (climate change research e.g.), although in this case the trolling helps the ‘impact’ score!

  3. Another fallacy that appears a lot in papers (complementing the fallacy of measurement) is begging the question fallacy. It’s a shame that colloquially, “begging the question” is now a synonym for “raising the question,” as the logical fallacy itself becomes obscured from colloquial use of the phrase.

  4. The challenge of scientism is a serious one. Theory as a practice within disciplines is often spurned as “airy and detached”, while statistical analysis is likewise scorned as “undertheorized”. It’s a tough thing to convince students that these need to work together so that you understand what you’re testing and can adjust your understanding based on the outcomes of data analysis. But absent a working relationship between those two groups, the spurning and scorn will justify themselves.

  5. A peer-reviewed article by Pippa Smart which was recently published in the journal European Science Editing at https://ese.arphahub.com/article/52201/ highlights the themes in this posting by Andrew.

    This article contains no objective measurements and/or a definition for terms like ‘more professional’, ‘dispassionate tone’, ‘unprofessional tone’, ‘social media accusations’, ‘unfounded accusations’ and ‘abusive emails’.

    Readers will wonder why Pippa Smart has used personal judgements without scientific merits, in particular given her backgrounds. These readers will also wonder why Pippa Smart has not provided them with her scientific insights in the intellectual shortcomings of the rejected manuscript.

    The article by Pippa Smart also contains a sentence without information: “The root of the problem with this author was that our decision was based on the submitted article and not the subject of the article.” Readers will wonder how this has passed peer-review.

    The views about the availability of raw research data (for reviewers and editors of life-science journals) in this article are heavily outdated. These outdated views are also in strong contract with a text in the online submission system for manuscripts for the journal: “Authors are informed that open-access publication of data that underpin the present manuscript is strongly encouraged in this journal (please see https://doi.org/10.3897/rio.3.e12431 and read and follow the Data Quality Checklist.”).

    There is after over one year no response on a query at Pubpeer about another article by Pippa Smart, see https://pubpeer.com/publications/06FE137BEBCAB56CF6DD42D9DD8973

    It seems thus plausible to argue that it is unlikely that Pippa Smart will communicate about the issues in her recent article.

  6. “Scientism or the fallacy of measurement arises when researchers take a particular measurement or statistical procedure and simply assert it as their goal…” Repudiating the idea of measuring as a fallacy is not anti-scientism, it’s just anti-science. The question of whether any given kind of measurement is successful in measuring any given phenomenon is a serious one, which is never resolved in a single paper. (In the particular example offered, demanding self-report as the only common sense possibility was wrong. Worse, unless “Minimum Arousal” and “Absolute Arousal Difference” aren’t, mathematically, inversions or reciprocals or other correlates, the criticism isn’t valid…unless the objection is to using the word “orientation?”) Other examples of this kind of scientism would be IQ, the Big Five personality traits, pretty much everything in Evolutionary Psychology so far as I can tell, but also such concepts as utility, marginal revenue productivity, money supply, interest rates, full employment and the whole panoply of national accounts (GDP etc.) The thing is of course, these measure were never simply assumed. Criticizing them for simply assuming is a wrong-headed approach, like criticizing religion for being dogmatic, instead of untrue.

    • I can’t quite parse out what you’re saying, whether you agree with Andrew about the idea of “scientism” or not.

      The fallacy of measurement is when someone assumes that the measurement they take directly reflects the phenomenon of interest, and/or *defines* the phenomenon as a certain behavior in the measurement (ie. a bimodal distribution in penis responses as the definition of bisexuality in the recently discussed paper)

      Repudiating this type of “scientism” is *NOT* anti-science. For example take BMI. It’s a straightforward ratio of mass to height squared. No one is claiming people are unable to measure mass or length, the point is that BMI is NOT necessarily a particularly good indicator of healthiness in all cases, and *defining* healthiness as being in a certain BMI range would be anti-scientific “scientism”. Creating treatments to change BMI instead of to change healthiness would be another form of scientism.

      • The criticism of BMI is not that it is “scientism,” but that it is not sufficient to measure health. This criticism has to be demonstrated by specifics, not by simply asserting measuring is reductive. Which is exactly what saying it’s “scientism,” a methodological principle. And, insofar as BMI is misused as a sole indicator, everything I know suggests this is driven almost entirely by health insurance companies, not the science community. (Another example would be the health insurance companies pushing a1c as the sole diagnostic criterion.) This kind of thing was not the misdeed of wrong-headed miscreants writing a single paper. If this were scientism even as defined in the OP, statistical modeling has problems just as great, or even greater. Do you really want to argue statistical modeling is scientism?

        And, consider your own example: You have committed yourself to the confused proposition sexual response can’t be measured and orientation can’t be measured (only self-reported) and bisexuality is something separate from sexual response. This is all incoherent on the face of it, and extraordinarily obtuse in regards to common experience about the reliability of self-report, or much worse, the pressures to deny such things, to oneself as well as others. Covering this up with generalities about scientism doesn’t support the case.

        Again, the assumption that statistical models are ever relevant, even when they aren’t refuted by the mismatch with reality, and can play a role in the ongoing debate within the scientific community is also the assumption that some phenomenon can be measured. Measured or modeled? If there’s a difference, it’s against statistical modeling!

        In some respects you are insisting that anyone who writes a paper, especially one that implicitly criticizes existing prejudices or literature (there’s an overlap,) simply must do a vastly superior job of justifying such negativity, immediately, in every single paper.

        • > You have committed yourself to the confused proposition

          Sounds like you might be confused yourself, since you keep telling people that they are saying things they are not saying.

          It might be more efficient for you to argue with your imagined interlocutors directly, rather than placing their words in the mouths of others.

    • What Daniel said.

      I’m not “repudiating the idea of measuring”! I don’t know where Steven got that. My problem is “when researchers take a particular measurement or statistical procedure and simply assert it as their goal.” Measurement and the goal of measurement are two different things. I have zero problem with researchers taking whatever measurements they want, writing them up, discussing their implications, etc. By problem is when researchers take an existing concept (such as bisexuality) and simply define it as a particular measurement they took.

      Regarding religion being dogmatic: Recall the distinction between evidence and truth. To say that someone is providing no good evidence is not to say that their underlying claim is untrue. This came up in our discussion of the Stanford study of coronavirus prevalence.

    • “Assuming what you want to prove” is a common mistake for undergraduate math majors to make. For many of them, getting over this tendency is a big step in developing what we sometimes call “mathematical maturity”.

      (This brings to mind one particular student –in grading an exam paper of his, I commented at one point that he had used the word “therefore” inappropriately. After I returned the exams, he came in to my office indignant. I calmly explained what the word “therefore” meant — he was chagrined. He said he never knew that; that he thought that “therefore ” was just a word that you inserted to make something mathematical.)

    • Ulrich:

      I agree that Bem’s paper is notorious, and I remain annoyed with people who characterized it as good science, but is it really a “hoax”? I think “junk science” or “bad science” would be a better characterization?

Leave a Reply

Your email address will not be published. Required fields are marked *