There has been an increasing discussion about the proliferation of flawed research in psychology and medicine, with some landmark events being John Ioannides’s article, “Why most published research findings are false” (according to Google Scholar, cited 973 times since its appearance in 2005), the scandals of Marc Hauser and Diederik Stapel, two leading psychology professors who resigned after disclosures of scientific misconduct, and Daryl Bem’s dubious recent paper on ESP, published to much fanfare in Journal of Personality and Social Psychology, one of the top journals in the field.
Alongside all this are the plagiarism scandals, which are uninteresting from a scientific context but are relevant in that, in many cases, neither the institutions housing the plagiarists nor the editors and publishers of the plagiarized material seem to care. Perhaps these universities and publishers are more worried about bad publicity (and maybe lawsuits, given that many of the plagiarism cases involve law professors) than they are about scholarly misconduct.
Before going on, perhaps it’s worth briefly reviewing who is hurt by the publication of flawed research. It’s not a victimless crime. Here are some of the malign consequences:
– Wasted time and resources spent by researchers trying to replicate non-findings and chasing down dead ends.
– Fake science news bumping real science news off the front page.
– When the errors and scandals come to light, a decline in the prestige of higher-quality scientific work.
– Slower progress of science, delaying deeper understanding of psychology, medicine, and other topics that we deem important enough to deserve large public research efforts.
This is a hard problem!
There’s a general sense that the system is broken with no obvious remedies. I’m most interested in presumably sincere and honest scientific efforts that are misunderstood and misrepresented into more than they really are (the breakthrough-of-the-week mentality criticized by Ioannides and exemplfied by Bem). As noted above, the cases of outright fraud have little scientific interest but I brought them up to indicate that, even in extreme cases, the groups whose reputations seem at risk from the unethical behavior often seem more inclined to bury the evidence than to stop the madness.
If universities, publishers, and editors are inclined to look away when confronted with out-and-out fraud and plagiarism, we can hardly be surprised if they’re not aggressive against merely dubious research claims.
In the last section of this post, I briefly discuss several examples of dubious research that I’ve encountered, just to give a sense of the difficulties that can arise in evaluating such reports.
What to do (statistics)?
My generic solution to the statistics problems involved in estimating small effects is to replace multiple comparisons by multilevel modeling, that is, to estimate configurations rather than single effects or coefficients. This tactic won’t solve every problem but it’s my overarching conceptual framework. There’s lots room for research on how to do better in particular problem settings.
What to do (scientific publishing)?
I have clearer ideas of resolutions (at least in the short term) of the Bem paradox; in short, what to do with dubious but potentially interesting findings.
So far there seem to be two suggestions out there: Either publish such claims in top journals (as for example Bem’s in JPSP, or the contagion-of-obesity paper in NEJM), or the journals should reject them (perhaps from some combination of more careful review of methodology, higher standards than classical 5% significance, and Bayesian skepticism).
The problem with the publish-in-top-journals strategy is that it ensures publicity for some mistakes and it creates incentives for researchers to stretch their statistics to get a prestigious publication.
The problem with the reject-’em-all-and-let-the-Arxiv-sort-’em-out strategy is that it’s perhaps too rigorous. So many papers have potential methodological flaws. Recall that the Bem paper was published, which means in some sense that its reviewers thought the paper’s flaws were no worse than what usually gets published in JPSP. Long-term, sure, we’d like to improve methodological rigor, but in the meantime a key problem with Bem’s paper was not just its methodological flaws, it was also the implausibility of the claimed results.
So here’s my proposed solution. Instead of publishing speculative results in top journals such as JPSP, Science, Nature, etc., publish them in lower-ranked venues. For example, Bem could publish his experiments in some specialized journal of psychological measurement. If the work appears to be solid (as judged by the usual corps of referees), then publish it, get it out there. I’m not saying to send the paper to a trash journal; if it’s good stuff it can go in a good journal, the sort where peer review really means something. (I assume there’s also a journal of parapsychology but that’s probably just for true believers; it’s fair enough that Bem etc would like to publish somewhere that outsiders would respect.)
Under this system, JPSP could feel free to reject the Bem paper on the grounds that it’s too speculative to get the journal’s implicit endorsement. This is not suppression or censorship or anything like it, it’s just a recommendation that the paper be sent to a more specialized journal where there will be a chance for criticism and replication. At some point, if the findings are tested and replicated and seem to hold up, then it could be time for a publication in JPSP, Science, or Nature.
From the other side, this should be acceptable to the Bems and Fowlers who like to work on the edge. You still get your ideas out there in a respectable publication (and you still might even get a bit of publicity), and then you, the skeptics, and the rest of the scientific community can go at it in public.
There have also been proposals for more interactive publications of individual articles, with bloglike opportunities for discussion and replies. That’s fine too, but I think the only way to make real progress here is to accept that no individual article will tell the whole story, especially if the article is a report of new research. If the Bem finding is real, this can be demonstrated in a series of papers in some specialized journal.
Appendix: Individual cases can be tough!
I’ve encountered a lot of these borderline research findings over the past several years, and my own reaction is typically formed by some mix of my personal scientific knowledge, the statistical work involved, and my general impressions. Here are a few examples:
“Beautiful women have more daughters”: I was pretty sure this one was empty just based on my background knowledge (the claim was an difference of 8 percentage points, which is much more than I could possibly expect based on the literature). Careful review of the articles led me to find problems with the statistics.
Dennis the dentist, Laura the lawyer, and the proclivity of Dave Kingman and Vince Koleman to strike out a lot: I was ready to believe the Dennis/Laura effect on occupations and only slightly skeptical of the K effect on strikeouts, but then the work was later strongly criticized on methodological grounds. Still, my back-of-the-envelope calculation let me to believe that they hypothesized effects could be there.
Warming increases the risk of civil war in Africa: This one certainly could be true but something about it rang some bells in my head and I’m skeptical. The statistical evidence here is vague enough that I could well take the opposite tack, believing the claim and being skeptical about skepticism of it. To be honest, if I knew these researchers personally I might very well be more inclined to trust the result. (And that’s not so silly: if I knew them personally I could ask them a bunch of questions and get a sense of where their belief in this finding is coming from.)
“45% hitting, 25% fielding, and 25% pitching”: I was skeptical here because it was presented as a press release with no link to the paper but with enough details to make me suspect that the statistical analysis was pretty bad.
“Minority rules: scientists discover tipping point for the spread of ideas”: I don’t know if this should be called “junk science” or just a silly generalization from a mathematical model. Here I was suspicious because the claim was logically inconsistent and the study as a whole fit the pattern of physicists dabbling in social science. (As I wrote at the time, I’ll mock what’s mockable. If you don’t want to be mocked, don’t make mockable claims.)
“Discovered: the genetic secret of a happy life”: There’s potentially something here but the differences are much smaller than implied by the headlines, the news articles, or even the abstract of the published article.
Whatever medical breakthrough happens to have been reported in the New York Times this week: I believe all of these. Even though I know that these findings don’t always persist, when I see it in the newspaper and I know nothing about the topic, I’m inclined to just believe.
That’s one reason the issue of flawed research is important! I’m as well prepared as anyone to evaluate research claims, but as a consumer I can be pretty credulous when the research is not close to my expertise.
If there is any coherent message from the above examples, it is that my own rules for how to evaluate research claims are not clear, even to me.