“Confirmation, on the other hand, is not sexy”

Mark Palko writes:

I can understand the appeal of the cutting edge. The new stuff is sexier. It gets people’s attention. The trouble is, those cutting edge studies often collapse under scrutiny. Some can’t be replicated. Others prove to be not that important.

Confirmation, on the other hand, is not sexy. It doesn’t drive traffic. It’s harder to fit into a paragraph. In a way, though, it’s more interesting because it has a high likelihood of being true and fills in the gaps in big, important questions. The interaction between the ideas is usually the interesting part.

In this particular example, Palko is telling the story of a journalist who reports a finding as new when it is essentially a replication of decades-old work. Palko’s point is not that there’s anything wrong with replication but rather that the journalist seems to feel that it is necessary to report the idea as new and cutting-edge, even if it falls within a long tradition. (Also, Palko is not claiming that this newly published work is not original, merely that it is more valuable in context.)

Palko’s observations fit into a topic that’s been coming up a lot in this blog (as well as in statistical discussions more generally) in recent years:

To review:

– Lots of iffy studies are published every year in psychology, medicine, biology etc. For reasons explained by Uri Simohnson and others, it’s possible to get tons of publishable (i.e., “statistically significant”) results out of noise. Even some well-respected work turns out to me quite possibly wrong.

– It would be great if studies were routinely replicated. In medicine there are ethical concerns, but in biolab or psychology experiments, why not? What if first- and second-year grad students in these fields were routinely required to conduct replications of well-known findings? There are lots of grad students out there, and we’d soon get a big N on all these questionable claims—at least those that can be evaluated by collecting new data in the lab.

– As many have noted, it’s hard to publish a replication. But now we can have online journals. Why not a Journal of Replication? (We also need to ditch the system where people are expected to anonymously review papers for free, but that’s not such a big deal. We could pay reviewers (using the money that otherwise would go to the executives at Springer etc) and also move to an open post-publication review system (that is, the journal looks something like a blog, in that there’s a space to comment on any article). Paying reviewers might sound expensive, but peer review is part of the scientific process. It’s worth paying for.

– Also the Bayesian point that surprising claims are likely to be wrong. Journalists like to report “man bites dog” not “dog bites man,” but when you look into some of those “man bites dog” stories they’re not actually true. I don’t see a resolution for this one.

– Everybody’s talking about the problem of false claims in the scientific literature. People are talking about it much more now than they were, twenty years ago. Back then, we thought of publication bias as a minor nuisance, but now we see it as part of a big picture of the breakdown of the culture of science.

– I think statistics needs to move beyond the paradigm of analyzing studies or datasets one at a time. I doubt this even made sense in the days of R. A. Fisher, I’m guessing that back in the day at Rothamsted Experimental Station or the experimental farm in Ames, Iowa, that each experiment was part of a long thread of trial and error (perhaps Steve Stigler can supply more details on this). But somewhere along the way came the idea that each little experiment was supposed to come in a box, nicely tied up and statistically significant.

P.S. The picture above is the first item that came up in a google image search on *confirmation is not sexy*.

18 thoughts on ““Confirmation, on the other hand, is not sexy”

  1. “In medicine there are ethical concerns…” This is actually untrue. Many randomized controlled trials in medicine should be replicated, for the “first” RCTs are commonly incorrect. To name just one example, better control of glucose levels (“tight glucose control”) in Intensive Care Units was instituted in 2001 after a single-centre RCT done in Belgium. Many subsequent RCTs have shown this to be a harmful intervention, and it is no longer practised. If the study wasn’t replicated, we’d still be harming people with this “treatment”.

    Many industry-funded trials are irreparably biased, and the only way to deal with this is by replicating them. This is expensive, but in the public’s interest.

    Therefore, there is an ethical imperative to perform MORE replication of trials in medicine, not less.

    • OK, let me revise that and say: “In medicine a replication can require more effort . . .” My point is that many experiments in psychology or biology can be easily replicated by a grad student in a lab, whereas a medical experiment can take a lot more permission and also often requires more time.

      • My suggestion is why don’t the funding agencies, when they award grants, keep aside a portion for “replication”? Once the core grant’s work is done, an independent researcher(s) gets a (lesser) grant to merely verify.

        Agreed, there will be less distinct proposals funded, yet your confidence in every result rises.

  2. A similar idea: why not have an electronic subjournal definitely linked to the main journal? e.g. for the journal ‘Marketing Science’ there would be ‘Marketing Science Replication and Extension’. (Marketing Science is a print journal, but under the auspices of INFORMS, a professional society. Harder to imagine this if a professional organization isn’t behind it.)

    This would have the added advantage of providing a good place for work that might take an existing data set from a study and do a more extended analysis.

    This might also be a good place to have articles that might be a sort of “running meta-analysis” of the original finding. As new studies were done, the meta-analysis could be continuously updated by the authors.

    I’m not an academic so there may be good reasons against these ideas in academic publishing.

  3. Andrew:

    I’m all in on the replication journal. Make it electronic, and publish all those replications grad students will do.

    Students can learn a lot via replication, buy also the replicated. It’s a win win.

    There should be no fear of getting your job replicated and people finding mistakes. We are all human. I’d welcome people double checkingy work.

  4. I have a student whose Honours project is using NHANES data to do replications of papers that appear as food-related stories in the local newpapers. We’ve started with perfluorooctanoic acid (used to make non-stick pans) and heart disease. Next up is tomatoes and depression, and just this weekend there was a story on carotenoid-containing fruits and vegetables and optimism that I’m adding to the list.

    The idea was to get some information on whether these assocations are mostly publication bias (in which case they won’t replicate) or mostly confounding (in which case they will replicate).

  5. “In order to assert that a natural phenomenon is experimentally demonstrable we need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimental demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.” (Fisher, Design of Experiments)

    Fisher continually emphasized that to report only part of the relevant existing evidence in existence allowed one to “lie with statistics”—yes, they had the phrase back then. For causal inference, one was to plan a variety of tests of diverse consequences so as to be better able to refute alternative explanations.

    • Agree, also the person who was credited with suggesting the paper’s topic on the problem with p_value censorship, in one of the earliest papers (1950,s or 60,s) was RA Fisher.

      Also in his earlier papers Fisher often refrred to multiple studies, including one of his early definitions of sufficiency.

  6. Pingback: Improving the quality of articles in scientific journals | Robust Analysis

  7. So there is at least one journal that is dedicating a section to replication results, or was trying to make it happen. This crossed my emails a couple years ago. I don’t really know how it’s all worked out, but I’ve also heard rumblings about some of the bigger Econ journals dedicating one slot an issue to replication studies of major recent work. From the interwebmails of the Public Finance Review:

    We write to ask you to invite your graduate students to submit well-executed replication studies to Public Finance Review for possible publication.

    Replication is key to scientific progress. However, with a few notable exceptions, economic replication studies have rarely been published, making it difficult to assess the validity of economic research. One thing standing in the way of replications is that there is little professional reward for doing. With its new “Replication Studies” section, Public Finance Review hopes to alter the incentives by publishing replications of important studies in the field of empirical public economics.

    Public Finance Review will consider three kinds of replications for publication:

    1. Positive replications: These are studies where the replicating author shows the original paper’s findings are robust to substantial extensions over time, explanatory variables, different data sources, or alternative estimation procedures.

    2. Negative replications (Type 1): These are studies where the replicating author is unable to reproduce the original author’s results using the same data, procedures, etc. In these cases, supplementary correspondence with the replications co-editors should provide evidence that adequate efforts were made to work with the original author to reproduce their results.

    3. Negative replications (Type 2): These are studies where the replicating author is able to reproduce the original author’s results, but he/she finds that the original results are not robust.

  8. Pingback: Replication and its low allure – can it change? | …not that kind of psychologist

  9. Paying reviewers is done in some journals (more common in finance perhaps), but alternatively we could raise the status of reviewing papers. If publishing in a top journal is fantastic, shouldn’t reviewing for a top journal be pretty darn good too? As it stands, the only reason to review papers (from a self-interested perspective) is to score points toward eventually being named associate editor, or co-editor.

Comments are closed.