Transparency, replications, and publication

Bob Reed responded to my recent Retraction Watch article (where I argued that corrections and retractions are not a feasible solution to the problem of flawed science, because there are so many fatally flawed papers out there and retraction or correction is such a long, drawn-out process) with a post on openness, data transparency, and replication. He writes:

A recent survey article by Duvendack et al. report that, of 333 journals categorized as “economics journals” by Thompson Reuter’s Journal Citation Reports, 27, or a little more than 8 percent, regularly published data and code to accompany empirical research studies. As some of these journals are exclusively theory journals, the effective rate is somewhat higher.

Noteworthy is that many of these journals only recently instituted a policy of publishing data and code. So while one can argue whether the glass is, say, 20 percent full or 80 percent empty, the fact is that the glass used to contain virtually nothing. That is progress.

But making data more “open” does not, by itself, address the problem of scientific unreliability. Researchers have to be motivated to go through these data, examine them carefully, and determine if they are sufficient to support the claims of the original study. Further, they need to have an avenue to publicize their findings in a way that informs the literature. . . .

Without an outlet to publish their findings, researchers will be unmotivated to spend substantial effort re-analysing other researchers’ data. Or to put it differently, the open science/data sharing movement only addresses the supply side of the scientific market. Unless the demand side is addressed, these efforts are unlikely to be successful in providing a solution to the problem of scientific unreliability.

Reed concludes:

The irony is this: The problem has been identified. There is a solution. The pieces are all there. But in the end, the gatekeepers of scientific findings, the journals, need to open up space to allow science to be self-correcting.

Could be. It could also be that the journals become less important. In statistics and political science, I see journals as still being very important in determining academic careers, but not so important any more for the dissemination of knowledge. There’s just so much being published and so many places to look for things. Econ might be different because there are still a few journals that are recognized to be the top. From the other direction, the CS publication system is completely broken because all the conferences are flooded with papers, it’s just endless layers of hype. CS, on the other hand, is doing just fine. At this point the publication process just seems to be going in parallel with the research process, not really doing anything very useful.

33 thoughts on “Transparency, replications, and publication

  1. Without an outlet to publish their findings, researchers will be unmotivated to spend substantial effort re-analysing other researchers’ data.

    True but a) I thought there was at least one new on-line journal for this and;
    b) it might be very useful for graduate students or even senior undergrads to have a look and see if the the results make sense.

    A useful learning exercise with the possibility of hitting a jackpot like the R&R re-analysis.

    And even a poster presentation goes in a young researcher’s C.V

    • Reminds me of a cancer researcher lamenting about the the early 1960’s – we thought once we pointed out the dangers of cigarette smoking – most folks would stop smoking.

      I would add random audits of research to your prevention box.

      Evidence-based medicine failed (or at least has not yet succeeded) because there was no way to identify and quarantine improperly conducted and reported research that contaminated the pool of evidence. The early quality of research scoring guides (a kind of statistical education) seemed to be mostly used by those who did poor quality research to puff their papers up by claiming they did the good things.

      Evidence does not come from studies that stand on there own – so to have evidence one needs pools of it (that are not contaminated). So you will likely need to worry about this for evidence-based data analysis.

    • >”scaling up stats education”

      No, I cannot disagree more strongly. The current stats education is the cause of the confusion. I remember sitting in class (grad school), learning NHST, and thinking “shouldn’t we set our hypothesis as the null hypothesis?” I was kept so busy that I just let it go and passed the class. It wasn’t until years later, when I had actual data and experience, that I realized that failing to get an answer to that initial question had caused me to waste years of my life on a poorly designed experiment. It was designed *for NHST*, meant only to test a “null hypothesis”. I needed to rethink everything from scratch and attempt to salvage the project, and was only able to achieve this to any degree due to exceptional personal circumstances. I would say that 99.99% of my peers are wasting their careers due to what was taught in stats class. They are trapped, because it is very difficult to admit you wasted the prime of your life on producing misinformation.

      Stats education in its current state needs to be abandoned entirely, it is causing so much pain and suffering. Not just on the users but also on the people inflicted by the treatments/policies it is used to generate.

  2. While I agree with everything, I’d like to paraphrase Jean-Luc Godard: “The best way to criticize a paper is to write another paper.” (Of course he said “movie”).

    There are many people working on any given area. If a paper is wrong, other researchers will react by writing new papers showing why the original paper’s results are questionable. It does not have to be a literal replication of the paper. What I mean is there is a lot of work that is a form of replication, but not in a literal sense.

    This is feasible in economics because most empirical economists work on public or quasi-public data (you may have to pay for it, or get permission to use it, but many have access to it). In psychology where data are often unique and generated by the study’s own researchers, it is probably necessary to have some literal replication in addition to new studies using new data.

    • It can be very hard to publish a paper pointing out though that a published paper reports statistically impossible results or was based on some incorrect analytic process. I’ve tried to jump through that hoop. Of course, in psychology any non-replication can simply be attributed to some moderator variable – or even simple sampling error.

    • > If a paper is wrong, other researchers will react by writing new papers showing why the original paper’s results are questionable.

      I agree with mark: “It can be very hard to publish a paper pointing out though that a published paper reports statistically impossible results or was based on some incorrect analytic process.” I have also tried to jump through that hoop, and it is not as easy as you may think for several reasons.

      First, chances are that you will get very little support from senior faculty. Remember what David Broockman said? His advisers and his (then future) colleagues all told him to forget about LaCour’s paper. I had almost the same experience. If I wasn’t told to let it go to protect my career, I was told it wasn’t enough to demonstrate that other papers were flawed; I had to “fix” the flaws, or else nobody would care.

      Second, chances are that you will get a hostile referee. It’s very common that either the original paper’s authors will review your paper, or someone who has written a paper similar to what the original paper did will review your paper. You can imagine how helpful the referee report will be.

      For an example of the second case, take a look at what happened to Steven Ludeke. He submitted a co-authored criticism of Peter Hatemi’s research, and despite writing a cover letter asking for Hatemi not to be a reviewer, the editor sent it off to Hatemi anyway.

      http://nymag.com/scienceofus/2016/07/why-it-took-social-science-years-to-correct-a-simple-error-about-psychoticism.html

      Gelman has written about this too:

      http://www.stat.columbia.edu/~gelman/research/published/ChanceEthics8.pdf

      • I understand, but that is not the point I am arguing. What I am saying is that writing a “comment” paper or a direct replication paper is not the only way to critique bad research. One can also write a new paper referencing the flawed research and showing how to correct and improve upon it. Moreover, since this is commonplace among economists, I am arguing that there is in fact more replication than it appears, but that it does not fall into the narrow definition of paper replication espoused by Prof. Gelman.

        Lastly, the LaCour issue is also not the point here because Prof. Gelman is talking about bad research not fraudulent research.

        • Jack PQ: Still difficult to get published. Who do you think gets invited to review these “corrected and improved” articles?

        • Jack:

          What Carol said. Also I’d say that it’s a useful contribution to point out flaws in published work, even without adding new things, showing how to correct and improve, etc. For one thing, it can be a contribution to knowledge to that something doesn’t work. For another, some research is so hopeless that the best way to correct and improve it is to just abandon it. That’s my feeling about the beauty-and-sex-ratio work, the ovulation-and-clothing work, etc. This is just my view, of course, but my point is that I should be able to publish criticisms of that published work without a requirement or expectation that I show how to correct and improve upon it.

        • Fair enough. My point is, to make progress it is essential to be specific about the problems, to figure out how to overcome them.
          1-Fraudulent data–>replication and shout it from the rooftops
          2-False results due to bad statistics–>replication & publish in journal Comment section (and better Stats training)
          ***Hence we need journals to have a Comment section, but most don’t!
          3-Dubious methodology–>this is a murky area because there can be honest disagreements about methodology, hence my Godard quote: write a new and better paper
          4-Conflict in interest in refereeing–>Editors need to wise up…

          I worry that that if Prof. Gelman is correct, there is a kind of Gresham’s Law at work: bad statistics will drive out good statistics, because bad statistics is both easier and more likely to get published (flashy results!)

  3. The solution is to not have journals any longer. Just repositories. If you have a fresh result, or a replication, or a critique, put it out there for everyone to see, and email a link to the people you think will find it most interesting. If you want a job or tenure, tell the committee what you think your best work is, and point them to what others did with it. That’s what html is for.

      • I share both your hopes and your doubts.

        I just noticed that arXiv’s annual budget is well under one million dollars. And everyone can access it. To put that in perspective, the 24 universities of the Russell Group in the UK pay over 20 million dollars for access to the journals of a single publisher.

        In an important sense (and I’m aware of the irony) there is simply too much money to be saved (i.e., lost by publishers) to imagine anything changing very soon.

      • Like arXiv, I think these efforts are praiseworthy. But I think the ideal situation is for university libraries to run their own repositories, where their own scholars can upload their work. A “discipline” will form as a network of linked personal, scholarly homepages (written in simple html) that link to PDF files (by way of a meta data page, as in arXiv).

        SJS seems also to offer a “journal like” surface. It’s that sort of thing that I envisage will one day be a thing of the past. My vision is that libraries will function as precisely that: stores of documents that can be easily accessed. Knowledge will change slowly as more documents (i.e., research reports) are added and the relationships between them develop. There will be an ever changing “state of the art”, not a series of “publishing events” in which “a recent study shows…”.

        • > There will be an ever changing “state of the art”, not a series of “publishing events” in which “a recent study shows…”.
          That would be nice – and likely _papers_ will be like github repositories for continuous collaborative development.

          Perhaps an adventurist university might be foolish enough to pioneer such as change in academic survival.

    • I am no fan of the current system but I fear that the system that you discuss would give great advantage to the students of well-connected researchers. One of the very few merits of the current system is that the supposedly anonymous review process still allows the unknown to get published in top-tier outlets and to get known in that way,

      • I think something like that could happen in a completely open, repository system too. An unknown at a marginal institution comes up with a great result and publishes it in the repository. It gets noticed at a conference, or on a blog, or simply by the author informing some key people by email. After a bit of correspondence and, perhaps, a replication, the scholar is off and running.

        • That is certainly possible but your work is far more likely to be noticed if you submit at the Harvard or MIT digital depository than if your work was found at some obscure state (or non-USA) university.

        • I’m not sure that’s true. The repository should be a pretty passive place. People just upload stuff to it. The mere fact of being uploaded shouldn’t be an event. (Most of the stuff that is uploaded to Harvard or MIT will be unremarkable.) The effect occurs when your paper actually makes a contribution to someone’s understanding of the world. Then all they need is a stable URL.

        • In order to have an impact that paper needs to be read and I suspect that many great papers from researchers from small or obscure or foreign universities would never even be read if simply posted online somewhere. I also suspect that the speed at which the better ideas are disseminated would slow down substantially. Would anyone have read the very important work of a 21-year old Albert Einstein on intermolecular forces (published five years before he had his PhD) if it had not been published in the journal “Annalen der Physik”?

        • I don’t remember the story in its details. But I think Einstein’s ideas had impressed key people before they were published. Indeed, I think he needed the help of those people to make them “publishable”. In any case, I don’t think we can or should thank the current system of scholarly publication for making Einstein’s career possible. However he pulled it off, he’s a genius. Most of us want the most efficient way of making ordinary knowable things public.

        • I agree with the general goal you state and I also think that the current system is deeply flawed. I just think that the depository system would have been the death knell for my own career in its early years. In a discipline with thousands of researchers writing many thousands of papers every year no one would have paid any attention to the work of a graduate student or an assistant professor at a tiny teaching university. Once you are established then people are interested in your ideas and findings but we obviously cannot pay attention to everyone.

        • It’s important to remember that if a new system is implemented, everyone would use it. I agree with Martha (below) that notifications would have to supplement the e-prints in the repository. So you would have to do a little self-promotion (instead of dealing with reviewers and editors). Also, you’d be competing for your first jobs with others in the same situation. It is very likely that your first tenure-track hire on the new system would be based solely on a reading of your work. This would replace being hired based on a couple of top-tier publications that haven’t yet been cited by anyone (and may never be cited).

        • Uploading needs to be supplemented by notification. I’m thinking of a modern version of what was once the custom in math: One would send preprints of papers to colleagues who might be interested in the topic. This can now easily be replaced by submitting the paper to the digital depository, then sending an email to a list of possibly interested colleagues notifying them of the submission of the paper, including an abstract and information on accessing the paper. (I haven’t been active in math research for many years, but would guess that this is how the arXiv has been used by mathematicians.)

    • There’s room for journals, but they could play a different role. Studies, data and critiques would go into a public repository. Journals could feature articles of a theoretical, historical, literary, or philosophical nature (within a particular field).

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *