How should journals handle submissions of non-reproducible research?

We had a big discussion recently about the article, “A Graph Placement Methodology for Fast Chip Design,” which was published in Nature, and was then followed up by a rebuttal, “Stronger Baselines for Evaluating Deep Learning in Chip Placement.” Both articles are based on work done at Google, and there was some controversy because it seems that Google management did not allow the Stronger Baselines article to be submitted to a journal, hence its appearance in samizdat form.

I won’t address the media coverage, the personal disputes in this case, or any issues of what has or should be done within Google.

Rather, I want to discuss two questions relevant to outsiders such as myself:

1. How should we judge the claims in the two papers?

2. How should scientific journals such as Nature handle submissions of non-reproducible research?

The technical controversy

Just as a reminder, here are the abstracts to the two articles:

A Graph Placement Methodology for Fast Chip Design:

Stronger Baselines for Evaluating Deep Learning in Chip Placement:

The Nature paper focuses on the comparison to human designers, but here’s what is said in the conclusion of the Stronger Baselines paper:

So they’re just flat-out disagreeing, and this seems easy enough to resolve: just try the different methods on some well-defined examples and you’re done.

I’d say this comparison could be performed by some trusted third party—but the Stronger Baselines paper was already by a trusted third party. so I guess now we need some meta-level of trust. Anyway, it seems that it should be possible to do all this without having to resolve the question without having to go to the Google executives who are defending the Nature paper and ask them where exactly they disagree with the Stronger Baselines paper.

Although this comparison should be easy enough, it would require having all the code available, and maybe that isn’t possible. Which brings us to our second question . . .

How should scientific journals such as Nature handle submissions of non-reproducible research?

A big deal with reproducibility in computing is training, choosing hyperparameters, settings, options, etc. From the Nature paper:

For our method, we use a policy pre-trained on the largest dataset (20 TPU blocks) and then fine-tune it on five target unseen blocks (denoted as blocks 1–5) for no more than 6 h. For confidentiality reasons, we cannot disclose the details of these blocks, but each contains up to a few hundred macros and millions of standard cells. . . .

For confidentiality reasons, it’s not reproducible.

Fair enough. Google wants to make money and not give away their trade secrets. They have no obligation to share their code, and we have no obligation to believe their claims. (I’m not saying their claims are wrong, I’m just saying that in the absence of reproducible evidence, we have a choice of whether to accept their assertions.) They do have this Github page, which is better than you’ll see for most scientific papers; I still don’t know if the results in the published article can be reproduced from scratch.

But here’s the question. What should a journal such as Nature do with this sort of submission?

From one perspective, the answer is easy: if it’s not reproducible, it’s not public science, so don’t publish it. Nature gets tons of submissions; it wouldn’t kill them to only publish reproducible research.

At first, that’s where I stood: if it can’t be replicated, don’t publish it.

But then there’s a problem. In many subfields of science and engineering, the best work is done in private industry, and if you set a rule that you’re only publishing fully replicable work, you’re be excluding the state of the art, just publishing old or inferior stuff. It would be like watching amateur sports when the pros are playing elsewhere.

So where does this leave us? Publish stuff that can’t be replicated or rule out vast swathes of the forefront of research. Not a pleasant choice.

But I have a solution. Here it is: Journals can publish unreplicable work, but publish it in the News section, not the Research section.

Instead of “A Graph Placement Methodology for Fast Chip Design,” it would be “Google Researchers Claim a Graph Placement Methodology for Fast Chip Design.”

The paper could still be refereed (the review reports for this particular Nature article are here); it would just be a news report not a research article because the claims can’t be independently verified.

This solution should make everybody happy: corporate researchers can publish their results without giving away trade secrets, journals can stay up-to-date even in areas where the best work is being done in private, and readers and journalists can be made immediately aware of which articles in the journal are reproducible and which are not.

P.S. I’ve published lots of articles with non-reproducible research, sometimes because the data couldn’t be released for legal or commercial reasons, often simply because we did not put all the code in one place. I’m not proud of this; it’s just how things were generally done back then. I’m mentioning this just to emphasize that I don’t think it’s some kind of moral failure to have published non-reproducible research. What I’d like is for us to all do better going forward.

40 thoughts on “How should journals handle submissions of non-reproducible research?

    • Mark:

      I answered this question in the above post:

      The paper could still be refereed (the review reports for this particular Nature article are here); it would just be a news report not a research article because the claims can’t be independently verified.

  1. IMO when corporate researchers publish an article with amazing claims but can’t give the details that allow the work to be reproduced, it’s really just a PR stunt isn’t it? They are using the journal to provide a stamp of authenticity on a claim that can’t be verified. How are these articles peer reviewed? By insiders at the company? How does anyone know this “cutting edge” research is really “cutting edge” and not just BS? Not that I think it is in this case or any other, but it might as well be for all that can be verified.

    On the one hand I agree that this *information* could be in the “news” section rather than the research section of a journal, but if it’s written by company personnel isn’t it still just a promotion? What if Amazon wrote a paper on the working conditions in its warehouses, sent it to the NYT, and NYT just published it straight up? Would that be appropriate? If the paper was published on the company website wouldn’t it be just as newsworthy? Then a science journalist – an independent party – could report on it and break down the claims. Then it would be “news” rather than “promotion”.

    I don’t think this should be published in a journal at all. News or not. The details can’t be released so it can’t be replicated, so **no one really knows** if it’s cutting edge or not, and when a company writes articles about itself or its own work, it’s not appropriate to call it “news”. There’s nothing stopping the company from publishing the paper on its own website if it’s that important. But that’s not as useful to them because it doesn’t provide the illusion of authenticity that the journal provides.

    • Anon:

      I see what you’re saying, but the trouble is that this would lock the journals out of all the fields of science and engineering where the best work is done privately. The journal can still send the paper out for review, and the reviewers can judge problems with the work and require revisions. Nobody knows what Daryl Bem really did with his ESP experiments either; it’s trust all the way down.

    • I think there’s a little more value that that. In principle, another company that manufactures chips can make use of the reinforcement learning approach, even fork their source code, to lay out their own proprietary chips. If it doesn’t work, no one can conclusively say if it’s a fault of the implementation or the idea, or confirm that it ever worked on the first architecture in the first place, so it’s not really a replication, but it’s worth a try at least.

    • Andrew, Somebody:

      My response is that all of your concerns are already covered by the paper being posted on the company website. In terms of distribution this seems highly preferable even, since many journals are gated. If the company is excited about promoting the results, then of course they can notify the “news” sections of journals and trade pubs and give talks at conferences, many of which probably already have appropriate forums.

  2. Should the policy vary by field of research? In some fields, publicly releasing and open-sourcing everything are de rigueur. In others, much is proprietary or sensitive (often to protect privacy or prevent harm). An extreme form of the latter could be research on military intelligence. Also, different kinds of research projects differ in the amount of effort required to prepare the research materials for public release. In some areas, the amount of effort is trivial and presumably such effort should be mandatory. In other areas, requiring public release would require so much effort as to prevent anyone from publishing anything. In the experimental sciences, the longstanding standard seems to have been that researchers must describe their procedures in sufficient detail to allow an expert in the area to reproduce the results, but need not necessarily open their laboratories and equipment to the public (albeit during some controversies they would open them to visiting experts). More recently, depositing data (but not physical equipment) in public repositories has become increasingly expected. Software systems can be more like data or more like physical laboratory equipment, depending on their scale, complexity, and domain.

  3. The original article is not replicable, but the Stronger Baselines article is (at least to anyone with access to an expensive GPU cluster) since they use an open benchmark. Since the two articles make nearly incompatible claims, and Google claims the Stronger Baselines article is too technically flawed to publish, isn’t it worth a replication attempt? Fork the RL code, run it on the IBM benchmark, and see if it does better or worse than RePlAcE. I’m sure someone at OpenAI/Facebook/AWS can do it in a couple days.

  4. Isn’t the difference who writes the news articles? The researchers don’t — indeed they aren’t even responsible for what’s in the story.

    How about just creating a new section of the journal for irreproducible results?

  5. Something here feels very wrong to me. But I wonder how analogous problems used to be handled 50-60 years ago, in the days when Bell Telephone Labs was producing a lot of cutting-edge research on the psychology of hearing and speech recognition. How was that research published, and by whom?

    • > How was that research published, and by whom?

      Much of that work was published in traditional journals. For example, Roger Shepard’s work at Bell Labs was regularly published in Psychological Review, Journal of Experimental Psychology, Psychometrika, Journal of the Acoustical Society of America, Journal of Mathematical Psychology, etc. The work included both computation-heavy (for the time) developments in multidimensional scaling (along with Kruskal) as well as human subjects experiments that were run the same way they would have been in an academic department.

  6. I guess the problem becomes when corporate research becomes top-tier and traditional academic publishing is the minor’s league. Companies like Google only need academic journals to make their work seem credible. They have the means to publish it, and it won’t just languish in obscurity, as a random paper by an assistant prof from an at-of university might. The only thing Google gets out of publishing in academic journals is that they can say, ”it’s valid because it appears in an academic journal.” Wouldn’t downgrading that work to an announcement mean losing the kind of credibility they are trying to get by publishing it in the first place?

    But it does seem like academic journals in some fields are irrelevant if they don’t include the research of private companies. But if that’s so, maybe it just means that they aren’t needed anymore? Why do the journals exist in the first place?

    • > Why do the journals exist in the first place?

      To the broader question of why would corporations publish/do open-source, it seems like this would be a simple way to market from one side of the company to another (or within all the companies working inside a big company like Alphabet).

      If it’s not published/marketed using external tools then you gotta figure out how to do that internally. Just more work. Also if your thing is popular then you can hire people familiar with your toolsets and you’re good to go. Tensorflow is a Google project and I think PyTorch at least started as a Facebook one (really not sure what the relationships between projects and companies now are in either case).

      I suppose that drifts maybe too far from the specifics of the routing problem here but just a guess.

  7. Just to continue the thought, the current situations reminds me of the marriage of convenience at the turn of the 19th/20th centuries between a dying aristocracy (titles but no money) and rising class of businessmen (money but no titles).

    • Anon:

      I took a look but it left me confused. It says, “the Venn diagram showing the funders of ML and politicians is a circle.” That doesn’t make sense to me. There are some funders of ML who are politicians, some funders of ML who are not politicians, some non-funders of ML who are politicians, and of course lots of people in neither category.

      • I think the author means that the Venn diagram “funders of ML” and “[funders of] politicians” (not the politicians themselves) is a circle. Allowing for some hyperbole, the contention that funders of ML also spend a lot on political lobbying, and vice versa, is a reasonable one.

        • Adede:

          Sure, funders of machine learning spend a lot on political lobbying, but I don’t see this Venn thing working even as hyperbole, as lots of funders of machine learning are not funders of machine learning. I guess this is just a case of the author of the blog being too clever by half: instead of simply saying that big companies spend a lot on lobbying, he or she constructs this Venn diagram scenario that doesn’t even make sense.

  8. I’m confused by the conflation of reproducibility and replicability in this post. For me these are two different things. Reproducibility for me involves regenerating the results given the original data and code; replicability is about whether the same experiment done again yields similar results to the original. So a reproducibility crisis would be a different animal than a replicability crisis.

    • Shravan:

      Setting aside issues of terminology—I don’t think everyone would define the words “reproducibility” and “replicability” the way you do–in this case I think both issues arise. First, it’s not possible to repeat the exact experiment done by the authors of the Nature paper as they have left out some of the necessary steps. Second, it’s not clear that their method, if it indeed works as advertised on their example, would work on other problems. More generally, if you want to replicate (using your terms) a published experiment, then a natural first step is to reproduce it (again, using your meaning of the word). So I think the two things are the same “animal,” just in two slightly different stages of development.

  9. I use the terms like you do (I think I got it from Roger Peng, and I think he was got it from Jon Claerbout). But I think many in the machine learning community follow Chris Drummond (“Replicability is not Reproducibility:
    Nor is it Good Science”) and of course the “Reproducibility Project” is the “Replication Project” by your definition. I do think it’s unfortunate that experts use the same terms to define the same distinction but in opposite ways, but we’re stuck with it now, and you need to rely upon context to see which one is meant. (And we linguists have a similar issue. Some people use the term “determiner” to refer to the function and the term “determinative to refer to the word class, whereas others use the term “determiner” to refer to the word class and the term “determinative” to refer to the function).

    Personally, I think the distinction is worth preserving but with some added ones as well: “robustness” (analyse the same data with different assumptions) and “generalizability” (different data and different methods). I think the problem with the latter is that it sounds like conceptual replication, and the problems with that have been well discussed here.

  10. The actual source code is available? That’s already a huge improvement over past practice. At least you can try it out on other data sets and see how it does. From an engineering standpoint, the ability to easily try out the method on a variety of other problems is more important than reproducing the exact problem on which they tried the method.

  11. You say ” I’ve published lots of articles with non-reproducible research, sometimes because the data couldn’t be released for legal or commercial reasons” and this is certainly true for many of us. But I don’t think we should let this pass too easily – as you say, we should do better. I think we need to ask what the value is of research that can’t be reproduced due to legal or commercial reasons. If it may influence public policy, then I don’t think legal or commercial reasons is a sufficient excuse. Certainly some commercial research is proprietary and publication of the results may be of value to other researchers. But it is often used (either by the authors or others) to influence public policy, and then the commercial reason excuse becomes thin (perhaps too thin to use).

    Here is a case in point: https://www.nber.org/papers/w30028#fromrss. The authors expend a great deal of energy developing the data set they use – as we’ve discussed before, they should receive credit for that. More credit than current systems provide, where only the publication gets credit. This particular research is interesting in its own right. But it also pertains to important public policy issues – GDPR and other privacy related policies. Since this research may be used to influence policy, I’m not sure they should be permitted to claim commercial reasons as an excuse for not releasing the data. To be fair, they don’t claim any particular reasons – they just follow standard practice of not releasing their data set. Also, to be fair, it is a paper in the NBER working paper series, so it is not yet published in a peer-reviewed journal (though I have no doubt it will eventually appear, perhaps in several papers).

    But my point is that this paper is a good example of important research that cannot easily be reproduced (reading it reveals many many forking paths taken), but whose value is complex to determine. There are methodological features of interest, interesting data, but also potential policy implications of the research. Personally, I’d prefer to see a requirement that the data be provided – along with recognition and credit for assembling the data used in the paper. But, without the data being available, I’m not sure this should be published.

    • Dale:

      A few years ago some colleagues and I published an article, “An analysis of the NYPD’s stop-and-frisk policy in the context of claims of racial bias,” with policy-relevant conclusions. For legal reasons we were not able to make the data public. I think it’s fine that our article was published; it also would’ve been fine had there been a big red banner across the top of the first page saying something like: WARNING: THE DATA USED IN THIS ARTICLE ARE NOT AVAILABLE.

      • Question: why was the data not available? I don’t mean this to indicate that you should have provided it – but when I read that article I see no good reason why it needed to be withheld. I’m sure it wasn’t up to you but there appears to be no real check on an entity declaring that the data is commercial, proprietary, or needs to be withheld to protect privacy – regardless of whether or not that is a viable claim.

        • Dale:

          It was a project originating with the office of the New York State Attorney General, and they told us that we were forbidden to release the data. I asked them what would happen if we released the data anyway and they told us they’d sue us, or prosecute us, I don’t remember which.

        • OK, let me walk that back just a bit.

          Seems like there’s a pretty big distinction between researcher availability and public availability. Researchers can be held accountable for inappropriate release of data. Data sets could be given a “privacy” ranking. I suggest three levels

          Level 3: the data cannot be accessed by any other researchers or individuals. This must be justified extensively
          Level 2: the data can be available to “qualified researchers” on a limited basis (2?-5? investigations FCFS) This must be justified
          Level 1: the data can be available to “qualified researchers” on an unlimited basis. This need not be justified.
          Level 0: no data restrictions

          1. “qualified researchers” would include any investigator with access to NSF/NIH or other Federal grants, possibly more
          2. The institutions supporting researchers would agree to penalties or suspensions for privacy violations
          3. FCFS = first-come, first served on a qualification pending basis

          Think of it as a sketch framework.

        • I think the “qualified researchers” is just gatekeeping bullshit. Data can either not be released at all (level 3 in your scheme), can be accessed by people who sign some document about keeping the data limited (level 2) or is open… The idea that we’re going to keep say workers at a tech company from seeing the data because they aren’t at an “approved institution” or that interested bloggers can’t have the data because they don’t have a PhD or they don’t have NSF funding or such like that is full of crap.

          I would also argue that if you’re going to do studies to get a drug approved, then you must have all the participants agree that the data may be released publicly and must release the data in computer readable format publicly on the web before the FDA can approve it. We need public review not just “trust us we’re the FDA”.

        • Daniel Said: “I think the “qualified researchers” is just gatekeeping bullshit. :”

          You won’t get anywhere with that. When privacy is breached it can’t be put back in the bottle. If you want people to allow the use of private data, the penalty has to be strong enough that violations would be not just unlikely, but would not occur at all. “Sign some document” isn’t sufficient. The penalty for violations must have serious career implications that the employer must agree to implement – or lose access for all their employees. That rules out random bloggers. Better for some people to have access than for no one to have access. Workers at tech companies could conceivably have access as well, pending their company’s agreement to the rules. Random unemployed individuals or free lancers? Nope.

        • There’s no “breach of privacy” if you’re intentionally giving out the data to everyone, it’s just “this isn’t private data”.

          The point is that there is plenty of *non private* (ie. data where the participants would be happy to sign a document saying their data could be released) which is nevertheless gate-kept not for privacy reasons but for furthering people’s career by making it hard to use the data. That’s just bullshit.

    • Some research that can’t be reproduced due to legal or commercial reasons is clearly valuable. A lot of the work during the pandemic was performed on raw health records from patients under very tight legally-binding ethics agreements. There’s no way that that information could be released with the paper legally (and I would argue that even if the legal barriers were not in place, it would have been massively unethical to even consider doing so). That work directly lead to changes in patient management (and the implementation/removal of restrictions in some countries). It’s hard to argue that those works were not publishable, or that the data should have been mandated for publication to occur.

      • I want to push back on your claim that “a lot of the work during the pandemic” used data that could not legally or ethically been released. The legal restrictions are clear, although I believe they go too far. Ethical restrictions are less clear. I do agree that there is danger in releasing patient information, especially if it cannot be adequately anonymized. But if we stop imagining the horrors to how it could be misused, I think much of it is not really so sensitive. As I’ve said, I’m willing to share all of my health records with anyone that is interested.

        Normally, I would think things like vaccination status and other health conditions along with COVID data should not really be so controversial. It is unfortunate (disastrous in many ways) that even vaccination status has become so politicized that it isn’t unrealistic to imagine personal vaccination status being used in harmful ways. But in a more civil setting, I don’t see why such data really needs to be protected.

        The early hydroxychloroquine study’s data (the initial French study) was released and there were 36 participants. It was not de-identified although it wasn’t sufficient to figure out who the participants were. In fact, a bit more data would have been useful, albeit making identification potentially feasible. Would it have been unethical for that data to be released? I guess it is debatable, but the answer is far from obvious.

        I don’t doubt the need for privacy restrictions. But I do question the automatic response that some horrible ethical boundary is crossed any time health data is made available. Your claims about the necessity of keeping a lot of the pandemic data secret is, in my view, unsubstantiated. HIPAA aside, the moral grounds for keeping COVID data secret is not so clear to me.

  12. I personally would not participate (as a research subject) in any study which was going to make my medical records data publically available, no matter how well they assure me it will be anonymized. If that were on the informed consent documents, surely it would reduce the potential pool of participants.

    • When the NEJM did the SPRINT competition a few years ago, they held a conference to discuss the competition and the general idea of opening clinical trial data. One of the panels was comprised of participants in the trial – their motivations (almost universally) were (1) to hope to get an improved treatment, and (2) to help improve public health. They were literally shocked to hear that the clinical trial data was not generally made available publicly. So, you might not participate, but plenty of people would. As for my medical records data, I’m willing to provide that publicly for free (if I could only figure out how to get it all).

      • exactly, I think it’s far more common for people to be shocked that the “research” they’re participating in is in fact a goldmine that the experimenters are going to use for personal gain rather than something to be made available to everyone. Most people go into these things trying to help their community, and discover later that it’s mostly about getting grants and helping the institution and are disillusioned.

        • “and discover later that it’s mostly about getting grants and helping the institution”

          e.g., most of the work is useless to begin with because the studies are poorly designed and/or have insufficient participation and/or any number of problems that are discussed here on a regular basis. That’s the “getting grants” part of the game – putting any old POS out there and sucking the public unknowing into contributing to the stupidity.

Leave a Reply

Your email address will not be published. Required fields are marked *