In the open-source software world, bug reports are welcome. In the science publication world, bug reports are resisted, opposed, buried.

Mark Tuttle writes:

If/when the spirit moves you, you should contrast the success of the open software movement with the challenge of published research.

In the former case, discovery of bugs, or of better ways of doing things, is almost always WELCOMED. In some cases, submitters of bug reports, patches, suggestions, etc. get “merit badges” or other public recognition. You could relate your experience with Stan here.

In contrast, as you observe often, a bug report or suggestion regarding a published paper is treated as a hostile interaction. This is one thing that happens to me during more open review processes, whether anonymous or not. The first time it happened I was surprised. Silly me, I expected to be thanked for contributing to the quality of the result (or so I thought).

Thus, a simple-to-state challenge is to make publication of research, especially research based on data, more like open software.

As you know, sometimes open software gets to be really good because so many eyes have reviewed it carefully.

I just posted something related yesterday, so this is as good a time to respond to Tuttle’s point, which I this is a good one. We’ve actually spent some time thinking about how to better reward people in the Stan community who help out with development and user advice.

Regarding resistance to “bug reports” in science, here’s what I wrote last year:

We learn from our mistakes, but only if we recognize that they are mistakes. Debugging is a collaborative process. If you approve some code and I find a bug in it, I’m not an adversary, I’m a collaborator. If you try to paint me as an “adversary” in order to avoid having to correct the bug, that’s your problem.

It’s a point worth making over and over. Getting back to the comparison with “bug reports,” I guess the issue is that developers want their software to work. Bugs are the enemy! In contrast, many researchers just want to believe (and have others believe) that their ideas are correct. For them, errors are not the enemy; rather, the the enemy is any admission of defeat. Their response to criticism is to paper over any cracks in their argument and hope that nobody will notice or care. With software it’s harder to do that, because your system will keep breaking, or giving the wrong answer. (With hardware, I suppose it’s even more difficult to close your eyes to problems.)

45 thoughts on “In the open-source software world, bug reports are welcome. In the science publication world, bug reports are resisted, opposed, buried.

  1. I can’t help but think that part of this discussion hinges on the meaning of “published”, “peer reviewed”, etc. Other discussions have talked about a need to redefine publishing and peer review in light of new developments (this blog and blogs like it, arXiv, etc) and in light of old weaknesses (statistical education consisting of p-values, gatekeepers with agendas, “pal reviews”, etc).

  2. Good point. It’s worth noting that many (most?) successful open source projects have pretty strong norm about how criticism is delivered. For instance, https://www.python.org/psf/codeofconduct/

    Yesterday, Andrew wrote: “I think it’s great to have your work criticized by strangers online.” I completely agree.

    But then he gives an example and says: “The criticisms were online, non-peer-reviewed, by a stranger, and actually kinda rude. So what, who cares!” I think it’s admirable that he was able to take it so well, and I hope I could say the same if it happened to me. However, it is a social fact that a lot of people *do* care, and it would be helpful to establish norms akin to the open source community’s codes of conduct.

    Tact is important, but it’s also hard. My own experience trying to publish replications is that you have incentives to play up the differences between your results and the original, otherwise reviewers get bored and you get the dreaded “not important enough for journal X” response.

    In any case, given that people are what they are and that science is also a social enterprise, I don’t think it’s as simple as saying “who cares”. Many in the open source community understand that.

  3. I don’t work on open source, but I can say with confidence that in corporate software development, the people who report bugs are generally not welcomed with open arms. Developers will gladly cry “user error” even when it is clear the software could be made better. I suspect, like in research, it’s a question of what is motivating you – the joy of building / discovering or the fear of getting fired/losing face or simply not enjoying the crossing T’s and dotting I’s that it takes to make really good software.

    • Yeah, there may be other parallels.

      “Open source” scientists may have similar values as open source programmers. They value transparency, they do things for the ‘greater good’, they want others to use what they’ve done however they wish, they invite criticisms, they want to make sure they’ve themselves done things correctly, they want post-dissemination review, etc.

      “Closed source” scientists don’t want to be scooped, they don’t want people to know the shakiness of the methods used, they don’t want to expose the flaws, etc.

      I would say that the open source software world is quite similar to the open-source science world, and the closed source software world is more similar to the closed source science world, in terms of practices and values – “It’s not a bug, it’s a feature!” “It’s not bad data; it’s a data-dive!”

    • Yes, “generally not welcomed with open arms”, in large part because you’re creating more work. They don’t like more work.

      I also see a divide. In general, production developers (whose code might be traversed millions of times a week and is central to some function of the company) tend to respond better. This is particularly true if they have an opportunity to fix the problem behind the scenes or in a way that they can take some credit for fixing it. If they want to say “we fixed this problem and zbicyclist was of some small help” instead of “zbicyclist pointed out a big error in our code”, I’m fine with that. I depend on them; they depend on me.

      Where I’ve noticed more resistance/hostility is in one-off projects, particularly once they’ve been reported out (internally or to an external client). This is, of course, more analogous to criticizing an accepted journal article, since most academic journal articles are basically one-off projects.

      And, not surprisingly, most of the real boner errors are going to be in one-off projects where the aim is to make it “good enough” for the client, or “good enough” for publication, so you can go on to the next project.

    • Only the best open source projects have good bug-reporting histories. I’d guess that most projects are small projects that don’t get bug reports (and aren’t maintained), many are medium sized and get bug reports but don’t have the time or manpower to do anything about them, and some get bug reports that are useful and supported. On the other hand, many projects feel the need to explicitly state a Code of Conduct (see https://www.contributor-covenant.org/ for one) because reports have drawn criticism and outright vile in the past.

      Finally, in the closed-source software realm, for some projects attempting to report potentially serious security flaws is met by lawyers and law enforcement. There is recognition among security researchers that this needs to change and that it has limited reporting of issues and limited the validity/security of software we all use. This is very similar to the publishing world having sometimes hostile reactions to comments.

      • Only the best open source projects have good bug-reporting histories. I’d guess that most projects are small projects that don’t get bug reports (and aren’t maintained), many are medium sized and get bug reports but don’t have the time or manpower to do anything about them, and some get bug reports that are useful and supported.

        Pretty sure I’ve heard an analogous argument regarding post-publication peer review also: that it’s hard enough to get people to review papers before publication, and that if we revamp our system in any way that relies on papers getting good critical attention after publication, we’ll be making the noise problem worse. (Which is not to say we shouldn’t make it reasonably easy to disseminate such critical attention, for whatever fraction of papers garner enough interest.)

        • It takes me a long time to do a solid paper review, I read the paper and all the notes multiple times, really consider the analysis approach, theoretical framing, literature review and quality of the writing. Then suggesting alternative approaches, etc. This is true for most people I know. The idea that quick comments post publication are going to replace that is really unreasonable.

        • It’s good to hear that you and most people you know take a long time and do a solid paper review. I suspect that many people aren’t that thorough with pre-publication reviews.

    • I do work in open source, and I can assure you that the first response to a lot of bugs is “User Error” but that doesn’t mean that the poor usability (which is what often causes user error) doesn’t get fixed eventually. I think that’s the key difference.

  4. Another issue is that many scientists see coding as a means to an end..and little more. Unlike good programmers who write clean code as a matter of pride and professional responsibility, scientists just want the thing to be done, no matter how buggy or messy the result. Once published, the software becomes abandonware; the scientist has already moved on to the next thing.

    Some researchers have the sense (and budget) to hire programmers to do the implementation, but the best programmers don’t want to work in academic research. The best programmers want to work with the best tools (scientific software is terrible), they want to take pride in their code and documentation (no one cares), they want to write tests (“what’s a unit test?”), and they want to be paid commensurate with their abilities (any private industry pays better).

    There aren’t many areas of research left where you can get away with doing zero coding. There needs to be a massive realignment of our collective values such that software development is seen as an integral part of the research process and not an annoyance.

  5. Maybe some of it has to do with the fact that a program can have its bugs fixed and version 2.0 is perceived as being an improvement (and a sign of a healthy, active project). But issuing a correction for a paper makes it look bad. And having to issue a retraction is purveyor as a potentially career-ending event.

    It might also tie into the hypercompetition of academia vs the less job-scarce realm of software.

    Finally, I’ve noticed that some of the big names in science criticism can come across as self-righteous assholes. E.g.: https://mobile.twitter.com/OmnesResNetwork/status/921769621434388480.

    I agree with Jordan’s point and I think his work is great, but ad hominems and profanity do not contribute to anything. Bug reports filed with the same level of in civility would also be poorly received, I imagine. And I know that tweet is not the full bug report (I have no complaints about his preprints and publications), but I think the point still stands.

    • The thread he links to there is also interesting, in that it shows how some outsiders (here, Neil Gaiman) perceive the NYT article as revealing sexism and bullying. Obviously people read it in varied ways.

      In some sense, this supports Anaya’s comments elsewhere that he is grateful that he’s been finding problems in the work of Brian Wansink, so at least he can avoid sexism accusations.

      • One of the more frustrating things in the world is when someone from your “team” is a jerk.

        1. If someone from my team is civil that is great.
        2. If someone from the opposing team is civil that is also great.
        3. If someone from the opposing team is a jerk that’s fine too! My opponents are wrong AND they’re jerks, that is easy to accept and fits very neatly into my worldview.
        4. But when someone from my team is a jerk it just makes me sad. Not only is it unpleasant, I suspect it is generally ineffective as a persuasion tool (due to point 3).

      • I think the comments from Gaiman (essentially that the effectiveness of Andrew’s work is totally irrelevant since he has criticized a woman’s work harshly) are telling. When we do science out in the open, we are also necessarily doing it in the public eye, and the public eye is not going to see things through our lenses.

        • Kyle:

          Framing is all. I’m pretty sure that if Susan Dominus had written an article for the NYT on Eva Ranehill, Anna Dreber, and Dana Carney, and the struggles these women had in bucking the power-pose Ted-world media machine, and on how the male researchers Simmons and Simonsohn had lent their support to the brave outsiders, then Gaiman would’ve been tweeting on how wonderful we all had behaved.

          If, as an outsider, the first thing you see is a news article that presents one side of the story, it’s easy for that to drive your perceptions.

        • This is a good point. I’m not especially surprised that the NYT ended up with one of these narratives rather than the other, and it’s probably best to just assume you’re going to be misrepresented and try to do good work regardless.

  6. Another analogy from software might be a complaint or request related to the design of the application. If you create software you are already trying to identify bugs (or you have someone whose job it is to do that), and a report from outside is easily viewed as an assist in that process (because, as others have noted, software often gets updated). Unsolicited input about the design of a feature is often a more direct challenge to how you view the problem you’re trying to solve, even though feature enhancements are part of the software life cycle.

    Criticism of any type can be delivered with varying levels of tact and accepted with differing degrees of grace, but if the message of a bug report is “you made a mistake,” a feature or enhancement request is more likely to be read as “you’re wrong.”

    • A lot depends on the tone. The thing about open source is that you can always say, you are free to implement that yourself. I have some R packages that I have my own versions of because the owners aren’t interested in supporting RStudio Server which I use for my courses, and that’s all fine, because they are open source I can do it myself.

  7. It’s worth noting that all the bug reports I’ve received start with “Thank you so much for making X! This has been extremely helpful in my work. However, I’ve been running into problem Y.” This definitely sets a very positive tone. While I don’t think all criticism should be stated in a positive tone, it’s been a considerably more pleasant experience to receive a bug report than, say, having a paper rejected (even if rejected the paper was the right decision by the reviewer and ultimately lead to a better paper in the end).

  8. The analogy between open source software development and science is not as good as it seems superficially. Software development is engineering rather than science. The measure of success is operational functionality and the tolerance for minor errors and imperfections is far higher than it is in a scientific context (in both cases I am supposing good faith effort). The design, realization, and interpretation of experiments is completely different in concept from the design of a device or tool. A scientific error is not a bug, and is not obviously comparable to a bug.

    On the other hand, the notion that open source works well seems to me poorly justified. There are open source programs that work well, or well enough, but there are many that do not, and their density is probably as great as the density of equally poorly written scientific papers. The difference is that the mechanisms for distributing the two kinds of work are different, and poorly functioning open source software is possibly less visible than poor science.

    Finally, most of the examples of replication problems described on this blog are only marginally characterized as “science”. Most come from social sciences, psychology, applied medicine, etc. areas dominated by observation rather than by experiment, or in which there are admitted as “experiment” things that no chemist/physicist would take seriously as an experiment (“controlled” studies of 30 students answering questionnaires is not an experiment in the same sense that a physicist makes experiments). Perhaps much of the replication crisis has to do with low quality problem formulation in certain areas/contexts (or simply a lower density of talented researchers).

    • I think your observations about where the analogy breaks down are good. All models are wrong &c &c. Maybe this is a useful one, maybe not. To the extent that the open science movement is helping anything, I think it is doing so by building on the OSS movement.

      I think reasoning about the frequency of bad science in various fields based on what’s presented on this blog is not a good plan. I feel certain a few layers of selection bias are in play.

    • Dan:

      It maybe that social science is not real science, but the title of this blog is “Statistical Modeling, Causal Inference, and Social Science” so you’re gonna be hearing a lot about social science here!

    • Really not necessary to add the final paragraph which weakened your point.
      It is true that most people build software to solve a problem rather than to create knowledge, that is fine.

  9. I think this title of this essay is misleading in two ways: (1) while the authors may appreciate bug reports for their software (as I do), the feeling is not universal. The open software world is full of examples where distributions have been branched because different groups of developers disagreed strongly, and loudly, about the direction of a project, or the approach taken to fix bugs. This is not my area of expertise, but I believe that Linux operating system development has become very tightly controlled because of bug-fixing feuds.

    And I think some writers are being disingenuous about how paper “corrections” are being treated. Yes, there are authors who are not comfortable revisiting their publications. But there are also critics who are very quick to move from criticizing the science to criticizing the scientist, if the scientist does not respond correctly. There seems to be an unfortunate tendency in some forums to assume the worst. Unlike a software development project, which is an on-going effort, many paper publications mark the end of a project, the graduate student or post-doc moves on, the datasets are poorly archived, etc. And the peer-review process for publishing corrections (or criticisms) largely does not exist, so there is no “independent” process to evaluate whether criticisms are valid or serious, or whether an author has made a good faith effort to correct them.

    Software production and scientific publishing are human endeavors; both show a distribution of responses to criticism. It is a lot easier to fix a programming bug (if that is what it is) than to correct a paper, so it’s not surprising that there are more bug fixes than paper corrections, even when both groups have similar commitments to accuracy and correctness.

    • “Unlike a software development project, which is an on-going effort, many paper publications mark the end of a project, the graduate student or post-doc moves on, the datasets are poorly archived, etc. And the peer-review process for publishing corrections (or criticisms) largely does not exist, so there is no “independent” process to evaluate whether criticisms are valid or serious, or whether an author has made a good faith effort to correct them.”

      I think part of the point here is that the system needs to change. There are some moves in this direction that are currently useful: Pub Peer, posting “updates” on one’s own website (as Carney did), and blogs (including this one)

      • +1 to the system needing to change.

        Bill Pearson’s point about the lifespan of projects is true, but I think this is a place where scientists can and should do better than we have been. There are really good tools now for making analyses reproducible and preserving them for our future selves. My own workflow isn’t perfect, but with every new project it gets a little bit better.

        Sharing data and code is great where it’s possible and ethical. Do we have a way to share model objects? There’s so little room in papers to describe a model, and if you could publish it, curious people could poke at it and build on it. (Heck, if I knew how to share model objects with my collaborators down the hall that would also be a win.)

        • “There’s so little room in papers to describe a model, and if you could publish it, curious people could poke at it and build on it.”

          +1

  10. I think the important commonality is with Linus Torvalds famous statement “given enough eyeballs, all bugs are shallow” (https://en.wikipedia.org/wiki/Linus%27s_Law) . That is, it can be extremely hard to fix bugs correctly, it may take months to find the actual cause of a problem. But with a lot of people digging it can be much faster.
    That is one of the basic arguments for foss software being more secure — that because people can see the code they can find the security issues. Many, many security issues in foss software are found by helpful people and solved before they are ever found “in the wild.”

    • But, while Linus may believe this, the evidence suggests that either their will never be enough eyeballs, or bugs are more complicated. The past several years (e.g. heartbleed bug) have revealed serious, fundamental bugs in large software projects that were not discovered.

      Today, perhaps because of the mismatch between the complexity of large software systems and the number of careful software readers, most security bugs are probably found by programs designed to break the security by sending zillions of random inputs, rather than by people reading code.

      • However, the “sending zillions of random inputs” approach doesn’t seem likely to be adaptable to critiquing research, so “many eyeballs” may be best approach we can develop for the latter. Thus it may be worth discussing how to alter the system of “incentives” to encourage more research critiquing. In fact, I suspect that research would make greater progress if we had fewer people conducting research and more critiquing it.

      • Even if you premise is true, it’s not an argument for closed source or against what Linus says since mechanized searching for bad code is just using artificial eyeballs and will be more effective with open source since it is not just in-house people writing the search algorithms. Most bugs are not security bugs either. Having done time on a security team for an open source project, I will say that while someone looking for security issues use a combination of automated tools to deliberately (not randomly) search for certain flawed practices which may indicate presence of a problem, then spend time manually working out whether and how the issue might be possible to exploit.

        In contrast, sending zillions of random inputs would be a very poor way of finding vulnerabilities. First, what do you mean by “random”? The worst vulnerabilities usually require specially crafted inputs such as a very specific url often with a request contain very specific data that might be java script or SQL or that would cause an integer overflow. Others require a series of steps. These require good understanding of the software and language being attacked.

        Applying what we know about other things, I think you should ask whether it is really true there are “more” bugs found today or if it might be the case that you and others are more aware of them than you might have been in the past when personal technology was less pervasive. How would you even measure that? Having bugs found is mainly a function of software usage rather than the level of bugginess of code, meaning that widely used software gets a lot more bug reports as people use it for more and more edge cases. I think in the comments on a statistics blog it is worth thinking about measurement and the equivalent of p hacking.

  11. How much do you think Mindset plays into all this, http://blogs.edweek.org/edweek/finding_common_ground/2017/06/misinterpreting_the_growth_mindset_why_were_doing_students_a_disservice.html

    Andrew blogged on it back in http://statmodeling.stat.columbia.edu/2015/10/07/mindset-interventions-are-a-scalable-treatment-for-academic-underachievement-or-not/ and I just added a comment there but I was thinking that perhaps all of science needs a mindset intervention. I think if others approached critisism as Carol Dweck does then things would be much improved.
    https://www.edweek.org/ew/articles/2015/09/23/carol-dweck-revisits-the-growth-mindset.html

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *