The puzzle: Why do scientists typically respond to legitimate scientific criticism in an angry, defensive, closed, non-scientific way? The answer: We’re trained to do this during the process of responding to peer review.

[image of Cantor’s corner]

Here’s the “puzzle,” as we say in social science. Scientific research is all about discovery of the unexpected: to do research, you need to be open to new possibilities, to design experiments to force anomalies, and to learn from them. The sweet spot for any researcher is at Cantor’s corner. (See section 7 of this article for further explanation of the Cantor connection.)

Buuuut . . . researchers are also notorious for being stubborn. In particular, here’s a pattern we see a lot:
– Research team publishes surprising result A based on some “p less than .05” empirical results.
– This publication gets positive attention and the researchers and others in their subfield follow up with open-ended “conceptual replications”: related studies that also attain the “p less than .05” threshold.
– Given the surprising nature of result A, it’s unsurprising that other researchers are skeptical of A. The more theoretically-minded skeptics, or agnostics, demonstrate statistical reasons why these seemingly statistically-significant results can’t be trusted. The more empirically-minded skeptics, or agnostics, run preregistered replications studies, which fail to replicate the original claim.
– At this point, the original researchers do not apply the time-reversal heuristic and conclude that their original study was flawed (forking paths and all that). Instead they double down, insist their original findings are correct, and they come up with lots of little explanations for why the replications aren’t relevant to evaluating their original claims. And they typically just ignore or brush aside the statistical reasons why their original study was too noisy to ever show what they thought they were finding.

So, the puzzle is: researchers are taught to be open to new ideas, research is all about finding new things and being aware of flaws in existing paradigms—but researchers can be sooooo reluctant to abandon their own pet ideas.

OK, some of this we can explain by general “human nature” arguments. But I have another explanation for you, that’s specific to the scientific communication process.

My story goes like this. As scientists, we put a lot of effort into writing articles, typically with collaborators: we work hard on each article, try to get everything right, then we submit to a journal.

What happens next? Sometimes the article is rejected outright, but, if not, we’ll get back some review reports which can have some sharp criticisms: What about X? Have you considered Y? Could Z be biasing your results? Did you consider papers U, V, and W?

The next step is to respond to the review reports, and typically this takes the form of, We considered X, and the result remained significant. Or, We added Y to the model, and the result was in the same direction, marginally significant, so the claim still holds. Or, We adjusted for Z and everything changed . . . hmmmm . . . we then also though about factors P, Q, and R. After including these, as well as Z, our finding still holds. And so on.

The point is: each of the remarks from the reviewers is potentially a sign that our paper is completely wrong, that everything we thought we found is just an artifact of the analysis, that maybe the effect even goes in the opposite direction! But that’s typically not how we take these remarks. Instead, almost invariably, we think of the reviewers’ comments as a set of hoops to jump through: We need to address all the criticisms in order to get the paper published. We think of the reviewers as our opponents, not our allies (except in the case of those reports that only make mild suggestions that don’t threaten our hypotheses).

When I think of the hundreds of papers I’ve published and the, I dunno, thousand or so review reports I’ve had to address in writing revisions, how often have I read a report and said, Hey, I was all wrong? Not very often. Never, maybe?

So, here’s the deal. As scientists, we see serious criticism on a regular basis, and we’re trained to deal with it in a certain way: to respond while making minimal, ideally zero, changes to our scientific claims.

That’s what we do for a living; that’s what we’re trained to do. We think of every critical review report as a pain in the ass that we have to deal with, not as a potential sign that we screwed up.

So, given that training, it’s perhaps little surprise that when our work is scrutinized in post-publication review, we have the same attitude: the expectation that the critic is nitpicking, that we don’t have to change our fundamental claims at all, that if necessary we can do a few supplemental analyses and demonstrate the robustness of our findings to those carping critics.

And that’s the answer to the puzzle: Why do scientists typically respond to legitimate scientific criticism in an angry, defensive, closed, non-scientific way? Because in their careers, starting from the very first paper they submit to a journal in grad school, scientists get regular doses of legitimate scientific criticism, and they’re trained to respond to it in the shallowest way possible, almost never even considering the possibility that their work is fundamentally in error.

P.S. I’m pretty sure I posted on this before but I can’t remember when, so I thought it was simplest to just rewrite from scratch.

P.P.S. Just to clarify—I’m not trying to slam peer review. I think peer review is great; even at its worst it can be a way to convey that a paper has not been clear. My problem is not with peer review but rather with our default way of responding to peer review, which is to figure out how to handle the review comments in whatever way is necessary to get the paper published. I fear that this trains us to respond to post-publication criticism in that same way.

114 thoughts on “The puzzle: Why do scientists typically respond to legitimate scientific criticism in an angry, defensive, closed, non-scientific way? The answer: We’re trained to do this during the process of responding to peer review.

  1. One reason why people don’t want to move on from their hypotheses is that the scientific publishing process takes such a long time.

    For most researchers it takes a long time to develop ideas, run experiments, do the analysis and write up the results to the standard that journals expect. By the time that you get the reviews back for a piece of work it is likely that you are coming towards the end of your funding, or even that your funding has long since run out. If a reviewer points out a likely problem and the author recognises it as such, they are often left with the thought that they don’t have the time to go back to the drawing board. Developing a better idea can happen the next day, but it could also require several months of intense work and those months may not be available.

    The issue here is partly that the journal publishing process is a higher standard than most papers are ready for. A potential solution would be a first stage where authors submitted a stripped-down version (perhaps more like blog post) for review that set out the basic idea and results. This submission would focus on the data, methods and results not so much on the introduction and discussion. This stripped-down version could be much quicker to write/review and occur much sooner in the research process. Only submissions that passed this stage would be ready for the full publication process.

    • I think Liam is on to something here.

      For a lot of papers there is no turning back–the researcher needs the paper published no matter what. They can’t afford to consider the possibility that their paper shouldn’t be published in its current form and needs substantial revisions.

      Grad students need to get a paper published to graduate, postdocs need to get a paper for their job interviews, new PIs need papers for grants and their tenure application.

      So if a paper gets rejected from a journal the response usually isn’t “let’s pause our other projects and see what we can do to improve this paper”, it’s usually “let’s just submit the exact same paper to another journal”.

      And here I can’t really blame researchers. In my experience the quality of 2 or 3 anonymous reviewers usually is not very good, so why should you listen to their advice anyways? The reviewers are essentially just a peanut gallery after all.

      I would suggest that making reviewers publish their reviews with their names could help the situation, but as we’ve seen with letters to the editor, the authors usually respond with their own letter about how none of the conclusions are altered despite the fundamental errors in their article.

      So yeah, we’re just screwed.

      • Jordan:

        When I was in grad school I was doing some work with an untenured prof in another department. We were following up on some work he’d done that he’d already written up and submitted for publication in a leading journal in his field. During our research, I realized that the paper he’d written was fatally flawed—the statistical model he was fitting wasn’t doing what he thought he was doing, and the conclusions in his paper didn’t make sense. There was no fraud; he was just working on the bleeding edge and had bitten off more statistical modeling than he could chew.

        Anyway, during this period he got his paper accepted by the journal. I told him something like, “Too bad, I guess you’ll have to withdraw your paper from the journal since it’s all wrong.” To my surprise he said: No, I’m gonna publish it. He had some reasons for this but they were not good reasons: basically, he had a publication in a good journal and he wasn’t planning to give that up.

        At that point I was a student and didn’t have a lot of experience with scientific publishing, but I had been coauthor on a few published articles from my summer jobs, and it seemed pretty obvious that the point of scientific publication was to present true results and interesting ideas—there was no real role for publishing bad ideas except by accident. This untenured professor, though, he had different ideas. As noted above, I think he managed to convince himself that his erroneous paper had value, but any self-convincing of this sort was at best a rationalization. He didn’t even correct the paper before sending in the final version—I guess because there was no easy way to correct it without admitting that it was wrong. At the time, I thought about sending a letter to the journal editor pointing out the error, but it didn’t seem worth the bother.

        • I had an experience that is at least thematically related to Andrew’s. Actually it had some more complicated elements to it, but for purposes of this discussion here’s the part that matters: someone sent me a paper that they were about to submit for publication, and asked for my comments. (Actually they also offered to make me a co-author, so perhaps I gave it more attention than usual.) I also noticed a “fatal” flaw in the paper: the point of the paper was to test whether a particular computed tomography algorithm could be used to effectively map air pollutant concentrations if path-integrated concentrations were measured along paths with a specific geometry. Rather than testing the approach against real data or even realistic data, they had tested it against simulated data for which the algorithm was guaranteed to be able to find a perfect fit. This meant there was really no point to the paper whatsoever: if you used that arrangement of optical paths with that algorithm on that specific test dataset, it was going to work perfectly, and this told you nothing you didn’t already know. I’m not sure how they had gotten all the way to the point of finishing the paper without realizing this, but they did not in fact realize it until I pointed it out. They agreed that oops, sure enough, they weren’t actually testing whether this set of optical paths would work in the real world, and since it obviously had to work with their test data, they hadn’t learned anything new. They thanked me for my ‘insights’. I assumed they wouldn’t submit the paper…but they did. And it was published. And I was disgusted.

        • Do you think it would be possible to keep the “reward” of publication in a good journal while allowing for correction of the record? One extreme rule that would do this: once you get into a good journal, it’s in – even if your final revision includes a new final section “why the analysis in the preceding article is actually rubbish.” Do you think the untenured prof would have taken that deal? I’m an untenured prof right now, and I’m pretty sure I would.

          This wouldn’t completely break the bad incentives, since you would still be awarded for getting a sloppy article past peer reviewers, but it might mitigate them a little. I can imagine this being “gamed” a little bit in order to introduce e.g. previously-unpublishable negative results into the literature, but that is perhaps as much feature as bug. Presumably your record would look bad if you only published these sorts of quasi-retracted articles, so there would still be some incentive to fight back against some reviews, but it might allow you to concede defeat some percentage of the time.

          To try to link this to Dan Simpson’s post about Bayesian LASSO: it seems inevitable that somebody would write that initial article, and that it would be possible to cook up examples and theorems that make the Bayesian LASSO look like a good idea. If the article were never published, lots of people would waste time writing it down. Having the existing article with a tacked-on ending (perhaps Dan’s blog post) is much better, and allows the original authors to get some credit for their “wasted” time.

          I think the usual response here is that we should just not publish the Bayesian LASSO, since it is taking up space that could have been used by methods that really work. I guess there may be some journals that really do fill up with new and interesting methods that really work every month (Annals of Math?), but there aren’t many and in any case we have the internet.

        • The puzzle is over why people are defensive about published articles. If it isn’t published yet, obvious self-interest is a reason to defend the paper even after you see flaws in it. But people are defensive even after it’s published. Unless it turns out to be fraud or silly error (e.g., coding error), it doesn’t reflect badly on the author, and, in fact, the criticisms increase his citation count. His dean, and lots of us, in fact, will just count publications and citations and not look at the substance.

        • Eric: thanks for replying. I was focused on Andrew’s problem of submitting obviously-flawed research, but in the back of my head I did think that my (non-serious?) proposal gives a partial response to your puzzle as well. To try to pull it out:

          If I make a real mistake in a journal & admit it, the article might get pulled. If it isn’t pulled, I guess there is a bit of a gamble: I certainly take some hit to my reputation, but maybe get some extra citations. Ultimately, though, the main worry is exactly the same as when an error appears pre-publication: I could lose a good line on my resume for being honest, and it seems pretty clear that I am unlikely to get a substantial reward. If it were easy and common to write addenda to articles with errors, perhaps one has less worry here: the article won’t get pulled, and the reputation hit might be diluted by the common acknowledgement that errors are in fact quite common.

          I don’t think it is at all uncommon to face this sort of gamble. The following story from my grad school days, details lightly changed, seems typical: the proof of one lemma in a paper of mine missed a special case, and as a result an error bound that appeared in the statement of the lemma had to be multiplied by a factor of 1.5 or so. The missing case could be filled in with about a line of explanation, and the changed constant in the bound never actually appeared outside of the statement of the lemma (it was always immediately swallowed by big-O notation). Nonetheless, the editor initially decided that the article should be pulled, and it took quite a bit of discussion & another trip through the review process to get it back in.

          If a postdoc came to me and said they’d found a similar error in a paper that was in press, should I tell them to admit it? Have you seen admissions of this sort in reference letters and grant applications?

          It feels a little funny to write this here. By reputation, you (and Andy, and many other big names on the blog) are both fantastic researchers and honest. But many of us are not so fantastic, and are keenly aware of the fact that being straightforward in this way risks our careers and especially those of our students. I’ve been able to be honest in this way without taking too much of a penalty, but I’m aware enough of my many failings that this seems to have taken quite a bit of luck.

          Thanks,
          -NTT

          PS: Lee, I was aware of that story (see in the comments also the perhaps-more-famous stories of Biss, who ended up in politics, and Ionel-Parker, which is an ongoing controversy that you must know more about than I do). I was perhaps a little tongue-in-cheek about this, and guess these errors are not *that* rare even in AoM… certainly Voevodsky famously talked about his opinion that large errors were probably not too uncommon even in big math papers.

        • Really, in my experience fixable errors of this sort rarely cause that kind of problem. It might matter what field of mathematics you are in but my experience has been if you can fix it without adding a full new page you just make the changes (perhaps with a bit of fussiness at the proof stage) and no one cares. If anyone had been reading closely enough to find the changes objectionable they would have caught the initial error.

          Interestingly, despite the fact that something like 80% of published math papers include substantial errors there doesn’t seem to be the same issue of a rotten stock of wrong results undermining the field. I presume this is because of the simple fact that in math a result rarely gets substantial reuse without re-verification, e.g., one or two people might cite a paper without really understanding the argument but if a either a result is pretty quickly forgotten or people need to walk through the result in an attempt to generalize it.

          Perhaps if it was mandatory to publish sufficient information about the dataset to allow its reuse we’d see a similar effect in other fields, i.e., the attempt to reuse/expand the analysis ends up exposing the failure.

          But I suspect we’d have to actually give people resume lines for undermining confidence in someone else’s work.

        • I guess there may be some journals that really do fill up with new and interesting methods that really work every month (Annals of Math?)

          Not quite (bur pretty close): this post from the RetractionWatch blog describes (what seems to be) the unique retraction of an article from the Annals, in 2017; and the PDF file linked to from this webpage at the Annals is called an “Erratum” but seems to qualify practically as a retraction, insofar as its first short paragraph (of two) concludes with the sentence “This affects the proof of the main theorem, Theorem 1.1, which therefore remains a conjecture.”

        • During the second year of my PhD studies, while discussing a first year MSc student’s lab data – there was no formal mentoring or supervision involved, we just shared a lab – an observation I made led him to realize the entire scope of his planned thesis project was based on a sample preparation artifact.

          To his and our supervisor’s credit, they immediately investigated the error, accepted the implications, and switched to a different project. I grew to know our supervisor well enough to learn that had this error been discovered in the third year rather than the third month of a project, he still would not have considered publishing it.

          The incident Andrew recounts is of a somewhat different sort, involving the severe personal and professional pressures faced by untenured professors in which they can find themselves in direct conflict of interest where their scientific ethics and best practices collide with their short-term (and maybe long-term) career interests.

    • Do we really have any evidence that scientists respond in any worse a way? If one was dedicated one could go through the comment logs of this very blog and look at how people respond even in low stakes situations where they have put little effort into their answer and I suspect you will still see an incredibly low willingness to admit mistakes and that willingness drops the more time/effort they have invested in something.

      I know you say some of this could be explained in this way but why not all of it? I mean heck, isn’t religious belief the limiting case of something whose truth matters greatly to the individual and as a result people are willing to do violence rather than admit they were incorrect. The results of your study being true may not be quite as important to you as many people’s religious belief is but it still has a huge effect on how you see yourself and as such we treat it the same way humans treat any beliefs whose truth really matters to us.

  2. Yeah, that is the crux of my puzzlement. Why in the shallowest way? You nailed it Andrew. Kudos. In fact if you respond graciously, some get more bent out of shape. Admittedly, from my own observation, even their legitimate criticism can be presented in cranky manner. Some use extreme language. Now the ones who are gracious tend to be more confident and humble.

  3. Nicely put. I think there’s a perverse incentive here which causes this shallowness and pro-publication bias: many current academic hiring and promotion and grant systems do not reward papers or experiments which we decide are not suitable for publication, even when not publishing was the “right thing” to do. You’ll get rewarded by the system (better job, tenure, grants etc) if you push all papers, even the flawed ones, through peer review (and through your own misgivings), than if you ‘do the right thing’ and pull them if you realize there’s a problem. And those sloppy publications only become a problem if somebody hits the point of retraction-cascade that you described previously, which is still fairly uncommon.

    So long as a researcher with many, mildly-to-moderately sloppy publications is more likely to be rewarded by the hiring and promotions and grants system than a researcher with relatively fewer careful ones, we’re incentivizing the bad actors. And then we hit the main difficulty with discouraging the submission and publication of sloppy, weak, or flawed research: As Upton Sinclair says, “It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”

    It’s no excuse, obviously. And it’s perhaps a bit cynical. But it seems to be one of the issues at the core of the replication crisis. Quantity over quality, as a hiring/promoting/grant-awarding metric, will never favor the careful.

      • I think it’s variably true. In fields where quantitative and research literacy is ubiquitous among committeees, and where sufficient time is given to carefully review applicants’ work, sloppy work (I hope!) won’t be hired/promoted/granted.

        But it’s far more likely in fields where people on the relevant committees understand experimental or quantitative work to varying degrees, where a pretty introduction and larger CV might mask the weak foundations for the few people hiring. And, of course, it can happen anywhere where “Yes, but this applicant has N publications, and this one has N*2. Much more promising tenure case!” is uttered.

        • Yes I think that is possible. I decided that academia had undergone such a shift from 70’s on. All I remember is rancor. My family was largely academic. But it was hoped that I would become an academic. And believe me some were very enthusiastic to have me as their student.

    • And then the litterature gets filled with crap and you can no longer build upon it. You have to redo all the foundations and background yourself if you care about validity of your findings (though, why bother?).

      Meanwhile the patients continue to die.

      • And then there’s money.

        From this week’s The Straight Dope:

        “The problem was described a bit more heatedly in a 2009 article in the New York Review of Books by Marcia Angell — longtime editor at a little pamphlet out of the northeast called the New England Journal of Medicine — about the infiltration of industry money into things like “expert panels” on health issues. Angell cited as an example the National Cholesterol Education Panel, which in 2004 recommended lowering acceptable levels of “bad” cholesterol, and eight of whose nine members proved to have financial ties to cholesterol-drug makers. Angell’s conclusion? “It is simply no longer possible to believe much of the clinical research that is published”

        https://www.straightdope.com/columns/read/9051/how-credible-are-the-new-blood-pressure-guidelines/

        Rhetorical question: how much money do you have to receive before you’re unwilling to question the hand that’s feeding you?

        • Angell would like to believe that the only bad incentives come from profit-making corporations. But non-profits (including, of course, government agencies) also have outcomes that they want and that they’ll pay good money for.

          And every researcher wants to have a reputation for being right, both for the money and the self-esteem. Upton Sinclair didn’t go far enough when he said, “It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”

  4. They’re attached to their ideas because they have career and economic incentives to be attached. It is the rationale, and optimal decision from the scientist’s perspective.

    The issue is that scientists make a name for themselves by discovering *new* ideas. One typically doesn’t get fame and fortunate by confirming old knowledge. They get it by innovating. To innovate, one starts with a hypothesis that, if true, would be novel (rarely do they start with a hypothesis that is consistent with established results). They spend months designing studies – spending time and money to run experiments, collect data, and analyzing data – to test this hypothesis

    They’re not going to easily admit they are wrong after absorbing so many time and money costs.

    Scientists advance by making new, unexpected, unconventional findings.That’s how they get tenure. That’s how they get published. That’s what they discuss at performance reviews when justifying the value they add to the research institution. If in your performance reviews you say “oh I did 10 studies and contributed to the scientific community by finding that these particular hypotheses are NOT true. The conventional thought is more supported”. Well that’s great. A scientific idealist would say this still adds value to the community – to know what is not true. Yes, but it rarely adds money to their wallet.

    Expecting a scientist to easily abandon his hypothesis is unrealistic. There’s simply little economic incentive to do so. Now, there is the potential risk of ending up like Amy Cuddy…but for every Amy Cuddy there are hundreds of scientists who are equally unskeptical of their own work and get away with it! And who in your community will actually call you out for lack of sufficient self-rigor?…it’s a glass house. So, in my opinion. This is happening because scientists are incentivized to behave this way. In terms of practical career and financial goals, this is the optimal strategy.

  5. I think you’re right about this, Andrew. Many academics are indignant about criticism from editors and peer reviewers. (Of course, sometimes that criticism is less than qualified, so perhaps the training goes both ways.) I think PhD students pick up this too early. They aren’t taught (by their elders) to respect their elders. This makes them less likely to respect their peers when they get older. The solution, as I’ve recently argued, is to do away with peer review and put everything out in the open. In the short term, it’s to get used to rejection.

  6. I dunno — to me , it seems like *every* scientific paper is published, or should be, with an implicit understanding that the work might be fundamentally in error. If I ran a study that found some evidence I think supports hypothesis X, and a referee says “you need to consider the possibility that X is totally wrong, and that Y is the explanation for your observation,” I think the right thing to do is to publish the paper and say “It seems to me the observations support X but as a referee helpfully pointed out, it might also be Y.” The other alternative, throwing the paper in the trash, deprives others of whatever utility the observations have to offer!

    Now this is assuming the referee report says “you need to consider Y,” rather than “the methodology of this paper is so fundamentally flawed that the reported observations are valueless and the paper shouldn’t be published” — but in that case, it wouldn’t be up to the author what to do because the paper would just be rejected, right?

  7. There may be a problem with this hypothesis. Defensiveness is a typical response, not a universal response. Yet scientists universally undergo the training that you propose as the explanation. So there must be another factor, quite possibly more important, in determining who gets defensive and who does not. Fragile egos?

    • Jean-Luc:

      Sure, there are lots of explanations; I didn’t mean to imply that my explanation above is the whole story. But it’s an important part, I think. In science, we really are trained by example to respond to criticism with whatever it takes to get our papers published. Sure, we learn from critical referee reports of our work, but the main focus, almost always, is to get the paper published. It’s so rare that I’ve seen the response of, “Hey, the referee is right: our project is completely misguided and we should quit and start over.”

    • Jean-Luc:

      What you wrote made me think of the opposite. Maybe defensiveness is universal, not restricted to scientists. So the solution would need to be broader.

      • If defensiveness is nearly universal (as I think it is), perhaps our time is better spent identifying instances where defensiveness is *not* common and understanding why.

        • My totally nonscientific sense is that defensiveness is primarily a function of one of two factors: number of total publications and personality. In other words, someone with a single published research program almost never accepts critiques of that research program because that’s their entire public persona. Of people with more extensive programs, some accept critiques and some don’t, and the difference seems to be driven by personality. The result in some ways is the inverse of Andrew’s recent points about investment in defending everything you’ve published once you retract: admitting honest error or acknowledging superior subsequent research that produces results at odds with your prior findings in practice is a signal that the researcher’s other work is significant and probably of high quality.

  8. “…they’re trained to respond to it in the shallowest way possible, almost never even considering the possibility that their work is fundamentally in error.”

    I don’t see this to be the case, at least not in the field of ecology, where, I think, many of our students are told that they may be fundamentally in error. I rather see the problem as resulting from dichotomania. We have learned to judge a result as either right or wrong, because of all the thresholds we are using. So of course we react strongly to criticism, because if our research is not true, it must be wrong. However, it may even be questionable whether something can be “completely wrong”: almost never everything we thought we found is just an artifact of the analysis. Almost always there is some sort of effect (perhaps it’s in the opposite direction, of course). I think the best cure would be to describe our results as: “We found that our subjects were green. But we don’t know yet whether this is a general pattern, so we are looking forward to see replications.” Perhaps the day will come when scientists and journals are prepared to accept such a claim as “scientific”. And then, I hope, both criticisms and responses to criticism will be much less angry.

    • No not so much in ecology. It depends on so many factors I would speculate. The personalities of the researchers. The constraints. Their not wanting to appear out of the loop, etc.

      Yes dichotomania. Sander Greenland by any chance? I should call his attention to this thread. LOL

    • Valentin Amrhein said, “I rather see the problem as resulting from dichotomania. We have learned to judge a result as either right or wrong, because of all the thresholds we are using. So of course we react strongly to criticism, because if our research is not true, it must be wrong. However, it may even be questionable whether something can be “completely wrong”: almost never everything we thought we found is just an artifact of the analysis. Almost always there is some sort of effect (perhaps it’s in the opposite direction, of course). I think the best cure would be to describe our results as: “We found that our subjects were green. But we don’t know yet whether this is a general pattern, so we are looking forward to see replications.” Perhaps the day will come when scientists and journals are prepared to accept such a claim as “scientific”. ”

      +1

  9. Hi,

    Could anyone tell me what Cantor’s corner is in this context, and why it’s nice to be in it?

    I know Cantor proved that you can’t shove all the real numbers into a countably infinite set, so a continuous range is not a transformation of a discrete range.

    But what’s the metaphor here? I don’t see it in Gelman’s handy-dandy Statistical Lexicon.

    • J:

      From the first paragraph of the above post:

      Scientific research is all about discovery of the unexpected: to do research, you need to be open to new possibilities, to design experiments to force anomalies, and to learn from them. The sweet spot for any researcher is at Cantor’s corner.

      It goes like this: a model does what it does, then we look for problems; having done that we expand the model and push it harder, until it breaks down, etc.

      Here’s the story in Ascii art:

      Model 1 . . . X
      Model 2 . . . . . . X
      Model 3 . . . . . . . . X
      Model 4 . . . . . . . . . . . . . . . . X
      Model 5 . . . . . . . . . . . . . . . . . . . X
      . . .
      

      For each model, the dots represent successful applications, and the X represents a failure, a place where the model needs to be improved. The X’s form Cantor’s diagonal, and the most recent X is Cantor’s corner. Cantor’s diagonal argument, taken metaphorically, is that we can never fully anticipate what is needed in future models. Or, to put it another way, the sequence of problems associated with the sequence of X’s cannot, by definition, be fit by any single model in the list.

  10. There’s also a more charitable answer for why some researchers at times respond negatively to some criticism: not all criticism is of high value.

    No statistical analysis is perfect…and as such you need to pick what issues are acceptable simplifications and what are not. Sometimes researchers make unacceptable simplifications and this is worth discussion and pointing out weaknesses, but other times it’s just not. As an example, I’ve seen comments on this blog along the lines of “well, I see they discretized a continuous variable, so this analysis must be worthless”. If that’s as deep as the sophistication of the criticism gets, it’s really not worth researcher’s time to engage and a short, curt response is totally reasonable to me.

    Of course, I’m not saying that it’s fine to be rude to all those who offer criticism or that the criticism on this blog in unsophisticated, but there is criticism that’s just not worth one’s times. As an example, Bob was not particularly kind to a blog post that came to the conclusion that JAGS was faster than Stan, which to me was totally reasonable given that it was not much of an in depth criticism of Stan. Of course, he gave a detailed response to this, but if there were 15 such posts, at a certain point he’s well justified to be ruder and shorter.

    • A:

      Sure, we can’t spend all our time engaging with crappy or insincere criticism. My issue is that scientists are trained to respond dismissively even to good and serious criticism; see the word “legitimate” in the title of the above post.

    • It’s also important to distinguish between comments that are meant as fully formed critiques and those that are not. Many comments are tentative and fragmentary; as long as they don’t claim undeserved authority and scope, they should have a place. Researchers shouldn’t have to respond to transient comments of that sort, but other commenters may chime in, and the exchange can be interesting and productive.

      Maybe commenters should provide certain “ethos markers”: for instance, they should indicate whether they have read the full study, glanced at the abstract, or done something in between. That allows readers to put the comment in proper perspective.

  11. So is the “Beyond Power Calculations” article you linked above what you mean when you say significance testing doesn’t work well with noisy measurements?

  12. Might this also partially explain why outsiders might find experts doubling down on their certainty concerning some controversial topic (e.g. the undoubted universal benefits of free trade) and suggesting that anyone who disagrees is an anti intellectual buffoon less than convincing? After all, this, given what you are saying, would be the response of the community whether or not the skepticism was justified, right? Or does your explanation apply severally but not jointly?

  13. Allow me to play the carping critic here: Although I’ve seen the story played out as described – defensiveness about legitimate criticism – “legitimate” is highly subjective and I’ve seen the opposite played out even more…
    1) All too often in my experience, the anonymity of peer-review brings out some exceptionally nastiness, or nit-picking criticisms blown up as if fatal flaws, when they aren’t. Sometimes, the referee sees it as their job to keep the paper out of print or at least delay it maximally, especially for a prestigious journal (there is stiff competition for space in leading journals, so an intrinsic COI against fair review can occur). I recall an early experience in which my paper was savaged for among other things having a typo (of “June 31st” in an example) presented as if proof that the whole paper was careless. Sadly, sometimes referees are opponents and even try to maliciously obstruct publication; sometimes they even seem to relish that role (especially if what we are writing goes against their published positions, or they have a competitive paper under review, or they simply guess who we are and happen to hate on us for other reasons, e.g., some imagined past slight). Becoming defensive in the light of those experiences is at the very least understandable.
    2) There is the usual null bias in the chosen illustration for “the pattern we see a lot” – which for some reason I’ve rarely seen in the papers I get to review. What I have too often seen is author’s making unsupportable claims of finding ‘evidence of no effect’ because P>0.05 or the confidence interval includes the null. Now in the area I work in (medical side effects) such claims can be as or more publishable and newsworthy than claims of evidence for an effect, especially if the effect is an established matter of controversy (which in medical research usually takes about one report accompanied by a hot press release); those authors can get every bit as defensive about their null claims as anyone, citing of course how lack of statistical significance corresponds to support for the null (which of course it typically does if you pre-load the null with a prior probability of 50%; although that excuse is rarely offered, I did encounter it once and it made me realize the abuse potential for Bayesian testing was at least as high as for null-hypothesis significance testing).

    • Sander:

      Thanks for commenting. Just to clarify: yes, as I wrote below, I think that even misguided referee reports can be valuable as indicating communication difficulties, but I’m not saying that all or even most criticism that we receive in the review process is legitimate. The topic of my above post is that researchers often seem to dodge or dismiss even the most serious, thoughtful, well-reasoned criticism. It feels to me that we’ve been trained in techniques of handling weak criticism, to such an extent that even when very high-quality criticism comes along, our default response as academic scientists is to brush it aside. Think how many times it’s happened that researchers, faced with clear and serious errors in their published work, insist that the main conclusions of their papers are unaffected by the errors. I think this is in part coming from the practice of thinking of criticisms as sallies to be parried, rather than as serious and equal steps in scientific reasoning.

    • On the other side, a percentage of my reviews have resulted in major revisions of papers (10 to 20% by recollection).

      Not sure why. In a couple cases I think the primary author was junior and the senior authors decided to let them experience some reviewer criticism. Or maybe the errors I noticed were just too obvious to dismiss.

      Once the authors and editors benefited from my suggestion to sort out an issue – but chose not to share it with the readers (despite my complaining about that) :-(

    • the abuse potential for Bayesian testing was at least as high as for null-hypothesis significance testing)

      What is the distinction you are making between Bayesian testing and NHST here?

      To me, “Bayesian testing” is a tool that can be used (unfortunately) to do NHST. Whether a practice is NHST depends on the null model chosen, research hypothesis/theory, and data collected. You can do it with a t-test, anova, bayes factors, or even with no calculations at all. You can do NHST just by looking at a graph and saying “the groups look different (or not), therefore this drug cures cancer by inhibiting mitogenesis (or whatever)”.

      • By NHST I meant the usual “testing” of the null (the hypothesis no difference) by seeing whether P<0.05.
        I think that null-P dichotomization procedure what's being discussed in the literature I see complaining about "NHST". But I'd be happy to indict Jeffreys-type (null-spiked) "objective-Bayes" null-hypothesis testing as a way to distort the literature more effectively than any method Fisher or Neyman imagined. Then again, P dichotomizers can come back to match Bayesian distortion by moving the cutpoint down to .005. What then will Baysians do? Put even more prior probability on the null to ward off the evil false-positive spirits (which statisticians created by dichotomizing inference in the first place)?

        • Then again, P dichotomizers can come back to match Bayesian distortion by moving the cutpoint down to .005. What then will Baysians do? Put even more prior probability on the null to ward off the evil false-positive spirits (which statisticians created by dichotomizing inference in the first place)?

          There are two axes here:
          1) Bayesian vs Frequentist
          2) Default nil-null-Hypothesis vs Research Hypothesis (if the research hypothesis happens to be a nil-null, then it is no longer a “default”)

          I propose that the Bayesian and the Frequentist should both stop testing the default nil-null hypothesis (ie NHST): https://i.imgur.com/rZOXkfX.png

          Personally, it is up in the air whether Frequentist approaches have utility beyond cheaply approximating low-prior-info Bayesian solutions. I honestly haven’t bothered to think too hard about it because that whole debate is a huge red herring. First we should deal with the much more pressing problem of testing a research hypothesis vs a default nil-null-hypothesis.

          You seem to define NHST as only the upper right corner of that chart, yet from your comments you obviously realize that a Bayesian approach cannot solve the problems you see either. It will all make much more sense if you think of NHST as the entire upper half of that chart. Ie, NHST is defined by testing a default nil-null hypothesis that differs from the research hypothesis, something orthogonal to the Bayesian-Frequentist distinction.

          If not NHST, is there another term we could use for the vertical axis in my chart? I would love to read the title/abstract of a paper and immediately know what type of hypothesis they tested. This would make it very convenient to then disregard all conclusions arrived at by testing something other than the research hypothesis.

        • Anoneuoid, I think you are right about your NHST comments – I was simply saying that when I see “NHST” used it is always for frequentist NHST dichotomization. Yes Bayesian NHST dichotomization exists and is as destructive, in fact has the potential for even more harm – so I’ll include that in future comments against NHST.

          Both frequentists and Bayesians have been major promoters of nullism, dichotomania, and model reification – in fact it seems to be a cognitive disability that much of statistics took up, formalizing instead of fighting innate human biases.

          I would however add that a simple first-step solution to nullistic bias would be to require any testing to involve at least two hypotheses, which could and often would be the null and an alternative that is of contextual relevance (after all, typical grant applications have to submit power calculations and thus have pre-specified an alternative). Better still would be several tests (e.g., null, pre-specified alternative, and some points between). Even better: Multiple tests of each point using different models (AKA sensitivity analyses) to start to make inroads against reification. This demand will require complete reporting of the analyses; but with online appendices now available, no excuse of “space limitations” for incomplete (and selective) reporting can be justified.

          As for frequentist vs Bayes, that is a whole other topic. I happen to think that frequentist demands for calibration evaluation of methods is legitimate and should lead us to reject certain popular Bayesian testing tools like Jeffreys (spike-prior) tests and posterior predictive P-values.

    • @Sander Greenland
      I think that there is a lot to be said for your point of view. That is why I’ve come, reluctantly, to the view that reviewers should not be anonymous. The standard (and the courtesy) of reports is increased when the writer knows that their name is attached to them.

      It’s why I also love preprint servers. Much of the reviewing has been done by the time you get to the point of submitting the final revision to a journal. (In fact it’s probably only a matter of time until the final stage vanishes: huge amounts of money will be saved by universities when they no longer have to pay for over-priced, paywalled journals).

      Incidentally, I have found that styles of reviewing differ quite a lot in different fields. My erstwhile job, the stochastic properties of single molecules, was a small and specialised field and it was almost always possible to recognise reviewers (this was at a time when thay were always anonymous). The standard of reviewing was usually very high, at least in specialist journals (less so for glam journals). In contrast, getting into statistical inference is a bit like entering a pit of vipers :-(

  14. Andrew wrote: “During our research, I realized that the paper he’d written was fatally flawed—the statistical model he was fitting wasn’t doing what he thought he was doing, and the conclusions in his paper didn’t make sense. (…) To my surprise he said: No, I’m gonna publish it. He had some reasons for this but they were not good reasons: basically, he had a publication in a good journal and he wasn’t planning to give that up. (…) This untenured professor, though, he had different ideas. As noted above, I think he managed to convince himself that his erroneous paper had value, but any self-convincing of this sort was at best a rationalization.”

    I got a few years ago an offer to become a co-author of a manuscript in the field of ecology on developments in the number of Hares at the island of Schiermonnikoog. The first author was a post-doc and decided soon afterwards to remove me as co-author during a phonecall because he insisted to incorporate data on birds in the manuscript which were totally unsuitable for the aim of this paper. I had provided him with these data on birds, but had told him at the same time that they were unsuitable for the aim of the paper.

    I have filed an appeal to the institute of the first author and got a few months later a new version with my name as co-author. The set of totally unsuitable data on birds was still present in the manuscript. There were still as well many anomalies and other issues in this version. The general scientific level was towards my opinion very low.

    I was still struggling to make a thorough and full set of comments on this version when I got a message from the first author in which he insisted to submit the manuscript within one week, with or without my consent, to the Journal of Animal Ecology. I have immediately contacted the institute of the senior author and told them that such an acting is not allowed according to the guidelines for authors of the Journal of Animal Ecology.

    I got within a few days a letter from this senior author in which was told to me that the authors had decided to remove from the manuscript this full set of data on birds, and that they would prepare another paper and thus also did not need me anymore as co-author. I have spend around one week scrutinizing the entire manuscript and I have send my findings to the first author and to various other parties. The authors and other parties have never responded. Efforts to contact one of the other co-authors remained unanswered. Google revealed a few years later that the paper was published at http://www.tandfonline.com/doi/abs/10.1080/11956860.2015.1079409

    The first author got furious when I suggested / told him that his hurry to get this manuscript published / submitted to the Journal of Animal Ecology was related to his position as post-doc.

    The first author also admitted to me that he had more or less hidden in the version which I had received for the first time that this research was some sort of follow-up of an earlier paper on this long-term project of annual counts of these Hares. This paper documented the first time serie of these counts. The first author argued a.o. that no one within his field of research was aware of this journal (Lutra, the scientific journal of the Dutch mammal society). See
    http://www.zoogdierwinkel.nl/sites/default/files/imce/nieuwesite/Winkel/pdf%20download/Lutra%2049%282%29_van%20Wieren%20et%20al_2006.pdf for the paper in question.

  15. The problem isn’t with peer review, per se, but instead with the role that peer review plays in the current publication system. What could and should be a means for authors to get constructive feedback to improve the quality of their work instead serves as the main barrier to publication. The fact that referees are chosen by the editors (whose sole purpose is to decide ‘accept/reject’) instead of being chosen by the authors (who are much more qualified to determine who their “peers” are and whose opinion about their work is valued) naturally pits authors against reviewers in the way Andrew mentions.

    I think authors would be much more open to legitimate criticism if there were no danger of it derailing their attempt at publication. But given the way editors often magnify minor, or even incorrect, criticisms, authors have no choice but to be defensive.

    • Harry:

      Again, I don’t think the problem is with peer review. I think the problem (or a problem) is that scientists been trained through the peer-review process to respond to criticism in a non-serious way, to keep our main claims unchanged even when serious errors are found in our work, and that this training makes scientists behave badly even when they get post-publication criticism that is outside the peer-review process.

      • Andrew:

        Yes, I think we agree regarding the merits of peer review. I should have been more clear that I agree with most (or all) of what you’ve written. I’m just pointing out a contributing factor which I think supports your overall claim: that peer review has been co-opted by the accept/reject publication process. As for the post-publication attitudes you mention: Part of it is carry over from the way we’ve been conditioned to address reviewer comments. But I think it can also be partly attributed to the ‘accept-reject’ mentality. Since publication is seen by many as the ultimate stamp of approval, I have little incentive to take any criticisms of my work seriously once it’s already been published.

        Again, the problem isn’t with peer review, as you mention. But the way peer review figures into the equation seems to be a far cry from how it was originally intended.

        • Harry,

          In all the years, I’ve listened to professorial publication woes, I wonder whether much has changed in history of universities. Political calculations play into these decisions too. I do agree with your pointing out that there is conditioning to ‘accept-reject’ mentality. I saw it in my father too.

  16. In my career, none of this behavior was very much conscious. I did my master’s and PhD in neuroscience in a psych. hospital. No one in the team was particularly knowledegeable in statistics, and that didn’t change much during my entire 8 years’ stay.

    The team members and I just soaked up the attitude that reviewer criticism is something you need to deal with but want to do so with as little effort as possible. If a reviewer had a point, you grudingly acknowledged it but still tried the minimalist strategy. You would also be as obsequious as necessary, and not dare disagree with a comment even if you thought it was misguided. It was really about pleasing the reviewers.

    The style of the limitation sections we wrote back then now – after having been salubriously mangled through Andrew’s blog – strikes me as pathetic. Again, enumerating limitations was just something that journals required, but you would dig in your heels and concede as little as possible. Some limitations were being acknowledged by everyone in the field, and it was “OK” to concede these without too much resistance, but basically the strategy was to come up with whatever ridiculous reason you could think of why a particular limitation was likely not to affect *your* particular study. The typical statement went like “our study used retrospective reports which might have bla bla bla, but we don’t believe this has significantly affected our findings because bla bla bla”. I still see this behavior today in some colleagues. It’s just what we’ve grown up with.

    So I guess my point is that one additional/alternative explanation for the defensiveness is just the trivial and unreflected passing on of habits. Of course, if you’re in a research environment, where knowledgeable people, critical thinking, journal clubs etc. are the norm, your habits will be better. But I know a lot of research (e.g. medical) is going on on the level that I’ve come accustomed to.

    • I am somewhat surprised at what has resulted after all these decades. Richard Posner wrote a book on Public Intellectuals where he addressed some features of the modern university that have altered the knowledge landscape. Derek Bok also has a book on role of university education. It’s been a long while since I read them. But they seem relevant to research culture.

      The observations for how people behave lead me to think that our high school experiences have a lot to be desired b/c at least I think that some behaviours are so immature. In saying this, I began to reflect on my own behaviors. Decided that I was not going to turn into a harpie, an affliction that results from popularity contests and emphasis on beauty etc.

  17. Taking off a bit from Jordan Ellenberg’s comments, the problem I often see in published social science area papers are conclusions that exceed the results from the study.

    In most cases that best the study can do is show there is some connection between a cause C and some effect E but that the connection is remote. That is, there is a path (or there is no path) between C and E given W,X,Y,Z (which we know about) further conditioned on unknown variables {α, Β, Γ,…}, and the strength of this connection is …

    The conclusion should be written as “Here are the possible causal models, and here are the suggested next steps to evaluate and alter the causal models….”

  18. I would think that refereeing would have the opposite effect from the defensiveness the post describes. Cosmetic improvements and sniping at referees is likely to enrage them rather than getting your paper published. You have to truly convince them. You will be tempted to fool them, but an unsuccessful attempt will, if demonstrating bad faith, doom you. Refereeing ought to make us better behaved scholars because it imposes authorities on us before which we have to humble ourselves.

    I do agree that defensiveness about published results is a puzzle. It’s actually flattering if someone thinks a paper of yours is worth debunking– most papers are ignored. If it’s the story described in the post— a paper that comes up on an interesting and important result that turns out to be legit random error, I don’t think the author gets blamed, and he’s gotten some name recognition that will advance his career.

    • Eric:

      I agree that criticism is useful and can even be a career-booster (either directly from the added attention to your work, or indirectly by pointing you away from less fruitful lines of research); that’s one reason it’s such a puzzle to me that scientists so often don’t take criticism to heart.

      Regarding refereeing: Yes, to respond effectively to referees you have to take the criticisms seriously enough to address the referees’ concerns—but, still, in almost every interaction with referees that I’ve seen, the author has the primary goal of publishing something close to the original paper, with all of the key ideas. I’ve almost never seen anybody—including me!—react to a referee report by saying: Hey, I didn’t realize it, but this research is completely wrong, my claims are unsupported, I really have no idea what’s going on, etc.

      • To be fair, how often is it really the case that you were completely wrong, and didn’t have any idea what was going on? I think that’s pretty rare. More often you’ve spent a LOT of time thinking about the project, and the referee is some prominent researcher’s postdoc who’s barely really aware of the field… or whatever.

        The big problem with anonymous refereeing is it just doesn’t do *anything like* what the imagined role of refereeing is. Its frequently used as a way to put roadblocks in the way of competitors, and/or to try to shape the global story in a field by keeping competing ideas out of publication.

        Whether it’s anonymous or not, the only legitimate review in my opinion is *post publication*. First, someone does some research and comes up with an idea, and makes it public… second, if anyone cares about that idea, they should engage with it by either providing open commentary to criticize or refine the research view, or they should do some additional research in the same field to try to provide further evidence and refine the understanding.

        The ONLY role that anonymous gatekeeping plays is *reputation mongering* as you get reputation by jumping bigger and bigger hurdles designed to keep information *out* of the public view. It’s absurd.

        • Refereeing only keeps information out of public view if it’s not on preprint servers. (Andrew’s post after this one.) So with preprint servers, it is the way you want it.

      • I’ve almost never seen anybody—including me!—react to a referee report by saying: Hey, I didn’t realize it, but this research is completely wrong, my claims are unsupported, I really have no idea what’s going on, etc.

        What this really means is that it was impossible to ever achieve the stated goals of the project with the methods used and data available. Ie, The project should never have been approved and funded in the first place, it was doomed before it began. I will say this is quite common in biomedical research, though most don’t realize it because they harbor one or the other misunderstanding about statistical significance (yes, I attribute the situation to this single cause). It certainly was the case for me once I figured out what was going on.

        In the case that you do realize it you can either attempt to salvage the project, quit, or become a fraudster. The first is very difficult, even impossible, unless you have exceptional circumstances (financial, etc). The second is honestly the most ethical option for most people, just treat it as sunk cost and move on.

        The third is the most popular, and has very little risk since you are essentially undetectable when the fraud is actually standard behavior. Most often people who choose this option just get the degree (“get that piece of paper”) and move out of academia into some profession where they can be rewarded for doing a good job.

        tl;dr:
        You rarely see this because those projects were already abandoned or the person submitting the paper doesn’t care about its actual value anymore (only the “nominal value”). There is a sort of Gresham’s law of research where what remains after awhile is primarily a bunch of really hyped up junk generated by people who are either severely confused or just want to publish a paper for career reasons.

      • I do a lot of reviewing and associate editorship, and of course I have my own papers reviewed (mostly but not exclusively methodological statistics). I don’t agree with the negative tone here about peer reviewing. I have often seen authors dropping claims that were criticised in peer reviews. Sure I’ve seen unfair, incompetent, lazy, or in other ways problematic reviews, but this is a small minority. Many reviewers are critical but happy to finally be convinced of the merits of a paper in case some really problematic stuff is amended or removed, and many authors comply. Many papers I’ve seen reviewed including my own have been improved a lot by peer review. And regarding nasty reviewers who supposedly only want to delay or prevent the publication of a paper, I’m pretty sure about the majority of rejected papers I’ve seen that you’d be happy about not having to read them, or that somebody has told the authors “work harder on this before you publish”.

        • Oh yeah, I had seen this. My posting was only partly to you (regarding authors who remove claims and improve their paper substantially as a response to peer review), a bigger part was referring to others who had been more generally negative about peer review.

        • Christian, aren’t the negative experiences largely a function of the specialty, the journal, and the stakes in the topic?
          My most negative experiences both as reviewer and reviewed concerned methodologic papers submitted to desirable journals – only a minority of cases were bad, but it seems for such papers there were enormous degrees of freedom (forking paths, if you will) both for reviewers to invent an explanation for rejection when that was their goal, and for authors to invent defensive rationalizations. When it came to papers that were study-data reports, there was the visible constraint of having actual data to report, with an (admittedly optimistic) upper bound on its information content given by the width of the confidence intervals, but even then the creativity of malicious reviewers could be impressive (sometimes abetted by editors threatened by the result). For the latter, my experience is largely within health and med research, where the stakes are often far higher than in (say) psych and social sciences, adding to the motivation for distorted reports from both reviewers and researchers. And by far higher I mean direct financial stakes (not just potential grant funding) in the 6, 7 or even 8-figure range from consultancies and investments in treatment modalities (which I have never seen disclosed by reviewers, and are often undisclosed by researchers), and indirect stakes from potential liability for having used the modality in practice. Take a look at the literature on intra-articular pain pumps if you want some terrifying examples of dissembling by those with enormous direct stakes – and bear in mind the published material is just the tip of the iceberg (disclosure: I was an expert witness against the manufacturers, who promoted the product without FDA approval – yes that can be and is done).

        • Just to make sure I understand – if I were to apply my skills outside of academia, I could get remunerated with an 8 figure salary and a constant, direct injection of opiates into my bloodstream?

          I maybe chose the wrong profession.

        • Fair enough, your experiences are different from mine then. As Associate Editor of four journals (and also as author, and reviewer as far as I can see other reviews of the papers I’m reviewing) I have seen hundreds of reviews; and I’d estimate the proportion of reviewers who are open minded about a paper and honestly evaluate it based on its weaknesses and merits without prior preference for acceptance or rejection as surely above 90%. Of course one can still find some issues with some of these reviews and there may be unconscious bias etc., still overall I think this is a useful exercise for the profession that serves to improve or at least keep up standards.

          I have to admit that I have hardly any experience with reviewing in health and medicine, so it indeed may depend on the area.

  19. The line of argumentation sounds reasonable, but I have not observed this myself. I may be blind to my own flaws, but I have also not seen in with my colleagues. At least not beyond reasonable stubbornness, which also has a good side. A new idea naturally meets with some resistance and stubbornness can help to pull through. I wonder whether this is more the case in more empirical field and in the publish-or-perish heartland America.

    It was not a fatal flaw, but I even wrote a blog post about problems we later learned about the paper I am most famous for. I would expect that such a blog post would improve my reputation as a diligent researcher, as long as you do not have to write one about every paper. Reversely getting a bad paper published is bad for your reputation. Your dean may not notice, but your colleagues will.

    Anyway, I may be working on something that may also solve this problem. I am setting up a new way to review scientific papers, which I call grassroots scientific publishing. For a scientific community it collates all articles found in many different journals and assesses their contribution to the field. Because it is a community initiative the editors knows the reviewers and can assess the reviews well. It reviews published papers and could later (once accepted as at least as good as traditional peer review) also review unpublished manuscripts. In this case, if a flaw is found in a paper, its grade could be reduced (to zero if completely fatal and not interesting). That would reduce the incentive to officially pass the peer review for a journal, but not care what the peers actually think of the paper.

    I work on the homogenization of climate station data, a first attempt at a grassroots journal for my field, to show how it would look like, can be found here:
    http://homogenisation.wordpress.com/

  20. The problem is the prestige of a scientist is based not on the quality of the experiment, but on exciting results. High quality useful (and even boring) experiments with null results don’t win prizes. Scientists with a track record of null results find it hard to get future grants. Scientists aren’t just defending a hypothesis, they’re defending their careers.

    In the big picture, it’s not merely how scientists behave (let’s face it, scientists aren’t magically going to respond in an unbiased way, scientists are just as irrational as other people), it’s about how science is funded and perceived by society.

    • You first claim that “scientists” are just acting rationally (in an economic sense) by producing BS to “defend their careers” due to the system they work under, then you call them “just as irrational as other people”. The two ideas you offer in that post seem to be in contradiction.

        • I’m not sure what you mean. The usual excuse for bad research behavior is “I/they/we need to do it to survive”, ie that the bad researchers are acting rationally in their own self interest.

          That is my wider context, what context are you referring to here? Perhaps if you give an example of the irrationality you were thinking of it can make sense.

  21. Andrew in his comment (third from above) recalled a story of a completely wrong paper being accepted and subsequently published, during which time the authors discovered the error but chose not to disclose it to the prestigious journal. Is this a failure of the peer-review process? The prestigious journal presumably has top reviewers, and they seemed to have accepted a paper which is completely wrong!

  22. Why is the simpler theory—that scientists are humans who get emotionally attached to their ideas—so insufficient that you need more theory to explain defensiveness? Sure our publishing model might make things worse, but the problem is as old as science. And, in response to those who attribute the problem to incentives, even Nobel Laureates and other huge figures, the scientists most free from economic and career incentives, seem plenty vulnerable to going to their grave as the last believer in an otherwise long-dead theory. If anyone is interested, I keep a collection of such figures here: http://enfascination.com/weblog/archives/410

    My recent favorite example: geologist TC Chamberlin, who wrote “The Method of Multiple Working Hypotheses” to help scientists defend themselves against emotional attachments, himself became so toxic and aggressive an early critic of continental drift (which challenged some of his own pet theories), that he and his son made it anathema for decades.

    No one is immune to becoming close-minded, certainly not those who think they are.

    • Enfascination:

      The simplest answer to the question posed in your first sentence above is that I’m coming from a social science perspective in which we try to understand behavior externally.

      Yes, “scientists are humans.” Scientists, like other humans, behave differently in different situations. In particular, to do successful science (that is, science that is successful in learning about the world, not necessarily science that leads to career advancement) it’s pretty much a requirement to be aware that one can be wrong; it’s pretty much a requirement to be able to learn from one’s mistakes. It’s hard to be a good scientist if you’re being close-minded while doing the science. But, externally, after the paper is published—or, I should say, after the paper has been accepted by the journal—so many scientists seem to refuse to admit error and to nearly refuse to admit even the possibility of error. An extreme case is Brian “Pizzagate” Wansink, who was busy assuring everyone that none of his hundreds of published errors affected any of his published conclusions, saying all this without even seeming to be aware of what his data were even measuring. But we see this thing all the time. Not everyone does it, or at least not everyone does it all the time: I’m not the only scientist who welcomes corrections. But the “puzzle” (to return to social science jargon is that scientists so often refuse to admit error after publication, even though learning from one’s errors is so key a part of science.

      Again, return to the very first paragraph of the above post.

      • However you see the “puzzle,” I think that one important part of improving the situation is to address the problem consistently from the beginning — that is, one important part of teaching is to include the ethic of taking (or at least trying to take) criticism seriously.

        Put another way: Maybe we spend too much teaching time focusing on Andrew’s first sentence in this post: “Scientific research is all about discovery of the unexpected: to do research, you need to be open to new possibilities, to design experiments to force anomalies, and to learn from them.” Perhaps we need to put equal focus on the “reality checks” that might show that the new possibilities are fantasies, or that the anomalies might be artifacts.

        • Yes! Definitely “we need to put equal focus on the ‘reality checks’ that might show that the new possibilities are fantasies, or that the anomalies might be artifacts.” Also (to impose essential epistemic symmetry) we need to focus on checks that might show claimed refutations or replication failures are fantasies, or that failures to detect anomalies might be from artifacts or intrinsic limitations (e.g., small samples).
          I’d go so far as to argue that, human nature being what it is, all these checks on our cognitive biases be given top priority over “discovery”. It seems that the original noble goal of significance testing was as such a check; it is then a testament to the strength of those biases that it became utterly warped into a means of claiming discovery (P.05, again equally publishable in areas I’ve worked in).

        • Sander:

          Yes. Sometimes I put it like this:

          Consider a significance test.

          The usual thing you’re taught in a sophisticated statistics class is that if the test rejects, you’ve learned something (you’ve rejected the model), but if the test does not reject, you’ve learned nothing (non-rejection does not imply acceptance.)

          Actually, it’s nearly the opposite. If the test rejects, you’ve learned nothing (you already knew ahead of time that the null hypothesis—exactly zero effect and exactly zero systematic error—was false), but if the test does not reject, you’ve learned something (that your data are so weak that you cannot distinguish them from random noise).

          That’s a slight oversimplification—in particular, if your test rejects a model that you liked (not a model you actually believed, as that’s generally impossible), then you can take this as a useful signal that there’s room for elaboration of your model (or for checking of your data)—but I think the general point is right. If we (correctly) take a hypothesis test as a check on inappropriate claims of discovery, then it’s from non-rejections that we’re learning.

          Beyond all this, p-values are super-noisy (by construction), and that should be recognized in the interpretation of results.

        • Andrew:
          1) Just noticed that WordPress deleted a portion of my final sentence’s final clause (I guess it did not like inequality signs), which was supposed to say that the P-value “became utterly warped into a means of claiming discovery (P less than .05) or claiming refutation (P greater than .05, again equally publishable in areas I’ve worked in). So I think either way P gets misinterpreted.
          2) Re: “P-values are super-noisy by construction” – absolutely! There are published criticisms of P-values (e.g., by Trafimow) condemning them because they aren’t replicable. As you know that criticism is patently absurd: A frequency-valid P-value (your U-value) by construction will be uniform(0,1) if every assumption including the tested hypothesis is correct, which means it will bounce around on replications all across the board from 0 to 1. Some people seem to think P-values should be converging to some sort of parameter, which is the exact opposite of what they are supposed to do (which is measure on a 0 to 1 scale the residual “random” deviation from the model).

        • > from non-rejections that we’re learning.
          Get to stick with your current model – given a lack of any sense of how to make it importantly less wrong along with the not yet dashed hope that it may not be too wrong.

  23. I agree with much of Andrew’s post, but I believe that the defensive response has a lot to do with how senior researchers teach their junior staff and students, in the same way that bad analysis is often passed down. If we want to change things, I believe that intervening in the teaching of grad students is a key element. Grad students, in my experience, are often willing to be persuaded to do things in a better way. For example, I consider it my responsibly as a teacher to ensure that students have reflected on whether there are alternative explanations to the ones given in papers.

  24. Two points are worth making:

    1) Another factor that contributes to our defensive/conservative posture when replying to criticism is time. This “publish or perish” philosophy that haunts academia forces us to publish, and the less effort we need to get a paper published, the better. So you kind of just go with what the referees’ stated and try to make as few changes as possible in your already-written manuscript.

    2) The authors that got completely convinced their research is the utmost statement of the true truth of nature (who mistakenly think there is anything like that) take these views onto their own criticism when they themselves are writing reports working as referees. And that completely ruins the peer-review process. In other words, authors push their views onto others even when working themselves as referees, and that may ruin science as a whole: a paper that obtains a different result than that “established true result” may not be (and usually it isn’t) accepted, just because the referee (the then-author) is a stubborn person who simply thinks this contradictory paper is flawed.

    In this sense, it would be healthy to science if we had journals specialized in publishing papers with negative results. Of course, these journal would have to be rigorous in verifying the authors’ methods, instead of caring for the results per se. Things like “we tried A via method B, but we actually got C instead”.

    • I am curious as to how statistics can be made legitimate if Evidence Based Medicine is now being used as a marketing tool by stakeholders to the biomedical industry & to medicine. John Ioannidis has a compelling YouTube video that addresses this development. Moreover, John claims that a statistician can produce any result that a stakeholder wants. And yet even highly accomplished experts imply and claim that statistics can be redeemed. Seventy years have gone into this effort.

      Gerd Gigerenzer has critiqued the replication movement. I haven’t read Gigerenzer’s entire support for such a claim. Gigerenzer is not a ‘Nudge’ proponent, at least the nudging from governments and those who have conflicts of interests.

    • In this sense, it would be healthy to science if we had journals specialized in publishing papers with negative results.

      The idea of “negative/positive” results is pseudoscience to begin with. Just report what x, y, z measurements are under conditions a, b, c. Even with small sample size and high variable measurements (the cause of “negative results”), you can still get some info on the order of magnitude of x, y, z.

  25. I think that the main reason why scientists respond in a defensive way is metrics. Publications are one of the most important metrics of scientific success, so it is not surprising that scientists are encouraged to push for their analyses to be published, regardless of scientific quality.

Leave a Reply

Your email address will not be published. Required fields are marked *