Ecologists push for more reliable research

John Williams points us to the above-titled news article by Cathleen O’Grady, subtitled, “Psychology’s replication crisis inspires efforts to expand samples and stick to a research plan.” There’s some new thing called the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology.

I’m glad that evolutionary biology is on this list, because one of the very first bits of junk science we discussed on this blog was some publicity-grabbing noise mining published in the Journal of Theoretical Biology. (And I was too nice back then, too.)

18 thoughts on “Ecologists push for more reliable research

    • However, if p-values are used with a threshold for dichotomous judgments about the ‘presence’ or ‘absence’ of an effect, or about whether an effect is ‘real’ or not, as is typical within the NHST framework, we may easily reach overconfident conclusions in either direction. Such overconfident dichotomous generalizations from single studies often lead to the erroneous perception that replication studies show ‘conflicting’ evidence and that science is in a general replication crisis (Amaral & Neves, 2021; Amrhein et al., 2019a, b).

      It looks like there is some collective “stages of grief” phenomenon going on.

      Please clarify how studies coming to different conclusions are no longer considered “conflicting”.

      And how 50-85% failure to agree on the conclusion is not a crisis. People expect more like 5% disagreement from researchers.

      What would need to happen for you to believe there is a replication crisis?

      • I think the point he’s making is the crisis is one of stupidity of conclusions, not fundamentally conflicting evidence.

        Imagine a world where there are a lot of small effects, and there’s a certain amount of money people get to study them, and the amount of money typically makes the study such that a null hypothesis of 0 effect will have a p value between 0.02 and 0.08…

        Every second study will say “effect is real” or “effect doesn’t exist”… In reality they are all consistent with “effect is small” and noone has the brains to overcome the decades of indoctrination in logically fallacious reasoning that NHST has instituted in their brains.

        Its a little worse than this of course because there are all the other issues, like varied experimental design and measurement issues and etc etc but even in an ideal case, the basic effect of noisy evaluations because NHST SUCKS is there… Even when there’s no underlying conflict in the experimental data.

        • I definitely agree NHST automatically generates conflicting results, but this reframing of the issue seems (amazingly) set to make things even worse.

          Instead of a “replication crisis”, now it is an “inconclusivity crisis”. Ie, we have learned little if anything from all these studies. But if we give the same researchers even more money, then sample sizes will be big enough to actually draw conclusions.

          That is false though, these compare A vs B studies are designed for NHST. Once you drop the NHST you would never design such studies to begin with. Instead you would observe how phenomena evolve over time in various circumstances then come up with reasons the curves look how they do.

          And this reframing is wrong anyway because the replication projects also report systematic inflation of effect sizes. At what point does the evidence become “fundamentally conflicting”?

  1. I was discussing the replication crisis with a colleague in natural language processing, who told me that there is no replication crisis in machine learning because nobody believes anything until it’s been replicated in multiple places. So how about calling it a “publication crisis,” in which unreproducible results published in “prestigious” venues form the basis for hiring, tenure, promotion, and grants. Much of the rest of the signal feels like bias from the over-the-top reference letter writers (see the next blog post!), which say more about the letter writer than the candidate. I only read reference letters to see if there are red flags about a person’s behavior or if they’re from someone I trust not to lead me astray (which is, of course, yet another form of hiring bias).

    • Bob:

      Sure—but it’s not just hiring, tenure, promotion, and grants. The problem is not just within academia; it leaks into the outside world. Consider that Brian “Pizzagate” Wansink had a position within the U.S. government to promote his ideas, as did Cass “Stasi” Sunstein, who promoted Wansink’s work along with other similar work. Yes, they’re part of a cozy club of law school professors, business school professors, etc.—but these sorts of people have lots of influence in government, business, and the news media.

      And machine learning has some replicability issues outside of academia too. If “nobody believes anything until it’s been replicated in multiple places,” then what does it mean when a high-profile Google engineer makes public claims that he won’t back up with public code or evidence of replication?

    • Bob:

      I would rather call it a “communication crisis” than a “publication crisis”, because, as Andrew wrote, it is not only about scientific publications (although I agree that our paper-culture may be the main culprit).

      Of course, science has always had replicability issues; if most claimed scientific discoveries had been replicable, we would probably have solved most energy and health problems in the Middle Ages.

      But since about 1925, science has formal methods and words for convincing others that a particular scientific result is significant or credible, without the need for replication. And this works across disciplines! Now I can judge reliability of a study in economics, although I have no knowledge about the subject. It also works in communication to the public. Science journalists have learned that a significant result is a reliable result.

      I have come to believe that most of the “replicability crisis” is due to unjustified expectations that must necessarily be disappointed. Yes, there is fraud and imperfect methods, but I think the most problematic and durable part of the crisis is that people did not learn, or have unlearned, to accept uncertainty and to embrace variation (I guess you know who coined those words).

      By the way: in my academic world (biology), it is by far not generally accepted or widely known that there is a crisis. I recently discussed with a dean of a major university, who asked me why we need a Swiss Reproducibility Network at all (https://www.swissrn.org). He said: “the very definition of science is that the results are reproducible”.

      • Valentin:

        Yes, “unjustified expectations that must necessarily be disappointed.” Well put. That’s related to my point that replication should not be oversold. Replication is not a positive path to discovery; it’s more of a way to help avoid thinking you have a discovery when you don’t. When the question comes up, “Should study X” (on beauty and sex ratio, or ovulation and voting, or ESP, or whatever) be replicated, my response is usually: Sure, people can replicate it if they want to, but I don’t personally recommend they waste their time!

        • “Sure, people can replicate it if they want to, but I don’t personally recommend they waste their time!”

          Yes, I agree, it’s stupid to waste time replicating studies that have so many blatant fatal errors. But the problem is that it seems that many researchers don’t understand that these studies have numerous fatal errors. So what then?

        • Andrew –

          > Replication is not a positive path to discovery; it’s more of a way to help avoid thinking you have a discovery when you don’t.

          I think that “help” is doing a lot of heavy lifting there. Studies can fail to replicate for a lot of reasons. I think often a replication failure is seen as proof that the original finding was invalid, when the issue should be viewed as a bit more complex. There can be a lot of reasons that studies don’t replicate.

        • Joshua said: “There can be a lot of reasons that studies don’t replicate.”

          Yes, good point. People should think of the problem as being larger than just “replication”. Failure to replicate could result from trivial errors. Social science is experiencing a *much* more serious problem: a major wave of invalid claims and results.

          The more appropriate way to think about the problem in social sciences and statistics is that there is a major validity problem: it’s researchers are generating ridiculous claims. It’s not a problem with statistical minutia. It’s a problem with researchers repeatedly using proven-invalid experimental methods to generate and promote false claims.

        • Chipmunk –

          >It’s not a problem with statistical minutia. It’s a problem with researchers repeatedly using proven-invalid experimental methods to generate and promote false claims.

          This is, as per usual, totally unqualified. In my view, while I get that it can feel good to arm-wave or hand-wring, it doesn’t doesn’t really shed that much light. Yes, it’s not a perfect world. Nor is it a binary world.

          For all the use of unproven-invalid experimental methods to generate and promote false claims that takes place – are we worse off? That’s not an excuse. Nor is it a reason not to talk about problems and investigate poor practices. Neither is it a statement, with certainty, that we’re better off. It’s a question. I just don’t quite get assuming an answer when the question hasn’t been very thoroughly interrogated.

      • “But since about 1925, science has formal methods and words for convincing others that a particular scientific result is significant or credible, without the need for replication. And this works across disciplines! ”

        No. Science does *not* have “formal methods” that work “without the need for replication”. Any method can be used or reported improperly. Therefore, no method can obviate the need to validate, reproduce, replicate or confirm results.

        Here’s what to say to your dean:
        It is true that “the very definition of science is that the results are reproducible”. That’s exactly why results have to be reproduced: to verify their scientific validity.

        • “No. Science does *not* have “formal methods” that work “without the need for replication”. Any method can be used or reported improperly. Therefore, no method can obviate the need to validate, reproduce, replicate or confirm results.”

          Yes, I agree. That “1925” paragraph contained several things that I don’t believe but that I suspect others do believe. Well, perhaps few people believe what I wrote, but many act as if they do.

        • Valentin Amrhein linked to the paper:

          “Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication”

          The third and fourth sentences in the abstract are:

          “Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires.”

          Sorry, this fails the Oklo Natural Nuclear Reactor Test. The theory was a single study, the discovery was a single study, there were no varying assumptions or random variation (except the purely stochastic and perfectly predictable decay processes), there was excessive agreement throughout, and the results have been generalized to every solid body in the universe.

      • Valentin –

        >I have come to believe that most of the “replicability crisis” is due to unjustified expectations that must necessarily be disappointed.

        Thanks. It’s nice to read someone say that.

        I’m not sure I agree. But I think that the question must be entertained and interrogated. It’s certainly a possibility, as is the idea that we’re in a “replication crisis.” The problem for me is that it seems to me that a lot of people seem to be operating as if the “crisis” is a given, without the question being fully interrogated. It’s a sexy concept.

Leave a Reply

Your email address will not be published. Required fields are marked *