Biology as a cumulative science, and the relevance of this idea to replication

Megan Higgs and I were talking with a biologist, Pamela Reinagel, the other day about replication, statistical significance, and related topics, and Pamela commented that the replication crisis didn’t seem to be as big a problem in biology (at least of the wet lab variety) than in psychology.

This was interesting to me. I don’t have much knowledge of biology, but, like psychology, it’s both an observational and experimental science with lots of variation.

One interesting thing about the psychology replication crisis is that it centers on experimental psychology. An experiment should be easier to replicate than an observational study, and my biologist colleague was surprised when I informed her that various famous claims from experimental psychology were believed to be true, sometimes for decades, before everything changed when the studies failed to replicate. I think I gave her the example of the elderly-priming-slow-walking study.

Pamela thought this was wack: how could a famous study sit there for 20 years with nobody trying to replicate it? I said that there’d been lots of so-called conceptual replications, and that researcher degrees of freedom and forking paths had led to these conceptual replications all appearing to be successes, even though it turned out there was nothing going on (or, to say it more carefully, that any real effects were too small and variable to show up in these crude, nearly theory-free, between-person experiments).

Pamela said this doesn’t happen as often in biology. Why? Because in biology, when one research team publishes something useful, then other labs want to use it too. Important work in biology gets replicated all the time—not because people want to prove it’s right, not because people want to shoot it down, not as part of a “replication study,” but just because they want to use the method. So if there’s something that everybody’s talking about, and it doesn’t replicate, word will get out.

The way she put it is that biology is a cumulative science.

Another way to put it is that in psychology and social science, we do research for other people; biologists do research for themselves.

For example, consider social priming of the elderly-words-and-slow-walking variety. Psychologists publish this work, but it’s not like they’re giving themselves subliminal tapes featuring the speeches of Speedy Gonzalez. Even when psychologists do make use of their own research (for example, using nudging ideas to lose weight), it’s within their personal lives, not part of their research. In contrast, biologists are using published biology research in order to do better biology research. Biology is cumulative, not just in the sense of new research building old research, but in the sense of methods cumulating as well.

OK, I guess that some psychology is cumulative in this way too, for example the psychometrics used to build measurement scales. One research team will define and test a depression inventory, for example, and other teams will validate it for their own use.

The point here is not that biology is “good” and social psychology is “bad”; there are just some differences between the fields. I’d say that most of economics and political science is like social psychology in this way: there’s a core of methods, but particular research results don’t tend to be directly used going forward, with the exception of some methods-focused subfields such as designing and evaluating survey questions.

Here’s an interesting quote from economist Noah Smith:

Quality control in academic economics is nonexistent. There is ZERO incentive for peer reviewers to go looking for errors in papers. And there is no incentive (other than sheer orneriness or ideological opposition) for anyone to check papers carefully after publication!!

According to Pamela Reinagel, it’s not like that in biology: there, researchers have a clear incentive to try to replicate published work, because they’re using it in their own research. That’s what’s meant by biology being a cumulative science.

What about statistics and computer science: where do they fit in this spectrum? I think it depends on what you’re trying to do. If we come out with a new algorithm for Stan or whatever and it doesn’t really work, then sure we might be able to squeeze a few papers out of it using hand-picked examples, but if it looks good, other people are gonna want to try it out themselves (for example). And then if our results don’t replicate, people will know. That’s not to say that statistical computing is a completely clean field. There’s lots of hype, lots of methods that are supposed to work but don’t really replicate, lots of supposed speed comparisons that are done completely misleadingly (it matters how you flip those switches!), not to mention papers in the literature with tons of citations but describing methods that don’t really work. So there are lots of mini-replication issues. But, as with biology, I don’t think this builds up to a full-fledged crisis because any method that gets lot of attention will get used and tested.

The role of overconfidence

This still leaves the question of why is it that experimental psychology, the most replicable part of the field, is the part with the biggest replication crisis? First and most obviously, the replication crisis has arisen in large part because it has been possible to replicate these studies and find their problems. The actual replications were an important part of the story. It was apparent to Paul Meehl and others in the 1970s that there were fatal flaws in the application of null hypothesis significance testing to psychology, and Meehl wrote about this a lot—but it didn’t really matter for a few decades until the failed replications started to pile up.

The second reason that I think the replication crisis has been more serious in experimental, rather than observational, psychology is that experimentation gives causal identification which leads to researcher overconfidence. An extreme example of this overconfidence came from a prominent psychology professor who angrily stated that if you have a randomized experiment, then you don’t need to be concerned about your sample being nonrepresentative of the population (see here for the story). We’ve seen a similar overconfidence from economists doing regression discontinuity studies. The existence of the identification strategy encourages these researchers to turn off their brains. They are making the strongest-link fallacy. Anyway, the point is that in non-experimental psychology the threats to validity are obvious, and that tempers any strong claims that researchers will make from their data. With experiments, they just think everything’s scientifically hunky-dory: give them their “p less than 0.05” (or, if they’re studying something important like early childhood intervention, their “p less than 0.1”) and they’re off to the races.

Hence the replication crisis. Just as it is said that our modern megafires arise from having forests full of trees, all ready to ignite and preserved in that kindling-like state by firefighting policies that have favored preservation of existing trees over all else, so has the replication crisis been fueled by a decades-long supply of highly vulnerable research articles, kept in their pristine state through an active effort of leaders of the academic psychology establishment to suppress criticism of any published work.

Again, it’s not that psychology is worse than other fields; as we’ve discussed elsewhere, psychology has lots of experiments which are easy to replicate (unlike in the fundamentally observational fields of economics and political science) and which are inexpensive in time, money, and lives (unlike in medicine or education research). Other fields have also have woods that are ready to burst into flames, but the matches have not yet been struck in sufficient quantity.

Back to biology

I sent the above to Pamela, who had some things to add:

1. Surely people can come up with counterexamples. But the more a conclusion matters to anyone, the more it will have been incidentally replicated. Some subfields of biology, like epidemiology or ecology or paleontology, are probably more like the social sciences you mentioned in terms of studying intrinsically non-repeatable things, sometimes at great expense, and with little or no experimental manipulation.

2. It’s true that when biologists publish new methods others rush to use them. But I think the bigger story is the way biologists build on results.

By way of example: figuring out what’s going on with one specific protein, say, involves a winnowing down of possibilities. First we might ask “is it in the nucleus or cytoplasm of the cell?” and if the result is “it’s the nucleus”, the next experiment might purify it from nuclear protein extract to ask the next question “is it phosphorylated or not?”; if that one’s a yes, then the next question will be “what enzyme phosphorylates it?” or “does phosphorylation alter protein function X?”… and so on. Every new experiment is based on conclusions reached up to that point. Competent biologists also routinely include “controls” in every experiment to verify all the important facts they think they know as well as any assumptions that are newly-minted and less secure. We do this because experiments are hard, and that’s how one verifies that a particular experiment worked properly, such that any new finding can be trusted. If in your next experiment the protein is not found in the nucleus, or is not phosphorylated, then you know either you’ve screwed up your experiment, or your situation isn’t comparable to the original one, or the original result was a mistake. So you pause the planned experiment to track down the problem. Redo the same experiment with freshly made solutions; re-test the old sample from the freezer; etc. So the knowledge is cumulative in this very fine-grained way, within as well as across research teams.

One thing that varies is the timescale of this loop, however. Biochemistry or bacterial genetic experiments are cheap and fast, so you might have several cycles of that sort happening per week within one person’s project. Animal behavior experiments take much longer and cost much more, so one or two rounds of iteration might represent enough work to be a published paper. The pattern is the same, just scaled.

P.S. In comments, some people point out that biology is a huge science with many different subfields. I agree, that this came up in my discussion with Megan and Pamela. Gene-association studies can be very statistical, and there are some areas within evolutionary biology that look a lot like some areas of psychology; consider for example the notorious beauty-and-sex-ratio paper that was published in the Journal of Theoretical Biology and is a classic example of a publication that is based entirely on a series of statistical errors allowing the author to make a politically convenient claim that then was published in a real journal and received media publicity.

The point about the cumulative aspect of research, at least in some areas of experimental biology, is that research group A will publish a paper that shows a method that research group B will want to use. So the point of the “replication” by group B is not to support group A or to shoot down group A or whatever, but rather just to use this method to make their (group B’s) research better. A similar thing arises in some areas of computational statistics: when we published the Nuts algorithm, lots of people wanted to use it on their own problems, so they programmed it up and used it. To the extend our findings didn’t replicate, people would’ve learned right away. It wouldn’t have taken 20 years as in that notorious elderly-walking study, because people didn’t just want to cite the Nuts result and move on, they wanted to use it.

And again, this doesn’t make biology (or some areas of computational statistics) “better” than psychology. It’s just a difference between the fields, and it helps us understand how it is that replication issues have played out differently in these different fields.

63 thoughts on “Biology as a cumulative science, and the relevance of this idea to replication

  1. Mathematics is another discipline which has little or no replication crisis. This may be due to the fact that academics in this field publish less frequently than in other disciplines, a factor driving the replication crisis.

    • In mathematics, every reader of a paper “replicates” the results, i.e., checks the proofs. I may not check all the proofs in a paper if I’m just skimming. But, if I am seriously reading a paper, then I have to be checking the proofs. So, anyone can replicate. And, authors generally want to know if they’ve made a mistake, and will publish corrections if mistakes are found. At least this is usually true. An exception would be Mochizuki (Google “abc conjecture Mochizuki”).

        • Jay:

          You wrote that when you are a peer reviewer you check all the proofs. I don’t think most people do that. I know I don’t. I read the paper to see if I think it is worth publishing. I will check some of the proofs. But, it is the author’s responsibility to check the proofs. If a paper is extremely important, the editors may ask the reviewers to check the proofs.

          Many mistakes can be fixed because the argument is heading in the right direction, but you just forgot a case or something and so you can find another way to get to where you are going. But, of course, sometimes you can’t, and then you need to come up with a whole new argument. I agree that intuition is an important guide. We don’t remember how to prove everything we know is true, so we rely on our intuition to plot the course, then fill in the details.

  2. I work in human genetics and I think that there are replication issues,
    though nothing like so bad as in experimental psychology. Examples:
    1) In studying genetic history using ancient DNA true replication is hard or
    impossible. The raw fossils are probably unavailable, and either one uses
    different samples (which may have different history) or one can replicate
    the calculations of the first study, which is far from a real confirmation,
    2) In association studies (match genotype to phenotype) we can now use
    very large sample sets (good) and find small effects (good) but are vulnerable to
    subtle confounders. A true random sample is usually impossible.

    And in both cases there are potential forking path issues.

  3. As with other posts along the lines of comparing one discipline with another, I’m not sure I buy the claims. Frankly, I don’t know enough about research in biology to pretend that I know whether it is more cumulative science than psychology or economics, or whether the comparison involves carefully selected subsets of research in those fields.

    But my skeptical nature causes me to offer another factor to consider. It seems like the more easily research can be translated into headlines (and associated publications, grants, TED talks, etc.), the worse the replication crisis. The biological research being discussed here is important but complex stuff and requires building theoretical understanding incrementally (perhaps this is what you mean by “cumulative”). It seems to be more rare to think of a biological experiment that would offer profound conclusions after just one experiment. But in psychology or economics it is relatively easy to design an experiment that would permit (somewhat wild) speculation about what it implies about the world (e.g., how power poses can bring about success, how nudging people’s financial decisions can lead to large financial impacts).

    To the extent that this is the difference between cumulative research and fields that don’t require it, there would seem to be a difference in training. Most biologists I know were trained in the scientific method, are generally understated in what they conclude from their research, and are careful not to hype their findings. This is certainly not the case with most psychologists and economists that I know. I am not so sure about statisticians, though I think they are much more like the biologists at least compared with data scientists (who are often economists or computer scientists).

    In any case, I don’t find the difference between disciplines of much interest (after all, what are the GRE scores of biologists vs psychologists? – poor inside joke if you don’t get the reference just ignore). But I do think that viewing research as a cumulative enterprise imparts a health respect for the incremental nature of understanding, and is a useful protection against the increasing pressure to seek the limelight in a world of attention scarcity.

  4. (I’m a biophysicist; I spend at least as much time talking to biologists as physicists, probably more. I have no firsthand knowledge of psychology.) I would agree that replication problems aren’t as big in biology as they seem to be in psychology, but they are certainly significant — more so than the tone of the post implies. This seems worse the closer one gets to clinical / health-related things. It’s not always as striking as entire studies being nonsense, but (i) having methods that aren’t reproducible, much to the dismay of graduate students, and (ii) having “statistical” conclusions that are nonsense, due to p-hacking, etc., are routine.
    It’s hard, though, to make generalizations about something as gigantic as biology. A lot of molecular biology studies allow precise, well-constructed, quantitative experiments that lend themselves well to being built on. In other areas (ecology, anything touching the microbiome), an experiment pokes one knob in a fascinating, important complex system that really lends itself to over-interpretation or delusion. All of this is “biology.”

    • I think much of the same is true of Psychology. There are areas like psychophysics which are much more replicable with huge effects (and some things, like perceptual illusions and the stroop effect, are immediately replicable and obvious even to viewers with no science training) whereas social and clinical psych suffer a lot more because the events simply can’t be controlled in the way the scientific method demands, with lots of statistical handwaving.

      Generalizations of whole subfields are rarely of much use, there’s so much diversity within them.

      • Sean:

        One of the problems with psychology is that the psychology establishment (for example the Association for Psychological Science) has for years been heavily promoting some really bad work. One reason the replication crisis was such a big deal in psychology is that it affected the work of some big names in the field. And they continue to promote junk science. I feel bad for the many many psychology researchers who do solid work; it’s gotta be tough to be in a field where much of the leadership promotes the bad stuff. Biology seems a bit different: for example, that beauty-and-sex-ratio guy is a fringe figure, not a leader in the field.

        • Thanks for taking the time to respond. It is a good point regarding fame/prestige. In Psychology research (which I work in mostly) my own heuristic is that there is an inverse relationship with the fame of the researcher and the likely utility of the research. Famous psychology people really have to earn my trust more before I believe the claims. Research that is more “boring” is often the most solid stuff! I don’t know enough famous biologists to make that sort of claim for them, but I suspect that media hype affects most scientific disciplines, but agree psych has it bad (medicine too, in my experience).

          The news hype cycles really make it worse though. I still rember one time I found a modest effect where I found partner effects in couples alcohol consumption (e.g., partners influenced each other to drink more). My university released a press release “Drinking as contagious as the common cold”… Made me less likely to want to promote my own research if it was gonna get hyped like that!

  5. I partially disagree. I think the primary difference between biology and psychology is the time/effort it takes to replicate a finding. In cell biology, an experiment can often be repeated in a week or so; in psychology, repeating a well-powered study could take months of recruiting volunteers.

    I think the replication crisis simply takes a different form in biology. Instead of big, well-known studies crumbling under scrutiny, in biology we have tens of thousands of low-quality experiments published every week and for the vast majority, no one even tries to replicate the findings. I agree that eventually the field muddles through this muck and settles on a textbook description of a biological process. And yes, those get corrected over time, too.

    But the process is super wasteful. I wish journals were better at weeding out low-quality science. Case in point, the majority of cell biology papers mistakenly count each cell measurement in a single sample as n and calculate tiny p-values based on these inflated sample sizes. It’s not uncommon to see a p-value of 10^-29 on a tiny difference between a grand total of 2 samples (one control and one treated). It’s comical, but it’s common practice:
    https://doi.org/10.1083/jcb.202001064
    https://bmcneurosci.biomedcentral.com/articles/10.1186/1471-2202-11-5

    I think preprints should be the land of all these pseudoreplicated findings, but journal editors should require that authors demonstrate their findings are reproducible before actually publishing. In fact, I’d prefer to jettison traditional peer review and replace it with “peer replication,” where another lab replicates key findings before they are published:
    https://blog.everydayscientist.com/?p=3913

    • In fact, I’d prefer to jettison traditional peer review and replace it with “peer replication,” where another lab replicates key findings before they are published

      I agree, but disagree with the framing. The rise of peer review actually parallels the adoption of NHST quite well. Following WWII, peer review gradually replaced peer replication. At the same time, testing a null hypothesis replaced testing your hypothesis.

      So, peer replication is “traditional” and peer review is the “new” method that replaced it:

      Well into the twentieth century, many renowned scientists went their entire careers without having a paper refereed—and were not always enthusiastic when introduced to the practice. In 1936, for instance, Albert Einstein was extremely offended when he learned that the editor of Physical Review had sent his submitted paper to an external referee. In a terse note to the editor, John Tate, Einstein wrote that he and his co-author

      had not authorized you to show [our manuscript] to specialists before it is printed. I see no reason to address the—in any case erroneous—comments of your anonymous expert. On the basis of this incident I prefer to publish the paper elsewhere.9

      Furthermore, many high-profile journals did not adopt external refereeing until the 1960s or even later. One especially striking example is that of the prestigious scientific weekly Nature, which did not consult referees for every paper it printed until 1973.

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528400/

      Both practices appear to stem from the rise of primarily government-funded “industrialized” science.

  6. Comparing “biology” to any discipline is a bit much: biology has a huge range of subdisciplines, from evolutionary functional morphology to cell chemistry. Psychology relies extremely heavily on statistics to even identify effects, but most of biology doesn’t rely on statistics at all. Certainly almost nothing about the fundamentals of biology relies on statistics. No one conducted RCTs to determine that the heart is responsible for moving blood through the body. No one conducted RCTs to determine rabbits are prey to foxes. Scientists simply observed these things and, as their observations were validated time and time again, they became accepted as fact.

    Social sciences frequently propose “hypotheses” that have no possibility of direct observation and only the most rudimentary reasoning (if any) to suppose they exist at all. They then propose to measure whether or not the hypothesis is real with a measurement technique that is no more confirmed than the original hypothesis, and frequently on a subset of subjects that is small and has only the most rudimentary constraints on its “representativeness” of what it purports to represent. Finally, they’re forced to use unreliable statistical methods to assess their unreliable measurements of their unreliable sample to assess their unlikely hypothesis.

    There’s simply no analogy for this experimental approach in physical and biological sciences.

    I tried to imagine how a social scientist would do a biology study. I can’t do it! I don’t think it’s even possible to create a sensible parody.

    Andrew: you should have a contest! $1000 for the best parody of a social science experiment in biology! There must be 50 or 100 readers that would contribute $10-$20! Then people could vote on the winner!

    • I think more than saying biology is a cumulative science, this makes sense. To the extent that fields *don’t* rely on statistics, they’re likely to avoid the replication crisis. By contrast, fields trying to measure tiny effects, effects so small that without statistics it’s hard to evaluate the outcome of an experiment, these are the fields that are set-up for a replication crisis when statistics are misused (as is often the case).

  7. The sub-disciplines often mentioned here are ‘couch’ psych. varieties: social, personality and to some extent clinical. Experimental psych. is about statistics and research design, which can be applied to any science or sub-specialty of Psychology. I keep reading of a few non-replicated examples that get recycled (beauty ratio, ESP, Obama and ovulation, etc.), but nothing from cognitive, perceptual, attention research. Sometimes those branches are under a broad term ‘Vision science’, depending on the school. Your average cognitive or perception researcher knows way, way more about statistics and research design than your average biologist (just follow TWIV or UCSF grand rounds during COVID pandemic to get some good laughs).
    To be fair, Biology is way too broad with new disciplines emerging every decade, that the above statement is relative. I don’t put one Biophysicist (such as Raghu above) or molecular Biologist working on protein structures in the same bin with other (wet) Biologists. Yes, the findings seem to replicate, but the thing that gets replicated the most in Biology, is how much is still unknown. Just look at immunology, microbiome, CRISPR and other TED talk-worthy topics. Surface is barely scratched and, yes, it replicates, but because of mechanisms that are largely unknown or misinterpreted, not because of better study design or effect size interpretations.
    Finally, NHST is as rampant in Bio as in social sciences, if not worse (TWIV, UCSF rounds youtube again), and it goes deeper than that. I have seen many immunology papers showing a handful of data points spread over three or more boxplots, that don’t match overconfident interpretations of the findings.

    • We just got a review back from a high-end biomedical journal for a paper that avoids NHST etc. as prescribed in this blog. One reviewer states a position that I think dominates about 95% of social science/biomedical research:

      “P-values are necessary to know if the effect is observable.”

      How does one respond to THAT?

    • Navigator:

      You write, “I keep reading of a few non-replicated examples that get recycled (beauty ratio, ESP, Obama and ovulation, etc.) . . .”

      First off, I don’t “recycle” these examples, I refer to them. It’s convenient to refer to examples we’ve discussed before. There are lots and lots of other examples of psychology that have failed to replicate, and some of these examples have been promoted by the Association for Psychological Science and leaders in academic psychology as well as the usual media outlets such as NPR and Ted. That elderly walking alone has been cited thousands of times!

      I agree with your other point that things seem to be different in other subfields of psychology. The discipline of psychology is doing itself no favors by heavily promoting bad work.

      • Sure, Andrew. I meant those are the ones I noticed used often on this blog, and Psychology is full of other examples:

        https://www.frontiersin.org/articles/10.3389/fpsyg.2021.769294/full?utm_source=F-AAE&utm_medium=EMLF&utm_campaign=MRK_1816931_a0P58000000G0YfEAK_Psycho_20220215_arts_A
        Looks like we finally know what wisdom is (sarc.)

        My point was that Psychology is very broad with a lot of sub-disciplines. But I agree that APA’s job should be to oversee all branches and do some ‘police work’. What bothers me is singling out all of Psychology, while ‘Nutritional Epidemiology’ or many branches of Biology and probably other sciences silently walk by without getting noticed as usual non-replicating suspects.

        Why? Because ‘couch’ Psychology is about everyday things and we can all relate to its concepts. Almost everyone can follow research methods used in pop Psych. Meanwhile if someone publishes a bad study of some ‘XYZW-R2D2-QUERTY protein’ that does this and that, in some obscure Bio journal, nobody will notice it. Worse, actually. Most reading it not being intimately familiar with research methods in molecular Biology will think it’s good solid science, they just don’t have enough background knowledge to follow it :-)

        Finally, bad info on beauty ratio won’t kill anyone (hopefully). However, bad science in nutrition, exercise, COVID, etc. will cause some harm for sure, to put it mildly.

    • While I tend to agree that there are differences between different fields of experimental psychology, it’s not true that there is “nothing” from cognitive, perceptual, and attention research. I just published an investigation of studies of object based attention. Slightly over half of them seem “too good to be true”.

      https://rdcu.be/cH2EW

  8. (Posting this again without the links, because my previous post was sent to the spam folder.)

    I mostly disagree. I think the primary difference between biology and psychology is the time/effort it takes to replicate a finding. In cell biology, an experiment can often be repeated in a week or so; in psychology, repeating a well-powered study could take months and require a lot of resources.

    I just think the replication crisis takes a different form in biology. Instead of big, well-known studies crumbling under scrutiny, in biology we have tens of thousands of low-quality experiments published every week and for the vast majority, no one even tries to replicate the findings. I agree that eventually the field muddles through this muck and settles on a textbook description of a biological process. And yes, those get corrected over time, too.

    But the process is super wasteful. I wish journals were better at weeding out low-quality science. Case in point, the majority of cell biology papers mistakenly count each cell measurement in a single sample as n and calculate tiny p-values based on these inflated sample sizes. It’s not uncommon to see a p-value of 10^-29 on a tiny difference between a grand total of 2 samples (one control and one treated). It’s comical, but it’s standard practice.

    I think preprints should be the land of all these pseudoreplicated findings, and journal editors should require that authors demonstrate their findings are reproducible before actually publishing. In fact, I’d prefer to jettison traditional peer review altogether and replace it with “peer replication,” where another lab replicates key findings before they are published. (An ancillary benefit of this scheme would be that it would incentivize peer replicators by giving them authorship. Plus it would disseminate novel techniques faster and encourage collaboration among labs.)

  9. As the biologist quoted above, I just want to support some of the comments that have been made.

    -I agree that Biology is a huge set of disciplines with different constraints and subcultures, so the observations I offered pertain to some but not all branches. Specifically my observations come from two fields of experimental basic research: molecular biology (including biophysics, biochemistry, genetics); and, systems neuroscience (including sensory neurophysiology, psychophysics, and quantitative animal behavior). Key similarity: the usefulness of a new finding is primarily to guide further research. So perhaps the thesis should be: to the extent that ANY field has this cumulative structure, that field is less prone to spurious results being believed for long.

    -I strongly agree that the closer a result is to “TED talk” material, the more skeptical we should be. We live in an attention economy. Any result that can garner attention or produce headline of interest to the general public is at higher risk of having been motivated by that goal, just as a result that stands to benefit the authors financially. The kind of biology research I was talking about is geeky – ONLY interesting to other researchers for the purpose of figuring out underlying mechanistic models for how some basic biological phenomenon works. Basic research is often most efficiently accomplished by choosing example cases for their amenability to study; often those cases are of no intrinsic medical or social interest. (To state the obvious for the record: the long game is the premise that some fraction of the fundamental insights or fortuitous discoveries gained thereby will prove to be of immense practical importance, perhaps in a way not anticipated at the time of the research).

    • I agree with your points here to some degree. It makes sense to separate techniques papers from typical biology findings. If a technique (or probe or construct, etc.) seems useful, other people will try it. However, a finding that “protein X accumulates in the nucleus and colocalizes with protein Y upon stimulation Z” is likely to not get a lot of retesting. In fact, most biologists would not be surprised at all if they failed to replicate such a finding. They’d chalk it up to a different cell line or a different part of the cell cycle or something. Eventually, the original finding will likely stop being cited if no one can reproduce it, but it seems pretty inefficient.

      I think the “replication crisis” in biology just takes a different form. We just churn out reams of irreproducible papers and the wheat eventually floats up above the chaff. I think maybe that’s what you mean, Prof. Reinagel. I just think it’s a very wasteful way to go, instead of taking more steps to ensure reproducibility from the beginning. Why is it OK that most papers are probably false?

      (Finally, I think there’s also a lot of irreproducible science in the mid- and lower-tier journals, not just the flashy stuff. The perverse incentives in publishing apply to all levels.)

      • I would say that to some degree, even the papers that come to conclusions, relying on the previous methods papers, end up being replicated in some way. Because if you try to show something similar in a similar system and get results that are not even remotely similar, you won’t necessarily chalk it up to different systems.

        I know people who have redone experiments as reported to ensure they understand the details of the techniques and can get reliable results with them before they use them in their own systems. And in fancy places with lots of money go and visit labs (especially if they don’t seem to be getting it right). But I will admit that people doing that kind of work often have no grasp of statistical methods… I imagine because in some ways you’ve shown something happens or it doesn’t.

        An an ecologist, we are often in the position of psychology/sociology, which is why many statistical methods end up being used in the same places (starting from RCT in agricultural systems). And in the unenviable position of undertaking projects that can take 5+ years and result in one or two papers.

        In my area of research, it is to some extent cumulative. We can’t really reproduce the research that took many thousands of hours to produce (even if you had unlimited seasons, as you do in a lab setting). But when reporting results about a system you studied, you are (almost) never attempting to suggest they transfer to being universally true. Or even true in similar systems. You do get to say, this is a thing that CAN happen. Particularly in ecology where every thing is interacting with everything else, it is important to be aware of things that you may have thought were ignorable.

        The accumulation of evidence happens when multiple studies ask similar questions in multiple systems. If some of them (especially systems that are like yours) say that X interaction CAN be important you may find there is enough evidence that effect of X interaction should be considered (and the data to measure it needs to be collected) before you decide it’s ignorable. That of course is why ecological studies can be so labor intensive. Don’t have to buy a lot of expensive equipment or chemicals… but you do need to employ an army of undergrads.

    • Pam:

      I get your idea of “cumulative” research. But is it a feature peculiar to biology? If we go back to Aristotle or even just to Newton, biology was no more “cumulative” than it is today and really until the Renaissance, there was almost no cumulative science whatsoever, just a little bit of physics, math and astronomy. IMO the cumulative nature of biology, geology, physics and chemistry reflect the fact that fundamental observations have been established by repeated observations and thus become the basis for new questions.

      But should we call that “cumulative science”? Or just plain “science”?

      Note the key feature: repeated observation. I don’t agree with Sam that it’s “easy” or cheap to repeat observations in other sciences. In a lab it might be, depending on the experiment. But in many disciplines the repetition occurs over years and even decades, not over weeks or months, and is anything but cheap and easy. How many times has the iridium layer at the K-T boundary (dinosaur extinction) been tested? Dozens, surely, over a period of decades; and there must be tens of thousands of biostratigraphic studies of the K-T boundary, probably each requiring several years of dedicated research.

      And as others have said social sciences also has fields where knowledge builds progressively, like polling.

      So I’d suggest that *cumulative* science *is* science.

        • Andrew said:

          “Tell it to the Association for Psychological Sciences and the National Academy of Sciences.”

          I read a book by one of the founders of CBT. I vaguely recall he came about CBT because the methods available were, for the most part, totally ineffective.

          In your PS you add:

          “The point about the cumulative aspect of research, at least in some areas of experimental biology, is that research group A will publish a paper that shows a method that research group B will want to use. So the point of the “replication” by group B is not to support group A or to shoot down group A or whatever, but rather just to use this method to make their (group B’s) research better. ”

          That’s one of many possible reasons B would “replicate” or retest work by A. Another reason is that A proposes general principles that need to be tested in a variety of different circumstances. A certain compound has a certain effect in pig cells, the researchers postulate the effect will be general to all mammals; the compound and effect are tested in the cells of other mammals. The variation of context allows the principles behind the effect to be further refined and developed.

          But ultimately if the work isn’t reliable nothing can be built on it, so unreliable work will never produce “cumulative” science. Sam’s idea about separating the wheat from the chaff is relevant: there is a selection effect. Reliable work is built upon, unreliable work can’t be built on. But the fact that a work isn’t built on doesn’t imply it isn’t reliable. Maybe people just don’t know what to do with it.

  10. I’m linking this here in case you haven’t seen it: Tim Errington et al. (2021) “Investigating the replicability of preclinical cancer biology”, https://elifesciences.org/articles/71601.

    It basically shows the same pattern we’ve seen in psychology. Figure 3 is impressive: Median replication effect sizes in cancer biology were about 1/5 of the original studies.

    • Yes, and keep in mind they couldn’t even figure out the methods for half the studies. Those are also failed replications. And these were all very prominent studies.

      Similar poor performance was seen in the case of spinal cord injury research:

      https://pubmed.ncbi.nlm.nih.gov/22078756/

      So the evidence available shows there is a huge replication crisis in biology.

      It is actually worse than a coinflip. At that point we should stop funding the actual experiments. Instead come up with ideas and flip a coin.

  11. It reminds me of this paper:

    As we have seen, traditional reliance on statisti-cal significance testing leads to the false appear-ance of conflicting and internally contradictory research literatures. This has a debilitating effecton the general research effort to develop cumula-tive theoretical knowledge and understanding.However, it is also important to note that it destroys the usefulness of psychological researchas a means for solving practical problems in so-ciety.

    The sequence of events has been much thesame in one applied research area after another. First, there is initial optimism about using socialscience research to answer socially importantquestions that arise. Do government-sponsoredjob-training programs work? One will do studiesto find out. Does integration increase the school achievement of Black children? Research will provide the answer. Next, several studies onthe question are conducted, but the results areconflicting. There is some disappointment that the question has not been answered, but poli-cymakers—and people in general—are still opti-mistic. They, along with the researchers, concludethat more research is needed to identify theinteractions (moderators) that have caused the conflicting findings. For example, perhapswhether job training works depends on the ageand education of the trainees. Maybe smaller classes in the schools are beneficial only for lower IQ children. Researchers may hypothesize that psychotherapy works for middle-class pa-tients but not lower-class patients.

    In the third phase, a large number of research studies are funded and conducted to test these moderator hypotheses. When they are completed,there is now a large body of studies, but instead of being resolved, the number of conflicts in-creases. The moderator hypotheses from the initial studies are not borne out. No one can make muchsense out of the conflicting findings. Researchersconclude that the phenomenon that was selectedfor study in this particular case has turned outto be hopelessly complex, and so they turn tothe investigation of another question, hoping thatthis time the question will turn out to be more tractable. Research sponsors, government offi-cials, and the public become disenchanted andcynical. Research funding agencies cut moneyfor research in this area and in related areas.After this cycle has been repeated enough times,social and behavioral scientists themselves be-come cynical about the value of their own work,and they begin to express doubts about whetherbehavioral and social science research is capablein principle of developing cumulative knowledgeand providing general answers to socially im-portant questions (e.g., see Cronbach, 1975; Ger-gen, 1982; Meehl, 1978). Cronbach’s (1975) article”The Two Disciplines of Scientific PsychologyRevisited” is a clear statement of this sense of hopelessness.

    http://www2.psych.ubc.ca/~schaller/528Readings/Schmidt1996.pdf

    However, what I see is the exact same process playing out in biology. Eg, look at ivermectin, or vitamin D for covid. A pile of conflicting results gets generated, people cherry pick the ones they like (there is always a legitimate reason to throw out any of them), and no progress is made.

      • Yes, biology has made enormous progress since Aristotle. It has some enormously powerful theories – since Darwin we are all successfully replicating studies on his ideas and on refinements of his ideas developed by many biologists who came after him, for example, by R. A. Fisher, the biologist. So, in this sense there really is no replication crisis in biology: the theories about evolution are successful; we all witness how the coronavirus evolves. Modern medicine is unthinkable without the successful methods developed in biology, and indeed the distinction between medicine and biology is rather blurry. And yes, methods are cumulating. Think of all those genetic methods that have made huge progress since the genetic code was deciphered in the 1960s (a century after Darwin’s On the Origin of Species).
        But strangely, most biologists I know still use statistical methods developed in the stone age, by R. A. Fisher (the statistician) and his buddies Neyman and Pearson. In evolutionary biology, there are a few freaks using Bayesian methods, but all the rest relies on NHST that was actually developed by nobody (for a survey on one of our best journals, see https://doi.org/10.20944/preprints202112.0235.v1).
        I work in a field that one could call the psychology department of biology (animal behavior) and in another field, conservation biology, that is sort of the epidemiology department (using observational data to study patterns of health deterioration of planet earth). I agree there is not much talk about a replication crisis in those fields, but the reason is only that almost nobody publishes replication studies.

        I often wondered why the biomedical sciences (that Andrew calls the wet lab variety of biology) made such enormous progress although it seems that conclusions from single studies are at least as unreliable as anywhere else (see the Errington et al. paper linked above). Maybe what makes them *appear* successful is the sheer volume of biomedical studies that are published each year, and we usually only see the results that were important enough to be replicated and then survived years or even decades of checks by other scientists. Also, it seems to me, Andrew, that in your post you are mostly speaking about biological *methods*, not about conclusions from single studies. Those wet lab methods are constantly being further developed, while apparently the applied statistical methods that researchers use for drawing conclusions are not. Maybe that is a reason why biology as a field looks so successful and reliable, while conclusions from individual studies are not. To be clear, I think the statistical methods *are* being further developed, but it seems that most researchers do not consider it necessary to adapt to the new developments.

        • >R. A. Fisher (the statistician) and his buddies Neyman and Pearson

          Fisher loathed Neyman and Pearson! Fisher’s last book (Statistical Methods and Scientific Inference, 1956) was to a great extent an attack on the Neyman-Pearson (N-P) school. He especially loathed the idea of “types of errors”. Statistics as taught to biologists is a (usually) unacknowledged hybrid of N-P “acceptance procedures” given a Fisherian evidential interpretation; neither N-P nor Fisher would probably approve of this. The conflict of methods and personalities between Fisher and N-P has been well documented by historians (see, e.g., G. Gigerenzer et al., 1990, The Empire of Chance), and commented upon at some length by a geneticist (A.W.F. Edwards, 1972, Likelihood) and a biostatistician (R. Royall, 1997, Statistical Evidence). Neither N-P nor Fisher were Bayesians, but they were far from intellectual allies– there are many more axes of variation of statistical practice than Bayesianism.

        • Hi Valentin,
          Andrew focused on the idea of cumulative methods (maybe because he works on developing methods), but you are correct that I was mostly talking about the cumulative nature of results.

          I agree with you that the majority of basic research biologists (at least in the fields I personally have closely observed) either use no statistical methods at all, or use and interpret NHST incorrectly. But the question you raise is one that really interests me: given that biologists in some fields have not been using, or have been misusing, statistics all these years, how is it that these fields have gotten so much right? In short I think it is because Biologists are relying on other robust and rigorous methods of inductive inference, which are as yet poorly codified and communicated. I think it would be extremely valuable to better articulate these methods, some of which are not statistical in nature. And I also think we could do even better by adopting (other more appropriate) statistical methods.

          There are lots of interesting issues raised by your post. I’ll try to scratch the surface of just one point. TLDR: a lot of biology research is *exploratory*, it is a mistake to try to assess the literature as if it were meant to be confirmatory.

          One issue I notice is a misconception about the nature and purpose of isolated results or effects reported in the literature. Most experiments contribute to the progress of the field more in the way that one loop of Velcro contributes to the strength of a whole Velcro strip; they were never meant to be rivets. If a paper presents a set of data with an analysis suggesting some effect, this is a “data point”, not a “Fact”. Its purpose is to provide *some* evidence, usually with respect to the relative plausibility of alternative causal mechanistic models, or estimating the relative likelihood that alternative research directions will prove fruitful. If a finding reduces my uncertainty by a fraction of a bit or updates the relative posterior probabilities by a smidge, it has value. Of course more evidence has more value, but that doesn’t make modest amounts of evidence bad science. The progress of the field as a whole benefits from sharing interim, provisional observations and ideas with each other as we go along. It is incumbent on the reader of the literature to assess and keep track of how certain or provisional each claim is.

          The success of prediction markets is evidence that experts within a field are quite good judges of the epistemic status of the claims in their field. Those who are not directly involved in research in a field, however, might do better to consult review articles. Things are often hotly debated for years or decades. When you see a stable consensus among several review articles, it is reasonable to interpret those claims as conclusions that the field considers reasonably general and well established. Everything else should be interpreted as a hypothesis still under discussion and investigation, or else, an isolated observation that isn’t being followed up because it currently doesn’t seem useful.

        • But the question you raise is one that really interests me: given that biologists in some fields have not been using, or have been misusing, statistics all these years, how is it that these fields have gotten so much right?

          This is the premise that needs to be questioned. How do you show they “have gotten so much right” without circular logic?

          I would say the success rate of direct replications, which is very low. That shows we do not typically even know the relevant experimental conditions that need to be reported in the paper.

      • United States reported 3,440,548 deaths of all ages for the year 2020. Expected deaths were 3,028,959. That is an increase of 411,589 deaths (+13.6%).

        United States reported 3,432,727 deaths of all ages for the year 2021. Expected deaths were 2,971,452. That is an increase of 461,275 deaths (+15.5%).

        To date, for the year 2022, United States reported 450,327 deaths of all ages. Expected deaths thus far, were 377,477. That is an increase of 72,850 deaths (+19.3%).

        https://www.usmortality.com/excess-absolute#unitedstates

        What has impressed you about these vaccines? Pretty much everyone got infected within less than a year and excess deaths actually appear to be increasing.

        Meanwhile studies that ignore (or handwave) the glaring difference in testing between vaccinated vs not and changing over time continue to be pumped out. It is just sad at this point.

    • @Anon,
      ‘…However, what I see is the exact same process playing out in biology. Eg, look at ivermectin, or vitamin D for covid. A pile of conflicting results gets generated, people cherry pick the ones they like (there is always a legitimate reason to throw out any of them), and no progress is made…’

      It’s a pile, all right. Some of those things have as much chance at preventing COVID as chocolate pudding does. But what about that famous Pfizer vaccine trial, based on which vaccination was (provisionally) approved? Over 36K recruited and randomized into placebo and vacc. Only 162 in the placebo group contracted COVID and 8 in the vacc. group. Isn’t that the mother of all cherry picking? The conclusion could very well be that placebo is highly protective. I mean, measly 162 out of more than 18K!! Vaccine is even more protective, but placebo is at least fantastic, judging purely by numbers ;-) Not to mention the lack of control over the dependent variable (letting everyone go about their lives, as if they have the same probability of getting infected).
      What happened at the end? Vaccines were shown to be excellent. However, if you start with something that had relatively low mortality to begin with and give it a nudge, you get excellent results.

      • @Navigator

        Already in the Pfizer trial we saw more “covid-like illness” in the vaccinated group, in particular in the first two weeks after the shot.

        They also failed to perform any type of exit survey and the vaccines have a rather distinct side effect profile. So it is doubtful the blinding was very successful (both of subjects and those they reported symptoms to who decided if a test was merited).

        Then just like in moderna, they saw slightly more all cause mortality in the vaccinated group.

        This was all pointed out at the time. And I have no idea how one can claim the vaccines are excellent when there have been more cases than ever and excess mortality has increased since the mass vaccination campaigns all over the world.

        There are probably severe covid cases that were turned into mild/paucisymptomatic covid cases but from all cause and excess mortality this appears to be outweighed by some side effect. Whether that is due to blood clots, the immunosupression period, stress due to the immune response in the frail, allergies to PEG products, or something else I don’t know.

    • I was a grad student in Ralph Levine’s course (see the last paragraph of the Schmidt paper). It wasn’t primarily the faculty who were complaining. We were completely fine with meta-analysis and the emphasis on effect sizes; we just wanted them to address the file drawer problem.

      In the end I ended up driving to Ann Arbor to take courses at IPSCR.

  12. One thing that makes molecular biology and related sub fields effective is that biologists often use multiple lines of evidence within a study to establish causality. For example, a common approach when establishing what a gene does is to first wreck the gene entirely with a mutation, then see if you can restore the original state by adding a new copy of the gene to compensate for the mutated version. You can also take the gene out of its original context and put it into a different organism and see you can recreate the pheromona associated with the gene.

  13. How many animals were killed for your vaccines? As a student of Jainism, can you understand why I didn’t want to cooperate with your “scientific” violence?

    • Rsm:

      I have no idea whatsoever why you are commenting on my posts on biology and not about my posts on Jamaican beef patties. I respect that you have your religious convictions, but, like most Americans, I eat meat, wear leather, use modern medicine, etc etc. You might want to start by setting up an information booth outside the local McDonalds rather than wasting your valuable time here with blog comments.

      • Are you really saying that the still, quiet voice inside you cannot morally justify doing violence to animals, so you’ll just act irritated at anyone who brings up the topic?

        • Rsm:

          No, I’m not saying that. I’m just giving you the advice that if you’d like to say that vaccines and meat-eating are immoral, and debate with people about it, that this isn’t a good place for it. I’d recommend twitter or 4chan or some place like that where you and others can really go at it.

  14. I’m pretty skeptical of the biologist’s claim. One reason is that it seems that biologists seem to be pretty much clueless about statistics. I mean, absolutely zero knowledge. I read a book recently, Writing Science, written by Joshua Schimel, who seems to be a biologist. The book is really great, but he has some hilarious advice in it about how to interpret data. Here are some juicy excerpts (you can guess what the figures look like, from the descriptions–he does write well):

    “As an example, consider figure 8.3. In panel A there is a large difference (the treatment is 2.3 x the control) that is unquestionably statistically significant. Panel B shows data with the same statistical significance ( p = 0.02), but the difference between the treatments is smaller. You could describe both of these graphs by saying, “The treatment significantly increased the response ( p = 0.02).” That would be true, but the stories in panels A and B are different — in panel A, there is a strong effect and in panel B, a weak one. I would describe panel A by saying, “The treatment increased the response by a factor of 2.3 ( p = 0.02)”; for panel B, I might write, “The treatment increased the response by only 30 percent, but this increase was statistically significant ( p = 0.02).”

    In the figure he mentions, panel A is probably Type M error (just look at the uncertainty of the estimates compared to panel B), and what he calls a weak effect in panel B is more likely to be the accurate estimate (again, just look at those uncertainty intervals). So that’s a very misleading statement to call A a strong effect and B a weak effect. If given data like in panels A and B, I would take panel B more seriously.

    But it gets worse. Here is what Schimel has to say about panel C. Again, I highlight the absurd part of his comments/advice:

    “The tricky question is what to write about panel C. The difference between treatment and control is the same as in panel A (a factor of 2.3), but the data are more variable and so the statistics are weaker, in this case above the threshold that many use to distinguish whether there is a “significant” difference at all. Many would describe this panel by writing, “There was no significant effect of the treatment (p > 0.05).” Such a description, however, has several problems. The first problem is that many readers would infer that there was no difference between treatment and control. In fact though, they differed by a factor of 2.3. That is never the “same.” Also, with a p value of 0.07, the probability that the effect was due to the experimental treatment is still greater than 90 percent. Thus, a statement like this is probably making a Type II error — rejecting a real effect. The second problem is that just saying there was no significant effect mixes results and interpretation. When you do a statistical test, the F and p values are results . Deciding whether the test is significant is interpretation. When you describe the data solely in terms of whether the difference was significant, you present an interpretation of the data as the data, which violates an important principle of science. Any specific threshold for significance is an arbitrary choice with no fundamental basis in either science or statistics.”

    OK, this is a sample size of 1, maybe all other biologists know what they are doing. But this author is a very senior scientist and so a lot of his students must be now senior scientists themselves. The fact that such absurd statements are driving this line of work in biology makes me very skeptical that the field in general is doing anything better than the psychologists.

    • Shravan:

      I’m pretty sure that Pamela agrees with you on all this. Indeed, this is why she contacted me originally: she was unhappy about the way that statistics was taught in biology, and she had the impression that most biologists (or, at least, most wet-lab-style biologists) were clueless about statistics, indeed much more clueless than psychologists were. It still seemed to her that there was no major replication crisis in the field. I guess she’d agree that the sorts of claims illustrated in the quote in your comment wouldn’t replicate, but maybe she’d also say that these sort of claims weren’t the aspects of biology that other biologists would care about: the other biologists would be interested not in the purported effects but in the lab methods used to get there. On the other hand, there must be some ultimate reason for these methods. If the methods are replicable but the ultimate claims are not, what do we do with that? I’m not sure.

      • First, Andrew is right that I agree that most biologists don’t know much statistics, and most of them mis-use or mis-interpret the few methods they know about, which is a problem. More positively: I think biology could be more productive if we truly collaborated with statisticians. But most statisticians don’t know much biology, and even less about how biologists have hitherto been as successful as they have been. Lack of understanding and lack of appreciation in both directions hinders such productive collaborations. Nevertheless, I think many fields of biology have other rigorous methods for determining what results are real, and those methods underlie their remarkable progress. Indeed that progress is so impressive that I think it would be worthwhile to make a study of what those other rigorous methods are.

        Second, Andrew seems stuck on an interpretation I keep denying: my point was never that biologists care about “the lab methods used to get there”. Methods are a thing, but not my point. My point is that every experiment we do is based on everything we think is *known* already. But a lot of what we “know” is tentative, provisional, approximate, and conditional. Sometimes the question we would ask next requires a very costly experiment, so before going forward it’s worth re-checking everything we already think we know, first. (But re-checking by repeating the same experiment is weak; far better to re-check by trying to verify things in some independent way). In other cases, it may be more efficient to just try the next experiment now, even when you are only 90% sure or even 50% or 30% sure of some premise, because the next experiment simultaneously tests the premise and, should it hold up, also refines or extends it. Researchers vary in this risk tolerance, and I think the best biologists are the ones who tend to re-check things exhaustively at each step. But overly risky/cocky research just leads to failure (experiments don’t work, no results), not apparently-strong but wrong results.

        • (But re-checking by repeating the same experiment is weak; far better to re-check by trying to verify things in some independent way). In other cases, it may be more efficient to just try the next experiment now, even when you are only 90% sure or even 50% or 30% sure of some premise, because the next experiment simultaneously tests the premise and, should it hold up, also refines or extends it.

          A theory T can imply multiple observations O1 and O2. To check the theory, of course you should see if O1 *and* O2 are actually observed. This adds support for the theory T, along with any other theories that also predict O1 and O2.

          The point of direct replication is different. It is to ensure that O1 and O2 are actually reliable observations we can “hang our hats on”.

          From the replication projects we see that one report of an observation has something like 20% chance of being replicated. But let’s say it is 50%.

          In that case the probability that observations O1 and O2 are both correct works out to 0.5*0.5 = 0.25. Thus there is only 25% chance that T is actually consistent with reality. The more unreliable results you string together the less support there is for the theory!

          This is why we need to ensure the observations are reliable, say 90+%. The way to do that is direct replication.

          There are some cases, like transfecting GFP, that really have been replicated many times. And the observation of a green glow after the transfection is difficult to explain in any way other than the GFP mRNA getting translated into GFP. This is an impressive feat that does indicate some understanding of the system.

          But this is not the case for the vast majority of claims.

        • > But re-checking by repeating the same experiment is weak; far better to re-check by trying to verify things in some independent way
          Agree and that worked for me in one experiment on preserving lungs for transplant with 6 dogs in placebo and 6 in active. The observed effect was huge but given just a single study it could easily been a fluke. No one wanted to have to do another study with 12 dogs but the next step in the research program would clearly not work if the first study was a fluke. Move to the next step safely.

          But it was far more common in clinical research for no one to have a clue how what the next study should be and how it would also verify the results of the initial study.

          My speculation is that if researchers keep banging away and there is a reasonable signal to noise ratio and it’s not too hard to assess likely replication – stuff will progress. Things can change that improve this, better instruments, decreases in costs of replication, better ways to assess likely replication without having to redo studies etc.

          For instance in micro-array expression studies, other groups that already have most of what’s needed to check the results of a new study, in multiple micro-array expression studies say 5, the 2 where uninteresting but fairly will know expression signals don’t highly correlate and 3 do – we know the 3 studies that very likely will replicate.

        • Pamela: excellent! Well said!! Applause!!

          But can I add one component? in general, in bio and physical sciences, people are striving to find the reality, rather than striving to find some quackadoodle clickbait result that scores a quick pub somewhere, so they actually care about checking the foundations before proceeding. Also, reviewers and other biologists have the same goals and they’re more than happy to shoot down papers that overlooked relatively simple checks on shaky foundations.

    • My two cents:

      – I agree that the second claim (panel C) is absurd
      – your interpretation of panel A seems to be based on (1) a prior distribution on the effect sizes and/or (2) a presumption of p-value filtering/a file drawer effect; you’re probably right, but I wouldn’t necessarily give Schimel a hard time for trying to point out the disconnect between p-values and effect sizes.

      More generally, the people I know who are quantifying animal behaviour wouldn’t call themselves “systems neuroscientists”, so maybe Dr. Reinagel is talking about a different field, but I will say that there’s a lot of room for improvement in animal behaviour, and in microbiome studies … (the Spider Guy work(s/ed) in animal behaviour, although maybe that’s not a relevant data point here …)

  15. I think another important aspect in this discussion of differences among fields and sub-fields of science is the degree to which the results of a single experiment (or study) are disseminated in a single publication. In Pamela’s examples, there seems to be much less emphasis on each step being a “result to publish,” as opposed to the examples referenced in psychology where an experiment is typically viewed as a potential publication.

    • In a similar vein – I have interviewed a number of highly accomplished geneticists, molecular biologists, and biochemists about how they use statistics in their work. The most common answer is a tie between: “If you need statistics to determine if there is an effect, there isn’t one” and: “If you need statistics, you’ve designed a bad experiment”.

  16. Experimental social psychologists make “tools” for each other more than you’d think, and not just in the realm of psychometrics. One example I’m aware of is the “fast friends” procedure, a short activity that makes two people feel like they are good friends (there’s a romantic version that got some press a while back, unfortunately without a fun alliterative name: https://www.wired.co.uk/article/how-to-find-love-in-45-mins). This might seem like just a “that’s interesting” type of result, but it’s actually being used as a research tool: it turns out there are a lot of questions in social psychology that are easier to answer if you can induce friendships in a laboratory. For example, there are people interested in whether having friends from an outgroup makes you better disposed to that group (direct contact) or whether the effect is more powerful when the outgroup member is a friend-of-a-friend (extended contact). They use the procedure to test whether these attitudes change before or after the “friendship” is established.

    I think the distinction between “bench biology” and experimental psychology isn’t really one of whether the research is cumulative, exactly. It’s about how easy it is to tell when a tool is useful or not, versus it just being a statistical accident.

Leave a Reply

Your email address will not be published. Required fields are marked *