Would we be better off if randomized clinical trials had never been born?

This came up in discussion the other day. In statistics and medicine, we’re generally told to rely when possible on the statistically significance (or lack of statistical significance) of results from randomized trials. But, as we know, statistical significance has all sorts of problems, most notably that it ignores questions of cost and benefit, and it doesn’t play well with uncertainty. Hence my post, evidence-based medicine eats itself.

In comments, Nick wrote:

Ok, so it is not easy, but small incremental gains can get you a long way.

The amelioration of symptoms and prognosis of almost every common disease has improved since I [Nick] started clinical medicine in 1987; progress built on very many RCTs, none of them perfect but together forming a tapestry of overlapping evidential strands that can be read.

This made me wonder: Would this benefit have occurred without randomized clinical trials (RCTs), just by clinicians and researchers trying different things and publishing their qualitative findings? I have no idea (by which I really mean I have no idea, not that I’m saying that RCTs have no value).

There are famous examples of mistakes being made from poorly-adjusted observational studies (see for example here), and where this bias disappears in a randomized controlled trial. But my question is not, Can randomized clinical trials work well in particular examples? or, Are there examples where nonrandomized comparisons can be misleading? or, Can randomized trials be analyzed better? My question is, If randomized clinical trials were never done at all, would we be worse off than we are now, in terms of medical outcomes?

I think it’s possible that the answer to this question is No, that the damage done by statistical-significance thinking is greater than the benefits of controlled studies.

I have no idea how to estimate where we would be under that counterfactual, but perhaps it’s a useful thought experiment.

P.S. Just to clarify, I’m not saying that the alternative to randomized clinical trials would be pure guesswork or reasoning by anecdote. I assume that there would still be controlled comparisons of different treatments performed in comparable conditions, just without the randomization and the randomization-based inference.

Also, yes, I recognize that randomization can often make sense. I’m not saying that randomization is always a bad idea. I’m just wondering if the idea of randomization had never come up, whether on balance we’d be better off, in part because researchers and decision makers would need to wrestle directly with the issues of variation, uncertainty, and comparability of treatment and control groups, without thinking they had a magic wand that could just give them the answer.

132 thoughts on “Would we be better off if randomized clinical trials had never been born?

  1. The statistical significance paradigm is not exclusive to RCTs though, right? Observational studies use NHST all the time so I don’t think it’s fair to compare RCTs + statistical significance vs. no RCTs + no statistical significance.

    • Michael:

      Agreed. But randomization and statistical significance go together. Randomization is used to justify the probability calculations that result in statistical significance decisions. What if, instead of randomization, they just assigned treatments to similar groups and then plotted the results?

      • I don’t really see how that follows? The significance decisions usually come out of some kind of sampling framework, and there’s always going to be sampling. If they start assigning treatments by, say, pairing up matching participants instead of pure random sampling, then you’d probably see paired t-tests or something becoming the preferred analysis method.

      • > Randomization is used to justify the probability calculations that result in statistical significance decisions.

        Randomization is also used to justify that the “standard ignorability condition that is common in causal inference with observational data” is indeed verified when one has the opportunity to do an experiment.

        > What if, instead of randomization, they just assigned treatments to similar groups

        How would that be, if done properly, different from (stratified) randomization?

    • There is a point of view that NHST should not be used with observational data, on the grounds that without randomization there is no guarantee that the distribution of p-values is uniform under the null. That would leave the analysis of observational data to the methods used by Andrew and others, which I’m fine with.

  2. I think the COVID situation shows that at least in the early stages of a problem where we don’t have much idea what’s up at all, the qualitative and informal experimentation paradigm works reasonably well. as we get further along I think more well controlled experiments are needed. one form of control is to randomize, but I think there are other useful forms of control as well. for example optimization based strategies in which we try to move uphill based on stochastic gradient estimates or differential evolution strategies.

    in general it seems RCTs are used to find out if x is better than y, but RCTs are best at identifying causality, it’s not a good strategy for optimization to run causal experiments and then use the outcomes to choose the more effective method from among the two or three options. I think we’d get much more improvement using an explicit optimization strategy

    • A (real) question about “RCTs are best at identifying causality”? Is that true? Or ‘best’ for identifying associations? Given the variety of measured and unmeasured factors, and their variable correlations, it seems to me that finding associations would work well but not that x causes y. (Asking for non-expert a friend whose lives on the non-stats side of the world.) Finding associations of course leads to more specific tests – progress. (I also wonder if causality is has meaning when biological systems are organized around deeply inter-dependent and context-dependent architectures. For example, does it make sense to say that x causes y, and that y causes x? There are many examples of course. Would it be more informative to say that they are inter-dependent. Further, one can say that dysfunction in x causes disease y in some organ system. But that does not mean that normal x causes health.)

      • I’m on the fence re: ‘RCTs are best at identifying causality’, which is not to say that the trial may yield causal connections in some cases. It would be truly beneficial to do a live trial which can be evaluated by diverse experts.

      • To disambiguate the language, I actually meant “the thing that RCTs are best at is identifying causality” not that “among all options RCTs are the best one for identifying causality”

        Causality here is basically defined to mean the difference between what would happen with the intervention and what would happen without the intervention. So “incorrect glasses prescriptions cause eyestrain” is a meaningful statement even though in fact there are a lot of moving parts there, all the way from the quantum mechanics of light to the materials science of lenses to the sociology of access to eye care, to the method of teaching of optometry and the distribution of eye abnormalities in the population etc.

        Basically, if you discover what looks to be an effect of one thing on another thing, then you can verify that it’s a causal effect by getting a group, splitting it at random into two groups, and studying the treatment on one group vs a different (typically null) treatment on another group. The randomization ensures in the large sample limit that whatever could possibly be different about the two groups is balanced among them so that for every person with modifying factors A in one group, there’s a similar person in the other group.

        Of course, that’s *in the large sample limit* and RCTs can only really identify statistical/average effects. Given the high dimensionality of human existence, it’s entirely possible that a RCT doesn’t get anywhere near the large sample limit even with what’s normally considered very large samples (like say 100,000 people or 2M people or whatever). This can be particularly true when there are unmodeled mediating factors, like age, genetic background, diet, cultural factors, drug interactions, etc.

        My point about other methods is that the power of inference under RCTs doesn’t necessarily lead to good decision making, so as Garnett pointed out below, the differences should be considered carefully. If we want to make people say have the least amount of pain, it can be better to try different treatments on each person, using an optimization procedure, until each one arrives at the best available option for pain control. You can’t infer causality here in the same way, but you get way less pain nevertheless.

        • >> If we want to make people say have the least amount of pain, it can be better to try different treatments on each person, using an optimization procedure, until each one arrives at the best available option for pain control. You can’t infer causality here in the same way, but you get way less pain nevertheless.

          Yeah. This is one concern I have about FDA approvals of drugs.

          Sometimes with psychiatric stuff, you have to try several drugs of the same general sort (say, antidepressants) for a particular individual before you get the best result, and it isn’t necessarily predictable. For example, anecdotally – citalopram works for me, escitalopram doesn’t. Escitalopram is the biologically active isomer of citalopram [the racemic mixture] so there should be *no biological difference whatever*. Yet… there is for me. (Nocebo effect? Who knows…)

          So it may be better to have more options available even if some are less good population-wide.

        • Daniel,

          Thanks for clarifying about RCTs and causality.

          I worry that the large sample limit. As has been discussed here in many ways at many times, causes, consequences and interventions can be highly heterogeneous. Seeking large samples to make confident claims about causality means averaging across increasingly heterogeneous variability to arrive at an average estimate that may not apply to anyone in the populations, e.g. a drug or nutrient that works brilliantly in a few with no benefit or harm in others.

          I especially like your last paragraph which implies addressing heterogeneity directly, not as a nuisance but as reality. Will read Garnett’s links. This is an interesting problem, probably important too.

      • Well, control prevents association without causation because the only difference between the groups in an RCT is meant to be the treatment under investigation. And reverse causation is not possible because the treatment temporally precedes the measurement of the outcome. That only leaves causation.

        • Nick said,

          “Well, control prevents association without causation because the only difference between the groups in an RCT is meant to be the treatment under investigation.”

          But. “meant to be” is not the same. as “is”.

  3. A related concern with medical science is how infrequently results are conveyed in concrete, meaningful units. I think most practitioners have a strong desire to consider the costs and benefits of the treatments they have available, but the data is presented in non-intuitive ways (e.g., conditional probabilities). Even if nothing else is changed, I think we could make large gains by simply forcing researchers to convert their results into something more interpretable. For example, instead of saying that some intervention reduces mortality from some disease by, say, 50%, we should be saying that the drug reduces mortality from 1-in-350 to 1-in-700. The former (percent change in risk) is scale-free and is almost entirely useless it is paired with the base rate. However, the latter is a single-piece of information that’s easily interpreted while also carrying information about the base rates.

    • Oops, second-to-last sentence should read:

      “The former (percent change in risk) is scale-free and is almost entirely useless UNLESS it is paired with the base rate.”

  4. Well, you have the example of scurvy, where an early randomised controlled trial demonstrated the importance of citrus back in 1753 and people sort of forgot about it, relying on theory and anecdotal observations, until the mid 20th century when new trials were done.

    • My understanding is that the British Navy changed suppliers for their limes, and the new varieties had way way less vit C, so the effect of their citrus ration on scurvy was eliminated by this change. In many ways this example goes the other way from the interpretation you implied. The lack of mechanism in the original studies lead to people being unable to determine what the real cause was, and then to be unable to cope with changes that eliminated the original cause. Mechanism was critically important here.

      • I’ve heard this story as the vitamin C in lime juice being destroyed by a new method of preparation of the juice.

        But yeah – that was a case where lime juice “mysteriously stopped working” so people started to look for other causes. Definitely a problem caused by the mechanism not being understood.

        • Scurvy had been the leading killer of sailors on long ocean voyages; some ships experienced losses as high as 90% of their men. With the introduction of lemon juice, the British suddenly held a massive strategic advantage over their rivals, one they put to good use in the Napoleonic wars. British ships could now stay out on blockade duty for two years at a time, strangling French ports even as the merchantmen who ferried citrus to the blockading ships continued to die of scurvy, prohibited from touching the curative themselves.

          The success of lemon juice was so total that much of Sicily was soon transformed into a lemon orchard for the British fleet. Scurvy continued to be a vexing problem in other navies, who were slow to adopt citrus as a cure, as well as in the Merchant Marine, but for the Royal Navy it had become a disease of the past.

          By the middle of the 19th century, however, advances in technology were reducing the need for any kind of scurvy preventative. Steam power had shortened travel times considerably from the age of sail, so that it was rare for sailors other than whalers to be months at sea without fresh food. Citrus juice was a legal requirement on all British vessels by 1867, but in practical terms it was becoming superfluous.

          So when the Admiralty began to replace lemon juice with an ineffective substitute in 1860, it took a long time for anyone to notice. In that year, naval authorities switched procurement from Mediterranean lemons to West Indian limes. The motives for this were mainly colonial – it was better to buy from British plantations than to continue importing lemons from Europe. Confusion in naming didn’t help matters. Both “lemon” and “lime” were in use as a collective term for citrus, and though European lemons and sour limes are quite different fruits, their Latin names (citrus medica, var. limonica and citrus medica, var. acida) suggested that they were as closely related as green and red apples. Moreover, as there was a widespread belief that the antiscorbutic properties of lemons were due to their acidity, it made sense that the more acidic Caribbean limes would be even better at fighting the disease.

          In this, the Navy was deceived. Tests on animals would later show that fresh lime juice has a quarter of the scurvy-fighting power of fresh lemon juice. And the lime juice being served to sailors was not fresh, but had spent long periods of time in settling tanks open to the air, and had been pumped through copper tubing. A 1918 animal experiment using representative samples of lime juice from the navy and merchant marine showed that the ‘preventative’ often lacked any antiscorbutic power at all.

          https://idlewords.com/2010/03/scott_and_scurvy.htm

      • Scurvy is actually a good example of the main RCT failure mode. I’m still waiting for the chemo/radiation therapy study that accounts for caloric restriction due to the “side effect” of nausea slowing tumor growth.

        Also, wasn’t the use of lemons kind of a military secret? Seems like only the British navy really adopted it.

        https://en.wikipedia.org/wiki/Limey

        • Anoneuoid, what is your hypothesis, that some of the effect of chemo on tumors or mortality is due to weight loss?

        • I don’t know if weight loss is required but poorer absorption, appetite, etc leading to a caloric restriction or intermittent fasting effect. I’ve never seen this accounted for and it is definitely going on.

  5. I started medical school in the late 1960s. It was common for medical evidence to consist of small collections of anecdotes and for the main comment at tumor boards, collegial conferences held to discuss cases typically once a week at hospitals, to be “in my experience.” Much of the teaching I got was based on my elders’ personal experience. Reliance on RCT and statistical analysis of RCT was boosted when we learned that the highly touted by the leading authorities new chemotherapy for lymphoma was no better than an older regimen. In the 1990s my profession experienced a great shock with transplantion for breast cancer that made so much sense and was so attractive and trendy that almost every leader in my profession jumped on it before doing actual phase three trials. I have previously recommended “Malignant” by Vinay Prasad that outlines these issues very well.
    RCTs have lots of problems. Statistics is hard. Retreating to the old ways of trusting some master is not the answer.

      • “I wonder if primary care docs take any of the zillions of reported RCT results seriously.”

        I would not be at all surprised if a lot of primary care docs (and probably specialists as well) take seriously those RCT results that agree with what they already think, and dismiss RCT results that disagree with what they already think.

    • Also, Stephen Goodman has argued that understanding and doing RCTs was a way for the young upstarts to get promoted and replace the old masters and their small collections of anecdotes.

      So that “social” change would also need to be sorted out…

      But the real question is how to modify RCT practices in the future so that they do more good and less harm – e.g. dynamic trials, federated trails and non conclusions allowed from single trials, etc.

      • I don’t know what you mean by “dynamic trials” and “federated trials”, but “non conclusions allowed from single trials” sounds good.

  6. Ya, we need fewer RCT and more *MARKETING*. And “Ancient Chinese Medicine” and other “wisdom of the ages”. The Alternative Medicine industry can show us the way!

    • Divalent:

      If RCTs had never been born, I’d hope we’d not be making all our decisions based on marketing and traditions. There are lots of ways of doing comparative studies that don’t rely on randomization and statistical significance.

      • I would enjoy knowing the names of some of those lots of ways. Getting ready to crack open Douglas Montgomery’s “Design and Analysis of Experiments” for the first time and I would like to pay extra attention to the right chapters.

        • Jai – I think the idea is a comparison between similar groups, estimation, and a qualitative understanding of un-modeled uncertainty. So in practice this means controlling for confounders, get the best estimate of treatment effect possible given everything you know / want to assume about the process in question, and use that information in combination with all the other qualitative stuff you couldn’t model (e.g. maybe you think there was unobserved confounding or generalizability issues).

        • Michael,

          Thanks for responding. I have another question. You said that, out of the gate, the comparison must control for confounders. Yet, that is the point of randomization in the first place. I don’t get how to control for confounders without relying on randomization.

      • The marketing problem is a reasonable concern. Various supplements (or TCM, or fad diets, etc.) could be more appealing to people who care more about health and wellness, and would be actively marketed towards them. As a result, the observational evidence favoring such remedies accumulates, creating a causal inference problem that’s intractable without RCTs.

        • Naive observational studies would have trouble, sure.

          But I don’t think RCTs per se are necessarily required to distinguish between, say, “Supplement X decreases risk of heart attacks” and “people who take Supplement X tend to have healthier lifestyles which reduces their risk of heart attacks”.

          I am thinking of air pollution studies (my own field). The whole thing is controversial and muddy, since there are all kinds of correlations between the presence of various pollutants.* But still – we can generally draw correlations between pollution levels in city A and health outcomes, without actually randomizing what city has what pollution level, by controlling for many confounders.

          Now, randomized human exposure trials to low levels – “controlled human exposure” – are used to understand the biological effects. As are animal studies (for higher exposures it would be very unsafe/unethical to expose humans to). But that’s just one part of the picture, not all of it.

          *some of which are unavoidable since given the pollutant categories defined by EPA, one turns into another: sulfur and nitrogen oxides produce particulate matter through various reactions, nitrogen oxides and organic compounds react with oxygen to produce ozone, etc. But we can at least correlate health with pollution exposure, even if we can’t distinguish – say – smog and particulate effects in a particular study.

  7. My father was an internist who started practice in the 1930s and was a big fan of randomized clinical trials, but I think it mattered more to him that they were blinded than that they were randomized. He was acutely aware of the way the placebo effect and selection bias affected clinical impressions. Any effective alternative to RCTs needs to take account of these problems.

  8. Hi Andrew:

    Interesting question. I believe that the answer is we are better off. The Women’s Health Initiative is a good example of why RCTs are necessary. . Of course, you didn’t really expect us to answer this hypothetical.

    Rodney

    • Rodney:

      Indeed, I link to a discussion of the Women’s Health Initiative in the above post! I have no doubt that particular randomized clinical trials have improved people’s lives. On the other hand, there’s a cost if people make wrong decisions from RCTs or delay good decisions because they’re waiting for RCTs. My question is not, Have RCTs ever been beneficial? but rather, Are RCTs beneficial in net?

      • > Have RCTs ever been beneficial? but rather, Are RCTs beneficial in net?

        Maybe, “how far above/below replacement level are RCTs as a way to do medical science”?

  9. I would politely disagree with the statement “In statistics and medicine, we’re generally told to rely when possible on statistically significant results from randomized trials.”.

    Rather I would say “In statistics and medicine, we’re generally told to rely when possible on results from randomized trials.”. Many RCTs with negative outcomes have changed clinical practice (e.g. rosiglitazone increasing the risk of heart failure in type 2 diabetes…the RECORD trial).

    I have worked in clinical development in the pharma industry for 25 years and RCTs provide us all (pharma/regulators/physicians/patients/analyst) with the best estimates and uncertainty of the true effects of drug regimens for both efficacy and safety. Thus where possible, it is always the most scientific approach, and modern data analysis methods use (longitudinal) hierarchical non-linear (Bayesian) modelling based on all the data to inform and educate us (no p values in my work, just estimation!).

    Thus to answer your question:

    “If randomized clinical trials were never done at all, would we be worse off than we are now, in terms of medical outcomes?”

    Absolutely! As an example, the increasingly common approach of using Real World Evidence (RWE) is plagued by limitations due to lack of randomisation and selection bias in the data (e.g. only the sickest patients get the new drug or the higher doses). Thus sound inference based on such data is painfully limited. In contrast, RCTs yield sound (unbiased) evidence.

    • One question that I don’t find adequately discussed here is whether the people who are willing to participate in RCT’s are representative of the population for which the treatment might be useful. Two possible extremes are individuals afraid of possible side effects of experimental treatments for psychiatric conditions, and individuals who are are terminally ill (or close to it) and are willing to try anything. In both cases failures in extreme cases may mask possible success in less extreme cases.
      Of course, something similar may take place in long term observational studies, in which the individuals who continue to participate may be quite different from those who drop out.
      The only way to deal with these issues is to rely on evidence outside the studies themselves.

      • Hi David,

        Not sure if your comment was specifically directed to me, but here is my opinion on this matter anyway!

        RCTs have rightly be questioned over their inclusion/exclusion criteria. I think there is a balancing act of two opposing positions:

        1) Any patient who could reasonably be expected to be prescribed the drug post-approval should be studied pre-approval.
        2) During the drug development process, very little data has been generated to be confident that every patient in category 1) can ethically be given what is a new, untested, experiment drug regimen.

        For example, an 80 year old with multiple comorbidities taking multiple comedications. Whilst this patient might satisfy 1), ethics boards/pharma are rightly concerned that giving the new drug to this patient when so little data has been generated beforehand may be highly questionable.

        A solution that is worth mentioning (and one I support) is the more general idea of adaptive licensing (see Hans Eichler’s work)…here we might start drug trials with more restrictive inclusion/exclusion criteria, but as safety data in accrued, this is relaxed. This would tick both boxes, but does need pharma/regulators to be more flexible than we currently are.

        Finally, indeed there is an assumption that anyone in 1) is ultimately reflected in the trials. However the efficacy/safety we generate in nearly always summarised at the population level (i.e. mean drug v mean placebo), and hence we have no guarantees of individual patient outcomes. Thus given we have done a good job with including patients in 1), we can only conclude that the drug regimen may be worth investigating for any specific patient.

        cheers

        Al

    • Descriptions are not explanations. RCTs are just descriptions. All RCTs can do is show we are not dealing with snakeoil. That’s it. RCTs are completely overrated, overstated and not any better than other methods. Do a search of “Misunderstanding Randomized Clinical Trials”… You will see the article from thoughtful scientists. What we need is a near complete understanding of biochemical cellular operations We barely understand cells’ workings- just fragmentary. What we need are deep explanations as genius physicist David Deustch notes , NOT more RCTs. RCTs have misled doctors in the past and led to suboptimal treatments and outcomes-had to do with a nasal vaccine spray. RCTS are still weak evidence and statistical descriptions- NOT explanations. A million variables are ALWAYS present in studies on human beings . And bias does not really matter Lots of great physicists like Einstein and Bohr were very biased… RCTs sacrifice bias ( an dare overrated even for reducing bias) for accuracy. RCTs do NOT deserve their status – they are not anything that special. RCTs are still weak evidence in lieu of deep explanation of arterial cells’ workings! We need NEWER better methods, more creative and deeply explanatory!

      Take care,
      Michael Wenington M.D.

  10. Based on this thought experiment I would guess that you are a libertarian. Am I correct? Haven’t done a RCT study on this…but my anecdotal data suggests it.

    Sounds an awful lot like: The harm transfer payments make to the community is far worse than the benefit those transfer payments provide.

    • Michael:

      Huh? Randomized clinical trials are nothing like transfer payments. They’re more like investments in common goods. If you want to draw a budgetary analogy, it would be to spending on roads or canals. The question in my post is not, Should there be zero investment in medical research? My question is, Are randomized clinical trials a good investment?

      • I noticed a lack of an answer to the question.

        I’m not making an analogy about the budget. I’m making an analogy about the sort of logic one might use to come to your conclusion.

        RCTs, done well, and analyzed properly are absolutely critical to move research forward. If done poorly, analyzed poorly, and not reported on properly it’s like anything else you might invest in: in need of improvement. It sounds like you’re inclined to throw the baby out with the bath water.

        RCTs are a common framework that is generally agreed upon. It can however be abused by those with agendas. Leaving the framework and rigor up to “the market” is just as bad of an idea as any libertarian might make about their awful reading of the invisible hand. To me, there are other alternatives to using RCTs than “let anecdotal evidence become king” or “nothing”. It’s a false choice. How about “let’s work together to make RCTs more robust and actually try to get booked on television programs to inform the public…”.

        Just like the price of freedom is eternal vigilance, so to is the price of creating mathematical tools to analyze large amounts of data with many factors, unknowns, and random chance. Especially when presenting the findings to THIS public, which is uniquely bad at interpreting data.

        • Michael:

          You write, “RCTs, done well, and analyzed properly are absolutely critical to move research forward.” That’s an assertion. I don’t think it’s true. I think lots of research can and has moved forward without RCTs.

          This has nothing to do with “the market.” RCTs can be done by governments or corporations or individuals.

          I agree with you completely that “there are other alternatives to using RCTs than ‘let anecdotal evidence become king’ or ‘nothing.'” See my P.S. above.

  11. I disagree that randomization and statistical significance “go together” in the sense that randomization inevitably ends with statistical significance. Randomization has benefits that go beyond distributional assumptions of test statistics.

    (Perfect) Randomization says that we don’t have to adjust for things to get causal effects. Randomization can be used to eliminate unmeasured confounding. Without randomization, we have to have the type of subject matter knowledge (which can never be guaranteed fully) that allows us to rule out unmeasured confounding. Randomization allows us to start from a place of ignorance.

    • “(Perfect) Randomization says that we don’t have to adjust for things to get causal effects. Randomization can be used to eliminate unmeasured confounding.”

      I think this is a critical point – and the “perfect” qualifier is necessary. In the real world, I don’t find the difference between RCTs and observational studies quite as stark as it is being portrayed in this post. Given the reality of significant costs associated with RCTs, the sample sizes are likely to be quite limited. This means that confounders cannot be ignored, just as they must not be ignored in observational studies. So, I see it more as a continuum – RCTs have to potential to get clearer results from a limited sample size than an observational study, but both have to deal with potential confounding influences. The choice between a RCT with sample size n, and an observational study with sample size N, then becomes more of an empirical question and not necessarily a matter of principle.

      • Points worth thinking about/taking into consideration, but I don’t think they give the whole story. One thing that comes to mind is that randomization and statistical model for analysis need to be compatible.

    • “If they are *not* the best way, that could be quite problematic.”

      Even if they’re not the best method now, they were so for several generations.

      I’m not sure what Andrew is on about. We can question whether they may have declining utility. But I don’t think you have to look past the human life expectancy gains in the last 50 years to see that RCTs have delivered massive benefits.

      • I don’t know enough to have a meaningful opinion on whether RCTs are or are not the best way.

        But…

        >>But I don’t think you have to look past the human life expectancy gains in the last 50 years to see that RCTs have delivered massive benefits.

        That assumes all the medical benefits of the last 50 years are directly attributable to RCTs, and wouldn’t have been achieved if other kinds of studies were favored. Isn’t that exactly the question being asked in this post? Whether RCTs as a standard approach yield better results overall than a less standardized approach to experimental design and statistical analysis?

        • I think I see your point — but it may need expansion. One thing in particular that comes to mind: Decreasing things like air pollution, lead in paint, and lead in drinking water have almost certainly contributed to improvements in human longevity and in quality of life — and my guess is that RCT’s were not involved in deciding to make these changes.

        • I don’t think penicillin was discovered by RCT, nor were sanitation ditches dug by RCT nor were many dangerous jobs eliminated by RCT. The concept of vaccination was invented by Pasteur before RCTs. I’d guess that almost all of the improvement in lifespan between 1900 and 1950 or 1960 came without RCTs. Certainly RCTs helped figure many things out, but not everything.

        • Some of that early stuff, like penicillin, maybe didn’t *need* RCTs because the effect was so huge that it was immediately obvious, way beyond placebo!

      • But I don’t think you have to look past the human life expectancy gains in the last 50 years to see that RCTs have delivered massive benefits.

        Or have countries like the US just been preferentially aborting many of the babies that would be born with diseases or into poverty for 50 years? If you dont drop abortion deaths when calculating life expectancy has it changed since the 1960s?

    • The FDA is an active obstacle to saving lives: https://www.baromedical.com/post/the-fda-s-unflattering-view-of-hyperbaric-medicine

      They should go back to making sure the ingredient list is accurate. They themselves have admitted they cannot do the job of properly assessing efficacy and safety of treatments.

      The FDA cannot adequately monitor development of food and medical products because it is unable to keep up with scientific advances (Finding 3.1.2).

      […]

      The FDA cannot fulfill its surveillance mission because of inadequate staff and IT resources to implement cutting-edge approaches to modeling, risk assessment and data analysis (Finding 3.1.3). The FDA lacks a coherent scientific structure and vision as a result of weak organizational infrastructure (Finding 3.1.4).

      […]

      The Subcommittee found that despite the significant increase in workload during the past two decades, in 2007 the number of
      appropriated personnel remained essentially the same — resulting in major gaps of scientific expertise in key areas4 . More importantly,
      despite the critical need for a highly trained workforce to fulfill its mission, the FDA faces substantial recruitment and retention challenges. The turnover rate in FDA science staff in key scientific areas is twice that of other government agencies, GAO-02-958 PDUFA User Fees (Finding 3.2.1). There are insufficient programs of measurement to determine worker performance (Finding 3.2.2). There is insufficient investment in professional development, which means that the workforce does not keep up with scientific advances (Finding 3.2.3). Finally, for various reasons, the FDA does not have sufficiently extensive collaboration with external scientists, thus limiting infusion of new knowledge and missing opportunities to leverage resources (Finding 3.2.4).

      http://www.gmptrainingsystems.com/files/u1/pdf/FDA_Mission_Risk.pdf

  12. This is a tricky question.

    Your point as I understand it is that the ratio of “apparent value of RCTs” to “true value of RCTs” may be higher than the alternatives. Like RCTs seem to be able to solve all your problems — but they can’t; and so people do less of the more informal (?) thinking that’s necessary to figure out what’s going on.

    Maybe without RCTs people would’ve been forced to think more; but if people reacted this way to RCTs who’s to say they wouldn’t have reacted the same way to whatever alternatives that came up?

    I think the problem, whatever it may be, is a bit deeper — the “apparent value” of many things associated with “science” has become wildly overrated.

    Like it’s the same phenomenon (on a smaller scale) when people seem to think learning the word “dopamine” greatly increases their understanding of human behavior and the human mind.
    [I do not mean to attack “neuroscience” in general — I am saying nothing about that. Just that a lot of people wildly overrate the apparent value of learning that word/concept — like people in the past surely had words that were used very similarly to how dopamine is used in popular discourse today; I am also aware of effective “dopaminergic” anti-depressants.]

    • +1 to the sigh about people substituting “dopamine” for phenomena at a different level of description (such as “interest” or “craving”).

      I think the way RCTs tend to focus inference on the average treatment effect could be more problematic than the use of significance, although the two are probably related. There’s something about the way randomization enables estimation of an average effect despite unknown confounders that can entice one to believe that those confounders are of no scientific interest. That way the strength of the RCT design may lead to disregarding questions about heterogenity of treatment effects.

      • Yes, I agree (as I would guess a lot of people here do) that focusing on average effects is often problematic. There is a place for anecdotal evidence in the designing of good clinical trials — e.g., if there is noticeable anecdotal evidence that people with characteristic X to not do as well under a certain treatment as people without characteristic X, then that needs to be studied more — perhaps with an RCT of the treatment that compares people with characteristic X and people without characteristic X.

  13. Andrew:
    We (women) might still be taking that form of hormone replacement therapy (HRT) that doctors had been foisting on us until the 2002 (I think) RCTs (which doctors said would be immoral to run since they all knew how beneficial HRT was against a slew of diseases in post-menopausal women). You could say, eventually we would have found out*–maybe– but I thought the primary role of intelligently designed experiment (it needn’t be a strict RCT) was to speed up what might have been discovered by just happening to come across things. Causal inference w/o RCTs succeed where they manage to mimic the counterfactual knowledge that proper RCTs can provide.
    More than ever, today, researchers are looking to controlled trials to distinguish genuine benefits, in relation to Covid-19, from those easily explained by background variability. The fact that some, e.g., in social psych, see the supposed randomized assignment of a “treatment”, with questionable measurements and multiple testing, as legitimating P-values, only points to abuse of the method. We should stop blaming methods for abusive uses of them. On the other hand, it would be informative if the new anti-statistical significance/anti test movement admitted their skepticism of controlled statistical trials. It would help open people’s eyes to see just what is really at stake in jumping on the bandwagon.

    As I’ve said before, it still seems to me to be a significant change from when you said in your 2012 paper with Shalizi “What we are advocating, then, is what Cox and Hinkley (1974) call ‘pure significance testing’, in which certain of the model’s implications are compared directly to the data”(p. 20), and were keen to falsify statistically. Statistical falsification requires a threshold.

    *What we see often are people taking what was learned from a randomized controlled trial and then looking back through observational studies to find the same thing. But they wouldn’t have looked for it without the controlled trial.

    • Deborah:

      As I wrote in response to an earlier comment, I link to a discussion of the Women’s Health Initiative in the above post! I have no doubt that particular randomized clinical trials have improved people’s lives. On the other hand, there’s a cost if people make wrong decisions from RCTs or delay good decisions because they’re waiting for RCTs. My question is not, Have RCTs ever been beneficial? but rather, Are RCTs beneficial in net?

      • I dont really see why this is being held up as a success. It is just one in a series of conflicting results attributed to differences in the treatment timing/formulation/etc or population.

        After 10 years of randomised treatment, women receiving hormone replacement therapy early after menopause had a significantly reduced risk of mortality, heart failure, or myocardial infarction, without any apparent increase in risk of cancer, venous thromboembolism, or stroke.

        https://www.bmj.com/content/345/bmj.e6409

        Unfortunately, the surge in HRT use and its consolidation was abruptly stopped by the publication of the WHI trial, which was inadequately designed, evaluated, and reported. The damage done was huge, basically leaving many symptomatic women without an effective treatment, even if the epidemiological data were not strong enough to document a clear harm to women’s health. Although most of the evidence obtained was only with oral conjugated estrogen with or without medroxyprogesterone acetate, further studies and analyses have consolidated the view that HRT is highly beneficial when given to symptomatic women within 10 years since the onset of menopause or to symptomatic women that are under 60 years of age. However, the damage remains, and low HRT use, which is unjustified, continues to occur throughout the world.

        https://pubmed.ncbi.nlm.nih.gov/31540401/

    • “On the other hand, it would be informative if the new anti-statistical significance/anti test movement admitted their skepticism of controlled statistical trials. It would help open people’s eyes to see just what is really at stake in jumping on the bandwagon.”

      Bingo!

  14. >Would we be better off if randomized clinical trials had never been born?

    I don’t think there is a way to answer this.

    >I think it’s possible that the answer to this question is No, that the damage done by statistical-significance thinking is greater than the benefits of controlled studies.

    It seems to me that you are saying that our great reliance on RCT’s has, at least partially, been responsible for encouraging “damage done by statistical-significance thinking”, and that perhaps this damage is greater than the benefit gained by RCTs. Ok, but in the counterfactual where there were no RCTs, do you not think that some other sort or generally erroneous thinking would have been spawned? Not “statistical-significance”, but something else equally damaging? I’ve pretty much subscribed to the kind of analysis that you promote on this blog (I use Bayesian multi-level models every day!), but of course that doesn’t mean you can’t make big mistakes with those kinds of methods either.

    I just think the “magic wand” is more a product of human thought/nature than a result of any particular method (though, some methods do seem more prone to it!), and it might certainly have appeared in some other form in a counterfactual world without RCT’s.

  15. We’d be screwed without RCTs. Most of what we have today wouldn’t exist.

    Andrew if you think RCTs are bad, you should go back and see what was before them. It took generations just to get doctors to wash their freakin’ hands.

    The reason RCTs look bad to you now is that we’re looking for ever smaller effects from ever more narrowly targeted treatments, meaning the signal:noise ratio is declining fast. NHST is poorly suited to these conditions and is thus yielding unpredictable results. IOW, the low-hanging fruit, which required only the simple NHST tool to pick, is gone. Now we need more sophisticated techniques – or more sophisticated experimental design – to can amp up the signal or tamp down the noise.

      • I agree. I think that things like arrogance, established beliefs, and “that’s the way we’ve always done it” played a big part in the problem of getting doctors to was their hands. (Nowadays the idea of going directly from dissecting a corpse to attending at childbirth seems abhorrent, but it didn’t back then. Frankly, I’m not sure just what promoted the change, but I’m guessing that growing peer pressure was part of it. My grandfather’s uncle was a physician, and was a strong proponent of physicians’ washing their hands before attending at childbirth — because his mother had died from childbed fever.)

  16. I don’t buy the theory of tight bundling between RCTs and statistical significance. A world without RCTs would just be development economics pre-2000: statistical significance without the added benefit of randomization. Actually, there’d probably also be more regression discontinuity designs and instrumental variables etc, which have their own set of problems. I think RCTs have been useful in development economics. At the very least they’ve shown that microfinance is meh, but conditional cash transfers ain’t so bad. Given how much money that goes into those things, that’s useful, no?

  17. For fun: versions of the times make the person or the person makes the times. The chains are too complicated, like figuring out whether the South would have won if Stephen Douglas were President (or if there would now be statues of him all over). Easy to get bogged down in arguments over specifics.

    Maybe the point is a variation on the word ‘tapestry’, that the efforts are necessary parts of the motion forward of the wave of progressive understandings. Like we reach for something inaccessible down multiple, related pathways, and those efforts as an abstract whole generate an understanding which becomes manifest in real results. Like counting ordinals.

    Better analogies then might be from physics where an understanding builds which allows someone to develop an experiment to test an idea which previously was inaccessible. Inaccessible to the mind is inaccessible in experiment, in descriptive form, in provable theory and theorems.

    Of course this same tapestry also generates such nonsense as ESP research; bits of noise encourage others to look for bits of noise they collectively think stitches together into more than bits of noise, without understanding that form of noise repeatedly found indicates the system accepts noise as a result, which is obviously true since, by definition of power relationships, there are infinitely more failed fits than fits.

    Actually, just for fun, can you imagine a world in which fits were easy? We’d all be more similar in shape. Our reactions would need to be more narrowly constrained into order classes of reducing, not increasing complexity. Can you imagine if every analysis began with fits and we had to look for what doesnt fit? Like every pair of pants fits, so which ones dont? Every model fits, so which ones dont?

    Can you imagine if everyone were equally attractive to everyone else, but we still preserve conceptions of love between individuals? So, life would become how we arent the same because that way we’d be able to identify different layerings of similarity to match beyond the simple. The more similar, the more we’d need to identify dissimilarity at higher orders, like if we were all exactly the same but with differences that become apparent only through group of group tendencies that attach to our similar selves. The group whose fingernails tend to grow at a slightly higher rate. The group that eats the daily same-same food slightly slower or earlier. Or which farts more. I love you for the way your fart tendencies plus your natural response delay matches mine, she says. Dr. Seuss called them Star-bellied Sneetches (and Sneetches with no stars upon thars).

    It’s a version of Flatland, right? (Which abstracts the conception of right turn to correct turn, so right is now a step in a direction we label right though we’re actually blind to the direction, which means right could be wrong, which is exactly the problem when you have a system that accepts fits where sign may be hidden because you’ve constrained out higher order operations so they become inaccessible.)

    • “Better analogies then might be from physics where an understanding builds which allows someone to develop an experiment to test an idea which previously was inaccessible. Inaccessible to the mind is inaccessible in experiment, in descriptive form, in provable theory and theorems.”

      Yes — and I think the idea of “inaccessible to the mind is inaccessible in experiment” — fits the example above of childbirth fever. As long as people believed in things like “miasmas”, their minds were not open to the idea of bacterial infection.

  18. Nobody is mentioning the sample size. Many RCTs are low in sample size and high in experimental control. We could do without them, but the sample size requirements would skyrocket, making a study impossible. Not to mention that the alternative explanations would be plentiful (which is the case even in well-control studies).

    Also, in medicine many phenomena cannot be ethically studied by implementing RCTs, making quasi-experiments of various levels very popular.

    I just ignore the binary NHST part of the RCTs output and focus on the descriptives and the effect size(s) instead.

  19. Andrew,

    Like others, I am a bit confused by the coupling of statistical significance and randomized control trials embedded in the possible thought experiment. That seems to be the least compelling reason for the goal of randomized control trials. To extend Deborah Mayo’s argument above, doctors and clinicians publishing qualitative work would likely not have concerned themselves with the allocation of treatment is a rigorous way at all.

    Historically, the rationale for randomization was not statistical significance, but fairness. Writers, clinicians, scholars, and philosophers had been concerned with achieving like for like or comparable treatment groups since antiquity, though the explicit justifications begin to appear in the 19th century. Early “randomized” trials were made via alternative selection (Balfour’s study of belladonna and scarlet fever) in an effort to control selection bias, an effort that we would look at rather dubiously today because of our knowledge of the true requirements of RCTs. Nonetheless, the idea here is clearly present. The first medical trial that I am aware of to successfully use randomization as an allocation scheme in the MRC streptomycin trial. The goal of randomization, as stated by Hill even earlier was to conceal the allocation schedule from those entering. Later, that same principle was expanded to blind experimenters. As David Cox has noted, the most important function of randomization is to help blind treatment assignment and thus avoid selection bias in the creation of comparable groups. That idea does not exist prior to or without randomization, or if it is discovered, occurs substantially later. Consequently, there is a prima facie argument RCTs are always net beneficial relative to a world in which they did not exist.

    Jamison’s “The Entry of Randomized Assignment into the Social Sciences” has a nice overview of the history of randomized assignment that pins the creation of it as a holistic construction to the 1920s. and Chapter 2 of Imbens and Rubin’s text makes me skeptical that Rubin would have made his immense contributions absent a world in which randomization had been discovered and conceptualized.

  20. Perhaps your goal isn’t to engage such readers, but I suspect that most readers, myself included, don’t have much of a feeling of what “wrestl[ing] directly with the issues of variation, uncertainty, and comparability of treatment and control groups” looks like in detail and whether it would be any better. Why would these approaches lead to less uncertainty-hacking, comparison-hacking, etc. (a la p-hacking)?

    • Country:

      This is just a blog post! There’s no room for me to explain all those things each time I blog. I recommend you read our new book, Regression and Other Stories, which covers these topics in detail with many examples. I want to engage all readers, but I can’t start from scratch on each post, or I’d be writing page 1 of my textbook over and over again, 400 times a year, and never getting to anything new.

  21. I haven’t read the whole discussion but anyway… the argument that is behind the idea that maybe we are not better off than we’d be without RCTs seems to go like this: RCTs are maybe a good idea but they have their problems. The fact that they are a good idea and have been promoted for that reason makes some people use them uncritically as a panacea, and this can do more harm than good.

    But well… I don’t really see how this kind of thing can be avoided for any good but not totally simplistic idea. And I also tend to think that if one panacea is taken away from people (or not even given to them in the first place) they will look for another, eventually find something, and then do their black magic (i.e., pervasive uncritical misuse) on that one. What would the world be if there were no significance tests and only Bayesian analysis? I tell you, misuse of Bayesian statistics would be all over the place, and some people would invent significance testing in order to deal with it. And we’d be happy with that, up to the time point when significance tests become the governing paradigm again and the new leader in misuse.

    I think effort is better spent making as many people understand whatever approach and method as well as possible, rather than trying to solve problems by replacing one much misused approach with another one that hasn’t yet been popular enough to be misused to the same extent.

    • “I think effort is better spent making as many people understand whatever approach and method as well as possible, rather than trying to solve problems by replacing one much misused approach with another one that hasn’t yet been popular enough to be misused to the same extent.”

      +1

    • Christian:

      Sure, but it’s a matter of emphasis. Deciding to rely on RCTs is a choice that has real costs, including:
      – Setting aside information from other sources.
      – Opportunity cost from waiting too long to make a decision.
      – Smaller sample sizes because RCTs can be unwieldy and expensive.
      – Poor data analysis because people think that with RCTs you don’t need good data analysis, also a general recourse to “conservatism” in design and analysis choices.

      Here’s an analogy. Suppose you’re concerned with the (real) risks of exposure to cosmic rays at high altitudes, so you decide to require that airplanes all have thick lead coatings. This will have costs! The lead will protect you from cosmic rays, but you’ll be paying for it in fuel costs and increased pollution.

      • Except that it’s not RCT’s fault that information from other sources is ignored, people think they shouldn’t look at anything else before they have an RCT, and people do poor data analysis because they find an excuse (would they be able to do better if they didn’t find that excuse??).

        Your analogy is good though. I’d totally be with you opposing the idea that *everything* should be done by RCTs (as in “all airplanes”), and I have elsewhere argued against the “only evidence from RCTs should count”-people. This whole silver bullet kind of thinking is what we should be fighting.

      • > Smaller sample sizes because RCTs can be unwieldy and expensive.

        Are (non-R)CT less expensive? The actual cost of the randomization itself is probably minimal compared to the cost of the clinical procedures and the cost of designing, running and analyzing the trial. (Of course if the alternative is to not run a trial at all, that would be much cheaper.)

  22. I kind of think that there might be a problem with regards to the RCTs as requirement for drug approval vs. movement toward personalized medicine. If a treatment is inherently personalized, average effects will be totally uninformative. IMO, personalized medicine is what we will really need to solve things like cancer and maybe autoimmune diseases. But I don’t see how we can get there within the current regulatory structure.

    And once we get a good enough understanding of the underlying biology we shouldn’t need to do massive human trials to find out if something works.

    This may be testable very soon! For example, if nearly all the COVID vaccine candidates turn out to work, IMO that would be evidence that the understanding of vaccine development is good enough that (in hindsight) maybe we should have rolled out the vaccine sooner without waiting for phase III trials.

    If only one or two (or, God forbid! – none) work, that would show the opposite.

    • Based on the info we have about SARS, the vacines will work to induce a temporary immunity, but only well and safely in young healthy people who were at little risk anyway.

      Not sure why that isn’t the default expectation.

      • Immunity might well wane, but it’s probably not *that* temporary.

        If immunity was *really* temporary (say sub-1 year) we should see a *lot* of reinfections, not just one or two semi-anecdotal stories. People’s immune systems vary… so if the average duration of immunity was say 6 or 8 months we would expect there to be a pretty noticeable fraction below 4 months. And the March peak in infections is now 4 months past. Yet we now see infections increasing in precisely those states that were mostly spared in March-April, *not* a lot of reinfections in NYC/Massachusetts/Chicago/Detroit.

        *as back-extrapolated from deaths peaking in the 2nd week of April; not the peak of “confirmed cases” which is distorted by bad testing in March.

        And is SARS-1 immunity even temporary? T-cell immunity is supposed to be still persisting now, 17 years later… antibodies do not tell the whole story.

        The vaccine people know what they are doing, IMO. Biology has advanced a lot in 17 years; if SARS hadn’t disappeared in 2004, I am sure we would have a pretty good vaccine for it.

        • Yes, there is probably a modified covid the second, etc time around. It may usually be that you are asymptomatic but can still spread the virus until the t-cells kill the infected cells.

        • Well that would be not that good in the short term (as the unexposed people in older/more vulnerable groups could still be infected), but actually fine in the long term (once most people are either vaccinated or exposed)… if you’re only “at risk” once (later infections are asymptomatic or like a cold) then it would stop being a big issue as soon as people are vaccinated or (if vaccines aren’t developed quickly) infection spreads far enough that there aren’t that many unexposed people.

        • Yes, the problem is only for the vulnerable population. The people who have a mild illness and weak immune response are fine either way.

        • Dengue is a very different case – there are multiple circulating strains much more different than what we see in COVID mutations (or are likely to any time soon). Antibody-dependent enhancement is *definitely* a big issue there … but for reasons that are not at all applicable to COVID.

        • THe thing with dengue as I understand it is that antibodies to strain X are only partially relevant to strain Y, so they cause ADE instead of being protective.

          As there are not multiple strains of SARS-COV circulating in the human population (the original SARS-COV-1 is gone) it’s not relevant in this case.

          That doesn’t mean that ADE couldn’t happen for other reasons – but it does mean that dengue is irrelevant.

          >>A new “SARS” comes out every few years…

          What do you mean? SARS-COV-1 and SARS-COV-2 are the only ones known to infect humans, and the first is gone. Sure there are tons of other coronaviruses out there, but they’re either from different groups (like MERS and “common cold” coronaviruses) or animal-only.

          Coronaviruses don’t mutate nearly as fast or in the same way as influenza viruses.

        • THe thing with dengue as I understand it is that antibodies to strain X are only partially relevant to strain Y, so they cause ADE instead of being protective.

          You have to think about the mechanism. ADE happens when there are non-neutralizing antibodies. Ie, the antibodies bind to the virus but are not effective in preventing infection. Ie, when they are of low affinity or low quantity. That is why it is seen for Dengue in the case of a similar strain after antibodies wane, because there are low numbers of low affinity antibodies. Same for SARS, in vitro and in vivo:

          To evaluate the efficacy of existing vaccines against infection with SHC014-MA15, we vaccinated aged mice with double-inactivated whole SARS-CoV (DIV). Previous work showed that DIV could neutralize and protect young mice from challenge with a homologous virus14; however, the vaccine failed to protect aged animals in which augmented immune pathology was also observed, indicating the possibility of the animals being harmed because of the vaccination15. Here we found that DIV did not provide protection from challenge with SHC014-MA15 with regards to weight loss or viral titer (Supplementary Fig. 5a,b). Consistent with a previous report with other heterologous group 2b CoVs15, serum from DIV-vaccinated, aged mice also failed to neutralize SHC014-MA15 (Supplementary Fig. 5c). Notably, DIV vaccination resulted in robust immune pathology (Supplementary Table 4) and eosinophilia (Supplementary Fig. 5d–f). Together, these results confirm that the DIV vaccine would not be protective against infection with SHC014 and could possibly augment disease in the aged vaccinated group.

          https://www.nature.com/articles/nm.3985

          We found that higher concentrations of anti-sera against SARS-CoV neutralized SARS-CoV infection, while highly diluted anti-sera significantly increased SARS-CoV infection and induced higher levels of apoptosis. Results from infectivity assays indicate that SARS-CoV ADE is primarily mediated by diluted antibodies against envelope spike proteins rather than nucleocapsid proteins. We also generated monoclonal antibodies against SARS-CoV spike proteins and observed that most of them promoted SARS-CoV infection. Combined, our results suggest that antibodies against SARS-CoV spike proteins may trigger ADE effects. The data raise new questions regarding a potential SARS-CoV vaccine, while shedding light on mechanisms involved in SARS pathogenesis.

          https://www.sciencedirect.com/science/article/pii/S0006291X14013321

          So the risk factors for ADE are:

          1) Old/obese/diabetic people/animals with poor immune response (lower affinity/quantity)
          2) Exposure to a similar strain with slightly different epitope (lower affinity)
          3) After the antibodies wane, but not too much (lower quantity)

          No study of this for covid has been published. It is mid-July now.

        • Maybe. Not sure it is quite as simple as that. Antibody-dependent enhancement of viral entry into cells may not necessarily mean antibody-dependent enhancement of severity of disease for example.

          Sure, ADE for COVID could exist because of low quantity of antibodies. But the dengue issue of low affinity because it is a different strain won’t. So the case for dengue as a model is still very poor.

          IE – the only reason to think COVID is more likely to show ADE than any other random respiratory virus (most of which don’t) is the comparison to SARS-1. Or is there another reason?

        • IE – the only reason to think COVID is more likely to show ADE than any other random respiratory virus (most of which don’t) is the comparison to SARS-1. Or is there another reason?

          That is a pretty good reason to repeat the same studies where ADE was seen for SARS with the SARS2 virus.

      • But anyway — whether they work or not, and how well, is not actually the point of the comment.

        My point is that the *distribution* of effectiveness can tell us something about the practical “state of the science”.

        If ultimately most or all the efforts with serious resources succeed, then we probably know more than we realized we did*, and vaccine development is probably overcautious (and we could have saved a lot of lives by going straight to distribution after Phase II).

        If a few (or none!) succeed and most or all fail, then it might be useful to see if that could have been predicted earlier in the process, but probably that means it’s genuinely a hard problem with current science/tech and the level of caution is appropriate.

        If many “succeed” but not ultimately in a very useful way, as you suggest, same thing, it’s genuinely a hard problem with current science/tech and the level of caution is appropriate.

        Which of those options we think is most likely is an entirely different question.

        *In the sense that the regulatory scheme/development stages assume a greater uncertainty than now exists, with the advancement of science since the scheme was developed. I mean, if we understood the immune system as well as we understand (say) orbital mechanics, there would be no justification for trials at all – we could go straight from the lab to mass distribution.

        Obviously that’s not happening any time soon. But in the course of a pandemic, when each week’s delay means thousands of deaths, I think we have to ask how plausible the risk of a “really bad” (IE – more deaths/sickness caused than prevented) vaccine actually is.

        • I think we have to ask how plausible the risk of a “really bad” (IE – more deaths/sickness caused than prevented) vaccine actually is.

          Based on what was seen for SARS, it is likely that the vaccine will enhance the illness when elderly, obese, etc are later challenged with the virus. Could be that the younger people who got mild illness during the first wave suffer similar if challenged when antibodies have waned sufficiently (but are still present).

          Why is it now late July and no one has repeated those same studies with the SARS2 virus? I couldn’t tell you.

        • Could be that the younger people who got mild illness during the first wave suffer similar if challenged when antibodies have waned sufficiently (but are still present).

          And if this is the case, we want them to be getting regular “booster” exposures to maintain the immunity. Ie, not hiding in their house.

          Lots of room for the cure to be worse than the disease here, which we already saw once with the ventilators.

        • >>Could be that the younger people who got mild illness during the first wave suffer similar if challenged when antibodies have waned sufficiently (but are still present).

          But then why aren’t we seeing it now? There were *millions* of asymptomatic/mild infections in March, and this is late July. If antibodies wane relatively quickly, we ought to be seeing ADE happening “naturally” if it is more than a vanishingly-rare occurrence.

          Again it’s not like everybody’s antibodies are going to wane on exactly the same time scale; there will be a bell curve, and even say 5% of millions of people is still a lot. Some studies (at least preprints) are suggesting antibodies are waning on the scale of months* – so there should be lots of people in this risk group.

          It seems more parsimonious to me that the vaccine people basically know what they are doing and ADE is probably not going to be relevant for this disease. (The disease itself is not the same as SARS-1, even though the virus is closely related.)

        • But then why aren’t we seeing it now? There were *millions* of asymptomatic/mild infections in March, and this is late July. If antibodies wane relatively quickly, we ought to be seeing ADE happening “naturally” if it is more than a vanishingly-rare occurrence.

          How do you know it isnt when we don’t know who already had it months ago?

          It seems more parsimonious to me that the vaccine people basically know what they are doing and ADE is probably not going to be relevant for this disease.

          If they knew what they were doing they would repeat the same studies that showed a problem for SARS before devoting so much time and money to a vaccine, just for them to possibly fail for that reason later. So either some incentive is off, or they dont know what they are doing.

        • Sure the majority of people who had COVID in the US in March/April never tested positive. But there are so many that *do* know that I still think we ought to be seeing it if it was likely.

          >> So either some incentive is off, or they dont know what they are doing.

          But those aren’t the only possibilities. Perhaps they know more than either of us about what happened with SARS, and it turns out not to be relevant to this, for example.

          From what I’ve heard, the signs seem to be pretty promising re: ADE not really being a thing with COVID.

          (For that matter, convalescent plasma treatments are being used: if vaccines are going to cause ADE problems, why wouldn’t those?)

        • The vast majority of people diagnosed were not mild/asymptomatic. They were pretty ill.

          If they have top secret info about ADE but are keeping it from the public, it is more likely it shows the presence of ADE. Hiding that lets you keep getting gov funding. Why hide evidence your product is safer than expected?

          Yes, there is little sign of ADE in cases with robust antibody response. That is not when its likely. The is “no evidence” for ADE when it is likely, because no one has published anything on it at all.

        • I am not talking about “top secret info”, just a better understanding of the immunology. IE – the hypothesis is not that they have data that we don’t (it would all be published in the open literature), but that they have better subject matter knowledge and so can come to conclusions from that data that you or I couldn’t.

          I have seen arguments that in-vitro measurements of antibody-dependent enhancement of *viral entry into cells* may not be predictive of in-vivo antibody-dependent enhancement of *disease*, for example. So the SARS-1 stuff might not actually be that relevant. But I don’t have the subject matter knowledge to evaluate this.

        • I am not talking about “top secret info”, just a better understanding of the immunology. IE – the hypothesis is not that they have data that we don’t (it would all be published in the open literature), but that they have better subject matter knowledge and so can come to conclusions from that data that you or I couldn’t.

          I’ve never seen anyone propose any theory like this. They just keep saying “no evidence”, but only looking where we don’t expect to find evidence.

          I have seen arguments that in-vitro measurements of antibody-dependent enhancement of *viral entry into cells* may not be predictive of in-vivo antibody-dependent enhancement of *disease*, for example. So the SARS-1 stuff might not actually be that relevant. But I don’t have the subject matter knowledge to evaluate this.

          Might not. Does that mean don’t even run the simple cell culture studies where it was seen for SARS to check? Funding is not in short supply.

        • >>I’ve never seen anyone propose any theory like this. They just keep saying “no evidence”, but only looking where we don’t expect to find evidence.

          What I’m suggesting is that they may have a better understanding than you or I of where we “expect” to find evidence…

          >>Does that mean don’t even run the simple cell culture studies where it was seen for SARS to check?

          What’s the point at this stage? We’re doing human trials of vaccine now, I think some are already to Phase III. (And China is reportedly already vaccinating its military and businesspeople working outside China!)

          If they work, they work, and in vitro is irrelevant. If they don’t work, we’ll know soon.

          If we delay development for say 2-3 months because of a scary-looking in vitro result that turns out not to be relevant in actual humans, that’s an *enormous* cost in lives, suffering, etc.

          >>Funding is not in short supply.

          The limit is not money but time.

        • What I’m suggesting is that they may have a better understanding than you or I of where we “expect” to find evidence…

          But this reasoning hasnt been published anywhere, ie shared with the public. So then its “top secret”.

          The most likely place to expect it is exactly where it was seen for SARS and other viruses. There is no top secret info. And testing diluted antibodies in cell culture and doing vaccine + challenge in aged/obese/diabetic mice would not delay human trials.

        • >>But this reasoning hasnt been published anywhere, ie shared with the public. So then its “top secret”.

          Are you sure of that? I haven’t read every publicly-available document about COVID vaccine candidates/development – have you?

          I don’t really have the time/interest to dig through all of it (in fact, I’m not sure that I’m interested in debating this much further). I just think it’s far more parsimonious to assume dozens of vaccine projects with different methods in different nations aren’t all making exactly the same mistake – which is obvious to people outside the field.

          >>And testing diluted antibodies in cell culture and doing vaccine + challenge in aged/obese/diabetic mice would not delay human trials.

          It absolutely could, if the safety results look bad.

          The relevant question is whether the risk/harm of “missing a real safety signal by not doing that” greater than the risk/harm of “letting the pandemic go on longer by delaying vaccine development due to a false safety signal”.

          I am really skeptical of the usefulness of cell culture studies here as the immune system is really complex with multiple types of cells, antibodies, etc. Antibody-dependent enhancement of one part of the infection process is not necessarily incompatible with an overall protective effect, if other parts of the process are impeded sufficiently.

        • Are you sure of that? I haven’t read every publicly-available document about COVID vaccine candidates/development – have you?

          I’ve been following this very closely, so I am confident I would have seen it. While I haven’t read every paper, someone would have referenced it or mentioned it in a paper I read by now. Instead it is exactly as I said, they say “no evidence for ADE” based on the results from situations where ADE is not expected.

          It absolutely could, if the safety results look bad.

          The relevant question is whether the risk/harm of “missing a real safety signal by not doing that” greater than the risk/harm of “letting the pandemic go on longer by delaying vaccine development due to a false safety signal”.

          So now apparently more information is a bad thing when developing these vaccines. More information is never a bad thing. Everyone knows cell culture and animals are imperfect models of humans already, yet we still do studies using those models that are not ethical to do in humans.

        • >>Everyone knows cell culture and animals are imperfect models of humans already, yet we still do studies using those models that are not ethical to do in humans.

          This pandemic is not an usual situation; time is far more critical than usual.

          (And I think medical development is overcautious even under usual circumstances, since the regulatory structure derives largely from times when biology was far less understood. Something like thalidomide is not really applicable now — and it’s worth noting that thalidomide from morning sickness was not in fact approved in the US. It was not an FDA failure, so using it as justification to increase pre-approval scrutiny is somewhat questionable.)

        • This pandemic is not an usual situation; time is far more critical than usual.

          As I said, you can do both in parallel. There is no problem with time nor funding. Apparently, you think more information about the dangers of the vaccine would make people more careful and that is a bad thing. So we shouldnt collect that information to begin with.

        • Depends what you mean by “careful”.

          If by careful you mean reasoned balance of risk vs. reward, being careful is necessarily a good thing.

          If by careful you mean erring on the side of not giving the treatment/vaccine (as is usual in medical stuff) it may not be.

          >>As I said, you can do both in parallel.

          Scientifically, sure. In terms of regulatory approvals to do stuff (either Phase III trials or actual rollouts)? I think the risk of delays is non-zero. It may not be significant, but it might be.

          >>There is no problem with time nor funding.

          Time absolutely is a problem. Every day sooner a vaccine can get mass distribution will save several hundred lives in the US alone and avoid massive suffering.

          Potential side effects seem a lot less significant on that scale.

        • Well, you are advocating not collecting information that may indicate a medical treatment is dangerous because if collected it may show it is dangerous and delay production of the dangerous medical treatment.

          I’m sure purposefully not collecting info on the danger of a medical treatment will lead to great success.

        • I’m not advocating anything.

          But – my question is whether the sort of in vitro studies you are talking about are /relevant/ – i.e. whether the information is in fact useful.

          That is — if these studies are universally not being done, there is probably a reason for that. Since many different nations with very different political systems (US, UK, India, and China for example) are working on it, political motives are probably not a factor.

          So (for me) the most parsimonious assumption is that there is a good reason they are not being done.

          The most obvious ‘good reason’ is that they would not actually provide useful information (would not be relevant to the actual system) and might delay production of a vaccine.

          There are times when basically everyone in a field makes a fundamental assumption mistake visible from outside the field — but it’s really rare (and usually in much less established/well-based fields than immunology), and it’s not the way to bet.

        • and anyway, whether vaccines are likely to work well was not really my point.

          My point was, *if* all or most of the vaccines which are far along *do* prove to work well, that probably tells us something about the degree of ‘overcaution’ in the development process – ie we understand the theoretical basis more than regulators are confident that we do.

          I’m not saying it *will* happen. The opposite would also tell us something. If we do see serious problems with COVID vaccines, I would be a lot less convinced that our drug development/authorization system is excessively overcautious.

        • But – my question is whether the sort of in vitro studies you are talking about are /relevant/ – i.e. whether the information is in fact useful.

          In vitro refers to cell culture, etc. It was also seen in animal studies, then never checked in humans.

          I’m not saying it *will* happen. The opposite would also tell us something. If we do see serious problems with COVID vaccines, I would be a lot less convinced that our drug development/authorization system is excessively overcautious.

          Where I’m at the news announces a huge storm, every stay home, etc then nothing happens. The next week they are silent and a huge storm shits everything down and causes all sorts of problems. This happens regularly, not just once. Drug development is more like that, where there is no real trade off between false positives and false negatives.

        • >>Drug development is more like that, where there is no real trade off between false positives and false negatives.

          Hmm. I completely disagree, and I would even say that any regulatory structure that is more or less honest (IE – doesn’t approve/deny stuff just because of the political affiliation of the people doing it or personal economic advantage/disadvantage, but is actually *trying* to balance risk and benefit) must *necessarily* have this kind of tradeoff.

          But I don’t know how to demonstrate that.

          Not sure there is any point in pursing this line of argument further, given that.

        • Just look at what people are saying about replication in biomed:

          Fifty-three papers were deemed ‘landmark’ studies (see ‘Reproducibility of research findings’). It was acknowledged from the outset that some of the data might not hold up, because papers were deliberately selected that described something completely new, such as fresh approaches to targeting cancers or alternative clinical uses for existing therapeutics. Nevertheless, scientific findings were confirmed in only 6 (11%) cases.

          […]

          A team at Bayer HealthCare in Germany last year reported4 that only about 25% of published preclinical studies could be validated to the point at which projects could continue.

          […]

          Some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis. More troubling, some of the research has triggered a series of clinical studies — suggesting that many patients had subjected themselves to a trial of a regimen or agent that probably wouldn’t work.

          https://www.nature.com/articles/483531a

          Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone.

          https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

          Early on, Begley, who had raised some of the initial objections about irreproducible papers, became disenchanted. He says some of the papers chosen have such serious flaws, such as a lack of appropriate controls, that attempting to replicate them is “a complete waste of time.” He stepped down from the project’s advisory board last year.

          Amassing all the information needed to replicate an experiment and even figure out how many animals to use proved “more complex and time-consuming than we ever imagined,” Iorns says. Principal investigators had to dig up notebooks and raw data files and track down long-gone postdocs and graduate students, and the project became mired in working out material transfer agreements with universities to share plasmids, cell lines, and mice.

          […]

          ALTHOUGH ERRINGTON SAYS many labs have been “excited” and happy to participate, that is not what Science learned in interviews with about one-fourth of the principal investigators on the 50 papers. Many say the project has been a significant intrusion on their lab’s time—typically 20, 30, or more emails over many months and the equivalent of up to 2 weeks of full-time work by a graduate student to fill in protocol details and get information from collaborators. Errington concedes that a few groups have balked and stopped communicating, at least temporarily.

          http://www.sciencemag.org/content/348/6242/1411

          An effort to reproduce the key findings of 50 influential cancer studies has announced that it will have to settle for just 37, citing budgetary constraints.
          […]
          “It’s a naÏveté that by simply embracing this ethic, which sounds eminently reasonable, that one can clean out the Augean stables of science,” says Robert Weinberg, a cancer biologist at the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts.

          https://www.nature.com/news/cancer-reproducibility-project-scales-back-ambitions-1.18938

          In the studies published to date, there is a surprising preponderance of failures to replicate. The same applies to the replication studies reported in this special issue. Why has there been a failure to replicate and what does it mean to the field? There is no simple answer.

          Possible reasons for a failure to replicate include:
          1) The original findings are incorrect, for example due to type I statistical error;
          2) The replication was not exact, and differences in experimental details led to different results that may raise important mechanistic questions;
          3) Effects were not significant because of greater variability in the replication (this would explain failure to detect an effect but not observations of different effects);
          4) Those carrying out the replications were not sufficiently experienced in the injury models or methods for assessment;
          5) There were differences in experimental animals (sources, housing, genetic drift);
          6) There were differences in the lesions;
          7) There were differences in animal care and use protocols (housing, handling, lighting, season, anesthetics, post-injury analgesics);
          8) There were differences in reagents (especially problematic for cellular therapies);
          9) There was inadvertent and unrecognized bias;
          10) The effect is not robust (encompasses some or all of the above);
          11) Finally, the possibility of scientific misconduct cannot be discounted.

          https://pubmed.ncbi.nlm.nih.gov/22078756/

          It is something like 10-30% of “promising” (based on statistical significance) results replicate. Some papers that can’t be replicated spawn hundreds of other papers over decades. So lets say 70-90% of actual good treatments get rejected out of an abundance of caution. Then it will be like I described with the bad weather predictions.

          And keep in mind this is just simply getting the same results in different labs by repeating the same procedures. That is the easy part, the hard part is interpreting them correctly.

  23. Wasn’t Ronald Fisher a leading proponent of randomization in the experimental context? If we judge by the standards of the day, surely Prof. Fisher’s intuition, findings, and teachings weren’t completely misguided?

  24. In the context of the evaluation of social programs, the operative word is “controlled” not “randomized.” Were we humbler to accept that “controlled” is a tough standard to meet in that context, we would be more creative…and credible. I rather pay attention to an analysis that tells me whether the estimated impact is consistent with the hypothesized magnitude of the change, and when it would manifest, than on a false sense of rigor from a single p-value of a randomized, not-really controlled, trial (RNCT).

  25. As I may have mentioned in an earlier post, I think I received the placebo in the polio RCT. The latter was a huge deal at the time. Others should weigh in on whether the same effect, speed, etc. could have been achieved with an observational effort.

    Many decades later, I was helping manage data for a melanoma RCT in San Francisco. The subjects just took their medicine to a pharmacy/chemist to see if they were on the placebo, or not, thereby torpedo-ing the trial. That particular medicine wasn’t going to help, but now melanoma can sometimes be treated, so progress was made eventually.

Leave a Reply to jonathan of some sort Cancel reply

Your email address will not be published. Required fields are marked *