“Efficacy of Ivermectin Treatment on Disease Progression Among Adults With Mild to Moderate COVID-19 and Comorbidities The I-TECH Randomized Clinical Trial”

Bob Young writes:

JAMA paper published today. Do you see any problems with their conclusion?

From the abstract:

The Ivermectin Treatment Efficacy in COVID-19 High-Risk Patients (I-TECH) study was an open-label randomized clinical trial conducted at 20 public hospitals and a COVID-19 quarantine center in Malaysia between May 31 and October 25, 2021. . . . the study enrolled patients 50 years and older with laboratory-confirmed COVID-19, comorbidities, and mild to moderate disease. . . . Patients were randomized in a 1:1 ratio to receive either oral ivermectin, 0.4 mg/kg body weight daily for 5 days, plus standard of care (n = 241) or standard of care alone (n = 249). . . . The primary outcome was the proportion of patients who progressed to severe disease, defined as the hypoxic stage requiring supplemental oxygen to maintain pulse oximetry oxygen saturation of 95% or higher.

And the results:

52 of 241 patients (21.6%) in the ivermectin group and 43 of 249 patients (17.3%) in the control group progressed to severe disease (relative risk [RR], 1.25; 95% CI, 0.87-1.80 . . . . In this randomized clinical trial of high-risk patients with mild to moderate COVID-19, ivermectin treatment during early illness did not prevent progression to severe disease. The study findings do not support the use of ivermectin for patients with COVID-19.

My reaction: Assuming the study has been done well, it tells us that we can be pretty sure that this treatment is not super-effective for this set of patients, nor is it super-counterproductive. That’s no surprise, but not everything has to be a surprise. It’s a data point. I’d think that useful research on this would not just look at outcomes but also track how the drug works in the body; otherwise it just seems like a shot in the dark. But I don’t really know anything about biomedicine; I’m just speaking as a statistician and saying that the value of this kind of purely empirical approach is limited, considering that the goal is to generalize to new groups of patients and new strains of the virus. Not that this sort of study shouldn’t be published; we just need to be aware of its limitations.

P.S. Just to clarify one point: The conclusion, “The study findings do not support the use of ivermectin for patients with COVID-19,” is literally true but is misleading if “do not support the use” is taken to imply “provide strong evidence against the use.” To put it another way, it would be a mistake to see this study and conclude: Negative finding, published in JAMA, therefore ivermectin doesn’t work. Better to say this is one data point that is consistent with what is already believed, which is that this treatment is neither devastating nor devastatingly effective. I don’t think it’s wrong for JAMA to publish such a paper; at the same time, publication in JAMA should not lead people to overstate the evidence from this study.

141 thoughts on ““Efficacy of Ivermectin Treatment on Disease Progression Among Adults With Mild to Moderate COVID-19 and Comorbidities The I-TECH Randomized Clinical Trial”

  1. I think this study is simply too small to conclude much. Those confidence intervals are way too wide.

    I think this study is a good example of why power analysis is so important. They should have have run the numbers and never ran it once they realized that it was underpowered.

    • Ethan:

      Yes, the conclusion, “The study findings do not support the use of ivermectin for patients with COVID-19,” is literally true but is misleading if “do not support the use” is taken to imply “provide strong evidence against the use.”

    • The study is not underpowered. Their sample of 500 would be small if they selected people from all ages. 500 patients following Malaysia age distribution would lead to a study of mostly young people and the study would not be powerful enough to distinguish effects, since bad outcomes would be rare. BUT, in this study they recruted only high risk patients 50 years of age or older. In their sample, bad outcomes from covid and relevant effects from ivermectin (if they exist) are expected to happen at a higher rate. I do not think there is statistical power problem with their study design.

      • It’s not underpowered for the primary outcome. But it does seem underpowered if they were trying to measure deaths. There’s a 60% decrease in deaths and it’s not significant. That’s by definition underpowered, no?

        • “That’s by definition underpowered, no?”
          No that is not the definition of underpowered. Power is defined on the primary endpoint and depends on anticipated effect size. We cannot automatically conclude “underpowered” applies to every trial that is not powered on mortality.

    • > I think this study is a good example of why power analysis is so important.

      That comment seems a good example of why it’s important to follow the links.

      > They should have have run the numbers and never ran it once they realized that it was underpowered.

      Unless I misundertood what you mean by that, they did run the numbers:

      Sample Size Calculation

      The sample size was calculated based on a superiority trial design and primary outcome measure. The expected rate of primary outcome was 17.5% in the control group, according to previous local data of high-risk patients who presented with mild to moderate disease.11 A 50% reduction of primary outcome, or a 9% rate difference between intervention and control groups, was considered clinically important. This trial required 462 patients to be adequately powered. This sample size provided a level of significance at 5% with 80% power for 2-sided tests. Considering potential dropouts, a total of 500 patients (250 patients for each group) were recruited.

      • Carlos:

        Sure, but that’s the problem: they’re powering the study to detect if the treatment is, as I put it in the above post, “devastating or devastatingly effective,” and, as such, when they get a null finding, all they’ve found is that the data are not consistent with huge effects. If the true effect were to decrease risk by 20%, then lots of people would want to use it; if the true effect were to increase risk by 20%, then that would be a good reason to tell people not to use it. But this study can’t identify effects of that size.

        What they did in this study is, unfortunately, standard practice: posit huge effects, then design the study, then report the results as if decisions should be made from this single noisy data point.

        • The point is that power analysis was done,

          But I wouldn’t say that the posit huge effects. It’s the kind of effect that you get with other treatments. It’s the kind of effect that it’s claimed elsewhere for this treatment.

        • What one could do if readers (w/o raw data) wish to do some follow-up analysis based on the point estimate and CI of RR to better understand the result?

          Not sure if this is suitable: Gaining an idea of the expected RR from literature –> calculate the power of this study (how many SE away from the expected RR the observed RR is) –> If the study is underpowered w.r.t. identifying the expected RR, calculate the sample size needed to have an 80% power (2.8 SE away). Is this reasonable? Any suggested papers that do well in this respect?

        • Patrick:

          The trouble is that if you go to the literature to get estimates of effect sizes, you’ll get huge overestimates because of selection on statistical significance.

      • > A 50% reduction of primary outcome, or a 9% rate difference between intervention and control groups, was considered clinically important.

        That’s what vaccines need to achieve to be considered effective. So if your question is, “is Ivermectin as effective as a vaccination”, that’d be the effect size to go for.

        • Mendel:

          I don’t think it makes sense to compare the effectiveness of a treatment with the effectiveness of a vaccine. I mean, sure, you can do it, but these are just different things. The reason for wanting high effectiveness of the vaccine is to stop the exponential spread in the population. That’s not an issue when considering the effectiveness of a treatment. So if we have an experiment that finds that in his particular group it does not appear that the treatment either doubles or halves the risk, I guess that tells us something, but I don’t see why it would be a good reason to say that trying out the treatment is a bad idea.

        • I’d lean heavily toward Mendel’s point here: for clinical interventions of infectious disease, a 50% effectiveness rate is pretty standard. Andrew: you suggest that it’s not reasonable to expect the same effectiveness from vaccines and direct treatment, but I don’t think that’s really true, given monoclonal antibodies were 77-85% effective at preventing severe disease (during delta). Moreover, as others have pointed out, the population is high risk. Recent research on remdesivir shows it’s >50% effective at preventing hospitalization for high-risk, infected individuals (noting this is a drug that has been widely panned as being largely ineffective overall).

          I’m 100% on board that we should be careful making general claims. I also fully agree that a less-than-50% effectiveness threshold might be useful for some circumstances. But their findings seem pretty well calibrated and measured (“does not support” is obviously different from “shows X is bad”). The directional effects (though not significant by traditional measures) pointed in the “wrong” direction for ivermectin’s use.

          Andrew, this might be a good chance to step back and reanalyze what the authors wrote and your critique keeping all of your other, broader concerns in mind. They don’t get ahead of themselves, they aren’t p-hacking, they back up their claims with (what we are at least told is) high quality data, and they aren’t saying “X is dangerous and should never be tried again by anyone.” (If you think a 20%, or 30%, or 5% effectiveness threshold is the proper bar, then go for it!) I’m not really sure why you’re “picking a fight” in this case to begin with.

        • Stepping:

          I don’t see why you say I’m “picking a fight”! Read my above post. As I wrote, I think the summary in that paper is literally true but could be misinterpreted. I also think it would be good for them to have data on what is happening inside the body, but I can understand that such data could be difficult to gather in these settings. When we consider the limitations of any study, starting with the standard error and then going to nonsampling errors, that should not be taken as a statement that the study is no good.

    • It’s not underpowered for the primary outcome. They did power analysis with an estimated 17.5% progression in control group, and powered the study to observe if ivermectin dropped it by half. That seems like a large effect size on the surface, but it’s exactly what ivermectin advocates have been saying Ivermectin would do. One of the nice things about the study (IMO) is that the cohort was selected (50+, laboratory confirmed, at least one co-morbidity) so that the expected proportion of patients who progressed to severe disease was relatively high even though the patients were enrolled early with mild to moderate disease. They based their estimates on clinical characteristics of Covid-19 patients in Malaysia, and progression in the control group ended up being 17.3%.

      • Greg –

        > They did power analysis with an estimated 17.5% progression in control group, and powered the study to observe if ivermectin dropped it by half. That seems like a large effect size on the surface, but it’s exactly what ivermectin advocates have been saying Ivermectin would do. One of the nice things about the study (IMO) is that the cohort was selected (50+, laboratory confirmed, at least one co-morbidity) so that the expected proportion of patients who progressed to severe disease was relatively high even though the patients were enrolled early with mild to moderate disease. They based their estimates on clinical characteristics of Covid-19 patients in Malaysia, and progression in the control group ended up being 17.3%.

        Tha is for that comment. It helps me to better understand the issues involved. Seems pretty remarkable that the estimated progression and progression in the control group were that close.

        What is your opinion regarding the secondary outcomes?

        • The 17.5% and 17.3% were impressively close, but they had a lot of prior information on patients in Malaysia, who were at the time (like Singapore) all treated in hospital setting. Or maybe they just got lucky.

          I think the secondary outcomes are woefully underpowered. I see the point made by Andrew and others that the study itself is underpowered if they were looking for 20% reduction in progression to severe disease. But (and perhaps this is a little cynical) the only reason these studies keep being done on Ivermectin for Covid-19 is that Ivermectin advocates have been describing it as a miracle treatment. So is it so unreasonable to set a target at 50% reduction?

        • Joshua, I’d like to add to my comments about secondary outcomes.

          The observation that 4/10 deaths in the control group were from nosocomial sepsis is, the more I think about it, relevant to the intepretation of deaths and mechanical ventilations. It’s not unreasonable that ivermectin could provide some protection against nosocomial infections, and I’d be curious what the rate of nosocomial sepsis (and parasitic infections) is for ICU patients in Malaysia.

        • Greg –

          I was wondering if it might be related to the worm/corticoseroids/ivermectin interaction. I was surprised no one (I’ve seen) has mentioned it.

    • Indeed, many issues….
      1. Details of primary endpoint was not defined a priori – if the WHO definition of severe disease was taken the results are positive for IVM. The definition they used is the only endpoint definition that would result in a negative outcome. Even the 95% used is suspect, the authors indicate the 95% value is from clinical stage 4, however even the Malaysian government defines 94% as the threshold for stage 4.
      2. The primary outcome is based on SpO2 <95%, however baseline SpO2 is not provided. This is highly unusual since it would be easy to measure and is arguably the most important baseline in this study.
      3. The trial was open label and the primary outcome is subject to investigator bias – clinicians could easily bias the results by altering how they monitor SpO2, how precisely they enforced the threshold, or other aspects of SOC such as propensity to use prone positioning.
      4. On average the primary endpoint (progression to severe disease, defined in this study to be SpO2 <95% which is a very slight progression) was reached within 3.1 days after treatment. This appears to be too short a time for treatment to have much effect and most probably is due to differences at baseline (see 2. above regarding missing SpO2 baseline data…).
      5. All clinically important endpoints show benefit, why was a clinically totally unimportant outcome chosen as primary outcome?
      6. Mortality outcome was so good that it almost reached statistical significance even though it was severely underpowered.

      • Adriaan:

        Your last statement, “Mortality outcome was so good that it almost reached statistical significance even though it was severely underpowered,” reflects a common statistical misunderstanding; see here. The fact that an estimate is noisy is not a point in its favor. Your statement would be more accurately be rephrased as, “Mortality outcome was very noisy.”

  2. Can you comment on whether it was sufficiently powered to measure an effect on mortality? The results seem inconclusive but is consistent with a benefit against death.

    I’m looking forward to the ACTIV-6 or PRINCIPLE results. My guess? They will show a significant but very small benefit.

    • Sam:

      The mortality data from the Malaysia study are consistent with a large benefit against death, or a small benefit against death, or no effect, or a small increase in death rate. I can’t really answer the question of whether the study was sufficiently powered, because the power depends on the underlying effect size. But I’d guess that the effect on mortality is small enough that the study was not sufficiently powered to detect a realistic effect size.

      • These are the mortality results: RR, 0.31; 95% CI, 0.09-1.11; P = .09
        Am I mistaken to say that these results would imply that the study was insufficiently powered to detect an effect size of 69% on mortality?
        I am not saying the power calculations were not made to show a 69% effect size, but rather that either the power calculations were not done to power the study to show mortality effect of 69% or that one of the assumptions made during the calculation was incorrect.
        Or am I misunderstanding something?

  3. Aren’t the authors pointing to the idea that it is a waste to treat COVID-19 with their conclusion sentence? (“The study findings do not support the use of ivermectin for patients with COVID-19.”)

    If there’s no evidence that invermectin helps, then why give additional and unnecessary treatment to patients that can also have negative side effects.

    • Harvey Risch, MD PhD of Yale – told me that there’s enough evidence that HCQ and ivermectin work, but not necessarily in the form of clinical trials. There’s also enough literature that urges scientists to look BEYOND clinical trials.

      Harvey says IVM and HCQ are harmless and have proven to be extremely safe for decades. So in this pandemic, it’s best if they are used to treat people as soon as they become ill.

        • ask him yourself. or ask the 60% more patients that survived inthe ivermectin group, what i found interesting was, of the 10 control group patients hooked up to ventilaters.All 10 died. of the 8 on icermectin, only 3 died. i think you and everyone else with common sense would rather be in the ivermectin group, right? please reply if you would rather be one of the 100% that died in the control group.

        • Mr:

          What you say may be correct, and if so I hope to see it replicated in a larger controlled trial. I suspect that much depends on how the treatment is done, who are the people in the experiment, and what stage they are in the disease.

          No doubt that, conditional on being in that study, we’d rather be in the treatment group. The issue is that we’re not in that study, so we’re interested in outcomes more generally. We discuss these general points in today’s post on meta-analysis.

  4. NO, no, no!

    OK higher power may make the interpretation less problematic but proper interpretation of what information the study does provide “should” always be helpful.

    Think of it this way – this can inform whether one should now recommend the treatment (no) whether one should do another trial (maybe not) and if so how (much larger sample size).

    As long as there still is uncertainties about the treatment any new study, even if underpowered could provide helpful information.

    The main conceptual error is thinking/acting there was and only will be this one study!

  5. I could agree with Andrew’s original post about the Lim et al. trial but it most oddly stopped short of quoting and discussing the so-called secondary outcomes
    – these outcomes in the paper’s abstract are pre-specified but are labeled “secondary” because the study as designed has poor power for any realistic effect on them:

    “For all prespecified secondary outcomes, there were no significant differences between groups. Mechanical ventilation occurred in 4 (1.7%) vs 10 (4.0%) (RR, 0.41; 95%CI, 0.13-1.30; P = .17), intensive care unit admission in 6 (2.4%) vs 8 (3.2%) (RR, 0.78; 95%CI, 0.27-2.20; P = .79), and 28-day in-hospital death in 3 (1.2%) vs 10 (4.0%) (RR, 0.31; 95%CI, 0.09-1.11; P = .09). The most common adverse event reported was diarrhea (14 [5.8%] in the ivermectin group and 4 [1.6%] in the control group).”

    – Taken together, these secondary results tell us that the study can’t distinguish whether ivermectin is a miracle life-saver or worse than worthless (I happen think the most reliable indicator of covid severity is whether you die from it, although in population data I think total deaths is needed since the classification of death from covid vs. with covid is controversial). Yet look at how the medical news headlines the results…
    “Ivermectin Does Not Stop Progression to Severe COVID: Randomized Trial” at https://www.medscape.com/viewarticle/968855

    It is also legitimate to note (but can get you badly savaged) that the trial protocol did not come close to most treatment protocols promoted by ivermectin advocates (e.g., treatment was not early for all patients; and it used no zinc when the advocates’ rationale for ivermectin is that it synergizes and thus must be given with zinc).

    There are now results of over 30 RCTs left on the web, even after eliminating the ones that looked like they are faked or seriously compromised, e.g., https://ivmmeta.com/#rct
    although that listing gets dismissed because the site is from a major ivermectin advocate. More are in progress. Even if one thinks that only a third or fewer of these trials are worthy of consideration, focusing on one alone as the arbiter of effect is absurd, and it is even more absurd to claim this one demonstrated no effect; yet that’s exactly how many if not most of the Twitter “experts” talk about the statistics, while accusing the ivermectin advocates of bias.
    Such irony abounds in this thread and its branches:
    https://twitter.com/NAChristakis/status/1494772663356641290

    • Sander –

      > …while accusing the ivermectin advocates of bias.

      I gotta say….it seems odd to me that you focus only on the one side of the ledger.

      We could just as easily select out inane Twitter threads where ivermectin advocates go wild accusing ivermectin doubters (as a monolithic group – INCLUDING researchers who have actively researched the efficacy of ivermectin) of bias on a scale tantamount to genocide.

      This thread, for my money, is the most amusing – where vaccine doubters go after a high profile vaccine doubter, because he says that study provides good evidence against ivermectin usage.

      https://alexberenson.substack.com/p/ivermectin-fails/comments?utm_source=url

      By way of some background, previously Berenson ran afoul of many of his supporters because he criticized Peter McCullough on Fox News.

      • i will check it out. but it sounds like berenson is part of the opposition. or perhaps just someone posing as antivax to discredit antivaxxers. OMG, You need only to read the first sentence to know this guy is not an antivaxxer. i myself am not antivaxx. just anticovid vax. The exact study he uses to debunk ivermectin is the one we are discussing. which shows a 60% decease in deaths!!!! incredible.

    • This, also, seems like an odd take to me:

      > although that listing gets dismissed because the site is from a major ivermectin advocate.

      Scott Alexander does a pretty nice job of discussing ivmmeta, and your characterization isn’t close to his discussion. That doesn’t mean that his take on them is typical – I’m sure it isn’t – but I find it problematic that you seem to sum up *all* critiques of ivmmeta as comprising merely something on the order or “They’re advocates!!!”

      https://astralcodexten.substack.com/p/ivermectin-much-more-than-you-wanted?utm_source=url

      Seems to me there’s plenty of knee-jerk reaction against ivermectin, but I certainly don’t think that reverse knee-jerk generalizing about this issue does anyone any good.

        • Joshua,
          I think you are reading way more into my post than is there. I was trying to be brief, not advocate ivermectin. My main point was that there is tremendous wish-bias against ivermectin just as there is tremendous wish-bias against it, and this anti-ivermectin bias dovetails with the standard misinterpretations of p>0.05 and “secondary outcomes” that dominate the medical literature.

          The mainstream has been beating on those who say the evidence tends to favor ivermectin, e.g., the ugly “horse-dewormer” labeling when in reality ivermectin is an essential medicine against human parasites as well (some even propose that its de-worming action supplies its covid benefit, by reducing the worm burden of the patients – which means we should see little or no benefit in Canada but some in the tropics).

          In this flaming atmosphere senn on that Twitter thread and in the Berenson substack link (where he is being idiotically certain based on one ambiguous and far-from-perfect trial of many, and the commenters rightly take him to task for that) I see no reason to emphasize how biased some of the ivermectin promoters have been – fine with me if others point it out, but the media already did that and continues to show ample reporting bias as in yesterday’s MedPage headline
          “Ivermectin Flops Again for COVID, This Time in High-Risk Adults
          – Malaysian trial finds no difference in progression to severe disease over standard of care”
          – That entire headline is false under my reading of the actual trial (it being a most ambiguous study, especially if we care about hard outcomes like death).

        • Sander –

          > I think you are reading way more into my post than is there. I was trying to be brief, not advocate ivermectin.

          Maybe I am. But to be clear – I didn’t read into your comment advocacy for ivermectin.

          What I was reacting to, was what seemed to me to be a truncation of the important full context. And this follow-on comments strikes me in a similar way.

          > My main point was that there is tremendous wish-bias against ivermectin just as there is tremendous wish-bias against it, and this anti-ivermectin bias dovetails with the standard misinterpretations of p>0.05 and “secondary outcomes” that dominate the medical literature.

          Assuming that was just an editing issue and not a freudian typo (tremendous wish-bias *against* it just as there is a tremendous wish-bias *against* it…).

          I’m struggling a bit with the p>0.05 secondary outcomes issue. On the one hand there are a lot of people out there saying that the “secondary outcomes” aspect completely invalidates the authors’ conclusions regarding the (lack of) efficacy, but on the other hand as Anoneuoid points out in his comments below, it does seem to me that a focus on the secondary outcomes is indeed weak scientifically. And I know it’s a big issue on this blog, but I’m not exactly clear on how the “statistical significance” aspect really should be viewed in the polarized public discussion about these kinds of studies. Of course, as ALWAYS the one study rule should be applied – but Alexander’s substack post has an interesting discussion of how the “statistical significance” question becomes complicated even across multiple studies.

          The point of a meta-analysis is that things that aren’t statistically significant on their own can become so after you pool them with other things. If you see one green box, it could mean the ivermectin group just got a little luckier than the placebo group. When you see 26 boxes compared to only 4 red ones, you know that nobody gets that lucky.

          Anyway….

          > The mainstream has been beating on those who say the evidence tends to favor ivermectin, e.g., the ugly “horse-dewormer” labeling when in reality ivermectin is an essential medicine against human parasites as well (some even propose that its de-worming action supplies its covid benefit, by reducing the worm burden of the patients – which means we should see little or no benefit in Canada but some in the tropics).

          This just seems off to me for a number of reasons. Right off the top, the concept of “mainstream media” doesn’t hold up for me – as Fox News and Joe Rogan and the Washington Times, etc., etc., aren’t meaningfully *not* “mainstream media” in my taxonomy. That’s not to defend the “horse-dewormer” rhetoric (I remember ivermectin from working with a business school case study 15 years ago that looked at why a pharmaceutical company would expend significant financial and intellectual resources on developing a drug to treat a disease – river-blindness – that basically afflicted only people who couldn’t afford to pay enough for the drug to be very lucrative). That seems stupid to me at a political/strategic level as well as from a public health perspective (given the polarized landscape, the “horse-dewormer” rhetoric might very well, if anything, encourage MORE people to take it due to a blowback effect).

          But then I just don’t get how you lift the “horse-dewormer” rhetoric from the rhetoric attacking vaccines and pushing the message that there’s a vast conspiracy to suppress ivermectin at the expense of hundreds of millions of lives. The one doesn’t meaningfully exist independent from the other and you seem to be repeating a line of thinking that seems to me to suggest that you view them as if somehow they can be disaggregated.

          btw

          > (some even propose that its de-worming action supplies its covid benefit, by reducing the worm burden of the patients – which means we should see little or no benefit in Canada but some in the tropics).

          If you read the Alexander piece, you’ll see that’s probably a bit off (as it may help with lessening the negative impact of corticosteroids in people with worms), as here:

          https://www.economist.com/graphic-detail/2021/11/18/ivermectin-may-help-covid-19-patients-but-only-those-with-worms

          > In this flaming atmosphere senn on that Twitter thread and in the Berenson substack link (where he is being idiotically certain based on one ambiguous and far-from-perfect trial of many, and the commenters rightly take him to task for that)

          But here you missed my point – which is that the same commentariat don’t take him to task for the same facile thinking on any of his posts about the vaccines.

          > I see no reason to emphasize how biased some of the ivermectin promoters have been – fine with me if others point it out,

          I’m not suggesting the one as opposed to the other – but that lifting either piece out of context is going to be, necessarily, incomplete at best and quite likely counterproductive – because you take your eye off the ball. The ball is the overall context of polarization.

          > but the media already did that and continues to show ample reporting bias as in yesterday’s MedPage headline
          “Ivermectin Flops Again for COVID, This Time in High-Risk Adults
          – Malaysian trial finds no difference in progression to severe disease over standard of care”

          I think there’s enough syntactical ambiguity in the wording of the article’s conclusion to make it complicated enough that describing it as “reporting bias” is too simplistic to be very useful. Not that the headline writers, in the best of all worlds, would be more sophisticated in their approach.

          > – That entire headline is false under my reading of the actual trial (it being a most ambiguous study, especially if we care about hard outcomes like death).

          Again, I don’t exactly understand this. It does seem to me that the it’s a mostly ambiguous study, but then you seem to be saying (in a reactionary way) that the study had meaningful results on “hard outcomes like death.” I’m struggling with that. And I hate the whole “if we care about” kind of rhetoric – as if anyone involved doesn’t care about hard outcomes like death?

          I dunno. Your whole take on this seems to me kinda reactionary – even after giving thought to your response that I’m ready way too much into your comments.

        • Joshua,

          Not sure if my response will land in the right place but this is in response to your post ending “Your whole take on this seems to me kinda reactionary – even after giving thought to your response that I’m ready way too much into your comments.”
          First, thanks for pointing out the typos, which unfortunately I can’t correct.

          Second, at this point I don’t know what you mean by “reactionary”, which I know of as a political term associated with aggressive conservatism, far from anything I hold. My guess is my use of “mainstream media” provoked your label, and indeed it was a poor choice – I had in mind mainstream medical reporting as seen in Medscape, MedPage Today, and similar sources. Fox News was never my go-to place for pandemic reporting, and I simply took it for granted you would know that. But then I’ve seen ample bias in headlines in the New York Times too, especially in the first year of the pandemic; does noting their editorial transgressions as well make me reactionary?

          Labels aside, can we stick to the facts about how both sides in this controversy (and several others in this pandemic) have had their share of misleading headlines and wish bias, at least according to some ideal of neutral and valid reporting of the research literature? Values that we might both share? Do you agree that no side should be exempt from criticism if they violate these essentially journalistic values? Do you agree that many of the comments in the Twitter link I gave initially reflect the dovetailing of wish bias with prevalent misunderstandings of statistics? Please re-read that Twitter thread before responding.

          As for secondary outcomes, I think the distinction from primary outcomes when all are prespecified has no epistemic justification whatsoever. Randomization can be used to analyze them all in the same way, and indeed a sophisticated analysis would treat them together as a multivariate outcome, taking account of their intercorrelations (which are high and would invalidate multiple-comparisons adjustments like Bonferroni).

          Alas, most statistical thinking and analysis in medical journals is far removed from state-of-the-art methodology or even correct common-sense statistics. As far as I can see on Twitter and medical articles in examples like this one, statistical training and conventions have instead damaged scientific thinking and reporting among actual medical researchers. Here, my quick take is that scientific thinking includes being informed by details of context, the hypothesized causal pathways that link those details, and what those details and pathways might together entail – especially what they entail about warranted uncertainty (which in real human-subject medical research always exceeds the nominal 5% error rate attached to interval estimates).

        • Sander –

          Agreed, “reactionary” was a poor choice.

          As for Medscape, MedPage, etc.,…headlines, NYTimes, Fox News, etc. I think identifying “bias” per se is tricky as it generally suggests mind-probing at some level. But sure, I think that we’ve seen sub-optimal treatment of the ivermectin issue at many levels. Still, again, I think then that makes it all that much more important to identify the actual underlying mechanisms of causality. For ivermectin, I think it’s important to always keep in mind the full context, the multifactorial nature of the polarization. Settling for simpler narratives is, IMO, counterproductive.

          > does noting their editorial transgressions as well make me reactionary?

          Taking “reactionary” back off the table, I think that assigning “bias” to editorial transgressions is simplistic in reaction. In this particular case, the technical issues are complicated. But beyond that, the polarized narratives are complicated as well. Sure, there’s nothing wrong, IMO, with noting the problems or expecting something better. But that isn’t really compatible with simplistic causal narratives, IMO.

          > Labels aside, can we stick to the facts about how both sides in this controversy (and several others in this pandemic) have had their share of misleading headlines and wish bias,

          In a sense, sure. What’s going on, is going on on both sides. I’ve got no problem with that. But “misleading” can either imply intent or not – so I think it’s important to be careful there. And “wish-bias” is, IMO, similarly complicated to assess. What is the “wish,” exactly, that is biasing headlines and the like? What is it in full context.

          > at least according to some ideal of neutral and valid reporting of the research literature? Values that we might both share?

          Yes, we do share those values. But I think perhaps I view the problem as more complex than you.

          > Do you agree that no side should be exempt from criticism if they violate these essentially journalistic values?

          Of course.

          > Do you agree that many of the comments in the Twitter link I gave initially reflect the dovetailing of wish bias with prevalent misunderstandings of statistics? Please re-read that Twitter thread before responding.

          In a sense, yes. But I think the identification of “wish bias” is complicated. And I also think that deconstructing Twitter comment threads is usually of dubious value from some kind of analytical perspective. I mean it’s information – but how to generalize to some larger from that information is tricky.

          > As for secondary outcomes, I think the distinction from primary outcomes when all are prespecified has no epistemic justification whatsoever. Randomization can be used to analyze them all in the same way, and indeed a sophisticated analysis would treat them together as a multivariate outcome, taking account of their intercorrelations (which are high and would invalidate multiple-comparisons adjustments like Bonferroni).

          That gets too technical for me to follow. But let me just make sure I understand. Are you saying that in this case, you see no valid distinction between the primary outcomes and secondary outcomes – and therefore it’s correct to say that the study actually shows that ivermectin is (very) efficacious in the sense that while it didn’t produce significant benefits on the primary outcome of interest it was significantly associated with lower deaths in the intervention group? Or perhaps you’re saying that the findings can only validly be considered as ambiguous for both types of endpoint? If so, are you saying a critical issue is whether the endpoints are prespecified (otherwise, I’m thinking, one could point to any outcomes associated with an intervention and attribute them to the intervention)?

          Again, my technical skills are limited but for me there does seem to be a problem there as it would seem to me that interventions would be generally be designed and executed differently as different outcomes are prioritized.

          > Alas, most statistical thinking and analysis in medical journals is far removed from state-of-the-art methodology or even correct common-sense statistics. As far as I can see on Twitter and medical articles in examples like this one, statistical training and conventions have instead damaged scientific thinking and reporting among actual medical researchers.

          Interesting. Again, I’m skeptical about the causal chain there. Not to say I think that problems don’t exist, but because problems exist doesn’t suffice to explain causality. Has the work of scientists actually been degraded because of inane hot takes (Twitter) or sub-optimal reporting (Medscape)?

          > Here, my quick take is that scientific thinking includes being informed by details of context, the hypothesized causal pathways that link those details, and what those details and pathways might together entail – especially what they entail about warranted uncertainty (which in real human-subject medical research always exceeds the nominal 5% error rate attached to interval estimates).

          What’s interesting to me there is that seems pretty much what I’ve been saying in reaction to what you’ve written about the whole ivermectin polarization.

        • Joshua:

          I did find that post to be surprising in that I thought there was a consensus that any aggregate effects of ivermectin would be small, but the person who wrote that article seemed to think the average effect was very large.

          If you start from that position (which I think requires accepting some questionable studies, but that gets into details which I have not studied), then I think the appropriate reaction to this new study is that it’s one more data point and it represents a scenario in which ivermectin did not have this expected huge beneficial results, which could be explained by the treatment not being applied appropriately in some way.

        • Andrew –

          > I thought there was a consensus that any aggregate effects of ivermectin would be small,

          But the consensus among the proponents is that the effect is huge.

          Almost invariably, those who are convinced that ivermectin is efficacious will respond to research finding no benefit by saying the wrong protocol was followed. The same pattern played out with hydroxychloroquine.

          It turns into an unfalsifiable argument (you can always say the protocol should be different, you need to use it with zinc or vitamin D or vitamin C or whatever) but letting that problem go – I suppose the protocol argument could have merit in any particular application. But the problem is that these same people are saying it’s benefit us huge – sometimes literally 100% – and a “miracle cure.”
          So then what I don’t get is why they think such a miraculous benefit, when used with the correct protocol, would go to zero benefit when an incorrect protocol was used. Why would there be, effectively, no dose-effect? Doesn’t seem likely to me. And then there’s the whole issue of no evidence (at least to my knowledge) in lab-based research of a mechanism of effect, in vitro, at the doses that they’re recommending in their protocols.

        • This is in reply to Joshua:

          I think your latest post exposes your own prejudices against defenses of ivermectin and criticisms of “negative” evidence, along with unfamiliarity with the history.

          From its start, part of the pro-ivermectin argument was based on the notion of an interaction with zinc metabolism, advising ivermectin be given with supplemental zinc (which has long been promoted for treating colds, another coronavirus disease grouping). It’s a theory pre-specified and long-standing in the topic, not a post hoc excuse invented to refute “negative” trials.

          If you claim otherwise, please provide a thorough review of all pro-ivermectin articles during this pandemic, documenting how every one of ivermectin’s defenders have claimed it has dramatic effects even when given without zinc. Until then I’ll consider it major limitations of the Lim (Malaysia) trial that it was not designed and conducted as a fourfold experiment crossing zinc and ivermectin treatments (and apparently ignored zinc entirely), and that it lacked power to detect plausible effects on the “secondary” outcomes.

          I think the ivermectin controversy can serve as a textbook example in which classical notions of both scientific and journalistic neutrality went to die hand-in-hand. As if to illustrate, given the atmosphere you help maintain in which any criticism of reportedly “negative” trials is taken as ivermectin promotion, I have to repeat that I am not an ivermectin advocate. I simply recognize that one more trial among many does not settle the controversy in any scientific sense, especially when its results are ambiguous both statistically and medically.

        • Sander –

          Apologies in advance for the length of this response. I hope you can get through it – particularly to the question at the end.

          I think your latest post exposes your own prejudices against defenses of ivermectin and criticisms of “negative” evidence, along with unfamiliarity with the history.

          First for my prejudices: I think IVM might “work” at some level. From what I’ve seen, the answer isn’t conclusive. I’m reluctant to just dismiss all those who say there’s evidence it has a positive beneift. I think the probability that it works as “miraculously” as some proponents are saying is lower than that it has a marginal benefit. It clearly has become a polarized issue, that reflects a politial/partisan signal. As such, I have a tendency to align with those who are doubting it’s efficacy against those who are promoting it. I try to interrogate that tendency for bias. In so doing, I see what I consider to be bad arguments sub-optimal arguments being made on both sides.

          From its start, part of the pro-ivermectin argument was based on the notion of an interaction with zinc metabolism, advising ivermectin be given with supplemental zinc (which has long been promoted for treating colds, another coronavirus disease grouping). It’s a theory pre-specified and long-standing in the topic, not a post hoc excuse invented to refute “negative” trials.

          From what I’ve seen, arguments promoting the use of ivermectin have cross a pretty wide spectrum, coming from a wide range of proponents. Included in there has been some advocates who say that it’s effectie when used with a particular recommended protocol in terms of dosage, timing, and what its administered in conjunction with. But there are three problems there as I’ve seen it.

          The first is that many of the same people who reject studies that don’t use that particular protoocol also point to evidence where it wasn’t used with that protocol, or where it is entirely unknown to what extent it was administered following that protocol, as evidence of it’s efficacy. So as someone who’s looking at their arguments with skepticism for the reasons I stated earlier, I begin to think that these inconsistencies suggest that the advocates are pushing a goal independent of the actual evidence. That isn’t to say that I’m rejectign the idea that it’s only effective when administered with a particular protocol. I obviously wouldn’t know whether that’s true – and so studies that don’t utilize that protocol could obviously have limited value. OF COURSE that could be correct.

          So what might help to evaluate this question is if there’s lab-based evidence that shows a differential (beneficial) mechanism of causality from IVM when applied in conjunction with the other substances which some proponents are saying are critical. Theories have been put forth, but as far as I know they haven’t been supported with actual empirical evidence.

          To make matters worse, different propoents say that particular protocols are critical for it to work.

          And then as I said above – I don’t get why, even if a particular protocol would lead towards optimal efficacy, studies that utilize different prototocls would just show no effect at all, as opposed to showing a sub-optimal effect. It would seem that you should be able to look at the data and show an increasing trend in efficacy that parallels the extent to which the protocol used approaches the recommended protocol – a dose effect.

          If you claim otherwise, please provide a thorough review of all pro-ivermectin articles during this pandemic, documenting how every one of ivermectin’s defenders have claimed it has dramatic effects even when given without zinc.

          I’m not suggesting that every one of its defenders have explictly claimed that “it has a dramatic effect even when given w/o zinc.” I’m saying that many of the highest profile advocates, as well as advocates more generally, have made arguments that selectively treat the question of whether administration fo zinc is necessary for there to be any benefit.

          Until then I’ll consider it major limitations of the Lim (Malaysia) trial that it was not designed and conducted as a fourfold experiment crossing zinc and ivermectin treatments (and apparently ignored zinc entirely),

          I don’t disagree with that. It certainly was an imperfect study. Ideally it would have included and controlled for any interacdtion effects with zinc. Since the use of zinc has widely been promoted by the highest profile IVM advocates, such a study would seem more useful. But including zinc would necessarily, significantly complicate any such study. So in the end, I look at that study as ONE study, which provides some information, which has limited value. It’s certainly not dispositive, but I wouldn’t be expecting that.

          and that it lacked power to detect plausible effects on the “secondary” outcomes.

          I’m a bit confused by you saying that, because it seemed above that you thought that the secondary outcomes were meaningful in terms of assessing IVM’s efficacy.

          I will say this, however. When I see many IVM proponents concluding that the study was “designed to fail” (I’ve seen that a lot), in a way that fits within a larger conspiracy framework that IVM has been specifically rejected by the medical establlishment because widespread use wouldn’t generate as much in the way of profits as vaccines, it only lowers my estimation of the likely validity of the arguments that dismiss evidence on the basis of not following a particular protocool. Now there’s a potemtial element there of seeking out the most extreme and implausible arguments as a way to confirm a “wish bias” as you describe it. Except that it’s not only an extreme outlier set who are promoting that narrative. It’s a narrative that’s being actively promoted by many of the highest profile (and most credentialed) of the advocates. That’s not to say I consider that narrative as 100% impossible – but I do think that for many reasons it isn’t particularly plausible and when I see advocates expressing total certainy in theories I consider pretty implausible, and I see arguments they make that are “adjacent” to a model of unfalsifiable analysis, my skepticism is enhanced. It’s also true that the promotion of implausible conspiracy theories by some proponents, about why the study was desgined as it was, is completely irrelevant to evaluating the study’s value given that zinc wasn’t administered.

          As if to illustrate, given the atmosphere you help maintain in which any criticism of reportedly “negative” trials is taken as ivermectin promotion,

          If you’re still reading at this point….I’m curious about that. I haven’t said that I consider you as promoting IVM. In fact, I’ve definitively said that’s NOT the case. So then I’m hoping you could point to what it is that I said that has you convinced I do view you that way – despite my saying explicitly that’s not the case.

          I simply recognize that one more trial among many does not settle the controversy in any scientific sense, especially when its results are ambiguous both statistically and medically.

          The funny thing is here, I agree completely with that.

        • Sander –

          Here’s a very limited window into what I was talking about. Here’s a meta-survey article with Kory as the first author. I did a search for “zinc” and there was exactly one hit. Of course, if I read it in more detail and dug through the references I might find that zinc was used from within their recommended protocol in all (or even most of) the studies they cite as evidence of efficacy, but I doubt it.

          Also, there’s this: Finally, the many examples of ivermectin distribution campaigns leading to rapid population-wide decreases in morbidity and mortality indicate that an oral agent effective in all phases of COVID-19 has been identified.

          This was one issue in particular I was thinking of. Kory and other high profile IVM advocates have referred to IVM use in Inida, in particular, as evidence of it’s efficacy. It’s been kind of stunnng to me that they’ve done so without actually having data (that I’ve seen) on how much IVM has even been administered, let alone what exact protocol has been been used. Many claims have been made about “distribution campaigns” in other locations as well (like Japan), just simply on the basis of whether IVM has been included in public health recommendations. It’s been a hand-wavy correlation-equals-causation-apooloza, as far as I can tell. That’s not something that inspires confidence.

        • Joshua:
          Thanks for the detailed reply; no need to apologize for length. My problem with your posts stemmed from what I saw as a failure to address the null-biased dogmatism among anti-ivermectin parties (a bias reflected in headlines spinning the trial in question as some nail in the coffin of ivermectin efficacy) in the same way as the huge-effect dogmatism among pro-ivermectin parties. But it seems you may agree with my points that the trial is in fact not very informative given its problems, making it incapable of settling anything and unworthy of the publicity and overinterpretation it has received. If so, we can move on to the bigger picture into which it should be fit (as a small part).

          For that, I would ask you to re-read very carefully the Alexandros Marinos post on ivermectin trials that k.e.j.o. kindly provided, to see what I think constitutes sound scientific analysis of a medical controversy; then judge how the outcomes (primary and secondary) reported in the Lim et al. study would affect the analyses he provides. I also advise re-reading as many of the comments as you can manage:
          https://doyourownresearch.substack.com/p/a-conflict-of-blurred-visions

        • Sander –

          I just spent some time looking at the Marinos article. I had looked at it before Iand by coincidence I had responded to a Tweet of his, about this study, just a few day ago).

          I have to say that I’m surprised that you’re so impressed by that article. Honestly, I thought it was pretty piss-poor, and a rather typical contrarian take of the sort I have frequenlty seen on climate “skeptic” websites.

          I could go into more detail with my criticism if you’d like, but for now I’ll just say that it’s very disappointing that his basic thesis is that his view that IVM is probably effective is probably right because it isn’t REALLY inconsistent with Alexander’s view because Alexander would be in agreement with him if Alexander didn’t just “trust” GMK.

          OK, well that seems like a pretty unscientific argument to start with, but then he argues he’s established that GMK isn’t trustworthy on a pretty weak basis. He claims that his reasoning isn’t an ad hom, and I think that’s questionable – but regardless, what he didn’t do was engage with the content as to WHY GMK felst some of the studies should be excluded.

          Yah. I’m confused as to why you think that article is an example of what science should look like. Seems to me it’s pretty much a good example of what science SHOULDN’T look like. Further, and more importantly, I don’t see how anything presented in that article actually addresses any of the technical questions related to how to interpret the exisitng literature on IVM.

          I will look at the comments later – as maybe there’s something there which would help me to understand why you think that article is such a tour de force.

        • Joshua,
          My recommendation of the Marinos post is based on the fact that he presents study results and his methodology for selecting and synthesizing them, open for anyone to dissect and criticize; worst I can see is that he doesn’t spell out his formulas, and I know those math-stat choices can make large differences in this setting. Admittedly I’m biased because I already did my own analyses (for a meeting) at about the same time he did his, and I reached basically the same conclusion about what the evidence leans toward (not “shows”) and how disconnected the ivermectin extremists on both sides seem from that evidence (however, I made no discussion of GMK; Marinos explained why he did, although one could challenge his reasons).

          In contrast to the Marinos post, I don’t see where your comments add information to the evidence surrounding ivermectin or how it should be viewed or synthesized, or add a new analysis to counter his, or where you offer any concrete criticism of the methodology he presented. All I can get out of your comments is that you don’t like Marinos or his conclusions. His conclusions are tentative compared to the extremes on both sides, and I think that’s good. If you come up with something he overlooked about study specifics or statistical methods, by all means post it back here and at his blog.

        • Sander –

          > Admittedly I’m biased because I already did my own analyses (for a meeting) at about the same time he did his,

          That’s intersting. So when you did your analysis, did you exclude any of the studies? If so, which did you reject? Did you review GMK’s rejections (which Alexander apparently considered valid – or perhaps as Marinos’ suggests, Alexander just accepted based on “trust” in a non-credible source) and reject his rejections?

          If you did evaluate and reject GMK’s rejections, I’d be interested in reading why you did so. Or did you just not evaluate the validity of the studies?

        • Sander, Joshua:

          It’s complicated. I agree that Marinos is clear on what he did; I’m also with Alexander in being concerned about a lot of the studies that are included in those analyses. Marinos is doing a kind of meta-meta-analysis with different exclusion rules, but I have some GIGO concerns here regarding selection bias in what is being looked at in the original studies. Also I’m skeptical in that he reports an average effect of the treatment having 67% improvement. I guess this is measured on different outcomes, but 67% improvement—that’s a huge effect. And now I can see why the Malaysia study is considered big news, because it really does rule out anything like a 67% improvement, at least for this particular version of the treatment in this particular population. I think it’s a weakness of Marinos’s analysis that he takes this 67% as a starting point, as it means that he’s taking at face value a bunch of iffy reports.

          In his post, Marinos writes, “In a remotely sane world, some country’s public health establishment would have sponsored a trial large enough to dwarf anything we have available, and settle the question for good.” This Malaysia study is an example of such a study: it’s large enough to rule out the purported huge benefit of ivermectin that was found from that controversial meta-analysis. But, no, the Malaysia study does not settle anything for good, because there are lots of outcomes that can be looked at and there are lots of ways for ivermectin to be used.

          Also, yes, this is all political. Go into the comment thread of Marinos’s post and right away you see a bunch of vaccine skeptics and conspiracy theorists. From the other direction you have lots of overconfidence (as in those horrible “Don’t worry about coronavirus” posts by Meyerowitz-Katz from early 2020). There’s lots to distrust all around. It’s a goddam minefield.

        • BTW –

          Perhaps a last comment.

          > All I can get out of your comments is that you don’t like Marinos or his conclusions.

          Just to correct this:

          First, I don’t know Marinos, so I have no take on him personally.

          Second, it’s true that I don’t “like” his conclusions in a sense. But that’s because I think the argument he presents to support his rejections of GMK’s rejections is weak. I take absolutely no issue with his assessment of the results of the studies as a technical matter. I have no reason to doubt Marinos’ analysis indicating that the studies (without GMK’s exclusions) show that early treatment with IIVM net a 67% improvement. (I couldn’t say either way because I’m not capapble of evaluating that analysis). If, in fact, GMK’s suggested exclusions (which Alexander apparently seemed to think were valid) are invalid, then I could see where Marinos’ analysis is an improvement on Alexander’s. But I think that Marinos’ explanation for why he rejected GMK’s rejections is weak, for the reasons I explained. Apparently you think all those studies should be included in any analysis?

          So as for “liking” his conclusions, to be clear, of course if IVM is effective, as Marinos suggests with his analysis, then I would “like” that, as it would mean that there is an effective treatment that could prevent illness and death, and perhaps offer me (my friends, my family) protection in particular.

        • Andrew –

          > but 67% improvement—that’s a huge effect.

          But maybe that’s actually a conservative estimate! The leading advocate for IVM, in testimony before Congress, says it’s a “100% cure!!”

        • Andrew: Yes it’s complicated and informed comment thus requires careful reading of the articles and blog posts, including the Marinos article.

          You said “the Malaysia study … really does rule out anything like a 67% improvement, at least for this particular version of the treatment in this particular population” – The differences in endpoints undercuts that description: the endpoints (outcomes) used in the Marinos meta-analyses are, according to Marinos, “the most serious endpoints” reported in the trials. All the most serious endpoints in the Malaysian trial had wide confidence intervals centered on strong inverse associations with 67% well within the intervals.

          The endpoint you referred to in the Malaysian trial is the least serious of those it reported, and there are causal hypotheses as to why there would be an increasing gradient in effect going from less to more severe – not that those are correct, but again the differences in endpoints undercut your description.

          Also, the effective size (or power) of the trial for these serious endpoints is very small, so it is also a mistake to claim that “it’s large enough to rule out the purported huge benefit of ivermectin that was found from that controversial meta-analysis.” – once again the differences in endpoints undercut your description.

          Returning to your GIGO comment, have you reviewed all the studies in that article’s most strict meta-analysis to verify that the concerns raised about ivermectin RCTs apply to those articles? The Marinos article does not present one meta-analysis, it presents several based on various exclusion criteria, such as omitting studies declared fraudulent by GMK. His final analysis uses so many exclusions (including GMK’s) that there are only 9 RCTs left out of about 30 possible candidates at the time (there are at least a dozen more trials than that but they hadn’t posted results back when Marinos wrote his article). He displays how the point estimate hardly budges from the increasing exclusions, although as must happen the interval estimate widens. Are you claiming that even with his strictest selection (~70% exclusion) the remaining 9 trials constitute “garbage in”? If so, please support your assertion by presenting the problems in each included study that make its inclusion more concerning than would the problems in the Malaysian study (lack of blinding, poor treatment timing, no consideration of adjuvant treatments).

        • Joshua:
          You wrote back as if Marinos did not use GMK’s exclusions, which is false: He applied GMK’s exclusions in his final meta-analysis, and showed that they hardly changed the point estimate, even though the exclusions widened the interval estimate as they must. That’s all made clear in Marinos’ graphic below his “Overall, the 5 pre-announced analyses put together, end up looking something like this” Marinos also opined that GMK was wrong to single out ivermectin RCTs as having a high “junk” rate, and he presented data to support the idea that such high rates are not unusual in the world of medical trials. From your description of Marinos I can only conclude that you are neither reading the material carefully nor understanding the statistics.

        • Sander:

          It’s hard for me to say because there’s so much going on and this is not an area of my expertise. Here are a few points. I’m following the Scott Alexander style of laying out the different arguments as I see them.

          1. You’re an epidemiologist so I’m inclined to trust your judgment on this one.

          2. I agree that Marinos is clear about what he does; overall, though, I’m more impressed by Alexander’s post because he goes through the individual studies. Marinos is starting with that ivmmeta.com analysis and I think that gives him a huge anchoring bias. As Alexander says, some of the studies reported there are really bad, so I think it’s a mistake to use that as a starting point. I accept the idea of including non-randomized trials—we want to use all the information that we can—but look at Alexander’s discussion of some of these. I don’t see how it makes sense to ever consider these in a serious analysis as counted equally to higher-quality studies.

          3. I took a look at Marinos’s final analysis excluding all the studies that were excluded by any of the groups:

          It seems that most of the information must be coming from the Bukhari study, as its confidence interval is so much narrower than all the others.

          But, given the data summaries above, I was surprised that the meta-analysis inference was so strong (the estimate at the bottom of the above graph). So I typed in the numbers and did my own meta-analysis, using the standard Bayesian template as in chapter 5 of BDA.

          The data, stored in the file meta_analysis_data.txt:

          study       est  se
          Ahmed       -1.897 1.505
          Bukhari     -1.715 0.489
          Buonfrate    1.955 1.505
          Chaccour    -3.219 1.681
          Krolewiecki  0.924 1.634
          Lopez       -1.109 1.668
          Mahmud      -1.966 1.551
          Mohan       -0.968 0.795
          Ravikirti   -2.207 1.523
          Together    -0.198 0.321
          Vallejos     0.285 0.760
          

          The Stan program, stored in the file meta_analysis.stan:

          data {
            int J;
            vector[J] est;
            vector[J] se;
          }
          parameters {
            real mu;
            real<lower=0> tau;
            vector<offset=mu, multiplier=tau>[J] theta;
          }
          model {
            est ~ normal(theta, se);
            theta ~ normal(mu, tau);
          }
          generated quantities {
            real theta_new = normal_rng(mu, tau);
          }
          

          The R code:

          library("cmdstanr")
          data <- read.table("meta_analysis_data.txt", header=TRUE)
          stan_data <- list(est=data$est, se=data$se, J=nrow(data))
          model <- cmdstan_model("meta_analysis.stan", pedantic=TRUE)
          fit <- model$sample(data=stan_data, parallel_chains=4, refresh=0)
          fit$print(max_rows=100)
          

          And the output:

            variable   mean median   sd  mad     q5   q95 rhat ess_bulk ess_tail
           lp__      -11.05 -10.77 3.34 3.29 -17.08 -6.00 1.00      959     1541
           mu         -0.81  -0.80 0.43 0.39  -1.53 -0.13 1.00     1490     1829
           tau         0.90   0.81 0.54 0.47   0.19  1.85 1.00     1122     1039
           theta[1]   -1.09  -1.00 0.89 0.77  -2.71  0.24 1.00     3504     2793
           theta[2]   -1.40  -1.39 0.49 0.50  -2.21 -0.61 1.00     2541     2621
           theta[3]   -0.07  -0.24 0.96 0.82  -1.34  1.72 1.00     2577     2346
           theta[4]   -1.33  -1.17 0.96 0.85  -3.09  0.02 1.00     3191     3143
           theta[5]   -0.41  -0.47 0.87 0.74  -1.69  1.15 1.00     3340     2922
           theta[6]   -0.88  -0.83 0.88 0.73  -2.41  0.51 1.00     4462     3107
           theta[7]   -1.08  -1.00 0.84 0.73  -2.56  0.15 1.00     4143     3332
           theta[8]   -0.87  -0.84 0.61 0.57  -1.93  0.11 1.00     5155     3330
           theta[9]   -1.14  -1.05 0.87 0.79  -2.72  0.10 1.00     3534     3384
           theta[10]  -0.31  -0.32 0.31 0.32  -0.82  0.21 1.00     3389     2908
           theta[11]  -0.23  -0.28 0.61 0.61  -1.13  0.85 1.00     3447     3284
           theta_new  -0.81  -0.79 1.14 0.84  -2.67  1.00 1.00     3015     3088
          

          All this is assuming the numbers in I'm taking the numbers in that table are correct summaries of the studies. Everything's on the logit scale. The result:

          - Inference for mu, the average effect in the hypothetical superpopulation of studies: the estimate is -0.81 with a 9590% posterior interval of [-1.53 -0.13]. Exponentiating gives an estimate of 0.44 and a 9590% interval of [0.22, 0.87]. That's all compared to 100%, so 0.44 corresponds to a multiplication of probability odds by 0.44, etc., if I'm reading this all correctly.

          - Inference for theta_new, the effect in a new study sampled from the hypothetical superpopulation: the estimate is -0.81 with a 9590% interval of [-2.67, 1.00]; exponentiating gives an estimate of 0.44 and a 9590% interval of [0.07, 2.71], i.e. a new study could have a true effect on the probability odds of a bad outcome of somewhere between a factor of 0.07 and a factor of 2.7. Neither of these extremes sound even remotely plausible; the wide uncertainty arises from the flat prior on tau.

          I'm not claiming that my simple Bayesian meta-analysis is definitive; I'm just using it to explore what these particular numbers can tell us, and where additional assumptions can come in.

          Anyway, the point is that Marinos's strong conclusion is coming essentially from this one study looking at the PCR results of 86 patients after a 14-day trial:

          4. What about that Malaysia study? It reports results on progression to severe disease on 490 patients. As I wrote above, I didn't find the result to be particularly newsworthy on its own, as I don't typically expect to see controversial treatments showing factor-of-2 effects in either direction---but this study seems stronger than the Bukhari study that is driving the meta-analysis. In your comment you discuss seriousness of outcomes, but "progression to severe disease" is pretty serious. If you want to restrict to "most serious," then you can just use the death outcome for all the studies---but then the Bukhari study could be coded as 0/45 vs. 0/41. So I don't think that it really works to apply this most-severe-outcome rule in interpreting the studies.

          5. To loop back to the basics: a key question seems to be if ivermectin can have huge effects in some settings. I'm generally suspicious of claims of huge effects, but, sure, I get it that some treatments really can have large effects. That's why this discussion has convinced me that the Malaysia study is more valuable than I'd thought at first. In a scientific context, it's just one data point. But in the context of these debates, this one data point can make a real difference.

        • The Marinos article does not present one meta-analysis, it presents several based on various exclusion criteria, such as omitting studies declared fraudulent by GMK. His final analysis uses so many exclusions (including GMK’s) that there are only 9 RCTs left out of about 30 possible candidates at the time (there are at least a dozen more trials than that but they hadn’t posted results back when Marinos wrote his article). He displays how the point estimate hardly budges from the increasing exclusions, although as must happen the interval estimate widens.

          Can you explain what this demonstrates? I’m not getting it.

          Let’s ignore all further developments that took place after the Marinos post and just engaging in the logic in that post.

          I don’t care about the point estimate, and I do believe the excluded studies have data irregularities that disqualify them from being taken seriously. Thus, my final conclusion is that there isn’t enough data to draw a useful conclusion.

          What additional insight does the fact that my exclusions wouldn’t have changed a point estimate grant me?

          Let’s take his point that the people who have discovered the data irregularities are biased in favor of the established “respectable” opinion (a good point, well argued, in my opinion). Does that tell me I should actually ignore the data irregularities and don’t use the exclusions? Or that I should split the difference somehow between using and not using them? I don’t think so–the disqualifying data irregularities are still there. In my opinion, it means I should go through with a fine-toothed comb and look for more data irregularities, especially in the studies that are widening the confidence interval towards the negative.

        • Andrew: Let me once again clarify and expand that I am not advocating ivermectin, nor do I think it has as large an effect as shown by the meta-analyses – for one thing, like you (I believe) I have a null-centered prior to begin with. What I do see is Marinos’ point that the evidence does lean in the direction of a risk reduction for the most consequential outcomes, however you whittle studies down based on critical claims like those from GMK and Alexander.

          The key point here is empirical: As Marinos whittles the study pool down, the summary estimate stays in the 60-70% reduction range for the endpoints labeled “most serious”, with the interval estimate expanding as it must. The question should be: What are plausible explanations for this phenomenon? You focus on Bukhari but you don’t note that it could not have driven the much larger pool of studies so heavily, yet that pool showed the same estimate.

          But fine, let’s see what happens when we drop Bukhari: We still get an apparent risk reduction, although now the interval is so wide that nothing initially plausible is excluded, and any firm conclusion has to be a product of one’s prior beliefs, not the trials. Adding the results of Lim (the Malaysia study) based on the same “most serious endpoint” criterion in use before that was published doesn’t change that outcome. If you want to add consideration of study problems to this judgement, you have to address the problems of the Lim study, which I did not see you do before declaring such an especially important one.

          Absent such in-depth analyses, I will repeat that the Lim et al. result is ambiguous: it can be explained or “spun” as either supporting and refuting ivermectin benefit, and there are no data that can knock out these explanations. So no, the Malaysia study is not particularly valuable; that is just a misimpression based on one spin of many equally defensible spins.

          What we are left with is that, after going on 2 years and by now something on the order of 40 RCTs plus many observational studies, the controversy remains. What bothers me most in this saga is the degree of anchoring in what should have been tentative prior beliefs, whether those are that ivermectin is a miracle drug or that ivermectin cannot have more than a trivial benefit. There follows from that anchoring a pattern of literature discussion conducted as if any study or analysis that can’t be spun as supporting the prior must be scrutinized to find flaws (which exist in all studies) to reject it, while any study that can be spun as supporting the prior is hailed as important game changer and given a pass on its flaws. Alexander was good in spotting and avoiding that trap; you mention that but again didn’t scrutinize Lim et al. that I can see

          As for politics, I think there’s a lesson somewhere in contrasting the ivermectin situation to another covid treatment: molnupiravir. That was authorized based on just a trial or two whose full data were unavailable to the public. What is available indicates that molnupiravir has no more benefit (30% risk reduction when properly timed) than does ivermectin in the more modest claims (beyond the covid topic, aducanumab didn’t even meet its prespecified efficacy criterion for the primary endpoint yet was approved by the FDA last June). I’ve seen none of the prior skepticism for the molnupiravir effect that was on ample display for ivermectin, despite general arguments to apply such priors routinely when evaluating efficacy claims for new drugs (e.g., see van Zwet et al., Significance Dec. 2021). It’s interesting to note that Merck manufactures both ivermectin and molnupiravir, but the price difference is an order or two of magnitude thanks to ivermectin being off-patent and molnupiravir being on; so perhaps the lesson is simply one about trial financing (no one immediately stepped up with an approval-worthy trial of ivermectin because there was no profit potential in it).

        • Sander –

          > I’ve seen none of the prior skepticism for the molnupiravir effect that was on ample display for ivermectin,

          I’ve seen plenty of skepticism about molnupiravir. For example, a mainstream article – depending on how we define mainstream:

          https://www.forbes.com/sites/williamhaseltine/2021/11/01/supercharging-new-viral-variants-the-dangers-of-molnupiravir-part-1/?sh=67f525266b15

          > It’s interesting to note that Merck manufactures both ivermectin and molnupiravir, but the price difference is an order or two of magnitude thanks to ivermectin being off-patent and molnupiravir being on; so perhaps the lesson is simply one about trial financing (no one immediately stepped up with an approval-worthy trial of ivermectin because there was no profit potential in it).

          A couple of comments. Molnupiravir is apparently not being utilized very much. Compare that to fairly widespread use of IVM. So I wonder if, despite the lower costs, the profit aspect gets a bit complicated because comparing actual profits might tip the scales towards IVM. Adding to that issue is that quite a few (and some high profile) advocates for IVM are making money by offering a service where it is made available – sometimes in ethically questionable ways (e.g., providing it to people who are taking it in hospitals w/o conclultation with treating physicians).

          Also, I would say it might be relevant that IVM is already approved for other use and available for off-label use, when considering why there’s less momentum for IVM trials than for molnupiravir, where it could only be made available after a trial. Sure, profit motive might be in play. But consider a more mundane explanation: medical providers actually care about providing help to sick people, and exploring a new treatment that is completely unavailable might legimitately have more appeal than exploring a treatment that is already effectively available.

        • > What is available indicates that molnupiravir has no more benefit (30% risk reduction when properly timed) than does ivermectin in the more modest claims

          On the other hand, if we want to look at the hardest endpoint available in the trial molnupiravir has shown a (barely) statistically-significant 90% reduction in deaths.

          https://www.nejm.org/doi/full/10.1056/NEJMoa2116044

          One could also say that in the “more modest claims” ivermectin has shown no benefit. As far as I know there isn’t a collection of successful and unsuccessful trials for molnupiravir. There is a phase 3 trial which was stopped early when an interim analysis showed efficacy. Based on that one trial Merck was granted an authorization for emergency use (which has some limitation and is not the same as a drug approval).

        • As a fan of irony, I do want again to point out an amusing overlap of the conspiracy-hued reasoning behind why IVM hasn’t achieved more purchase (i.e., “because Big Phama”) and the history of the drug.

          IVM was developed with the expenditure of considerable intellectual and financial resources – largely to treat an illness for which hardly anyone afflicted to pay for expensive medication (river blindness). To add even further to the irony, considering the overlap between anti-WHO rhetoric and IVM advocates, the drug was distributed largely through a collaboration between Merck and, you guess it, the WHO (oh, and also Big Government).

          I should also not the irony of me appearing as an apologist for BIg Pharma (and Big Government). My father is probably rolling over in his grave.

          I’ll also note that there is an issue of scale, but (again) if we’re going to reverse engineer from profit-taking, IVM might not be the hill to die on (even if we don’t consider the money being made by those prescribing IVM).

          Since July 2020, Taj Pharma has sold $5 million worth of the pills for human use in India and overseas.

          https://www.bloomberg.com/news/articles/2021-10-13/ivermectin-demand-sends-sales-soaring-for-foreign-generic-drugmakers?utm_content=business&utm_campaign=socialflow-organic&cmpid=socialflow-twitter-business&utm_source=twitter&utm_medium=social

        • Thanks Carlos for the NEJM link. Notable perhaps is how (again) the more severe outcome has a far larger estimated % reduction than the less severe one.
          A thought is that this phenomenon may in part be due to the smaller event numbers for more severe outcomes (corresponding to the small-sample/sparse-data bias well known to afflict ratio estimates);
          as a rough first step to account for that I add 1 to each event count (equivalent here to putting a mean-zero logistic prior on the log ratio), which here would drop the death reduction estimate to 80%.

          The controversy suggests an experiment which, while perhaps infeasible, may be interesting to ponder:
          Using participants that had no knowledge of this trial but had strong measured baseline opinions on ivermectin, randomize the article to two groups (blocking on baseline opinion), with the journal location blanked out,
          and for the one given to the treated group substitute “ivermectin” for “molnupiravir” everywhere so that it is presented as a trial of ivermectin.
          Lots of variations are possible, like including journal title as a crossing factor and randomizing that to NEJM vs. some very obscure journal title that includes a developing country name.

          You wrote:
          “One could also say that in the ‘more modest claims’ ivermectin has shown no benefit.”:
          I strongly disagree. A claim of no benefit is vastly overconfident in the null, as badly as 90% benefit claims.
          The majority of estimates so far fall to the beneficial side and for all we know those are reflecting some real benefits; that they are all just errors is a pretty strong belief.

          You also wrote:
          “As far as I know there isn’t a collection of successful and unsuccessful trials for molnupiravir.”:
          As far as I know there is no such collection for ivermectin either, unless by “successful and unsuccessful” you mean some measure of success based on getting a benefit estimate with p<0.05, getting in a high profile journal, receiving no challenge to the trial's integrity, and securing FDA authorization. I suspect that's Merck's idea of success.
          As for the science however, I don't like talking in terms of success based on observed endpoints. Success is getting funded and conducting the study in a well-documented way to produce reliable data.
          The difference is then that (as far as I know) there's only one published study of molnupiravir, looking of high reliability on the face of it, but the study conduct records and raw data aren't available for auditing or verification;
          while there is a wide spectrum of results for ivermectin from many trials whose reliability has more often than not been questioned (which is good if done blind to the trial outcome) and in some cases have supplied their raw data which led to discoveries of fatal problems.

        • Sander:

          You write:

          As Marinos whittles the study pool down, the summary estimate stays in the 60-70% reduction range for the endpoints labeled “most serious”, with the interval estimate expanding as it must. The question should be: What are plausible explanations for this phenomenon? You focus on Bukhari but you don’t note that it could not have driven the much larger pool of studies so heavily, yet that pool showed the same estimate.

          So, there are two things going on. The first is that the strong conclusion obtained from the final meta-analysis is essentially entirely driven by the Bukhari study. The second is, as Alexander and Meyerowitz-Katz discussed, the larger meta-analysis contains studies with a lot of problems. I just don’t think that larger meta-analysis, which includes all those problematic studies, is particularly useful, as I have no reason to think that the biases would cancel each other out. This is why I said that I disagree with Marinos’s decision (and yours) to use this larger meta-analysis as a starting point.

          Regarding the “most serious endpoint” criterion: I don’t buy this at all. Again, if you want to look at death, which in this case is indeed the most serious endpoint, then you should be looking at death for the Bukhari study, in which case the results from meta-analysis completely disappear.

          Regarding problems of the Bukhari and Lim studies: That’s right, I did not talk about that, except to say that each of these studies is measuring a different treatment in a different setting on a different group of people with different outcomes. Any meta-analysis is problematic. And that’s why I described the Lim study in my above post as “one data point.” The Bukhari study is one data point too—actually a noisier data point—but it’s what is anchoring Marinos’s strong conclusions.

          I am not attacking the Bukhari study or defending the Lim study. What I’m saying is that, from the Marinos post, I see that some people have assessed the evidence and view there to be good evidence that the ivermectin treatment has strongly positive effects, reducing bad outcomes by two-thirds or more. In Marinos’s post, that view is supported largely by the Bukhari study. When I first saw the Lim study, I thought it was no big deal because all it was doing was ruling out huge effects in either direction—but after seeing the Marinos post, I realize that there’s value to ruling out huge effects, so now I think the Lim study adds something useful. In the same way that the Bukhari study adds something useful. Each is a data point.

          If there had earlier been a consensus that any effects of ivermectin would be relatively small, for example decreasing or increasing bad outcomes by no more than 20%, then I’d say that the Lim study didn’t tell us much. I’d just assumed that there was such a consensus, but this discussion makes me realize that I was wrong: the studies are a mess, the discussion out there is a mess, lots can depend on how the treatment is applied, who it’s applied to, and when in the time course it’s applied, so, yeah, one more study, if trustworthy, can be informative.

        • > You also wrote: “As far as I know there isn’t a collection of successful and unsuccessful trials for molnupiravir.”:
          As far as I know there is no such collection for ivermectin either, unless by “successful and unsuccessful” you mean […]

          By “unsuccessful” I meant for example the study under discussion. I don’t know what’s your idea of “a successful trial” though.

          Anyway, I don’t particularly care about ivermectin. Or about molnupiravir for that matter. Which, by the way, looked better at the interim analysis – which is why the trial was stopped early – but efficacy came down from 50% to 30% with the additional data.

          I just found the comparsion inadequate. We agree that there are many differences, including the fact that it’s much easier to do a clinical trial properly when you are trying to sell something. (It goes both ways, though: you also need to be confident about what you’re doing if you’re throwing millions of dollars at it.)

        • Andrew: First, your post at its end shows on my screen as reproducing what I wrote as if you wrote it. I hope you can fix that.

          Now I’m aggrieved – you are misrepresenting what I wrote, slipping into straw-man argumentation:
          You said
          “This is why I said that I disagree with Marinos’s decision (and yours) to use this larger meta-analysis as a starting point.”
          It’s not a “starting point” or reference point (for me anyway), it’s simply something that’s been done before and serves to illustrate the impact of selection on the estimates.
          You presenting it as you do makes it sound like I give credence to the effect estimate from the larger analysis, which I certainly do not. I too want a more restricted analysis.

          You say “Again, if you want to look at death,” – I do but I also want to look at endpoints like ICU admission and mechanical ventilation, as those are associated with some of the most serious long-term sequelae.

          You say the “Bukhari study is what is anchoring Marinos’s strong conclusions.” I think that’s a misrepresentation of his methods. It’s contributing to them but he cites other lines of evidence as contributing too.
          His main point is not about ivermectin, but on how he thinks lines of evidence should be merged as opposed to how others have done it in this example. It’s a general argument which has appeared in other forms in many other fields (including epidemiology), as he cites.

          You obliquely misrepresent what I said in your stating “one more study, if trustworthy, can be informative.” I did not say Lim et al. was uninformative; I said its results were ambiguous and not about to settle the controversy. In other words, it’s not as informative as the headlines or you made out. It may be a good for sociomedical goals of staunching ivermectin usage and inflated claims. Is that part of what you call scientific importance?

          As far as its trustworthiness, that depends on what you mean. I do not think we should trust confidence intervals to capture the uncertainty warranted about Lim et al. in light of its already mentioned problems. The same could likely be said of almost all the studies – positive, negative, whatever. If we expanded those intervals to capture uncertainty beyond the impoverished oversimplification of confidence intervals (which assume the study was perfectly conducted and analyzed), the entire set of results will look even more ambiguous than it does already (although I’d bet the posterior will still tilt a bit in the benefit direction, based on what I’ve seen in real examples of expanded uncertainty analyses involving priors on bias distributions).

          Finally, I should add a technical caution that every ivermectin meta-analysis I’ve seen used what might be charitably called suboptimal statistical methods for combining studies (suboptimal even if all are perfectly conducted and reported). These methods are alas standard in software, and sometimes matter – at least if one thinks whether p is above or below 0.05 is a central point to emphasize.

        • Sander:

          1. Thanks for pointing out the cut-and-paste error in my comment; I fixed it.

          2. When I said you were using the larger meta-analysis as a reference point, I was referring to your statement, “You focus on Bukhari but you don’t note that it could not have driven the much larger pool of studies so heavily, yet that pool showed the same estimate.” I don’t think the estimate from that pool is relevant, as this is a pool that has lots of bad studies, as is discussed by Alexander and Meyerowitz-Katz in the above link.

          3. It’s fine to look at death and endpoints like ICU admission and mechanical ventilation. I was just commenting on your statement that, “Adding the results of Lim (the Malaysia study) based on the same ‘most serious endpoint’ criterion in use before that was published doesn’t change that outcome.” My response is that the “most serious endpoint” criterion was not used for the Bukhari study; if it had been, the data for that study would’ve been 0/45 vs. 0/41.

          4. I agree with you that the Lim results were ambiguous and not about to settle the controversy. Indeed, that’s what I said in my above post. At best, the Lim results rule out some extreme effect sizes for one particular version of the treatment in one place. As I wrote above, it’s one data point, just as the Bukhari study is one data point.

        • Andrew: OK, thanks for clarifying your positions…

          2. We disagree about the relevance of the larger inclusion (smaller exclusion) meta-analysis: It is the only data I know of providing some information about the net impacts of bias sources on study results. Its comparison to more restricted and hopefully more valid meta-analyses is a source of prior information for bias analysis.

          3. OK, good point.

          4. OK if amended to “At best, the Lim results refute some extreme effect sizes for one particular version of the treatment and outcome in one place.”
          The Lim result can’t rule out anything unless you assume any contradiction it faces from another study using the same variables and population has to be due to problems with that other study, not with Lim et al. I would not assume that for reasons given in 5 below.

          Regarding lines of evidence, part of our dispute may stem from what some see as excessive weight placed on RCTs, which can have plenty of practical problems and sources of bias. The mess of the ivermectin literature could almost serve as a case-study of why RCTs should not be taken as the supreme arbiter of what is claimed as medical knowledge: They can have all sorts of problems that call into question their consideration for some discussants (as has happened for some 70% of the ivermectin trials). Other lines of evidence need to be integrated with RCT information, hopefully in a transparent fashion. There’s a huge debate on the limits of randomized trials in the social and health science literature, e.g., see
          https://www.sciencedirect.com/journal/social-science-and-medicine/vol/210/suppl/C
          https://larspsyll.wordpress.com/2022/02/16/do-rcts-really-control-for-lack-of-balance/
          https://link.springer.com/article/10.1007/s10654-017-0230-6

          5. One more crucial point on which I think we may agree: That an analysis (e.g., that in Marinos) made what we would judge as mistakes or oversights does not (at least for me) void its status as scientific. The latter label identifies not whether it is correct but rather whether its methods meet some preconditions for producing reliable outputs from reliable inputs (preconditions which are controversial, as is the meaning of “reliable”).

          Everything is imperfect, especially our knowledge and most of all our assessments of its extent and accuracy; that includes our assertions about the mistakes of others. I think statistics causes a lot of harm via uncertainty laundering (a favorite of mine among of your phrases), due to giving us interval estimates with labels like “95% confidence” or “95% probability”. In most of the applications I see, those percentages are derived under models we know are wrong in ways that, if rectified, could greatly expand the intervals and our uncertainty. Any discussion of these intervals in practice thus needs to reduce that 95% in some way to account for that – sometimes down to more like 50% if we did a dispassionate review of the many and possibly conflicting causal explanations for the observations; for a real example see sec. 19.3.3 in https://link.springer.com/referenceworkentry/10.1007/978-0-387-09834-0_60.
          As Judea Pearl has emphasized and I reviewed for an upcoming festschrift at https://arxiv.org/abs/2011.02677, statistics as taught does not seem to address the crucial nature of this coverage for spotting our information gaps, and thus leads to understatement of warranted uncertainty levels. To ignore this problem is to encourage overconfident inferences.

      • Sander –

        I thought I made it clear from the top that I know next to nothing about statistics (which would naturally lead to the question of why I’m commenting on a statistics-based blog, but that’s another matter). So… ​

        I can only conclude that you are neither reading the material carefully nor understanding the statistics…

        Both are true. Because I don’t understand the statistics, I skim the more technical discussions (hoping I can parse enough to draw general concepts).

        ​You wrote back as if Marinos did not use GMK’s exclusions, which is false:

        Looks like that (very rough process) steered me in the wrong direction in this case – where I missed that he was saying that even working with GMK’s exclusions, he finds a much larger benefit from IVM than what Alexander found. Thanks for correcting my misunderstanding. Not terribly relevant, but just by way of defense…

        but his trust in GidMK waters down the power of the dataset to the point where the effect starts to look uncertain, even if ivermectin still looks more likely than not to have a significant effect….What I intended to show, and I think this exercise makes clear, is that Scott’s conclusions are very much dominated by what he excluded because of GidMK. Had he not done so, the conclusion of his analysis would have been very different.

        Which I took to mean that rejecting or accepting GMK’s exlusions was the critical factor in whether Alexander’s analysis stands up, and in determining whether or not a meta-analysis supported the use of IVM.

        And what I missed (where I put in the elipses) is this:

        I must admit that I did not expect that after all the exclusions we would still be in commonly accepted as “significant” territory, but them’s the maths.

        As for this:

        Marinos also opined that GMK was wrong to single out ivermectin RCTs as having a high “junk” rate, and he presented data to support the idea that such high rates are not unusual in the world of medical trials.

        I don’t see GMK’s characterizing or singling out the IVM RCTs as being more prone to “junk” than other RCTs as terribly important to the larger discussion. IMO, the critical factor would be to go through the content of why GMK excluded what he did to examine the validity of the exclusions, and not to rely on (1) backwards engineering from a cherry-picked inventory of when GMK was wrong or said something tribalistic (did GMK ever say anything right, or anything inconsistent with a Marinos’ conclusion that his input can just be rejected as a tribal approach? If he did, shouldn’t that fit into the reverse engineering?) and (2), to blithely assume that Alexander was working from “trust” in a non-credible source (maybe he was, but how would Marinos know that?).

        That logic is, IMO, highly problematic. “Somebody” above seems to think the case against GMK was well-made, so maybe I need ro revisit that. But IMO, after Marinos excusing ad hominem logic by just asserting it isn’t really ad hominem….notice here: If Scott’s analysis depends on his trust on GidMK, it is paramount that I demonstrate that GidMK’s track record is not one deserving of that trust.

        So Marinos says “if” X, but then just assumes that “X” is the case without really exablishing that it is the case. Now maybe he’s right, and Alexander was just working from “trust.” And maybe he’s right that GMK just isn’t credible. But that’s a whole logical rabbit hole that I think is entirely unscientific. What would make sense, to me, is for him to evaluate GMK’s criteria for exclusion.

        That’s why I asked you what your take is on GMK’s exclusions and whether or not your analysis included the articles GMK excluded. I’ve notice throughout this thread I’ve asked you a number of questions that you didn’t answer. It’s certainly your right to not answer questions, of course. But I would nonetheless appreciate it if you’d answer my questions.

        • Sorry Joshua about the unanswered questions, but you ask a lot and I simply don’t have time for them all. At least some you could answer for yourself if you studied the technical details of the materials and general methods in play. If you e-mail me directly with your background, perhaps I could recommend some useful articles or books for that. In the meantime, I hope you find my exchange with you through yesterday and my discussion with Andrew to be of some help.

    • > the trial protocol did not come close to most treatment protocols promoted by ivermectin advocates (e.g., […] it used no zinc when the advocates’ rationale for ivermectin is that it synergizes and thus must be given with zinc)

      I don’t know about “most protocols” but a search for [ ivermectin protocol ] leads to the Front Line COVID-19 Critical Care Alliance’s protocols:

      https://covid19criticalcare.com/wp-content/uploads/2020/12/FLCCC-Protocols-–-A-Guide-to-the-Management-of-COVID-19.pdf

      They seem to be ivermectin advocates – it’s the first treatment they mention in all settings.

      Zinc appears in the document, but not directly linked to ivermectin. Zinc is mentioned in combination with hydroxychloroquine though.

      Zinc appears in the ‘ Pre- and Postexposure Prophylaxis’ section ‘Nutritional Supplements (in order of priority, not all required)’.

      With low priority: Vitamin D, Curcumin (Turmeric), Nigella Sativa (black cumin) and honey, Vitamin C, Quercetin, Zinc, Probiotics, B complex vitamins.

      Zinc appears in the ‘Symptomatic Patients At Home’ section ‘First Line Treatments (in order of priority, not all required)’.

      It has the lowest priority of all, following Ivermectin, Hydroxychloroquine, Oropharyngeal sanitization, ASA, Melatonin, Curcumin (turmeric), Nigella Sativa (black cumin) and honey, Kefir and/or Bifidobacterium Probiotics, Vitamin D3, Vitamin C and Quercetin.

      In the ‘Mildly Symptomatic Patients (On floor/ward in hospital)’ section ‘First Line Therapies (in order of priority)’ we have Ivermectin, Nitazoxanide, Methylprednisolone, Enoxaparin, Vitamin C, Zinc, Melatonin, Anti-androgen therapy, Fluvoxamine.

      Interestingly, the relevance of zinc in the protocols seems to increase with disease progression but then dissapears in the last one ‘For Patients Admitted to the ICU’.

  6. For all prespecified secondary outcomes, there were no significant differences between groups. Mechanical ventilation occurred in 4 (1.7%) vs 10 (4.0%) (RR, 0.41; 95% CI, 0.13-1.30; P = .17), intensive care unit admission in 6 (2.4%) vs 8 (3.2%) (RR, 0.78; 95% CI, 0.27-2.20; P = .79), and 28-day in-hospital death in 3 (1.2%) vs 10 (4.0%) (RR, 0.31; 95% CI, 0.09-1.11; P = .09).

    […]

    In this randomized clinical trial of high-risk patients with mild to moderate COVID-19, ivermectin treatment during early illness did not prevent progression to severe disease. The study findings do not support the use of ivermectin for patients with COVID-19.

    Reads like a stats 101 misinterpretation to me.

  7. Wait what? You left out the most salient piece of the abstract, which is (formatted for readability):

    Mechanical ventilation occurred in 4 (1.7%) vs 10 (4.0%) (RR, 0.41; 95% CI, 0.13-1.30; P = .17),
    intensive care unit admission in 6 (2.4%) vs 8 (3.2%) (RR, 0.78; 95% CI, 0.27-2.20; P = .79),
    and 28-day in-hospital death in 3 (1.2%) vs 10 (4.0%) (RR, 0.31; 95% CI, 0.09-1.11; P = .09).

    Size of ivermectin group was 241, size of control group was 249.

    The truly interesting question is how we should treat 3 (1.2%) deaths on ivermectin compared to 10 (4%) of deaths in the control group. It’s a tiny sample, but at the same time death is the only non-subjective variable here, another commenter mention that this is single-blind. Now I’m not an ivermectin supporter, but this study updates my thinking towards it being effective, not away from it.

    • There’s no sense stressing out about such an uncertain result, especially if you have to cherry pick amongst a bunch of other results to get there. What does “non-subjective” matter here? If there’s an issue with blinding and thus the measure of “progression to severe disease”, then naturally that will flow on to the interventions provided, and hence the result of whether the patient dies or not will be affected by subjective decisions as well.

      There’s really no a priori reason to think that patient death is a better measure. Statistically it’s worse because the numbers are smaller and there’s more confounding factors.

      • “There’s really no a priori reason to think that patient death is a better measure.” – better measure for what purpose? Not for doing statistical number crunching, sure. But anyone with a grasp of the medical implications of the outcomes could understand this much: If you want a measure of the real impact that a treatment may have on a covid patient’s ultimate outcome, you would be better off basing it on the “secondary outcomes” in this study (ICU, mechanical ventilation, death) rather than on the primary outcome (supplemental oxygen); better yet, use an integrated analysis of all outcomes. I will wager oxygen was chosen as the primary outcome because it was the only one that could provide the power they wanted within a feasible trial, not because it was the most important one in terms of ultimate patient outcome.

        Regardless and again, this is but one trial of many on the topic, with quite ambiguous results, and needs to be audited and pooled with others that can be audited.

        • A better measure for the efficacy of a procedure. You need to measure at an early point in the causal chain to reduce confounding variables. Suppose I say, “pfft, why 28 day deaths, why not eventual life expectancy?” Surely you can see the problem with that theory.

        • Zhou Fang: Precisely what confounding variables are you referring to? The study was a randomized trial and the randomization indicator is a causal instrumental variable for all outcomes, regardless of where they occur in the causal chain. Plus, you haven’t cited any data to indicate any outcome was confounded by pretreatment variables. The study does have other major problems: Treatment was not blinded, so there is the possibility that post-treatment patient care and outcome measurements were biased; also, the treatment as administered was not the one advised by many ivermectin advocates (e.g., as others here have noted, zinc was not part of the protocol and the treatment was not always administered early enough).

          But none of that determines which of the outcomes is most relevant to the patient. Long-term (including lifetime) outcomes are very important and ideally would be studied too; for example there can be severe sequelae of ICU treatments like mechanical ventilation. Those outcomes simply can’t be measured beyond a trial’s end and often are never accurately known due to increases in censoring and competing risks as time passes. But there is one lifetime-outcome indicator that the trial measured, and which is presumably immune from problems like lack of blinding: Death within 28 days, which carries rather profound and obvious implications about a patient’s quantity of life after treatment.

        • > Zhou Fang: Precisely what confounding variables are you referring to?

          Confounding variables like the fact that an unrelated hospital infection killed four patients in the control group and zero in the invermectin group.

        • Because the primary outcome is a *treatment* (specifically, the transfer to supplemental oxygen), the secondary outcome of death is *more* affected by lack of blinding, because the exposure is not just the decision to transfer to oxygen but also a whole host of other decisions to do with the full treatment of these patients.

          Invermectin patients were transferred to supplemental oxygen more often, so if that is a blinding issue and the investigators perceived that the invermectin patients were in health trouble, you are actually saying invermectin patients got better (or at least, more intensive) treatment (which goes with the idea that the invermectin patients saw more adverse events), which shoots a big hole in any mortality claim.

        • Replying to @Zhou Fang:

          >Confounding variables like the fact that an unrelated hospital infection killed four patients in the control group and zero in the invermectin group.

          So to quote from the article, ” Four patients in the control group died from nosocomial sepsis”. How sure are you that this is “unrelated” either to covid or to ivermectin? For all we know, ivermectin may prevent ‘nosocomical sepsis’, or covid-19 may encourage it. Are you familiar with the worm-hypothesis for ivermectin? It’s rather unusual that these four deaths just happened to only hit the control group, given that this was randomized, the probability that a fixed four “unrelated” deaths happen to be in the control group by chance is about 1/16.

          It should be obvious why “death” is a more important outcome than e.g. “the doctor, who knew what treatment arm the patient is in and makes decision based on what they think is best for the patient, decided to put the patient on oxygen”. There’s a reason investigators are blinded in good trials, but you don’t need the person declaring the patient to be dead to be blinded.

          > especially if you have to cherry pick amongst a bunch of other results to get there

          I didn’t have to cherry pick very far – the worse the outcomes, the better ivermectin seems to look. Similar numbers are there for the “mechanical ventilation” part, which shouldn’t have been affected by the “unrelated” deaths btw. But I was careful not to draw too many conclusions, is it forbidden to say that mildly cherry-picked data looks interesting?

        • Zhou Fang: You have invented an explanation that favors the reduction in deaths seen with ivermectin as an indirect effect resulting from lack of blinding, rather than a more direct ivermectin benefit. That’s not what I call confounding; but semantics aside, it’s good to imagine and propose possible explanations for observations, as long as you don’t start to believe your explanations (cf. Popper). Now see if you can invent an opposing explanation that favors a direct ivermectin reduction of death and explains the opposite oxygen effect as an indirect effect from lack of blinding. (Hint: it’s just as easy to make up that kind of explanation as the one you gave; and the data can’t tell us which if either is correct.) Either way, we should agree that the study has a problem due to lack of blinding, as well as other problems like failure to ensure early treatment start …
          https://twitter.com/alexandrosM/status/1495062648357937153?s=20&t=j8x6zd-6AQVcTf9F6_HyGg

          To repeat what I replied to Joshua: The point is that the trial is not very informative given its problems, making it incapable of settling anything and unworthy of the publicity and overinterpretation it has received. If you want to move on to the bigger picture into which it should be fit as a small part, read very carefully the Alexandros Marinos post on ivermectin trials that k.e.j.o. kindly provided, as it provides what I think constitutes sound scientific analysis of a medical controversy; then judge how the outcomes (primary and secondary) reported in the Lim et al. study would affect the analyses Marinos provides. I also advise reading as many of the comments as you can manage:
          https://doyourownresearch.substack.com/p/a-conflict-of-blurred-visions

      • Matty –

        Is there a cutoff point at which you’d say the secondary endpoints become uninteresting? If so, what is it?

        To go ad absurdum…suppose there’s a 100k person study. Primary outcomes (say whether or not someone reported a sore throat) show no effect from the treatment but two people died in the control arm whereas none died in the intervention arm.

        On the surface, deaths seem way more important than whether someone got a sore throat.

        Are the secondary outcomes interesting there? If not, where is the cutoff?

  8. They say 40% of the control deaths were due to other infections picked up at the hospital:

    Among the 13 deaths, severe COVID-19 pneumonia was the principal direct cause (9 deaths [69.2%]). Four patients in the control group died from nosocomial sepsis.

    So it could be ivermectin protects against some other fungal, bacterial, whatever infection. It could also be that the unblinded hospital staff was more hesitant to put the ivermectin patients on a ventilator and/or otherwise treated them differently. Or the unblinded patients had less anxiety if they knew they were getting the treatment.

      • I see it as: no matter what the results or sample size were the results would still be inconclusive.

        That is a problem in general for these studies that compare group A vs B.

        We need to see a what patterns show up in a dose response, timeseries, or something that could help us distinguish between the plethora of explanations.

        • What we need is to acknowledge basic pharmacology.
          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7404744/pdf/BCP-9999-na.pdf
          Concentrations achieved pharmacologicalyl are 3 logs too low to impact viral replication.

          Biomedical research isn’t just a bunch RCTs and observational studies founded on wild guesses.

          As far the data available, the vast majority of covid cases do not have detectable viremia. If we accept that, it does not make sense to use plasma levels of the drug as a quantitative indicator of the effectiveness anyway.

          What would matter is how much concentrates into the respiratory mucosa or wherever the virus is primarily located.

        • Bold claim. What is the precedent for an oral drug that is concentrated 1000x in pulmonary mucosa? And, actually, does it? It would be cheaper and faster to test that in animals and even in the lungs of people who took ivermectin and died anyway.

          Show the data supporting this extraordinary claim. Biomedical research isn’t a bunch of RCTs based on wild guesses.

        • What is the precedent for an oral drug that is concentrated 1000x in pulmonary mucosa?

          Well, the paper you cited goes on to cite a paper that says it was at least 10x more concentrated in various tissues than the blood after injection into cows.

          Other than that, pretty much every lipophilic drug or any substance that gets actively transported into cells.

        • To be clear, I really have no opinion on whether ivermectin works or not. I personally would never take it. But comparing in vitro IC50 to in vivo blood concentrations is not at all a convincing argument.

        • 10x concentration in tissues clearly isn’t enough. It would still be 100x too low to have an effect on the virus. Pharmacology matters. Here it is from people with requisite expertise:
          https://www.medrxiv.org/content/10.1101/2020.04.11.20061804v1
          https://www.science.org/content/blog-post/what-s-ivermectin

          Biomedical research isn’t just a bunch RCTs testing random and/or existing drugs founded on wild guesses. Expertise in medicinal chemistry and pharmacology matters. Also it is crucial for the Bayesian approach.

    • (Whatever happened to all those HCQ clinical studies?)

      The HCQ was being given to hospitalized people (who had likely already cleared the virus) at 2-4x the normal dose, thus causing widespread methemoglobinemia (which looks very similar to severe covid and is treated with vitamin c).

      If we trust the studies that did come out, this likely increased the mortality rate by 5-20%.

      Whether the original idea, of giving normal doses as soon as symptoms appear along with zinc, would have worked was never checked afaik.

      • > Whether the original idea..

        Wel, the original HCQ idea dude has been busy with some other stuff.

        https://www.science.org/content/article/scientists-rally-around-misconduct-consultant-facing-legal-threat-after-challenging

        > The letter reflects a concern that “legitimate scientific criticism can be squelched by behaviors that go beyond scholarly debate,” says University of Virginia social scientist Brian Nosek, one of its authors. Threats like these are a “substantial threat to science as a social system,” adds Nosek, who has led a push for greater replicability in science

        • If he turns out to have a history of fraud, that makes the whole debacle even dumber. The healthcare system took the idea of a fraud, made it much more dangerous, and then gave it to (millions?) of people when it was not even expected to have any benefit. This was worldwide.

          Where are the scientific safeguards (independent direct replication and comparing surprising predictions to future data) meant to prevent stuff like this? They are gone, replaced by NHST, peer review, and the evidence-based-medicine paradigm.

        • Yes. It isn’t a perfect world and unintended consequences undesirable outcomes do occur. At least outside the borders of that wonderful land of libertarian Shangri-la.

        • So you are saying the first treatments were early intubation based on anonymous rumors from China and HCQ as an antiviral given at much higher doses than normal after it was expected the virus was cleared.

          It is just people following fads. And fraud would be caught with standard independent replication, which is actively discouraged because that is “not novel”.

          We agree.

        • …and so the grifters moved on to ivermectin which was the next fad largely based on fraud and crappy preclinical evidence.

  9. I found this comment:

    I looked at this paper yesterday and the outcome is not new or surprising. Late stage study (Treatments started on average 5.1 days after exposure) and only 1000 participants total /w 1:1 allocation between the treatment and control arms. It should not be surprising to anyone this is way too under-powered from the outset to find a statistically significant (p-value <= 0.05) outcome even if you stipulate in advance one existed.

    This study is not even blinded. If you did a trial of either Pfizers or Mercks treatments in these conditions they wouldn't show a benefit either. Treatments that screw with virus obviously are not going to be of much use at the tail end of the viral stage. (Damage already done prior to treatment)

    If you look at table 2 the conclusion by far with the lowest p-value (0.09) is 3 deaths in the Ivermectin group and 10 in the control group. The second lowest p-value (0.17) is mechanical ventilation 4 ivermectin, 10 control. All of the other p-values are substantially higher. Other studies have had similar results.

  10. “Bayesian inference: Every analysis uses prior information; the only question is whether you want to acknowledge it explicitly.”

    What are the categories comprising prior information? Here are some:

    [1] That which I have seen.
    [2] That which I remember having seen.
    [3] The degree to which I trust my recollection in [2].
    [4] That which someone else has seen.
    [5] The degree to which I trust the account in [4].
    [6] The commonalities between diverse things that I have seen: e.g. ripe bananas are sweet; that fruit there looks like a banana but is brick-colored; it turns out it too is sweet (though less so).
    [7] The differences between similar things that I have seen: e.g. Steve and Gerry are twins and are in almost all respects indistinguishable; but Steve is more the trustworthy of the two; but Gerry is the more sentimental.

    The higher order classifications (of similarity and difference); and the second and third-order removed reports probably account for a good deal of what I actually believe; e.g. that Julius Caesar was murdered; that Winston Churchill was a melancholic — though I wasn’t witness to the murder; and I only knew Churchill from his radio broadcasts (in which there was not the least bit of melancholy). But remembered direct experience is the base of the pyramid of belief — there’s no forgetting the “experiment” I carried out putting my finger into the lamp-socket to see what the devil it was in there that liked to bite.

    If someone wants to persuade me of something and they bring in as “evidence” the strength of their belief, simpliciter — I say, well …. let’s talk about the *evidence* for that belief.

  11. The principal problem with all these ivermectin clinical trials are:
    (a) The prior plausibility is extremely low– the preclinical evidence shows an effect of the drug in culture requires 1000x concentration than what can be achieved pharmacologically. The rationale for the trials is basically due to it’s popularity from twitter, disinformation websites, politicians, fringe MDs, and conspiracy amplifiers… plus maybe some faked and/or extremely poorly designed/biased observational studies.
    (b) Opportunity cost– by spending so much time on ivermectin, we are effectively wasting resources, and missing the opportunity to identify from among 10^60 other possible molecules that may be effective. Yep, that’s 10^60. Scientists are spending an incredible amount time/money/effort/patient resources on 10^-60 of the possible therapies because of… twitter???

  12. In assessing the effectiveness of any drug/substance we are always assuming some biochemical homogeneity that doesn’t exist. As a matter of fact, there are no two individuals that are the same, biochemically speaking. Heck, even sterile mice used for most experiments are not that similar to each other. Sure, some basic pathways are known, but at a level deep down, which lies at the intersection of biophysics and biochemistry, there is so much unknown, that it’s mind boggling. And that’s only for the body. When it comes to brain and nervous system, forget it. We just started scratching the surface.
    So, can’t we simply accept that Ivermectin, vaccines, coffee, etc. ‘work’ for some people and not the others, while they may work partially in some third group?

    • Navigator:

      To go one step further, all these things are beneficial for some people in some settings and have negative effects in other settings. Beware the fallacy of the one-way bet. I expect that most effects, positive and negative, are much smaller than what has been reported in the literature.

      • Navigator –

        > So, can’t we simply accept that Ivermectin, vaccines, coffee, etc. ‘work’ for some people and not the others, while they may work partially in some third group?

        Are you kind of replicating a similar issue here when you assume that we can identity the “groups” that are differentially affected – or in other words, the single attribute attribute or group of attributes that correlate with different outcomes?

        We look for patterns to make meaning. At some point you have to draw some lines of distinction to identify patterns. We can never do it perfectly. How do we determine exactly where to draw the lines so a choice isn’t characterized as arbitrary? Is 0.05% any worse than any other line of distinction?

        • I don’t see why the idea that a treatment may work for some groups and not others is necessarily tied to the idea of having a cutoff probability. Regarding the first issue, I think many things will work for some groups and not others – more specifically, it will work differentially for different groups (even somewhat harmful for others). That remains true regardless of a cutoff or even in the absence of a cutoff. I prefer not to have a cutoff at all. It is simply not necessary and not even helpful. The reality is that there will be evidence of some strength regarding how well a treatment works and for whom. There never needs to be a cutoff declaring “success” or “failure.”

          Of course, decisions must be made. For example, Medicare will have to decide whether to pay for a treatment or not, and under what conditions. But can’t we separate the decision from the analysis? In fact, I think the decision depends on a number of factors – statistical, medical, economic, political, and social. And it should depend on a myriad of considerations. Analysis in all these dimensions helps. But what does not help, in my opinion, is for those analyses to reach any kind of binary classifications, such as “works” or “does not work.” I keep arguing that decision makers have a different role than analysts. I’d like them to use the analysis, but I think they should be accountable for making decisions – inevitably in the face of uncertainty and learning from new information. We expect too much from the analysts and too little from the decision makers.

        • I think Joshua’s point is that the information “this works for some people” is not actionable unless you have some way to determine whether the person being treated is one of those people. There is a bright line between “we decided to give the treatment” and “we decided not to give the treatment”. Sure sometimes you can try the treatment and then stop later if there seems to be no effect, but in patients who are rapidly heading towards death, you may not have that luxury of time. In the absence of a decision criteria, just knowing that sometimes a thing helps sometimes it hurts is useless, perhaps worse than useless if it keeps you from investigating other treatments.

        • Daniel
          Isn’t that a somewhat different issue? Always there will be a gap between the individual being treated and the evidence regarding how groups are affected. Is this individual really part of any particular group or not? I find myself in that position – I have a condition and the evidence that exists is quite inconclusive – especially without the individual data (thank you again, HIPAA). I somewhat fit some of the groups that have been identified but not entirely. Yet I must make a treatment decision. P-value cutoffs really don’t help with the decision for me. What would help more would be access to the data that exists.

        • Dale –

          > What would help more would be access to the data that exists.

          Sure, more access to more data can help. But I think the basic strutural problem remains.

          I think of my mother-in-law. 92 years old. She’s had multiple myeloma for some 25 years. My guess is that she’s a fairly extreme statistical outlier when you look from one angle. But anyone who knows her knows that she’s a tough bird. She perseveres in the face of adversitly like no one else I know. Never feels sorry for herself. She has daily rituals that sustain her reasons to live – cooks elaborate meals 2 or 3 times daily, has her wine every night, sometimes followed with a nightcap of a good whisky. In the last 8 years or so she’s been hit with a serious of additional medical setbacks – bone fractures, heart valve replacement, falls with copious amounts of blood due to the blood thinners to treat her constant afib…

          But when I think of the details of who she is, I feel maybe she isn’t a statistical outlier. When she experiences a setback I have learned to more or less expect that she’ll just rally whereas typically something like a hip fracture in someone her age could likely lead to a cascade of physical failing until death.

          As Daniel describes, my point is that there’s always going to be a difficulty in delineating where to draw lines to distinguish among the data. Pretty much all the lines could be considred arbitray in some sense. But then you have to make decisions. Do you take a medication or not? Have a surgery or not? At some point, don’t you have to assume a less than perfect cutoff point if you’re going to make a decision? Yes, imposing some binary categorization of “works” versus “doesn’t work” seems discordant with the data, often. But what is the real option if a decision – which is innherely binary – has to be made? You can hedge with treat for a while and see what happens, or wait for a while before performing surgery, but there can be an accompanying cost or risk there.

          (I realize now I probably didn’t add anything here Daniel didn’t already cover).

        • Joshua
          I guess I don’t understand what your point is. I agree with everything you say about the details (e.g., your mother in law’s case). I have been in the same situation. There is the question of whether to trust the imperfect data that describes the “group” I am closest to or to declare myself an outlier and follow a different protocol. But the question is who should make that decision and on what basis? I thought you were supporting the idea that the analysis should conclude things. I do know that the providers will follow the protocol – they have little choice (and legal liability if they don’t). But to the extent that I know I differ from the group they are basing their recommendation on, I want to have the right to make that decision. And, ideally, I’d like some better information on which to base my decision. Are you saying that you think the statistical analyst and/or medical provider should decide what “group” a patient belongs to and base their treatment on that?

        • I think some of the comments in this thread are very hand-wavy. I don’t think “depending on context, there are likely small positive or negative effects” is super helpful from a decision-making standpoint. Given the notion there are likely alternate therapies/interventions that are effective, and perhaps less costly, there needs to be ways to prevent companies marketing and overcharging for what amounts to be snake-oil. Something like:

          1) Define group receiving intervention
          2) Quantify direction and magnitude of net benefit to group receiving intervention
          3) Quantify certainty of net benefit
          4) Repeat for competing interventions
          5) Quantify incremental benefit for interventions and incremental costs to determine if intervention is “worth it” as frontline response, taking into account uncertainty of evidence
          6) Treat starting with most cost-effective intervention, and if no response, try the next less cost-effective intervention, and repeat until options are exhausted. Then try experimental therapies/snake-oil.

        • Dale –

          > Are you saying that you think the statistical analyst and/or medical provider should decide what “group” a patient belongs to and base their treatment on that?

          Isn’t that necessary at some level? Of course a patient has needed information to inform that analysis and is free to weight the different influences as they see fit. Ideally the decision would be in collaboration. It certainly is for me with my medical providers and when I run across those who aren’t interested in collaboration I find another. Of course, I take their expertise and experience very seriously and I also move on if they don’t seem informed about the different variables that can inform the grouping. I doubt you and I see this differently in practice?

        • Joshua
          Lost the thread above, so it is easier to respond here. Providers follow the protocol and that protocol is based upon the research that has been conducted. So, if the closest group to my case is group X, then their advice is based upon results of treatment options for group X. The fact that I differ from X in some ways (for which we have no data or results) they will discuss with me – if I point it out – but in the end they will stay with the protocol. I don’t fault them for that – they could hardly do otherwise. But I am not the typical patient and I think most people just assume that the recommendation is based on evidence related to their specific situation. Or, some deny the evidence entirely and insist on their own uniqueness.

          I think we agree on the substantive issues. Where we may disagree is on how we wish medical providers would medical analysts would act. When they provide advice or results I’d like to see them offer more uncertainty based on how I may deviate from the closest group X that has been studied. There are strong forces to deter that – legal, institutional, the hero myth (applied to doctors), and perhaps the strongest is from patients themselves. Many people want to believe their doctors are more certain than the evidence can support. I’d like to see them admit more uncertainty than they do. If nothing else, that would help support the idea that more clinical trial data needs to be made available.

          Medical decisions are difficult. The research (and resulting protocols) will always apply to aggregate groups and individual patients will never match those groups precisely. That tension is unavoidable. I just don’t want to sweep it under the rug by supporting the notion that it is ok for research to support more definitive recommendations than it should. It was this line in your comment that provoked me to respond:

          ” At some point you have to draw some lines of distinction to identify patterns. We can never do it perfectly. How do we determine exactly where to draw the lines so a choice isn’t characterized as arbitrary? Is 0.05% any worse than any other line of distinction?”

          Perhaps 0.05% (or 5%?) is no worse than any other line – but I still don’t like the idea that there should be any line. Yes, it’s a slippery slope, but that is the reality of the evidence.

        • Dale –

          Thanks for the response.

          There are strong forces to deter that – legal, institutional, the hero myth (applied to doctors), and perhaps the strongest is from patients themselves. Many people want to believe their doctors are more certain than the evidence can support. I’d like to see them admit more uncertainty than they do. If nothing else, that would help support the idea that more clinical trial data needs to be made available.

          Well put, and I agree.

          Medical decisions are difficult. The research (and resulting protocols) will always apply to aggregate groups and individual patients will never match those groups precisely. That tension is unavoidable. I just don’t want to sweep it under the rug by supporting the notion that it is ok for research to support more definitive recommendations than it should. It was this line in your comment that provoked me to respond:

          I think that what you’re picking up on is a more general frustation that I have, in that I feel there’s a general tendency for people to complain about how things are done wrong without a full appreciation for how difficult and complicated it is do to things right. That frustration has grown somewhat for me thoughout the pandemic. I’m alluding the the dynamic where public health officials lose out no matter what happens. If many people are sick and die, they didn’t do enough and if few people are sick and die, they did too much. That isn’t to say that they should be given blanket immunity – just to say that, well, medicine is hard.

          So maybe I’m forcing that frustration onto this discussion. But yes, I fully agree that there’s too much of a tendency towards binary analyses. And I agree that there are some problematic reasons for that, on top of just a more general tendency humans have to want to simplify complicated decision-making and to go with a simplistic answer to satisfy a need to compensate for uncertainty. And I suspect that you, like me, are more comfortable with uncertainty than most – which would come into play here.

          > Perhaps 0.05% (or 5%?) is no worse than any other line – but I still don’t like the idea that there should be any line. Yes, it’s a slippery slope, but that is the reality of the evidence.

          I’m uncomfortable with that also. I’ll offer this tentative thought: I agree that 0.05% shouldn’t be used as a single definitive point to determine statistical “significance” independent of context. That does seem so arbitrary as to be meaningless. But maybe it is nonetheless useful to pick some point as a theoretical cut off, and then add or subtract from that as you can assess the relevant influences. It kind of goes back to the “all models are wrong but some are useful.”

  13. Seems like this post is getting a lot of play here

    https://doyourownresearch.substack.com/p/a-conflict-of-blurred-visions?utm_source=url

    I’m surprised so many people seem to like it. Ivermectin, I dunno, maybe it does something? But this post is pretty bad.

    It seems to be mainly rebuttal of Scott Alexander’s blog meta-analysis of ivermectin, primarily on the basis that if you don’t exclude the studies he’s excluded, the evidence looks very certainly positive. However, he doesn’t interrogate the actual reason the studies are being excluded. His arguments are

    1. GidMK recommended much of the exclusion, and GidMK has a string of bad opinions (he does look pretty silly)
    2. There are lots of junk studies in all medical literature that point in all sorts of directions, and medical science sees frequent reversals. It’s therefore unfair to single out pro-ivermectin junk studies.

    But the thing is, there are specific reasons for the exclusion of the ivermectin studies–nonexistent patients, duplicate patients, percentages that don’t add up to 100%, impossible computations, etc. Now that those have been pointed out, I really can’t think of a reason to not exclude them. He seems to be arguing for a kind of ensemble averaging of evidence, and so by that reasoning, excluding pro-ivermectin fraud biases the average. I agree that the people he’s calling out are probably selective in what studies they examine. But I don’t expect anything meaningful out of averaging a bunch of garbage. Maybe you should just check everything for fraud, including anti-ivermectin fraud too?

    He doesn’t seem to be willing to or able to do that though. Generally, he seems to advocate for a kind of democratize the meta-analysis, trust the overall numbers but not the expert interpretations approach.

    If we believe a superintelligent AI could find a way to reliably subvert RCTs to get its desired outcomes in a short period of time, then the only question is whether a big pharmaceutical company has enough brain power to accomplish something similar after decades of trying. That would certainly explain the reversals.

    Not all is lost though. Biology has solved this problem before we knew it existed, and we ourselves have solved it over and over again in areas other than medicine. The solution to sensory monoculture is the exact opposite: cultivating sensory pluralism.

    What does this have to do with ivermectin? Well, here’s how I process the state of the evidence. There are several different lines of evidence pointing us in the direction that ivermectin helps significantly with COVID. Any of those being true is sufficient, and several of them could be wrong, but the conclusion would still hold. This is how I like my theses, supported by multiple independent lines of evidence.

    This paragraph makes no sense to me. The average of garbage is garbage. Just looking at the number of ivermectin studies with a positive effect, or pulling the final mean and standard error from every article published anywhere and averaging them together tells me nothing. You don’t form cogent opinions by feeling the general vibe from which side has the most and loudest voices. That’s even more corruptible by conspiratorial actors than expert opinion is. GidMK may be a fool, but he makes a good point in his title (haven’t read the article) “meta-analyses based on summary data alone are inherently unreliable”. Bad meta-analysis and trust in numbers over judgement is how we got all our bad science to begin with!

    What I do think is a coherent way to synthesize information from multiple studies is to indeed “Do Your Own Research.” I don’t mean aggregating lots of topline numbers — I mean, go through a lot of studies, form a judgement of their methodology, check their math, and pool the raw data together if you can find it. If other meta-analyses of the form are biased because they only exclude pro-ivermectin studies with obvious data irregularities, you should go through all the studies, and see if you find the same problem in the anti-ivermectin studies.

    I don’t want to do it. That sounds like it takes a long time, and it’s boring, and I kind of just don’t want to. So I will just trust my doctor, and maybe a couple of researchers that I personally trust. That’s reality–everyone has a finite amount of time, sometimes you have to trust. Note that I’m not advocating that you trust “experts”, because yeah that makes you a patsy for powerful, interested establishments. I don’t trust “the statisticians”, I trust Andrew Gelman, Sander Greenland and co because I’ve done close readings of their work in the past and judged it to be good. That’ll occasionally still get me in trouble, but it’s certainly better than abandoning rigor and specific examination altogether to just uncritically ingest all the numbers and let the overall vibe take me where it may.

    This is neither here nor there, but Alexandros argues, contrary to GidMK et al, that the amount of fraud in ivermectin isn’t par for the course. In doing so, he compares the fraction of published studies with irregularities to the fraction of submitted trials with irregularities. Some part of the peer review process is to weed out the garbage and irregularities–I’m not so nihilistic as to think it doesn’t do anything for that!

    • Somebody:

      I agree. One doesn’t need to trust Scott Alexander and Gideon Meyerowitz-Katz to read what they wrote and think they have good reasons for wanting to exclude those studies.

      Another way of saying it is that meta-analysis, like any statistical method, can be thought of in two ways: as an estimate of the truth, and as an approach to data reduction. The argument in favor of the first meta-analysis that Merinos shows, coming from that ivermectin website, is not that it’s an estimate of the truth—it’s a horrible way to estimate the truth, to include a bunch of junk studies—but rather that it’s a summary of what’s out there in that particular set of studies. The trouble is that there’s a temptation to slide back and forth, to do a data reduction because that’s something we can do, and then to treat that as an estimate of the truth. That’s what I was getting at in my GIGO post on the nudge meta-analysis earlier this month.

  14. I would like to know, as a statistician, if you had only this data and nothing else available, if you got COVID would you want to be in the treatment group or the control group?

    I think this is a very simple question that should be easy to answer definitively by any statistician.

    • Adriaan:

      As a statistician, if all I had were the data from this Malaysia study, and I had to decide what to do, I’d choose the control, because the control did better than the treatment under the proximate outcome. But I don’t think that it makes sense for you to say that I should be able to answer this “definitively,” as a key point of this discussion is that there’s a lot of uncertainty in these conclusions. See the P.S. in the above post.

Leave a Reply

Your email address will not be published. Required fields are marked *