Skip to content

Doubts about that article claiming that hydroxychloroquine/chloroquine is killing people

James Watson (no, not the one who said that cancer would be cured by 2000, and not this guy either) writes:

You may have seen the paper that came out on Friday in the Lancet on hydroxychloroquine/chloroquine in COVID19 hospitalised patients. It’s got quite a lot of media attention already.

This is a retrospective study using data from 600+ hospitals in the US and elsewhere with over 96,000 patients, of whom about 15,000 received hydroxychloroquine/chloroquine (HCQ/CQ) with or without an antibiotic. The big finding is that when controlling for age, sex, race, co-morbidities and disease severity, the mortality is double in the HCQ/CQ groups (16-24% versus 9% in controls). This is a huge effect size! Not many drugs are that good at killing people.

This caught my eye, as an effect size that big should have been picked up pretty quickly in the interim analyses of randomized trials that are currently happening. For example, the RECOVERY trial has a hydroxychloroquine arm and they have probably enrolled ~1500 patients into that arm (~10,000 + total already). They will have had multiple interim analyses so far and the trial hasn’t been stopped yet.

The most obvious confounder is disease severity: this is a drug that is not recommended in Europe and the USA, so doctors give it as “compassionate use”. I.e. very sick patient, so why not try just in case. Therefore the disease severity of the patients in the HCQ/CQ groups will be greater than the controls. The authors say that they adjust for disease severity but actually they use just two binary variables: oxygen saturation and qSOFA score. The second one has actually been reported to be quite bad for stratifying disease severity in COVID. The biggest problem is that they include patients who received HCQ/CQ treatment up to 48 hours post admission. This means that someone who comes in OKish and then deteriorates rapidly could be much more likely to get given the drug as compared to someone as bad but stable. This temporal aspect cannot be picked up a single severity measurement.

In short, seeing such huge effects really suggests that some very big confounders have not been properly adjusted for. What’s interesting is that the New England Journal of Medicine published a very similar study a few weeks ago where they saw no effect on mortality. Guess what, they had much more detailed data on patient severity.

One thing that the authors of the Lancet paper didn’t do, which they could have done: If HCQ/CQ is killing people, you would expect a dose (mg/kg) effect. There is very large variation in the doses that the hospitals are giving (e.g. for CQ the mean daily dose is 750 but standard deviation is 300). Our group has already shown that in chloroquine self-poisoning, death is highly predictable from dose (we used stan btw, very useful!). No dose effect would suggest it’s mostly confounding.

In short, it’s a pretty poor dataset and the results, if interpreted literally, could massively damage ongoing randomized trials of HCQ/CQ.

I have not read all these papers in detail, but in general terms I am sympathetic to Watson’s point that statistical adjustment (or, as is misleadingly stated in the cited article, “controlling for” confounding factors) is only as good as what you’re adjusting for.

Again speaking generally, there are many settings where we want to learn from observational data, and so we need to adjust for differences between treated and control groups. I’d rather see researchers try their best to do such adjustments, rather than naively relying on pseudo-rigorous “identification strategies” (as, notoriously, here). So I applaud the authors for trying. I guess the next step is to look more carefully at pre-treatment differences between the two groups.

Are the (de-identified) data publicly available? That would help.

Also, when I see a paper published in Lancet, I get concerned, as they have a bit of a reputation for chasing headlines. I’m not saying that it is for political reasons that they published a paper on the dangers of hydroxychloroquine, but this sort of thing is always a concern when Lancet is involved.

P.S. More here.


  1. Carlos Ungil says:

    > The big finding is that when controlling for age, sex, race, co-morbidities and disease severity, the mortality is double in the HCQ/CQ groups (16-24% versus 9% in controls). This is a huge effect size!

    Aren’t those the raw mortality outcomes? If I understand correctly, the hazard ratios they calculate point to an increase in mortality but far from double:

    “Compared with the control group (9·3%), hydroxychloroquine alone (18·0%; HR 1·335, 95% CI 1·223–1·457), hydroxychloroquine with a macrolide (23·8%; 1·447, 1·368–1·531), chloroquine alone (16·4%; 1·365, 1·218–1·531), and chloroquine with a macrolide (22·2%; 1·368, 1·273–1·469) were independently associated with an increased risk of in-hospital mortality.”

    • zbicyclist says:

      Good point!

      So, the huge effect (16-24% versus 9%) is substantially lessened with weak controls (“they use just two binary variables: oxygen saturation and qSOFA score. The second one has actually been reported to be quite bad for stratifying disease severity in COVID.”)

      Then we have the NEJM study, with better controls, which doesn’t find a worthwhile effect on mortality. We may have a another textbook example of where weak controls produce misleading results.

      BTW, someone near and dear to me is on HCL for one of its approved uses, and gets a bit nervous with talk of such severe side effects.

      • Carlos Ungil says:

        FWIW, the endpoint in the NJEM study is “time to intubation or death” instead of mortality so I don’t know how comparable the results are and the 95% confidence intervals are not so distinct: [0.82 1.32] overlaps with [1.22 1.46] for patients on HCQ alone ([1.37 1.53] when a macrolide was added, as far as I can see the NJEM study doesn’t split the results).

        • Dom says:

          The breakdown of the composite endpoint is available in the appendix (page 9) but this is the raw data, before PSM.
          Death: 108
          Intubation: 154

          Death: 58
          Intubation: 26

      • Martha (Smith) says:

        zbicyclist said,
        “BTW, someone near and dear to me is on HCL for one of its approved uses, and gets a bit nervous with talk of such severe side effects.”

        Sorry to hear this. Adding to the problem: I heard on the radio recently that HCL is an important drug for treating lupus — but with increased demand for treatment of coronavirus, lupus patients are finding that HCL is either harder to get than it used to be, or much more expensive.

    • James Watson says:

      Yes – you’re right, mistake on my part. I was looking at the supplementary (propensity score matching, tables S7 A-D) where they get up to twice the mortality. A hazard ratio of 1.35 is still pretty huge though.

    • valerie kreutz says:

      The problem is the design:we do not know in what phase of the disease everybody is when HCQ was started ! Brazilian protocol is just for phase 1 of the disease and a very low dosage for a short period just 5 days with aziyromicina 500 mg per day also 5 days. Therefore these results cannot be extrapolate to what is being available for medical prescription by the health ministry of Brazil . Our country has lots of experience with these drug. The problem is the political side that the usage of this medication is envolves in Brazil

    • Charles Carter says:

      Don’t follow-
      “(18·0%; HR 1·335, 95% CI 1·223–1·457)” is [raw mortality]; [adjusted HR], 95% CI [adjusted HR interval] ???
      Why would they mix raw and adjusted figures? Is that common? I’ve never noted it.

  2. Anoop says:

    They do talk about the dose-response effect in the Limitation section.

    Their conclusion is spot on ” These findings suggest that these drug regimens should not be used outside of clinical trials and urgent confirmation from randomized clinical trials is needed.”

  3. Pablo Verde says:

    I like Watson’s point regarding the limits of adjusting for the patient’s severity in this observational study. However, by reading the statistical methods of the paper I was surprised to see that the data have not been analyzed using a hierarchical model.

    This is a multi-center and multi-national observational study. The data were collected at the level of the hospital and we can expect that the variation between hospitals’ mortality is important. Moreover, the volume of COVID-patients treated within hospitals could be an indicator of the quality of the hospital’s service.

    I bet that by accounting for hospital effects as potential sources of mortality rate variation the results presented by the authors could be diluted.

  4. Steve Sailer says:

    Hasn’t the hope always been more that hydroxychloroquine might work as a prophylactic or early, mild stage therapy rather than as a late stage Hail Mary pass lifesaver?

    • zbicyclist says:

      (I am not a medical professional). I thought it was the reverse; hydroxychloroquine is given to those with autoimmune diseases (RA, lupus …) to keep the immune system in check. In the later stages of COVID, there can be an overreaction of the immune system, so HCQ was a reasonable choice to try and control this. But in the early stages, you don’t want to suppress the immune system.

      But perhaps someone more knowledgeable will chime in here.

      • Steve Sailer says:

        Somebody should explain that to Trump and all the hospital doctors who started taking H. as a prophylactic: they are taking it wrong! Of course, if they were taking it right, it might kill them, but they are still taking it wrong.

        Seriously, this study doesn’t seem to have much relevance to the Trump controversy, even though the Trump controversy is why this study has attracted so much attention.

        • Henry Garante says:

          (I’m also not a medical professional but I’m following these subjet closely out of professional duty) I can confirm what valerie kreutz said above: hydroxychlorochine is seen by some doctors as an antiviral drug, just as the antibiotic azithromicin would also have some antiviral properties, as strange as it may seem. The viral load of covid-19 patients is high in the early days of illness. After 7-10 days, it drops sharply. Therefore, these doctors advise treatment immediately after the onset of symptoms and said from the beginning that it’s of no use afterwards. As valerie kreutz said, this is unfortunately a very politicised affair, in France too. Doctors who wanted to introduce this treatment (which has already been tested on other coronaviruses) are under enormous pressure. You can find information on the main French institute involved in this controversy here:

    • Luca Beltrame says:

      That’s what studies like the ones ongoing at the U of Minnesota (publication pending for at least one of them, according to the PI) want to see.

  5. tc says:

    According to Table 2, around 8% of the control group eventually went on mechanical ventilation, as compared to 20% of the HCQ group. This is a huge difference. The paper classifies this as an outcome, but there is no reason to believe that HCQ causes this kind of difference in ventilator use.

    The simplest explanation is that the HCQ group was sicker.

  6. Andre says:

    Forget HCQ, put that aside. Put standard medications in the hands of doctors, anti-biotics, anti-inflammatory, anticoagulants, antipyretics, etc. that everybody knows are used to alleviate symptoms but are no cure to Sars-cov2.
    Then, on any given day there will be hospitalized patients where doctors decide they don’t need any medication, or much more severe cases where all medication is needed.
    If you don’t have good controls and good detailed data to account for that, the end result will be exactly what this study shows, i.e. the more medication is given, the more patients die.

  7. Alex Kravetz says:

    I think we can all agree there is likely some confounding by indication, ie the HCQ group was likely sicker at baseline. That being said, the direction of the mortality effect makes it pretty clear: HCQ is certainly not the miracle people are claiming. And likewise, the rate of ventricular arrhythmias should give any clinician pause before adding this drug on for a crashing patient.

    • Luca Beltrame says:

      I don’t think anyone is claiming it is a miracle (like with remedisivir, I expect a rather small effect size, if any), but the application in this study it is not what the infamous Raoult paper was proposing (which is yet unanswered, as far as I can tell): does early post-exposure prophylaxis work?

      I don’t think there are proper randomized trials with published data on that yet.

      • Steve Sailer says:


        I’d be very interested in observational studies of people who were already taking H. for lupus, rheumatoid arthritis, or malaria and what happened to them regarding CV. Kaiser Permanente has about 10 million clients, so they could probably come up with a decent non-randomized overview pretty fast, although the CV rate isn’t very high on the West Coast where KP is centered.

        • Martha (Smith) says:

          Steve said,
          “I’d be very interested in observational studies of people who were already taking H. for lupus, rheumatoid arthritis, or malaria and what happened to them regarding CV.”

          Yes, such studies might be worthwhile (although, as you suggest re Kaiser Permanence, it is important to look for and take into account any possibly confounding factors).

        • Ricardo Silva says:

          I don’t know where to look for the actual data, but an interesting starting point might be the city of Manaus, Brazil. Higher incidence of malaria than a typical place of similar climate as it’s in the middle of the rainforest (so perhaps above average use of antimalarial drugs like HCQ). Turned out one of the worst affected cities in the world, mass graves and all. Many confounding factors, including baseline healthcare quality, may make it hard to get to any interesting conclusions though.

  8. Njnnja says:

    I’ve been coming to this blog for years for the highest quality analysis “behind the headlines” and I think this is an example of as good as it gets.

    But it looks like the WHO cancelled the running study, relying in part on the Lancet study. Score 1 for Thomas Kuhn.

  9. Vijay Gupta says:

    Just to focus on the model specification. Model specification for the regression has no interaction terms. See for impact on bias of coefficients. Quote “misspecified models without interaction terms resulted in up to 8.95 fold bias in estimated coefficients.” Relevant variables are excluded: anti-virals, oxygen use, & blood group. This will cause major bias. The Ma 1 NEJM study on the same database does includes information on other variables that could have been included in the Lancet study.

    Also On page 4 of study: ” if the patient’s electronic health record did not include information on a clinical characteristic, it was assumed that the characteristic was not present.” (This means that if there is no data on hypertension, then it is recorded as “no hypertension.)

  10. Silv says:

    There is some very strange things going on. The address of the company that seems to have been provided the data seems to have the same address as a Regis (rented office) Maybe they are on the same floor, maybe not. They seem to have a bunch of other websites:
    Surgical Outcomes / Surgisphere / vascularoutcomes / QuartzClinical
    Is this company having legitimate data? I don’t know.

    • Gary Wolf says:

      Going back over some of the public blog posts and comments about this now discredited study. This is the earliest I’ve found that used public info available about Surgisphere to question legitimacy of the data. May well find an early one, but thought it was nice to note.

  11. Silv says:

    Here is the address by thee way. 875 N Michigan Avenue, 31st Floor, Chicago, IL 60611

  12. Ko Rinne says:

    You have a great understanding of the limitations of the study, it’s very interesting to read you! Could you write a response to the Lancet quite quickly? WHO is suspending the ongoing studies, and France is also starting the procedure of suspension(16 trials). It is quite urgent I think, and you have all the arguments to convince that this study is bad (sorry for my terrible English, I am French-speaking (Switzerland)

    deepl translate of the document :
    The ANSM was informed of the position of the Scientific Committee of the international Solidarity trial in connection with the WHO on the suspension of the inclusion of new patients who were to be treated with hydroxychloroquine, pending a global re-evaluation of the benefit/risk of this molecule in clinical trials.

    In this context and as a precautionary measure, we have initiated a procedure with sponsors evaluating hydroxychloroquine to suspend the inclusion of patients in clinical trials conducted in France. It will take effect after a 24-hour objection period.
    Patients currently being treated with hydroxychloroquine as part of these clinical trials will be able to continue treatment until the end of the protocol.

    Since the beginning of the COVID-19 epidemic, we have authorized 16 clinical trials evaluating hydroxychloroquine.
    This is pending new data on the use of hydroxychloroquine in COVID-19 patients.
    It should be noted that only the results of robust randomized trials of hydroxychloroquine, whether or not in combination with azithromycin, can provide evidence of its efficacy and safety.

    • Al says:

      I wrote the response here, I will probably add more stuff, basically they only use 2 very coarse indicators for disease severity,making all their propensity matching useless.

      making their analysts fundamentally flawed,no different than the VA study

      • Ko Rinne says:

        I’m not a statistician, but I know a little about the clinic, that’s exactly what bothered me from the beginning, but I couldn’t find the right variables confounding and that’s the problem of assessing the severity of the disease, great job!

        Concerning co-morbidities, it seems to be the same between groups for all the participants. but very often the elderly patients have several co-morbidities, and it doesn’t seem to me that they have taken them into account, because here the prognosis is also worse because they can quickly become very unstable (example of reference )

        Nothing about cancers either, which are also common in the elderly
        while cancer increase the risk of death from infections.

        I think that these variables should be taken into account before comparing groups.

        But do like you think ! Thanky you so much !

      • Carlos Ungil says:

        It’s interesting to see both analysis side by side and the lancet paper may be “under-matched” and as a result the effect is over-estimated.

        But saying that “In Mehra the HR for HCQ is 1.335, while in Geleris the HR is around 1 – demonstrating neutral effect” seems too strong given the level of uncertainty: (hazard ratio, 1.04; 95% CI, 0.82 to 1.32). By the way, you’re comparing different endpoints (I’ve not looked in detail and it may strengthen or weaken you point, or be irrelevant, but I think it’s worth mentioning).

        That’s even more true for the JAMA study where “no significant differences in mortality were found between patients receiving hydroxychloroquine + azithromycin (adjusted HR, 1.35 [95% CI, 0.76-2.40]), hydroxychloroquine alone (adjusted HR, 1.08 [95% CI, 0.63-1.85]), or azithromycin alone (adjusted HR, 0.56 [95% CI, 0.26-1.21]), compared with neither drug”

        1.35 [95% CI, 0.76-2.40] doesn’t really contradict 1·447 [1·368–1·531] and the same goes for 1.08 [95% CI, 0.63-1.85] vs 1·335 [1·223–1·457].

        “despite this poorly analized study, it doesn’t kill patients, it just has no beneficial effect”

        It may or it may not kill patients, the studies that you present as more reliable don’t tell us much about that.

  13. Tom Parke says:

    Oh I retract that post!
    Turns out my statistical intuition is well off….

Leave a Reply