Skip to content
 

Is JAMA potentially guilty of manslaughter?

No, of course not. I would never say such a thing. Sander Greenland, though, he’s a real bomb-thrower. He writes:

JAMA doubles down on distortion – and potential manslaughter if on the basis of this article anyone prescribes HCQ in the belief it is unrelated to cardiac mortality:

– “compared with patients receiving neither drug cardiac arrest was significantly more likely in patients receiving hydroxychloroquine+azithromycin
(adjusted OR, 2.13 [95% CI, 1.12-4.05]), but not hydroxychloroquine alone (adjusted OR, 1.91 [95% CI, 0.96-3.81]).”

– never mind that the null is already not credible… see
Do You Believe in HCQ for COVID-19? It Will Break Your Heart and Hydroxychloroquine-Triggered QTc-Interval Prolongations in COVID-19 Patients.

I’m not so used to reading medical papers, but I thought this would be a good chance to learn something, so I took a look. The JAMA article in question is “Association of Treatment With Hydroxychloroquine or Azithromycin With In-Hospital Mortality in Patients With COVID-19 in New York State,” and here are its key findings:

In a retrospective cohort study of 1438 patients hospitalized in metropolitan New York, compared with treatment with neither drug, the adjusted hazard ratio for in-hospital mortality for treatment with hydroxychloroquine alone was 1.08, for azithromycin alone was 0.56, and for combined hydroxychloroquine and azithromycin was 1.35. None of these hazard ratios were statistically significant.

I sent along some quick thoughts, and Sander responded to each of them! Below I’ll copy my remarks and Sander’s reactions. Medical statistics is Sander’s expertise, not mine, so you’ll see that my thoughts are more speculative and his are more definitive.

Andrew: The study is observational not experimental, but maybe that’s not such a big deal, given that they adjusted for so many variables? It was interesting to me that they didn’t mention the observational nature of the data in their Limitations section. Maybe they don’t bother mentioning it in the Limitations because they mention it in the Conclusions.

Sander: In med there is automatic downgrading of observational studies below randomized (no matter how fine the former or irrelevant the latter – med RCTs are notorious for patient selectivity). So I’d guess they didn’t feel any pressure to emphasize the obvious. But I’d not have let them get away with spinning it as “may be limited” – that should be “is limited.”

Andrew: I didn’t quite get why they analyzed time to death rather than just survive / not survive. Did they look at time to death because it’s a way of better adjusting for length of hospital stay?

Sander: I’d just guess they could say they chose to focus on death because that’s the bottom line – if you are doomed to die within this setting, it might be arguably better for both the patient in suffering (often semi-comatose) and terminal care costs to go early (few would dare say that in a research article).

[This doesn’t quite answer my question. I understand why they are focusing on death as an outcome. My question is why don’t they just take survive/death in hospital as a binary outcome? Why do the survival analysis? I don’t see that dying after 2 days is so much worse than dying after 5 days. I’m not saying the survival analysis is a bad idea; I just want to understand why they did it, rather than a more simple binary-outcome model. — AG]

Andrew: The power analysis seems like a joke: a study is powered to detect a hazard rate of 0.65 (i.e, 1.5 if you take the ratio in the other direction). That’s a huge assumed effect, no?

Sander: I view all power commentary for data-base studies like this one as a joke, period, part of the mindless ritualization of statistics that is passed off as needed for “objective standards”. (It has a rationale in RCTs to show that the study was planned to that level of detail, but still has no place in the analysis.)

Andrew: I can’t figure out why they include p-values in their balance table (Table 1). It’s not a randomized assignment so the null hypothesis is of no interest. What’s of interest is the size and direction of the imbalance, not a p-value.

Sander: Agreed. I once long ago argued with Ben Hansen about that in the context of confounder scoring, to no resolution. But at least he tried his best to give a rationale; I’m sure here it’s just another example of ritualized reflexes.

Andrew: Figure 2 is kinda weird. It has those steps, but it looks like a continuous curve. It should be possible to make a better graph using raw data. With some care, you should be able to construct such a graph to incorporate the regression adjustments. This is an obvious idea; I’m sure there are 50 biostatistics papers on the topic of how to make such graphs.

Sander: Proposals for using splines to develop such curves go back at least to the 1980s and are interesting in that their big advantage comes in rate comparisons in very finite samples, e.g., most med studies. (Technically the curves in Fig. 3 are splines too – zero-order piecewise constant splines).

[But I don’t think that’s what’s happening here. I don’t think those curves are fits to data; I’m guessing these are just curves from a fitted model that have been meaninglessly discretized. They look like Kaplan-Meier curves but they’re not. — AG]

Andrew: Table 3 bothers me. I’d like to see the unadjusted and adjusted rates of death and other outcomes for the 4 groups, rather than all these comparisons.

Sander: Isn’t what you want in Fig. 3 and Table 4? Fig. 3 is very suspect for me as the HCQ-alone and neither groups look identical there. I must have missed something in the text (well, I missed a lot). Anyway I do want comparisons in the end, but Table 3 is in my view bad because the comparisons I’d want would be differences and ratios of the probabilities, not odds ratios (unless in all categories the outcomes were uncommon, which is not the case here). But common software (they used SAS) does not offer my preferred option easily, at least not with clustered data like theirs. That problem arises again in their use of the E-value with their odds ratios, but the E-value in their citation is for risk ratios. By the way, Ioannidis has vociferously criticized the E-value in print from his usual nullistic position, and I have a comment in press criticizing the E-value from my anti-nullistic position!

Andrew: Their conclusion is that the treatment “was not significantly associated with differences in in-hospital mortality.” I’d like to see a clearer disentangling. In the main results section, it says that 24% of patients receiving HCQ died (243 out of 1006), compared to 11% of patients not receiving HCQ (49 out of 432). The statistical adjustment reduced this difference. I guess I’d like to see a graph with estimated difference on the y-axis and the amount of adjustment on the x-axis.

Sander: That’s going way beyond anything I normally see in the med lit. And I’m sure this was a rush job given the topic.

[Yeah, I see this. What I really want to do is to make this graph in some real example, then write it up, then put it in a textbook and an R package, and then maybe in 10 years it will be standard practice. You laugh, but 10 years ago nobody in political science made coefficient plots from fitted regressions, and now everyone’s doing it. And they all laughed at posterior predictive checks, but now people do that too. It was all kinds of hell to get our R-hat paper published back in 1991/1992, and now people use it all the time. And MRP is a thing. Weakly informative priors too! We can change defaults; it just takes work. I’ve been thinking about this particular plot for at least 15 years, and at some point I think it will happen. It took me about 15 years to write up my thoughts about Popperian Bayes, but that happened, eventually! — AG]

Andrew: This would be a good example to study further. But I’m guessing I already know the answer to the question, Are the data available?

Sander: Good luck! Open data is an anathema to much of the med community, aggravated by the massive confidentiality requirements imposed by funders, IRBs, and institutional legal offices. Prophetically, Rothman wrote a 1981 NEJM editorial lamenting the growing problem of these requirements and how they would strangulate epidemiology; a few decades later he was sued by an ambulance-chasing lawyer representing someone in a database Rothman had published a study from, on grounds of potentially violating the patient’s privacy.

[Jeez. — AG]

Andrew: I assume these concerns are not anything special with this particular study; it’s just the standard way that medical research is reported.

Sander: A standard way, yes. JAMA ed may well have forced everything bad above on this team’s write-up – I’ve heard several cases where that is exactly what the authors reported upon critical inquiries from me or colleagues about their statistical infelicities. JAMA and journals that model themselves on it are the worst that I know of in this regard. Thanks I think in part to the good influences of Steve Goodman, AIM and some other more progressive journals are less rigid; and most epidemiology journals (which often publish studies like this one except for urgency) are completely open to alternative approaches. One, Epidemiology, actively opposes and forbids the JAMA approach (just like JAMA forbids our approach), much to the ire of biostatisticians who built their careers around 0.05.

[Two curmudgeons curmudging . . . but I think this is good stuff! Too bad there isn’t more of this in scientific journals. The trouble is, if we want to get this published, we’d need to explain everything in detail, and then you lose the spontaneity. — AG]

P.S. Regarding that clickbait title . . . OK, sure, JAMA’s not killing anybody. But, if you accept that medical research can and should have life-and-death implications, then mistakes in medical research could kill people, right? If you want to claim that your work is high-stakes and important, then you have to take responsibility for it. And it is a statistical fallacy to take a non-statistically-significant result from a low-power study and use this as a motivation to default to the null hypothesis.

30 Comments

  1. Joshua says:

    Andrew –

    What is the source of the first set of statements by Greenland that you included in this post? Was it in correspondence with you?

      • Joshua says:

        Andrew –

        Thanks. Also, not related but I hope you saw this (specifically, last two sentences):

        > Others might have proceeded with more caution or perhaps waited to confirm these results with a larger, more rigorous trial. Raoult likes to think of himself as a doctor first, however, with a moral obligation to treat his patients that supersedes any desire to produce reliable data. “We’re not going to tell someone, ‘Listen, today’s not your lucky day, you’re getting the placebo, you’re going to be dying,’” he told me. He believes it to be unnecessary, in addition to being unethical, to run randomized controlled trials, or R.C.T.s, of treatments for deadly infectious diseases. If these have become the accepted standard in biomedical research, Raoult contends, it is only because they appeal to statisticians “who have never seen a patient.” He refers to these scientists disdainfully as “methodologists.”

        https://www.nytimes.com/2020/05/12/magazine/didier-raoult-hydroxychloroquine.html?action=click&module=Top%20Stories&pgtype=Homepage

        • dhogaza says:

          ‘ “We’re not going to tell someone, ‘Listen, today’s not your lucky day, you’re getting the placebo, you’re going to be dying,’” he told me.’

          Thus the *double*-blind model for trials.

          The implication that you’d tell people getting the placebo that they’re getting it and will die as a result in itself could bias the study results. Hopefully he’s not so unethical that he’d really do this. Makes one wonder, though. Well, his hydroxycholoroquine study didn’t include placebos so I guess we’ll never know.

          • Joshua says:

            Don’t want to drag this further off topic – but I found the article an interesting read, especially when you look at how his original “studies” became such a widespread focus in the climate blogosphere. Intersects with a number of interesting issues that tie into a lot of the discussion on Andrew’s blog.

            • dhogaza says:

              Probably shouldn’t go too far down this path, but the fact that his hydroxychloroquine work became such a hit in the climate [science denialist] blogosphere might be because he himself is a climate science denialist. But I don’t have the stomach to visit some of the places you go to (curry, for instance) so you know better than I.

              • Joshua says:

                dh0gaza –

                > but the fact that his hydroxychloroquine work became such a hit in the climate [science denialist] blogosphere might be because he himself is a climate science denialist.

                I think probably indirectly. In the article they describe the explicit and direct pathway from Raoult into the conservative media universe. My guess is that the linke was mediated in that way into the “skept-o-sphere,” rather than his identification with climate “skepticism” per se.

                Some might argue that there’s a direct anti-establishment dogma link. Or that there is a “skeptical of authority” link. But I tend to reject that. I think that libz and climate “realists” are probably equally likely to be skeptical of dogma and authority. My perspective is that it isn’t really characterological profiles that distinguish people along these types of cleavages, but just basic root identity. I like my group and scientists that become affiliated with my group, and I hate the other group and scientists that become affiliated with the other group – even if those scientist/group affiliations are more or less random or arbitrary.

                Maybe we should pick this up over at ATTP if we’re going to continue – it’s a hard topic to just drop but I don’t want to drag this post even further off topic?

              • dhogaza says:

                My curiousity is satisfied, but others at ATTP’s might be interested in discussing it. I agree that this is enough for here!

        • Carlos Ungil says:

          Related: https://www.statnews.com/2020/05/11/inside-the-nihs-controversial-decision-to-stop-its-big-remdesivir-study/

          “The National Institute of Allergy and Infectious Diseases has described to STAT in new detail how it made its fateful decision: to start giving remdesivir to patients who had been assigned to receive a placebo in the study, essentially limiting researchers’ ability to collect more data about whether the drug saves lives — something the study, called ACTT-1, suggests but does not prove.”

  2. Andrea says:

    Re Figure 2: those are (standardised) cumulative incidence functions (1 minus the survival) under the assumption of proportionality in the hazards (they are based on a Cox regression model).

  3. Joseph Delaney says:

    “Thanks I think in part to the good influences of Steve Goodman, AIM and some other more progressive journals are less rigid; and most epidemiology journals (which often publish studies like this one except for urgency) are completely open to alternative approaches. One, Epidemiology, actively opposes and forbids the JAMA approach (just like JAMA forbids our approach), much to the ire of biostatisticians who built their careers around 0.05.”

    Epidemiology is a very good journal and I note that the approach there seems to lead to very fine papers. It is not the case that the JAMA approach is needed for good research. My favorite recent paper was in Epidemiology and I think their refusal to allow significance and tough focus on making me be much more clear about the objectives/claims in the paper led to a MUCH better paper. A case where peer review made a very positive difference.

    Open data would be a huge benefit to medicine but all of the incentives run in the wrong direction. I am about to try and publish the code I used with my next study and my colleagues think I am mad. I suspect the journals will agree.

    • dhogaza says:

      Data I can understand, but what’s the objection to publishing your code? Within your peer group, that is, obviously in a politicized context we know that code will be attacked by some who disagree with your work’s implication regardless of the correctness of the implementation.

  4. george says:

    Regarding binary versus survival outcome: is there not an argument for efficiency, using the full survival time instead of the dichotomized live/die outcome?

    There are formal connections between the odds ratio and hazard ratio, but in a rush job like this it seems okay to say they are near-enough answering the same question and using survival doesn’t throw out information.

  5. Anoneuoid says:

    IMO, the WHO definitely is:

    Tips for managing respiratory distress

    Keep SpO 2 > 92–95%.

    Do not delay intubation for worsening respiratory distress. Be prepared for difficult airway!

    https://www.who.int/publications-detail/clinical-care-of-severe-acute-respiratory-infections-tool-kit

    It is wholly negligent to still recommend this. If I had a family member die unnecessarily on a ventilator due to premature intubation I would be suing them.

    • Anoneuoid says:

      And this was never even publisehd to be useful for covid-19, so they didn’t even follow their own “evidence-based medicine” rules that ignore all prior information when recommending it.

  6. Anoneuoid says:

    Also, look at table 1. Clinical severity features within 24 h of admission.

    Hydroxychloroquine + azithromycin (n = 735)
    -spO2 <90%: 20.9%

    Hydroxychloroquine alone (n = 271)
    -spO2 <90%: 13.1%

    Azithromycin alone (n = 211)
    -spO2 <90%: 9.3%

    Neither drug (n = 221)
    -spO2 <90%: 6.6%

    Fever and abnormal chest imaging shows a similar pattern. So it is clear that the sicker patients upon admission got more aggressive treatment 2-3x more often. So all these comparisons are confounded. As usual statistical significance is irrelevant to anything.

  7. It is hard for most to grasp how poor most published clinical research is and crisis just magnifies that.

    That is why I have been pointing to the OHDSI group which have multiple data sources, experienced methodologists and well studied defaults including negative controls.

    Their observational study on Safety of hydroxychloroquine in pre-Covid19 patients comprised “Overall, 956,374 and 310,350 users of hydroxychloroquine and sulfasalazine, and 323,122 and 351,956 users of hydroxychloroquine-azithromycin and hydroxychloroquine-amoxicillin were included.”

    Now the assessment was done in past use of hydroxychloroquine in patients pre-covid19 and the summary conclusion was “Conclusions: Short-term hydroxychloroquine treatment is safe, but addition of azithromycin may induce heart failure and cardiovascular mortality, potentially due to synergistic effects on QT length. We call for caution if such combination is to be used in the management of Covid-19.”

    Now, the JAMA study is in Covid19 patients and possibly higher doses? (so a related but different question), but the Forest plot in the figures in the full OHDSI paper shows various data sources alone would have suggested almost significant or significant effects in both directions on their own.

    And by the way the study was rejected without review by the NEJM and now under review at another journal.

  8. jd says:

    “The power analysis seems like a joke”

    I feel like this most of the time when I am required to do a power analysis (every proposal ever).

    Andrew Gelman – any chance y’all will get a paper on design analysis for future studies out sometime? https://statmodeling.stat.columbia.edu/2019/10/11/whats-the-p-value-good-for-i-answer-some-questions/#comment-1139483

  9. Michal says:

    I think the curves in figure 2 really are KM curves. Or rather one of them is and the rest are based on hazard ratios. And the discretization is not artificial; if you look at the “Medical Record Abstraction Form” in supplement 1, they only recorded the dates of admission and death, so they figure just reflects the fact that they didn’t have exact time.

  10. Eric says:

    “You laugh, but 10 years ago nobody in political science made coefficient plots from fitted regressions, and now everyone’s doing it.”

    See figs 3 & 4 here for a 20 year old political science example:

    Bartels, L.M., 2000. Partisanship and voting behavior, 1952-1996. American Journal of Political Science, pp.35-50.

    • Andrew says:

      Eric:

      Figures 3 and 4 of that paper are fine; it’s the secret weapon, which I love. But here I’m talking about displaying all of a regression with a coefplot. I’m not saying this is a deep idea; it’s just not something that people used to do.

  11. pophealth says:

    Don’t you lose information by using logistic regression vs. survival analyses?

    • Andrew says:

      Pophealth:

      It depends what you’re interested in. I’m guessing that the survival analysis is intended to correct some sort of bias having to do with people leaving the hospital in different days. If the issue is not bias correction, I can’t see how it would help to learn that some people die after 2 days and other people die after 4 days. Either way, they’re dying.

  12. OliP says:

    The adjustment graph idea sounds interesting. I’m trying to think how it would work: average effect & uncertainty (y-axis) for all possible sets of 1,2,…,n covariates (x-axis)? Or am I not understanding the suggestion?

  13. Ryan King says:

    Might have done a time to event analysis because it lets them do something with the 254 included patients who hadn’t died or been discharged from hospital. Having not died after N days when most people in the (died or discharged) sample were dead by N*0.8 days suggests you’re a survivor.

    The steps in Fig 2 are reasonable to me … the data only exist at integers so why not plot it at integers? I assume the output is just sas’s phreg with the “baseline” option set to something. There are other options off the shelf.

    E-value! It is a really efficient way to check the box that you did a “sensitivity analysis.”

    Power analyses are probably requested by reviewers and editors even where inappropriate. In this paper they had a target # charts to review, so hopefully they really did a sample size calculation instead of just figuring out how many charts they could review (or serially checking the results and sampling more charts if needed). The assumed numbers are huge, so I wouldn’t be surprised if it’s post-hoc. Similarly, table 1 p values for observational studies are probably requested by editors and reviewers. A tradition that won’t die. I’ve had them requested for large database studies where the p-values are all < 1e-6. I tried to explain, but it was easier to just put them in.

  14. Rich says:

    You two need a show!

Leave a Reply