The horrible convoluted statistical procedures that are used to make decisions about something as important as Alzheimer’s treatments

Richard Juster points to this article, “Evaluation of Aducanumab for Alzheimer Disease: Scientific Evidence and Regulatory Review Involving Efficacy, Safety, and Futility.” The article begins:

On November 6, 2020, a US Food and Drug Administration (FDA) advisory committee reviewed issues related to the efficacy and safety of aducanumab, a human IgG1 anti-Aβ monoclonal antibody specific for β-amyloid oligomers and fibrils implicated in the pathogenesis of Alzheimer disease. . . .

The primary evidence of efficacy for aducanumab was intended to be 2 nearly identically designed, phase 3, double-blind, placebo-controlled randomized clinical trials . . . The studies were initiated after a phase 1b safety and dose-finding study indicated suitable drug safety . . . Approximately halfway through the phase 3 studies, a planned interim analysis met prespecified futility criteria and, in March 2019, the sponsor announced termination of the trials.

However, following this decision, and augmenting the data set with additional trial information that had been gathered after the futility determination, conflicting evidence of efficacy was identified in the 2 studies.

This sort of thing must happen all the time: decisions must be made based on partial evidence.

But then things start getting weird:

Study 301 (n = 1647 randomized patients) did not meet its primary end point of a reduction relative to placebo in the Clinical Dementia Rating–Sum of Boxes (CDR-SB) score. According to prespecified plans to protect against erroneous conclusions when performing multiple analyses, no statistically valid conclusions could therefore be made for any of the secondary end points in study 301. By contrast, study 302 (n = 1638 patients) reached statistical significance on its primary end point, estimating a high dose treatment effect corresponding to a 22% relative reduction in the CDR-SB outcome compared with placebo (P = .01). In the low-dose aducanumab group in study 302, the effect was not statistically significant compared with placebo, and based on the prespecified analytic plan, this precluded the ability to assess efficacy with respect to secondary outcomes in both the high- and low-dose groups. . . .

Lots of jargon here, but the message seems to be that the decisions are being made based on some sort of house of cards based on various statistical significance statements.

I feel like these authors are doing their best. It’s just that they’re working with very crude tools, trying to paint a picture using salad tongs.

5 thoughts on “The horrible convoluted statistical procedures that are used to make decisions about something as important as Alzheimer’s treatments

  1. The problem is that amyloids accumulate in all diseased/injured tissue. If a cell does not force proteins to fold properly and remove the ones that don’t, amyloids will form since that is the lowest energy state for peptides.

    So, removing amyloid-beta is like being too sick to take out the trash so your neighbors call your mom to come over and do it for you.

    While accumulated trash could also become unhealthy, taking it out doesn’t stop you from being sick. It likely also tricks your neighbors into thinking you are better off than reality, so they are less vigilant.

    The approval of Aducanumab based on conflicting results, despite everyone on the FDA committee voting against approval, is a fitting pinnacle achievement for the amyloid hypothesis.

  2. It looks like fairly standard “gatekeeping” procedures. Instead of modifying the p / FDR threshold to account for multiple study hypotheses, you go down a fixed ranking of tests stopping at the first null-not-rejected one. It controls T1ER in a weird way. If your ordering is clever (like test N only makes sense if test N-1 isn’t null) it’s basically free; if you have strong priors (null N is much more likely to be rejected than null N+1), it also prevents you from having to spend a lot of alpha on weak tests. The problem is, as you see, when the expected ranking of tests doesn’t work out, it seems silly to hold onto the procedure. There are modifications that allow further-down-the-chain tests to escape and be independently evaluated at some cost in alpha for the main procedure.

    • Ok, but how does that work? Does the analysis plan suddenly have a bunch of forks in it? For example, they could have designed the study to make a choice and drop one of the treatment arms early. So, maybe you say that was a missed opportunity. When you start going down this path, I don’t see where it ends.

  3. Commenting from the future… the Center for Medicare and Medicaid Service’s National Coverage Analysis for aducanumab and like drugs is out (link). The public comment period closed a little over a week ago. This was my first time at the public comment rodeo and I saw some things I didn’t expect, including a few very large tranches of obviously-astroturfed comments.

Leave a Reply

Your email address will not be published. Required fields are marked *