Skip to content
 

Bayesian decision analysis for the drug-approval process (NSFW)

Bill Jefferys points me to a paper, “Is the FDA Too Conservative or Too Aggressive?: A Bayesian Decision Analysis of Clinical Trial Design,” by Vahid Montazerhodjat and Andrew Lo. Here’s the abstract:

Implicit in the drug-approval process is a trade-off between Type I and Type II error. We propose using Bayesian decision analysis (BDA) to minimize the expected cost of drug approval, where relative costs are calibrated using U.S. Burden of Disease Study 2010 data. The results for conventional fixed-sample randomized clinical-trial designs suggest that for terminal illnesses with no existing therapies such as pancreatic cancer, the standard threshold of 2.5% is too conservative; the BDA-optimal threshold is 27.9%. However, for relatively less deadly conditions such as prostate cancer, 2.5% may be too risk-tolerant or aggressive; the BDA-optimal threshold is 1.2%. We compute BDA-optimal sizes for 25 of the most lethal diseases and show how a BDA-informed approval process can incorporate all stakeholders’ views in a systematic, transparent, internally consistent, and repeatable manner.

Hey—the acronym “BDA” is already taken! But let’s set that aside . . . In all seriousness, here are my reactions to the above abstract:

(a) I like the idea of applying formal decision analysis to the drug approval problem, an idea I discussed in this article a few years ago but have never actually done. So I’m glad to see this new paper.

(b) As always, I find the Type 1 and Type 2 error framework to be inappropriate. I continue to be frustrated by researchers, starting perhaps with Jimmie Savage, who want to apply Bayesian inference but remain struck in this essentially deterministic or discrete way of thinking. In general, I doubt it makes sense to say that a drug works or does not work; rather, a drug will have different effects on different people, and these effects are themselves unknown. That is, there is variation and there is uncertainty. But no Type 1 or Type 2 errors. To put it another way, Yes Yes Yes on the drug-approval process as tradeoff, No No No on the discrete framing of therapies as “effective” or “not effective.” To try to perform a sophisticated, data-based analysis but tie yourself to the Type 1 and Type 2 error framework, that’s like, ummmm, I already used the “paint a picture using salad tongs” analogy, but I still like it so, yeah, that’s it.

(c) It seems a bit un-Bayesian, to present numbers like “27.9%,” given the uncertainty that must be present in these estimates.

Anyway, my overall reaction is positive and I hope these comments can inspire these and other researchers in this area to do even better.

23 Comments

  1. Sam says:

    I think there’s a balance that must be struck between starting from common ground and not continuing to use unhelpful tools. While we would like the FDA to start using Type S and Type M error instead of Type 1 and Type 2 error, sometimes merely making that suggestion is enough to cause the key decision-makers affecting FDA policy to turn of the rational parts of their brain. It seems like the goal of this paper is to influence people to question some of the most unhelpful traditions that are currently hedging the way, and it’s hard to accomplish that when people turn their brains off.

  2. Jonathan (another one) says:

    OK. I’m just trying to figure out the NSFW part from your on-deck preview. I have several hypotheses:
    (a) If BDA has been reappropriated for an acronym, you can reappropriate NSFW… Now Something For Weekdays?
    (b) The salad tongs were originally going to have a different illustration.
    (c) Type I and Type II errors are literally not safe to discuss at Columbia.

    Am I warm?

    PS: The post itself (and the article) were quite good.

  3. Z says:

    Haven’t read the full paper either, but it seems possible from the abstract alone that the researchers themselves do not think or even model in binary terms but rather accept that the FDA is likely to continue to do so.

    • Michael Lew says:

      You are correct to suppose that the FDA is likely to continue with dichotomous thinking about drug trials. Perhaps they should. After all, the output of their deliberations is pretty much binary: approve or don’t approve.

      • Andrew says:

        Michael:

        The decision may be binary. That does not mean it’s a good idea to model the underlying reality as binary.

        • Phil says:

          See Lin, Gelman, Price, and Krantz (1999) for an example of making binary decisions based on a continuous underlying model!

        • Economist says:

          The authors are very aware of these issues. In the conclusion they discuss the need to extend their framework to deal with uncertainty and variability (across heterogeneous subjects) in the responses. It is always important to view a study as an incremental contribution to a larger goal, rather than the last word on anything.

          They are very few Claude Shannons out there.

          • Andrew says:

            Economist:

            Yes, I think what you just wrote is consistent with what I wrote above, in particular the last paragraph of my post.

            • Economist says:

              My point is simply that the authors appear to be aware of your suggestions and allude to some (and discuss at least one explicitly) as avenues for further research. I am not one of the authors, but my understanding is that they view this study as one step towards a fully developed “BDA” of experiments that inform FDA decisions.

              Being an expert and an author of the seminal textbook on the subject, I have no doubt that you will have many invaluable suggestions. But the issues you raised in this post – regarding the issues surrounding binary outcomes, as well as the related uncertainty and distribution of treatment effects – are issues that the authors are aware of and mention in their conclusion.

              • Andrew says:

                Economist:

                Yes, I agree that Montazerhodjat and Lo address these issues in their last two paragraphs, and I hope their paper is influential. That’s one reason I blogged it, to promote their ideas.

  4. Alex says:

    Regardless of how exactly the effectiveness of a drug is measured, in the end the FDA will approve it for further testing (and/or use) or not. So at some point, these drugs will be labeled ‘effective’ or ‘not effective’, right? Is there a way around that?

    • D.O. says:

      Life does not stop with approval/disapproval. Suppose they will build an approval scale, rate different types of evidence and societal needs and say that they are going to green-light any drug that goes over 70 on that scale. It might be a completely different story for everyone involved whether a particular drug reached 68 or 42 on that scale.

      Different approach: same scale, two drugs. One has 65+/-1 according to all the uncertainties involved in the process and another 65+/-10. Makes a big difference for how we go from here. I hope FDA already does some kind of this analysis informally.

      • Alex says:

        I’m not going to claim to be super-familiar with the FDA process, but my understanding is that life for a drug does indeed stop with disapproval. If the green-light point is a 70 and a drug goes through trials and hits a 68, it is not approved and it does not get used. It would be illegal to administer the drug, doctors would lose their licenses, etc. Perhaps the pharmaceutical company decides to keep pursuing similar compounds or whatever, but that’s aside from the FDA’s decision.

  5. Anoneuoid says:

    >”Let us define the treatment effect of the drug, δ, as the difference of the response means in the two arms, i.e., δ μt − μp. The event in which the drug is ineffective and has adverse side effects defines our null hypothesis, H0, corresponding to δ = 0 (and the assumption of side effects is meant to represent a “worst-case” scenario since ineffective drugs need not have any side effects). On the other hand, the alternative hypothesis, H1, represents a positive treatment effect, δ = δ0 > 0. Therefore, a one-sided superiority test is appropriate for distinguishing between these two point hypotheses.”

    Same old story of null hypothesis is false therefore favorite explanation is true. How different were the groups and in what way(s)? What are the possible explanations for any differences? That is the relevant info.

    Also, the study should be designed to gather as much such info as possible and the analysis not designed to punish the researchers for doing this by requiring multiple comparison corrections. The merits of each explanation should be judged on how well they can explain the rich set of data that has been collected. That means the explanations will likely need to be quantitative in some way, at least with upper/lower bounds on what is consistent with each.

  6. Clyde Schechter says:

    I believe the prospects for Bayesian thinking to take hold at the FDA are very close to nil. They are a government agency who are subject to intense, hostile scrutiny by overseers in Congress, who are, in turn, the focus of intense lobbying by the pharmaceutical industry. The very existence of the FDA, to say nothing of its appropriations, are constantly in jeopardy. Let there be even a hint that there is something “subjective” about what they are doing, and they are toast. Yes, on this site, we know that likelihoods are the product of subjective models, and that the impact of different priors can be tested in sensitivity analysis. But Congress doesn’t know that, and probably isn’t really educable on the subject. The FDA has no choice but to operate in a paranoid/defensive mode, lest we all return the era when snake oil was freely marketed.

  7. Keith O'Rourke says:

    Donald Berry did a lot of work on drug regulations from a Bayesian perspective and concluded that regulatory agencies did need to focus on type 1 and 2 error rates (i.e. if using a Bayesian approach these had to be assessed and this is/was actually in current FDA guidance on using Bayesian methods.) Think he would be OK with other kinds of error rates if they offered advantages in context of well designed and powered multiple trials.

    Currently, drug regulators are trying to move past “once off yes or no” drug approvals to adaptive approvals that change over the life cycle of the drug. Not hard to realize you want continuous improvements of the knowledge of drug benefits, harms and uncertainties (of both) if you or a loved one is going to be getting it. There are real challenges (as Clyde pointed to) making this happen, which likely has to been done globally.

    • I personally think the real barriers are entirely political. If I had the power, I’d pass a law tomorrow that eliminated “approval” and purely forced drug companies to have third parties test their drugs and report statistics on a variety of outcomes, and continue to collect those statistics yearly and publish updated versions…

      “Consumer Reports” for drugs would then become possible, and is really what we need.

      Of course, there are challenges, but the FDA is a wrongheaded approach to begin with.

      • Keith O'Rourke says:

        Have you looked at what the do and what is currently available from them to third parties?

        They are not the ones slowing down access to individual clinical trial data (that Stephen Senn and others promote) – as far as I am aware.

        Also google mini-Sentinel and perhaps recent research funded around that.

        Disclaimer, I used to work in that area.

        • Right, there are efforts to increase transparency, but the FDA model is still a one-time YES/NO decision.

          I think the “stamp of approval from the FDA” is a bad idea, and I think drug companies would much rather compete on some kind of metrics (effectiveness metrics, side-effect metrics, risk, reward etc). I also think consumers would be better off in a “metrics” environment.

          Of course metrics are gameable, so I think having a regulatory body in place to ensure that the metrics are reasonably accurate is still important.

          For example, suppose a drug is found to be slightly less effective than an existing drug but have too many side effects, so it’s not approved. For a person who is allergic to the “existing drug” but would have been able to handle the side effects of the non-approved drug… this is not a good thing!

          I think if you explained this idea to 1000 consumers they would probably be receptive and largely be in favor of the change. I think if you proposed it to doctors and drug companies, they’d also be in favor. I conclude from this that the only reason we continue to have the FDA be the way it is, is entrenched political power.

Leave a Reply