Skip to content

Specifying a prior distribution for a clinical trial: What would Sander Greenland do?

We had a brief discussion here about prior distributions for clinical trials, motivated by a question from Sanjay Kaul. In that discussion, I asked what Sander Greenland would say. Well, here are Sander’s comments:

My view is that if you are going to do a Bayesian analysis, you ought to do an informative (“subjective”) Bayesian analysis, which means you should develop priors (note the plural) from context. I don’t believe in reference or default priors except as frequentist shrinkage devices, and even then they need to be used with careful attention to proper contextual scaling. In my view the frequentist result is the noninformative analysis (I attach a recent Bayes primer I wrote on invitation from a clinical journal which explains why), and there is no contextual value added (and much harm possible) in Bayesian attempts to imitate it.

I think the idea of noncontextual default priors is like the idea of a one-size-fits-all shoe: Damaging except in the minority of cases it accidentally fits. Yet it seems even nominally Bayesian statisticians dislike the idea of procedures that must be fit to each context. In contrast, I think the ability to take advantage of expert contextual information is the chief argument for Bayesian over frequentist procedures (note that modern high-level applied frequentism has largely morphed into “machine learning,” meaning fully automated procedures which have their place, but not in clinical-trial analysis).

On a separate issue that applies to Andrew’s reply: Any regression procedure (frequentist or Bayesian) based on scaling the covariates using on their sample distribution confounds accidental features of the sample with their effects, which is to say it adds more noise to an already noisy system, and should be avoided. So it is with the default prior recommended in the Gelman et al. preprint.

For “conservative” Bayesian analysis I recommend instead contextual “covering” priors (from “covering all opinions” or some might say “covering your ass”) which assign high relative probability to all values that anyone involved in the topic would take remotely seriously. Your alternative from the power calculation provides an example of a non-null point that needs to be covered that way and gives an idea of the minimum range of high probability density for the prior. I discourage uniform priors for unbounded parameters because they unrealistically cut off suddenly, a property they transmit to the posterior (meaning that the posterior will cut off even if the data screams “big effect”!).

After proper contextual (not SD) scaling, I think a not bad very vague prior to start off in most clinical and epidemiologic contexts is the logistic for a log odds ratio or log hazard ratio, which is just a generalization of Laplace’s law of succession and assigns over 95% prior probability to the ratio being between 1/40 and 40 (if the effect were larger than this there would hardly be a question of subtle analysis). This corresponds to adding 2 prior observations (as opposed about 1 for the Gelman et al. default). I do believe in examining results under different priors, and one can go from the logistic inward, skewing as desired by altering the number of observations added to the treated and untreated. One can use non-null centers but I find them a little scary and prefer skewing instead unless the null is improbable to all observers (e.g., as often the case with age and sex coefficients when using priors for confounder control).

Also Sander had a couple of technical notes regarding our paper:

1) The Witte-Greenland-Kim 1998 article you cited was rapidly superseded by the attached Witte et al. 2000 paper, which takes advantage of a then new SAS feature and works better than the 1998 proc. So you ought to sub the 2000 paper for the 1998 one.

2) Note that the log-F prior gets heavier tails as the A (half the F degrees-of-freedom) parameter drops, so another and computationally easier way to accomplish what you wanted would be to reduce A below 1/2 to the point that it roughly matched the Cauchy and then rescale, instead of rescaling the Cauchy to match the add-1/2 prior. See my SIM paper for details on use of the rescaled log-F (which I called generalized conjugate in my Biometrics 2003 paper).

3) My IJE paper shows how to use all this in logistic and Cox regressions.

In recognition of the crudeness of all these models in health and social sciences (which you acknowledge), I have hard time seeing the need for the more elaborate R calculations you use, when I can just use ordinary software with added records and an offset term to get the same effects (see the section on the offset method in the IJE paper).

My reply (beyond, of course, Thanks for all the suggestions) is:

– I think that what you (Sander) are calling an informative or “subjective” prior is similar to what I’m calling “weakly informative,” in that you’re adding information but not putting in everything you might currently believe about the science of a problem. I agree that, ideally, priors adapted to individual problems are better than default priors. But I think default priors are better than no priors or flat priors (which are themselves a default, just not always a very good default).

– I have to look at all your articles… It makes sense that similar functionality could be gained using different functional forms or parametric families of models.

– Regarding your comment on the “elaborate R calculations”: Actually, bayesglm is more robust and easier to run than glm. Whether or not the programming was elaborate, it’s transparent to the user. With classical glm (which, of course, most users also don’t understand) you get separation and instability all the time. I have no problem with ordinary software etc… but if you’re using R, then bayesglm _is_ “ordinary software”–it’s part of the freely downloadable “arm” package. You might as well say, “Who needs fancy linear regression software, when I can just use ordinary matrix inversion to solve least squares problems?” If the method is good (which we can argue about, of course), then it’s a perfectly natural step to program it directly so that the user (including me) can just run it, thus automating the steps of calculating an offset term, etc.

Sander responded to my last comment as follows:

Most epidemiologists simply will not use anything — I mean will not use ANYTHING — that is not a regular proc in one of a few major packages: SAS, Stata, maybe SPSS. Some will not even allow publication of their study results except as verified through a major package (for a reason why see the attached, in which the authors could not bring themselves to come out and say upfront “we goofed because we trusted the S-Plus defaults”). So, however good your R procs may be, their outputs just aren’t about to enter my field except through occasional forays as you may make, and I’ll wager the situation isn’t all that different in many other fields. Hence I developed pure-data augmentation procedures that everyone can use with SAS (and thus pioneer new kinds of errors). You can stabilize the SAS/Stata results using data priors with fractional A (say 0.5) if that’s your only goal — my article from AJE (2000) is on that topic, more or less, and about a very common problem in epi.

I’m perfectly happy with the idea that new ideas will start in R and other developer-friendly packages and gradually work their way to Stata, SAS, etc. I’m hoping that bayesglm (or something better) will eventually be implemented in SAS, so that the SAS die-hards will use it also! But I agree that it’s a useful contributions to develop work-arounds that work in existing packages.

Meanwhile, Sanjay wrote:

Thank you for your note and the papers that lucidly lay out your perspective. I agree with you that the choice of priors should be driven by context. In the index example, the “context” is the investigator’s estimate of what they felt to be a clinically important difference, i.e., 25% relative risk difference (so-called ‘delta’). This should ideally be based on prior evidence and/or clinical/biological plausibility. Unfortunately, this number is often driven by trial feasibility issues (inverse square relation between ‘delta’ and sample size). So, one could justify constructing a prior based on the investigator’s expectation of a 25% benefit by centering the distribution on a RR of 0.75 (equivalent to a relative risk reduction of 25%) and a very small likelihood (say 2.5% probability) of RR>1.0 and RR <0.75. Thus the 95% CI of this prior distribution (characterized as ENTHUSIASTIC) would be 0.56 to 1.0 RR or a mean ln (RR) of -0.288 and sd ln (RR) of 0.147. The other choice of prior would be based on a mean RR of 1.0 (c/w null effect) and a very small probability of benefit <0.75 RR. Thus the 95% CI of this prior distribution (characterized as SKEPTICAL) would be 0.75 to 1.33 RR or a mean ln (RR) of 0 and sd ln (RR) of 0.147. Thus, one can construct posteriors from a range of priors that span the spectrum of beliefs in a "contextualized" manner. One can construct different priors based on different estimates of clinically important differences which will vary from physician to physician, patient to patient, disease to disease, and outcome to outcome (for example less severe outcomes requiring a larger delta and vice versa), thereby preserving the critical role of "context" in interpretation.

To which Sander replied:

As a matter of application, neither of the priors you mention below are what I would call “covering priors”, that is neither of them cover the range of opinions on the matter; instead, they represent different opinions in the range. Those priors may represent what some individuals want to use, given their strong prior opinions (whether enthusiastic or skeptical), but I doubt those priors and their results are what a consensus or policy formulation panel would want. If those two priors represent the extremes of views held by credible parties or stakeholders, one covering prior might have a 95% central interval of roughly 1/2 to sqrt(2), thus encompassing each extreme view but focused between those extremes. The posterior from that prior would be of more interest to me than those from the extremes. And I would not leave out the ordinary frequentist result, which is the limiting result from a normal prior allowed to expand indefinitely, and hence the limit of a normal covering prior as the range of opinions covered expands indefinitely (the normality is not essential, although some shape restriction is needed). Context determines what is adequate covering by the prior.

I pretty much agree with Sander except that I don’t know that there is always an “ordinary frequentist result”; at least, such a thing doesn’t really exist for logistic regression with separation.


  1. Bill says:

    Could you post the attachments to which Sanjay and Sander refer, as well as a link to the Witt et. al. (2000) article?


  2. Rehan says:

    Andrew, I second Bill, posting those attachments or links will be quite helpful. Thanks

  3. Ted Dunning says:

    I pretty much agree with Sander except that I
    don't know that there is always an "ordinary
    frequentist result"; at least, such a thing
    doesn't really exist for logistic regression
    with separation.

    Actually, in this case, the comparison is quite simple, to wit:

    As a comparison, we used un-regularized logistic regression which resulted in numerical instability and inherently invalid results.


  4. Darlehen says:

    Please correct me if I am wrong, but aren't all clinical trials guided by advisory boards and by self-legislation bodies from the Pharma industry?

    I was happy that for our research we set our own prior (sample) distribution, oef :)

    Darlehen (Germany)