Making decisions based on the endpoint of a 95% interval

Troels Ring writes:

You know undoubtedly this site and the idea behind, presented also in the book by Spiegelhalter et al 2004. A recent reason for wondering about this is a paper in American journal of Kidney Disease 2009; 53: 208-217 claiming that protein restriction kills people with a hazard ratio 1.15 to 3.20 so to “believe” this, if I understand it, the prior would have to have weights above 1.9 which is strange since the anticipated effect would be beneficial. I have found few references to this method (a paper by Greenland mentions it shortly) and I’d like to hear your view of it.

My reply: I actually hadn’t heard of this research before. It looks like it could be useful. I have no time to think more about this now, but my quick thought is that there’s something a bit wacky about making decisions based on the endpoint of a 95% interval. It doesn’t seem so Bayesian to me. On the other hand, I do this myself to some extent often enough, and in ARM we have a whole chapter on power calculations, so I don’t quite follow a consistent line on this myself.

Ring adds:

As I understand it [from the paper, “Methods for assessing the credibility of clinical trial outcomes,” by Robert A. J. Matthews, Drug Information Journal 35, 1469-1478, 2001] , the whole idea originating with Good (Matthews ref 15) is to analyze how to pick a sensible prior that will make the confidence interval to be criticized believable (or not). So to buy the packet you will somehow have to accept that the notion of a prior is interesting at all which is more than just having to acknowledge that Bayesians can “algebraically” deconstruct a frequentist confidence interval or estimate by putting e.g. a flat prior. But my area and clinical medicine certainly is showered with conclusions that are “statistically significant” which you, using the logic here, will see require an arcane conception of reality. Matthews only look at Gaussian models and looking at clinical trials put the prior with OR centered on 1 for equipoise but the principle should be easily extended to any likelihood and prior whereby a frequentist conclusion can be exposed to a Bayesian sensitivity analysis by varying the whole setup of priors and watching the posterior. But I wonder if there is enough power in it to convince those who do not find Bayes credible in the first case.

4 thoughts on “Making decisions based on the endpoint of a 95% interval

  1. Perhaps so, but the idea is not to make a decision but to expose the prior belief "needed" to make a frequentist 95% (or whatever) CI a Bayesian posterior – at least this is how I understand it. And the papers cited by Matthews then show that a frequentist CI which would only be uphold if an extreme prior is used is likely to come into conflict with evidence produced afterwards. So I'm pretty sure it has little to do with the famous paradox since for every frequentist CI you can find a limit you will have conceivably to exceed (if OR is above 1 etc) in the prior to make the result believable. If that limit is very high you will likely distrust the CI. I guess that is not too strange but not well reported, either.

  2. It doesn't sound very Bayesian to me, either. Which is why frequentist-derived p-values are only really useful in aggregate (replicated in multiple samples, consistent, mechanistically plausible, etc.). One must never make too much of frequentist CI and believe that they're actually estimating the probability of a given hypothesis. I think Ring is correct to point out the ridiculously extreme prior's needed to justify some of the published frequentist p-values you see in the literature. We don't always teach this well enough, and (at least in medicine) we regularly overstep the bounds of inference in our write-ups.

  3. I think this is sort of the opposite of Lindley's paradox. That paradox arises when the prior distribution of a parameter is diffuse. Then the probability that the parameter estimate will be close to any given value is small, so that the posterior probability of the model is low. However, with a diffuse prior distribution, the Bayesian credible interval becomes essentially identical to the classical confidence interval. The Matthews/Spiegelhalter calculations suppose that there might be prior information that the parameter is close to zero, so that the credible interval is pulled towards zero. If the prior information is strong enough relative to the information in the sample, the credible interval may include zero, even if the classical confidence interval does not. In a way, they're making the same point that Andrew and I did in our recent American Scientist paper: that when there's reason believe that the parameter is small, a large estimate from a small or otherwise weak sample may not provide convincing evidence that anything at all is going on.

Comments are closed.