How do you think about the values in a confidence interval?

Posted on January 14, 2013 9:59 AM by Andrew

Philip Jones writes:

As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange).

I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?”

I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question).

I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up.

Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals.

My response:

First off, I’m going to forget about the official statistics-textbook interpretation, in which a 95% confidence interval is defined as a procedure that has a 95% chance of covering the true value. For most of the examples I’ve ever seen, this interpretation is pretty useless because the goal is to learn about the situation we have right now in front of us, not merely to make a statement with certain average properties.

I would say that the usual interpretation of a confidence interval is as a set of parameter values that are consistent with the data. Typically the values near the center of the interval are more consistent, and sometimes this idea is formalized by thinking about hypothetical nested 1%, 2%, 3%, …, 99% intervals, where the more central parameter values are in more of these intervals.

The real problem is that the interval will exclude the true value at least 5% of the time. 5% doesn’t sound like much, but given that it is the more significant findings that get noticed, these can be an important 5%. Also, when the sample size is small, the confidence interval can include lots of implausible values too. Consider the notorious claim that beautiful parents were more likely to have girls. Here, the confidence interval included all sorts of big numbers (for example, the data were consistent with beautiful parents being 10 percentage points more likely to have a girl, compared to ugly parents) that a quick literature review revealed were highly implausible. This was a setting where the prior information was much stronger than the data.

17 thoughts on “How do you think about the values in a confidence interval?”

Bob Mrotek on January 14, 2013 11:54 AM at 11:54 am said:

Excellent response! In the area of non-destructive testing, for example, a 95 percent confidence level is the generally accepted norm but if we are really serious the confidence level should be more like that of the old Ivory Soap commercials as in 99 and 44/100 percent :)
- Rama on January 15, 2013 2:40 PM at 2:40 pm said:
  
  :) lol
James Thompson on January 14, 2013 12:12 PM at 12:12 pm said:

Neat idea, and very intuitive explanation. Can you use hypothetical nested intervals to understand the posterior distribution over parameter values? For example, you could just numerically estimate various nested confidence intervals, and simply count the total number of times that any parameter value fell inside the any of the intervals.

I wouldn’t be surprised if there were philosophical reasons why this isn’t a palatable idea, but it seems like it might work pragmatically.
Rahul on January 14, 2013 12:53 PM at 12:53 pm said:

Is it wrong to say 95% CIs tell us nothing about the behavior within the interval only the chance of falling in it? It could be either “all equally likely” or “tails more likely” or even something totally else.
Eric Rasmusen on January 14, 2013 1:44 PM at 1:44 pm said:

You could build a nice publication bias example here. In surveying the literature, I would expect to see a vigorous debate between the papers finding that beautiful parents have more girls and the papers finding that beautiful parents have more boys, with no papers whatsoever making the boring claim that parental beauty makes no difference.
Bob Carpenter on January 14, 2013 4:20 PM at 4:20 pm said:

From a Bayesian perspective, all of the points in a 95% posterior interval for a continuous parameter themselves have probability zero. It’s only intervals of non-zero width that have measurable probability (this can be generalized to sets of intervals). By definition, any 95% posterior interval has the same probability. The best you can do is say that intervals of the same width have higher probability in high density areas than low density areas. Andrew’s been urging us to use the smallest posterior interval (SPIn) for Stan, which isn’t even always well defined (consider a uniform posterior).
- Paul on January 15, 2013 1:52 PM at 1:52 pm said:
  
  The original query used the phrase “likely (probable)”. The points may have zero probability, but some (perhaps all) of them have positive likelihood. (I agree with Bob about probability; I’m just trying to better understand Philip’s question.)
bxg on January 15, 2013 12:15 AM at 12:15 am said:

> First off, I’m going to forget about the official statistics-textbook interpretation, in which a 95% confidence interval is defined as a procedure that has a 95% chance of covering the true value.

Is this merely an interpretation, or the basis of c.i.’s very definition? It would be weird to forget about a definition while continuing to strive to find relevance in the defined entity.

> I would say that the usual interpretation of a confidence interval is as a set of parameter values that are consistent with the data.

Why isn’t likelihood a strictly better path for assessing “parameter values that are consistent with the data” than taking a concept, whose underlying definition you reject while trying to retain it’s purpose by interpreting it as a (IMO) broken likelihood assessment?
- Andrew on January 15, 2013 12:55 AM at 12:55 am said:
  
  Bxg:
  
  1. I don’t think the classical definition of confidence intervals is helpful in answering the original question: “Are all values within the 95% CI equally likely (probable), or are the values at the ‘tails’ of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” To answer this question, I think we need to consider any given case, not merely averages.
  
  2. Likelihood is fine. For that matter I could’ve suggested to my correspondent that he just use Bayesian inference! But I wanted to answer his question, which is an important one because we see confidence intervals all the time.
  
  Also, in many examples the points with the highest likelihood are clearly not the most probable parameter values. I refer you again to the notorious study of beauty and sex ratios.
Bradley Spahn on January 15, 2013 12:25 PM at 12:25 pm said:

@Rahul, errors are typically assumed to be normally distributed (i.e. a bell curve). Indeed, we usually calculate 95% confidence intervals by assuming normally distributed errors. As such, we expect most of the density to be in the middle. Say that for a given regression, the coefficient was estimated to be 3 with a standard error of 1. Then we would expect to find the “true” coefficient between 2 and 4 65% of the time and between 3 and 5 95% of the time. So for this normally distributed error term, doubling the width of the interval from 2 to 4 nets you half as much absolute confidence as moving from 0 to 2.

@Bob, Isn’t your point about the point probability being 0 true for all continuous probability distributions, from a Bayesian perspective or otherwise?
alex on January 15, 2013 5:28 PM at 5:28 pm said:

My thoughts:

(1) The orthodox answer is that you can’t say if the tails are less likely or all the CI is equally likely. The theory just can’t give post-data probabilities like that. Excluding other knowledge, all areas within a CI are on a par. Spanos (2007) outlines this in section 4.2.4 of his summary of the Philosophy of Econometrics.

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.8707&rep=rep1&type=pdf

(2) It might be helpful to one up AG’s implausible, and think about an example where you absolutely know some area of the CI must be impossible. Like if you’re measuring weight, which must be positive. The likeliness of areas within the CI just depends on how the CI falls, if you get 1 to 11 you can’t add anything, if you get -9 to 1 the tail is more likely, if you get -3 to 7 the center is more likely. But it’s also the case that 0 to 10 and -2 to 10 and -10m to 10 are all 95% CIs – so it’s not clear what middle or center means. It depends on the situation, there’s no advance rule.

(3) Mayo and Cox give almost exactly the same reasoning as AG in section 3.6 (p 90) of their summary of frequentist inference. https://arxiv.org/pdf/math.ST/0610846.pdf

(4) I’m not sure it’s always the case that a confidence interval is thought of as a set of parameter values that are consistent with the data. The obvious counter example is a one sided CIs. If you put a 95% upper bound on something you’re including very inconsistent low values and excluding more consistent upper values.
Michael Lew on January 15, 2013 9:19 PM at 9:19 pm said:

The question has two answers, one based on likelihoods and one based on probabilities. The likelihood answer is in response to what I assumed the original questioner had in mind. The likelihood function is usually higher near the centre of the interval than at the edges, so the answer is that points in the tails have lower likelihoods than points near the centre of the interval, as long as the likelihood function and the confidence interval are centred near the same point. The probability answer deals with the “(probable)” part of the question that is probably a mistake. The probability that any estimate corresponds to the true parameter value depends on the likelihood function and the prior probability distribution. That answer depends on the prior, but it is probably not an answer to what the questioner had in mind.
- Andrew on January 15, 2013 9:59 PM at 9:59 pm said:
  
  Michael:
  
  I think you are making the error of treating a traditional textbook explanation as the correct story. “Likelihood” (meaning, the probability density of the data conditional on parameters) is not necessarily what someone is talking about when they say “more likely” or “less likely” or “equally likely.” If, for example, theta is the difference in Pr(girl), comparing babies of beautiful parents to babies of average parents, I would not say that it is “more likely” that theta is 0.08 than that theta is 0.001—even though for one particular study, the value 0.08 had a higher likelihood. I do, however, think it’s fair to say that theta=0.08 is more consistent with those data. The point is the “more consistent with the data” != “more likely” in common English usage.
  - Nameless on January 16, 2013 2:58 AM at 2:58 am said:
    
    I think that you’re only saying that because you’re trying to introduce your Bayesian reasoning with your own idea of a priori probabilities into a purely frequentist study.
    
    If you apply the same logic throughout consistently and stick with frequentist model, you might find that the mean is 0.047, 95% CI is [-0.039, 0.133], and 68% CI is [0.004, 0.09], which means that 0.08 is inside 68% CI and your “more likely” 0.001 is outside. So, strictly speaking, 0.08 is more likely.
    
    On the other hand, if you insert your preference for theta close to zero into the model from the beginning, you’ll arrive at a different CI where 0.001 is closer to the mean than 0.08.
    
    The answer to the original question is, yes, points closer to the mean are somewhat more likely than points near the edges. But not significantly so. Just looking at the Gaussian probability density might help. Local probability density of the normal distribution near the edge of the 95% interval is about 1/6’th of the density near the mean.
  - Michael Lew on January 16, 2013 3:39 PM at 3:39 pm said:
    
    I agree entirely with your suggestion that colloquial likelihood is often different from statistical likelihood, but in this case it is not. The question asked is most sensibly asked and answered in terms of likelihood. The fact that the asker is probably unaware of likelihood is no reason to answer the question in another manner. If a person asks which way to the train station in Pennsylvania, you would presumably not consider saying “The train station in Melbourne has lots of clocks”.
    - Andrew on January 16, 2013 4:20 PM at 4:20 pm said:
      
      Michael:
      
      I think we have different definitions of what is sensible. To me, it is not sensible to say that the point in the center of the confidence interval is more likely than the points at the extreme, at least not in the beauty-and-sex-ratio example. Nor do I think the points in the interval are equally probable. I think I was answering the question as directly as possible.
      
      To say it again, here was the question: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?”
      
      And here is my answer: In general, No and It depends.
R. Hahn on January 15, 2013 10:15 PM at 10:15 pm said:

Isn’t the original question (“Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?”) precisely what Jim Berger’s conditional frequentist testing is designed to address? https://www.stat.duke.edu/~berger/papers/02-01.pdf

In particular he says in section 2.1, under “criticisms of Neyman-Pearson” that “Both Fisher and Jeffreys criticized (unconditional) Type I and Type II errors for not reflecting the variation in evidence as the data range over the rejection or acceptance regions.” He then says of conditional frequentist testing in section 4.1 that “the result — having a true frequentist test with error probabilities fully varying with the data — would have certainly had some appeal, if for no other reason than that it eliminates the major criticism of the Neyman–Pearson frequentist approach.”

This is not incompatible with what you have said, or the other commenters, but certainly deserves a mention in this discussion.

Comments are closed.