And also on a pragmatic note: as you say, people produce these things all the time, and it’s hard to stop them from putting an interpretation on a realized CI. The temptation is often just too great. But I’m a little more optimistic about getting people to use an interpretation that actually has a legitimate foundation. It’s easier to tell someone “don’t say that – say this instead” than it is to be completely negative and just tell them “don’t say that”. Even if the legit interpretation isn’t so easy to understand, it’s still an improvement over using one that’s wrong.

]]>Maybe worth quoting here how they define “standard problems”:

“In the standard problem of inference about an unrestricted mean of a normal variate with known variance, which arises as the limiting problem in well behaved parametric models, the usual interval can hence be shown to be bet-proof.”

]]>I’m not sure that “bet-proofness” is likely to be very helpful to people but I suspect I’m still not understanding something there.

[I see Daniel said a similar thing further down but by inserting my comment here it looks as though I said it first!]

]]>No, in fact I am much more interested in “non-standard” problems.

> Off-the-shelf realized CIs for standard problems have the bet-proof interpretation.

OK but that does not mean they perform well on what matters – e.g. for type S and M errors see

http://statmodeling.stat.columbia.edu/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/

I think you’re conflating the “standard problem” result vs. what their paper is mostly about.

The quote from their conclusion paragraph about “enlarging a credible set” that you cite refers to what they call “non-standard” problems. An example is a procedure that can generate an empty CI (see the earlier comment). To be fair to the authors, the contribution of the paper is about these non-standard problems so that’s what they spend most of their time discussing.

But for standard problems (p. 2186) no enlargement of a credible set or anything else is necessary. Off-the-shelf realized CIs for standard problems have the bet-proof interpretation. (At least, that’s how I read it.)

]]>Russian dolls!

]]>I am not sure and it take a fair amount of effort to understand their claimed solution and what trade-offs it makes in order to be bet proof.

I do think its the wrong strategy – searching for procedures that meet properties that are taken as good (good for what?)

From their conclusions paragraph “… , we derive confidence sets that are reasonable by construction. Specifically, we suggest enlarging a credible set relative to a prespecified prior by some minimal amount to induce frequentist coverage.”

I doubt if that prespecified prior is trying to represent an underlying reality in any meaningful extent.

Additionally “One might also question the appeal of the frequentist coverage requirement [uniform for all possible parameter values]. We find Robinson’s (1977) argument fairly compelling: In a many-person setting, frequentist coverage guarantees that the description of uncertainty cannot be highly objectionable a priori to any individual, as the prior weighted expected coverage is no smaller than 1 − α under all priors.” I don’t agree. Someone might have a prior on effect sizes in psychology that put considerable probability on huge effect sizes – I think I should be able to ignore them.

]]>Now that prior should likely not be the one assumed but rather prior(s) worried about and here it might make sense for that to be a point parameter. Note from this Bayesian perspective, the coverage would not be uniform but why should uniform coverage be seen as of overriding importance.

This would be one way to do what Daniel suggests http://statmodeling.stat.columbia.edu/2016/11/26/reminder-instead-confidence-interval-lets-say-uncertainty-interval/#comment-353634

This won’t be easy – getting at the pragmatic meaning of concepts (what to make of instances of the concept for future thinking and actions) is always difficult and somewhat elusive.

]]>Mueller-Norets (2016, published version, p. 2185):

“Suppose an inspector does not know the true value of θ either, but sees the data and the confidence set of level 1−α. For any realization, the inspector can choose to object to the confidence set by claiming that she does not believe that the true value of θ is contained in the set. Suppose a correct objection yields her a payoff of unity, while she loses α/(1−α) for a mistaken objection, so that the odds correspond to the level of the confidence interval. Is it possible for the inspector to be right on average with her objections no matter what the true parameter is, that is, can she generate positive expected payoffs uniformly over the parameter space?”

If the answer is “no”, the confidence set is bet-proof. For “standard problems” (see Mueller-Norets), a standard frequentist realized CI is bet-proof (simplifying a bit here).

Paraphrasing: say α=0.5, θ is a scalar, and I’m a casino customer. The casino picks a θ, then uses θ and a computer to generate a random dataset, and then uses this dataset to calculate a realized frequentist 50% CI. They show me the calculated 50% CI but not the true θ. I can bet that the CI doesn’t contain the true θ. If I’m right, I get $1. If I’m wrong, I lose $1. Can I make money on average? No – realized frequentist CIs are bet-proof, for “standard problems”. (Simplifying a bit again, but I *think* I got that right.)

The “standard problems” caveat rules out settings that can generate e.g. empty CIs, because I can make money on average if I bet “no, the CI doesn’t contain the true θ” every time I see a realized empty CI (easy money – the realized CI is guaranteed to be wrong).

]]>Hey, we just saw that movie a couple days ago!

]]>“… all possible samples of the same sample size … 95% of them would produce intervals that contain the true value … and this also assumes that all the model assumptions are true .. and you have no knowedge of whether or not the one interval you have contains the true value.

]]>To be concrete, let’s say we have a sample with a confidence interval for the mean that ranges from 1.1 to 1.4. It would be correct to state that, with repeated sampling, 95% of the random samples (of this size, etc.) will contain the true mean. It would not be correct to state that we are 95% confident that the true mean lies in the range of 1.1 to 1.4, i.e. this particular confidence interval. The true mean either is in that interval or not. And we only have this one sample. So, what is the probability that the true mean is in the interval 1.1 to 1.4? Now we are on a slippery slope. Rather than debate the meaning of “probability” (which is a key issue, I agree, and one worth exploring), our particular interval is either one of the 95% that contain the true mean or one of the 5% that does not. We don’t know which, but if you ask me what the probability is that we have one of the “good” ones, I’d say 95%. And I’d be wrong – but how wrong? What would you rather I say about the one interval I have? Personally, I’d prefer to say 95% (with all of its faults and potential for misunderstanding) than to say nothing at all. ]]>

Could you expand on what information conveys? (Serious and sincere question, not rhetorical or sarcastic.)

My feeling is that people commonly misinterpret frequentist 95% CI’s as Bayesian credible intervals in at least two ways: 1) that they mean “there is a 95% chance that…”, and 2) that the CI and its guarantees apply to the particular sample we have in hand rather than to a process repeated an infinite number of times. Once you correct these misconceptions, though, what’s left for the frequentist-because-that’s-how-I-was-trained practitioner?

I have to confess that I can’t see the bridge from the seemingly abstract frequentist definition of CI’s — as I understand them — to practical use on a particular sample that I happen to have. It seems like once the misunderstandings are corrected, we’re left with something of a vague, relative comparison between CI’s: “This one’s bigger, so it’s more uncertain in some sense, at least as long as we weren’t unlucky enough to have a sample from the 5% of all possible samples where the CI may be nonsense.”

I agree that forked paths, low power, etc, are more important in some sense. At the same time, forked paths, power, etc, seem like “more advanced” topics that involve things like experimental design, while CI’s are “basic” topics that are built in to every piece of statistical software we might use and are de riguer in most fields, so perhaps we have more chance of making an impact in the basics. (And in my wild-guess estimating I wonder if the more advanced topics more severely affect some studies, while the basic topic mildly affects all studies and the sum over all of science is about the same. Just my fantasizing, really.)

]]>The other strand of thought that keeps recurring is the Bayesian-frequentist debate. I really don’t see confidence intervals as the avenue with which to convince the world they should be Bayesians. I’ve seen plenty of coherent arguments on this blog that make that point very sell – and, yet, most of the world is trained (if at all) in the frequentist methodology. Do you think that attacking the confidence interval is really an effective argument for convincing people to become Bayesian?

I do think we can teach people what a confidence interval actually means and that the common use of that interval, while flawed, does convey some information. I only want a sense of humility about what that information is. After all, the problems with incorrect interpretations of the confidence interval are not as serious as the issues with forked paths, lack of replicability, low power, failure to include prior information, etc. etc.

]]>Links here:

]]>The common use as an indication of uncertainty in an estimate seems to depend on it being kind-of OK most of the time. Is that really good enough?

]]>Larry: I can interpret, Clive. I know what you meant me to understand.

Clive: Mere sir, my sir.

Larry: Mere sir, my sir?

Clive: Mere surmise, sir. Very uncertain.

In the last the uncertainty is nil. We just say, “This is bull crap and that’s not in doubt.” But when you approach a balance point between certainty and uncertainty, then we start to worry about how certain we are or how uncertain we are. I’m trying to say that we rarely need to worry about this issue when things are obviously true or false, on or off, 1 or 0 so that reduces the hard cases to those which hit near enough the balance point: is this true or not? Or is this false or not? (And the converses, which unfortunately statistics decided to use as the general approach: is this getting true from being false or is this getting to be false from being true?) To say, this is the confidence interval or uncertainty interval balls all this stuff up and that means people tend to read it as they want or can. To focus on the specific interval, label it at both ends with both sets of labels and you see it’s where uncertainties balance with certainties. That extends to effects. I think of it as the “if.interval”, as in “if true, how far from false and if false, how far from true?” In my peculiar notation, the dot signifies process so you have if processed over interval, which automatically should imply bidirectionality with appropriate endpoint labeling. I subdivide into direction: if-1.interval and if-2.interval so – means direction, which labels the endpoints appropriately, and thus the disappearance of – has meaning.

]]>But I think “coverage interval” is still not quite accurate enough, since the realized interval doesn’t actually have the coverage, but the procedure generating the intervals guarantees (conditional on the assumptions) their overall coverage. So, how about “coverage procedure interval” or, in full, “frequentist coverage procedure interval”? And add “estimator”/”estimate”/”realized” etc. as needed.

]]>Following on from the other discussion, what would be your preferred terminology for distinguishing between the procedure for calculating the interval vs an actual interval using some dataset? “Uncertainty interval” vs “realized uncertainty interval”? “Uncertainty interval estimator” vs “uncertainty interval estimate”? Something else again?

Maybe I’m in the minority but I think I’ve convinced myself that failing to distinguish between the two concepts is a bigger terminological sin than the use of the word “confidence”.

(PS: I rather like your suggested “frequentist coverage interval”. Or even just “coverage interval”.)

]]>Mathematically speaking, a standard frequentist interval need not represent “uncertainty” in its usual English sense. But, for the same reason, it also need not represent “confidence” in its usual English sense.

One option would be to simply call it “a frequentist coverage interval” and eliminate all senses of confidence, uncertainty, probability, etc. But if we *are* going to use an evocative term, I’d prefer “uncertainty” to “confidence” because in practice these intervals *are* used to convey uncertainty.

Now, I wouldn’t use “confidence interval” for a Bayesian posterior interval of some sort – I see it as a fundamentally frequentist term. “Uncertainty interval” might be a good term in that context.

]]>