What’s out of context is that you make it an unqualified definition and Neyman does not.

]]>Yes, the submarine example can be nicely dealt with (actually when I was in Don Fraser’s class Welch’s example was the canonical example) but there is (yet) no general (satisfactory) theory that comes out of it.

The challenge of what should you condition on in a given problem to get a unique reference set remains unanswered.

]]>Spanos’s proposed truncation fix falls to a minor modification of the Uniform model. Suppose that the values of X are sampled from the Uniform and then observed with normally distributed error with tiny variance. Then the unconditional CI includes, not impossible parameter values, but rather very very improbable values (improbable in what sense? Bayesian, natch) but since the data do not strictly rule them out, Spanos’s set of possible parameter values A(x_0) is again the entire real line, and no strict truncation is justified.

]]>I see now that the Welch example is much discussed. See references in Spanos: http://www.econ.vt.edu/directory/spanos/spanos6.pdf

]]>Link to the Fraser article: http://projecteuclid.org/download/pdfview_1/euclid.ss/1105714167

]]>We are in agreement then and matus provided the example of how naive student would do just as badly trying to follow BDA http://statmodeling.stat.columbia.edu/2014/12/11/fallacy-placing-confidence-confidence-intervals/#comment-202998

Not sure why many writers seem to be able able to avoid giving this false sense of “Bayesians having the complete solution”.

]]>Oh, I agree that relevant subsets are a serious problem. (And thanks for the additional background!) What I’m having a problem with is the way the paper leaps from that to a blanket condemnation of CIs as a methodology. If the problem is fundamental to CIs in a way that Bayesian inference is somehow not, the paper does not illustrate or support this; the examples given ignore available frequentist solutions to the relevant subset problem and ignore possible Bayesian solutions that would have the same problems.

]]>@RM: I did read an earlier version of the manuscript (linked in my blog post). I have spent more time on that manuscript than I’m willing to admit – double-checking and triple-checking the calculations. Now, you seem to say that your conclusion in the manuscript (which I cited, and TY points out another section) does not characterize the paper’s argument. I guess I will just wait until a coherent version of the paper gets published…

]]>Erf: It similar to Simpson’s paradox used to draw attention to confounding, where the reversal is dramatic but the important point is just any confounding.

In Morey’s paper (which does a good job of making obscure technical material accessible) the important point is relevant subsets – the submarine example just makes the point dramatic. Relevant subsets are a very serious problem for frequentist methods, they stopped Fisher in his tracks, and for instance, Mike Evans in his rejoinder to Mayo in her recent likelihood principle paper raised it as the serious unresolved problem.

Interestingly, George Casella in his paper on relevant subsets suggested they were not likely to be of much practical importance. I raised this with him in 2008, as I found them very important in my practical work and so has Stephen Senn, and he assured me he had changed his mind.

Now formally, Bayes does avoid the problem – but only if one nevers questions their joint model, prior nor likelihood. For instance Box and Rubin’s work on model checking and calibrating Bayes or Steve MacEachern John Lewis (2014) (with Yoonkyung Lee), Bayesian restricted likelihood methods (not conditioning on all the data for good reasons).

]]>From what I can get out of Welch (1939) in a quick read (I don’t have a lot of time today), CP2 would be preferred over “CP3” (which can be derived using purely frequentist methods) because, for any given “false” value (theta+delta), CP2 would be (very slightly) less likely to include this false value than CP3 would. Is that what you’re thinking of? Why are you assuming that this is the only valid “frequentist principle” by which one may choose a CP?

What it comes down to is that your paper is arguing that frequentist confidence intervals are completely broken and should never be used, but the examples you use to support this don’t support it at all. If instead you were making the point that some of the recommended criteria used to choose a particular CP can sometimes ignore important information and lead to poor (or even “absurd”) CIs, or that one must be careful to consider assumptions and available information when interpreting CIs, then I don’t think most people would be complaining. (The same is true of Bayesian inference!) But nowhere _in your paper_ does it say why a frequentist statistician would necessarily come up with CP1 or CP2 but never CP3. Instead, it argues that because this particular Bayesian CredInt performs so much better than these two particular frequentist CIs, therefore CIs are all fundamentally broken and should never be used. This makes absolutely no sense.

The problems your paper ascribes to CIs could all apply to Bayesian inference methods, if those methods were restricted to use (or throw away) the same information. (For example, if you build a Bayesian credibility interval based on a non-parametric probability model and use no information at all about the submarine, wouldn’t the result have all of the same problems you ascribe to CP1 throughout the paper?)

The issues you address are real problems, obviously. But they are not problems with CIs or CPs, but with applying and interpreting statistical procedures without thinking.

]]>Its funny how one persons irrelevant example is anothers useful practical one. Submarines seem much more practical and realistic to me than 1D cubic spatial regression discontinuity on coal fires and health… just sayin

]]>If that counts as relevant context, isn’t the statement a tautology? “The mean is 5 is a correct statement*” (…) “*: Where applicable.”

]]>I’m just going to comment here that “(look, we found a guy, Cummings, 2014)” seems to ignore the fact that Cummings (2014) are essentially the APA’s new statistical guidelines.

]]>I’m not trying to “scare” you by quoting from the actual statistical literature on this topic. If you are unwilling to interface with this literature — instead relying on an intro textbook that does not address the details of CI theory, and doesn’t even support your points insomuch as it discusses many different ways of constructing CIs — there’s not much else to be said.

]]>Matus, Andrew did read the paper, and is correctly characterising our argument. You should read the rest of the paper, including the part where we discuss cases where a proper Bayesian interval will be numerically the same as the confidence interval. Anything but Andrew’s interpretation would be nonsensical given the rest of the paper, though the paper is a draft so we can make it clearer to avoid misunderstandings…

]]>Andrew, have you read the paper? They are actually saying that all CIs are bad, that they should be abandoned and that bayesian intervals should be used instead. Here is their conclusion:

“We have suggested that confidence intervals do not support the inferences that their advocates believe they do. The problems with confidence intervals – particularly the fact that they can admit relevant subsets – shows a fatal flaw with their logic. They cannot be used to draw reasonable inferences.We recommend that their use be abandoned.” (p.9)

Is this to what you subscribe to?

“I find this general approach to inquiry—take a generally-recommended principle and explore simple special cases where it fails miserably—to often be a helpful way to gain understanding.”

How about a following inquiry into the submarine problem:

It is highly unlikely that the lost submersible has been carried 1000 miles away from the point of initial submersion point. Therefore a gaussian prior with mean at the initial submersion point is a much better choice. Then we choose the likelihood to be a conjugate distribution (ie gaussian) as is advocated in your textbook (BDA 2ed, chap 3.3). We add prior for sigma and marginalize over it. Then we obtain student’s t distribution for posterior. From this posterior we obtain a 50% credible interval that is similar (up to the prior) to CP1 in Morey et al.

Does this show that bayesian CIs should be abandoned? Do you feel the need to rewrite BDA, such that it says in chapter 3.3 that uniform likelihood should be used instead of normal when estimating position of submarines, whales and farting divers? Do you find this inquiry illuminating? I do not. I find it silly. CP1 is similarly silly except it has frequentist flavour.

While you point out in your post that bayesian CIs have their own problems, Morey et al. will have none of it. They don’t find any fault with the bayesian CIs – they certainly don’t point out any problems in the paper. Instead we learn that “by adopting Bayesian inference, [researchers] will gain a way of making principled statements about precision and plausibility.” (p.9) And you were complaining about the overconfident tone of statistics textbooks when they describe CIs…

]]>Andrew,

I just split what you call CI derivation into two parts – I. derivation of P(D|theta) and II. the derivation of the CI from it. The former part is difficult for complex models, while the latter is trivial and exact once we have P(D|theta). I called the latter part CI derivation. My use of the term is narrower. I now realize this is confusing, since frequentists will work both on I. and II. when they try to improve calibration of their CI procedure. However, I. is not only about the long run performance. Assumptions and background knowledge about the data generating process enter here. I wanted to focus the discussion with RM on part I und put part II aside.

]]>Andrew: Yes, it is true that “frequentism” can be a bit of a grab bag (particularly if one counts Fisher as a frequentist, which is at least debatable). In the context of our paper, however, by “frequentism” we mean Neyman’s theory, since that is the theory that birthed CIs. Thankfully, that is much less of a grab bag since Neyman was so careful and principled (re: your “foxhole” comment, if anyone would use a counter-intuitive CI on the high seas, it would be him! Of course, Fisher would just say that Neyman didn’t get out on the high seas enough…)

Tal: Bayesianism and frequentism aren’t exhaustive. “Fiducial” or “likelihoodist” may be the third label you’re looking for (depending).

]]>John:

This somehow reminds me of the foxhole fallacy.

]]>Sounds similar to some of the points Feynman made about the unusual efficacy of math in describing physics. Some of the concepts in Phycics you can translate into words, & analogies, & try to make it *seem* as if you are describing the Physics itself in sentences.

For simpler concepts & situations it may indeed work. But there’s no mistaking that the *real* physics is described by a precise math equation. And whatever translation into sentences you do is only an approximation and in some limiting case it always fails.

So also, for NHST, the biggest problems seem to arise when someone tries to translate a complex situation into “simple” sentences. That’s where the biggest errors & pitfalls lie.

]]>You’re making a pretty bold assumption that Neyman would use either of those CIs out there on the high seas. I think you’re probably wrong on that.

]]>Since Mayo has not responded, perhaps I can answer part of my own question through Mayo’s work. Mayo (1982; available from Mayo’s website) writes:

“It must be stressed, however, that having seen the value x, NP theory never permits one to conclude that the specific confidence interval formed covers the true value of 0 with either (1 – alpha)100% probability or (1 – alpha)100% degree of confidence. Seidenfeld’s remark seems rooted in a (not uncommon) desire for NP confidence intervals to provide something which they cannot legitimately provide; namely, a measure of the degree of probability, belief, or support that an unknown parameter value lies in a specific interval. Following Savage (1962), the probability that a parameter lies in a specific interval may be referred to as a measure of final precision. While a measure of final precision may seem desirable, and while confidence levels are often (wrongly) interpreted as providing such a measure, no such interpretation is warranted. Admittedly, such a misinterpretation is encouraged by the word ‘confidence’. ” (p 272)

Mayo is here is accusing Seidenfeld of committing what we call the fundamental confidence fallacy. (In our article, we also make the point that the FCF is encouraged by the word ‘confidence’). Mayo’s remarks about the CIs being inappropriate for “final” precision can of course be extended to what we call the likelihood and precision fallacies, since those are “final” judgments as well, and CIs are just as inappropriate for these. So it seems that Mayo agrees with us on our three fallacies (unless I misunderstand her, or she has changed her mind).

In response to Seidenfeld’s counter-intuitive example that showed a similar disconnect between “confidence” and “final precision”, Mayo writes:

“To this the NP theorist could reply that he never intended for a confidence level to be interpreted as a measure of final precision; and that he never attempted to supply such a measure, believing, as he does, that such measures are illegitimate. It is not the fault of NP theory that by misinterpreting confidence levels an invalid measure of final precision results.” (p 273)

This echoes (or, rather, we echo this, since she was first…) our statement that “Confidence procedures were merely designed to allow the analyst to make certain kinds of dichotomous statements about whether an interval contains the true value, in such a way that the statements are true a fixed proportion of the time *on average* (Neyman, 1937). Expecting them to do anything else is expecting too much.” As Bayesians, we disagree with Mayo on the value of measures of final precision, of course, but we definitely agree on the problem of misinterpreting confidence intervals.

Finally, Mayo also writes:

“In my own estimation, the NP solution to the problem of inverse inference can provide an adequate inductive logic, and NP confidence intervals can be interpreted in a way which is both legitimate and useful for making inferences. But much remains to be done in setting out the logic of confidence interval estimation before this claim can be supported — a task which requires a separate paper.” (p 273)

Mayo, almost 50 years after the theory of CIs was laid out, is saying that “much remains to be done” before one can support the statement that NP confidence theory is “both legitimate and useful for making inferences.” I found this to be a fairly staggering statement from a frequentist, to be sure. We disagree with Mayo’s estimation that this can be done — and are unaware of any work after in 32 years after this paper that did — but it would certainly be valuable work trying.

]]>Well, I’m happy to take your word for this, but then the upshot is that you’re arguing against something that almost nobody actually cares about in practice. If you really believe that anyone who thinks it’s a good idea to take into account the length of the sub and the uniformity of the bubble distribution can’t possibly be a frequentist, then it seems to me that you’re rendering the term ‘frequentist’ largely useless in modern discourse. I think you’ll have a very hard time finding people to endorse the view you describe (i.e., that anyone who’s a frequentist must reject the obviously correct “frequentist” CI here in favor of suboptimal alternatives).

If anything, I think you will probably make many people very happy, since as I understand it, you’re basically saying that anyone can be a Bayesian without ever formally integrating a prior–all one has to do is occasionally think about the plausibility of the models one is testing, and then it doesn’t really matter how those models are formalized beyond that.

Or is there some third label you think we should apply to someone who has no problem using a confidence (and not credible) interval that doesn’t have the “preferred frequentist properties” even if it’s clearly the most sensible model?

]]>Richard:

I almost agree with you here. But let me emphasize that one key aspect of frequentist theory is that it includes many different principles which can contradict each other. Frequentist principles include unbiasedness, efficiency, consistency, and coverage. Careful frequentists realize that no single principle can work, and they recognize that different principles are more or less relevant in different settings, even if this is not always clear in textbook presentations.

]]>Frequentist theory is a principled theory of inference; it is not “whatever makes sense to you”. It has implications, which we are highlighting. In order to say that the frequentist in our example is “dumb” you have to call Welch and Neyman “dumb”. Either they are “dumb frequentists” or you misunderstand the theory.

Frequentism, as a theory, cares about different things than you might expect. The implications of this may not make sense to you, but the answer is not to deny that those are in fact, implications. This stuff has been known for a long time, but has gotten little attention outside theoretical statistics. We aim to change that.

]]>I agree completely with the conclusion you ascribe to Morey et al, Andrew, but I think they clearly go well beyond that in the paper. For instance, even in the abstract, they claim that “CIs do not necessarily have any of these properties, and generally lead to incoherent inferences”. That’s a very strong claim, and doesn’t seem accurate to me. I think it would have been better to say that in most cases naive CIs will lead to coherent inferences, but that there is a non-negligible set of cases where one is liable to draw very wrong conclusions if one is not careful. Of course, this is true of just about any model, whether frequentist or Bayesian.

I think the point you’re making could have been much more simply demonstrated in the paper without introducing Bayes at all–e.g., by comparing the naive frequentist CI to the “correct” one (which could just as easily be presented in its frequentist flavor). It strikes me as quite misleading for the authors to claim that Bayesian approaches solve this particular problem, when as others have noted above, one could have come to the right inference using a different frequentist CI (or, conversely, arrived at the wrong inference with a different Bayesian model).

]]>Tal:

I can’t speak for Morey et al., but I think the key confusion here is that they are not saying that all confidence intervals are bad (certainly not that all frequentist methods are bad, given that any Bayesian procedure can be interpreted as “frequentist,” as all that this means is that various theoretical properties of a method are evaluated). What they are saying is that “confidence intervals” do not represent a general principle of interval estimation. This is a point that may well be obvious to you but it not always clear in textbooks. The issue is not that they are picking a bad frequentist method, the issue is that they are pointing out that a procedure that is sometimes recommended as a good general principle, actually can have some big problems. From the perspective of the user, what’s important is to understand where these methods have such problems.

I find this general approach to inquiry—take a generally-recommended principle and explore simple special cases where it fails miserably—to often be a helpful way to gain understanding. Indeed, I apply this approach myself in criticizing the noninformative Bayesian approach (which I, to my embarrassment, recommend in my textbooks) in the second-to-last paragraph of my above post.

]]>For your argument it is immaterial who invented the CI and what his opinion on this matter was – especially since this was more than a half a century ago. You need to engage modern definition of CI. Recent statistics textbooks are where you want to start.

Further, for your argument it is immaterial that statistician X derived a CI (for the submarine example) by assuming Y, because you are not arguing against X and his derivation but against the concept of CI. If you want to put CI against credible interval you need to provide a CI derivation where your assumptions are close/identical to the assumptions made in the derivation of the credible interval. Your CP1 and CP2 fail to assume uniform distribution for x like your derivation of bayesian CI does .

(Btw you won’t scare me by quoting dead people. I have no problem in stating that some of the work by Fisher, Neyman, Welch etc. was mistaken, wrong, confused, misguided, ignorant and naive. If they did make mistakes in derivation or did make implausible/unstated assumptions, too bad for them.)

]]>Suppose I’m a frequentist. You come to me and say, “hey, we have two models we can consider for the process that generated these bubbles. One of them is completely implausible given everything we know about our lost submarine, but has nice frequentist properties if we completely ignore all of the specific information we have here; the other is almost certainly the correct model, but would produce some very strange results in some hypothetical situations that are nothing like the current one.”

It seems you’re claiming that, as a frequentist, I’m obligated to choose the latter model because frequentists aren’t allowed to think about, or take into account, the likelihood of different data-generating processes. This hardly seems fair; as matus pointed out in his/her blog post, you’re basically pitting a dumb frequentist against a clever Bayesian. Obviously, if you stipulate that only Bayesians are allowed to take into account *any* kind of contextual information about a problem, then frequentism is going to get it wrong much of the time. But that seems like a rather odd view. Surely one doesn’t have to be a Bayesian to realize that it’s a bad idea to privilege a model that makes no logical sense over one that does, right? (Or, to put it differently, if you think that one *does* have to be a Bayesian for that, then I think you will be surprised at the number of people who are happy to be called Bayesians even though it apparently has no implications whatsoever for the way they do their analysis.)

Matus:

No, you misunderstand. I’m not talking about a lack of a closed-form solution, I’m saying that there is no solution at all.

Here’s a simple example: you have data y_1,…,y_n from the logistic regression model, Pr(y_i=1) = invlogit(X_i*b), where X_i is an n*k matrix. Let’s say n=100 and k=10, just to be specific, and let’s also specify that X has no multicollinearity. Also, just to be specific, suppose we want a 95% confidence interval for b_1. There is no general procedure for defining such an interval, not in the sense that you mean, in which the interval is obtained by inverting a hypothesis test.

Nonetheless, practitioners want such intervals (in part, I’d argue, because they’ve been misled by the confident tones of statistics textbooks, but that’s another story). So procedures exist. But it’s not an exact science, it’s a bunch of rules and approximations.

That’s ok, not everything has to be an exact science. Even if something is an exact science, it depends on assumptions that we don’t in general believe (in your example, you want me to give you P_theta(D|theta) but in any real example I’ve ever seen, this probability distribution is only a convenient approximation).

So I don’t think it’s a devastating criticism on my part to say that hypothesis-test-inversion confidence intervals are not an exact science. They represent a statistical method that works in some important special cases, as well as a principle that can be applied with varying success in other situations. And that’s fine. The mistake has not been in people coming up with and using this method, the mistake is when people treat it as a universal principle for interval estimation.

]]>Exact science: If you give me P_theta(D|theta), D and specify alpha I will give you the alpha CI (a,b). You may fail to provide closed form solution for P_theta(D|theta) for a particular model (and take recourse to an approximate solution) but that does not make the particular step where we derive of from P_theta any way less exact. Similar, you would not describe the second law of motion as approximate or not exact just because you can’t measure mass with infinite precision or because you can’t derive a closed form solution for a complex model.

]]>Thanks bsg for giving me a chance to clarify my position.

I was reluctant to start pointing fingers at people making “outlandish claims” and I debated whether starting with that now (as I don’t think it helps much for the overall discussion). But since I consider myself an observant person that likes basing her statements on some evidence, I want to combat the “straw man argument” accusation. Here is a great example of what I’m talking about: http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/

That person is clearly credible, clever and has more than basic stats training. Still he claims things like (I just take some statements):”Frequentism [in the context of a CI – added] and Science do not mix” or “The frequentist confidence interval […] is _usually_ [emphasis added] answering the wrong question” or “[…] if you follow the frequentists [idea of a CI – added] in considering “data of this sort”, you are in danger at arriving at an answer that tells you _nothing meaningful_ [emphasis added] about the particular data you have measured.” or “‘Given this observed data, I can put no constraint on the value of \(\theta\)’ If you’re interested in what your particular, observed data are telling you, frequentism [in the context of a CI – added] is _useless_.” Morey et al also claim “confidence intervals do not have the properties that are often claimed on their behalf” and the blog post here is entitled “fallacy of confidence in CI” (of both latter statements I think they were only chosen for reasons of rhetoric rather than substantive reasons).

To wrap my position up. I’m convinced that

a) CI can be useful procedures for scientific inference about plausible values of the unknown parameter

b) It is very important we know when and why and how they are and are not working well (just as it applies to every other inferential procedure).

I think we (and most if not all others on this blog) will agree with a) and b). For those who do not agree with a) (like the blogger I cited here), I suggest the game above.

]]>You’re missing the fact, which I’ve repeated several times here, that CP2 (and another CP derived by Welch) have better frequentist properties than the credible interval and would be preferred by a frequentist. We explicitly say that the objective Bayes interval is a 50% CP. But it isn’t preferred. The preferred intervals lead to absurdities. I dislike being told over and over how obvious it is that frequentists would use the likelihood/Bayes interval when I’ve cited one of the most important frequentists of the 20th century saying otherwise, and giving explicit frequentist reasoning.

]]>Ah, I see what you’re saying about “additional information”. I think you’re misinterpreting what’s meant by that, though. If all you tell me is that you have a 50% CI, then from my point of view there is a 50% probability that your CI contains the true value. If you also tell me other information, such as the width of the CI, or known flaws in the model you used to produce it, etc, then I can use this “additional information” to come up with a conditional probability that your CI contains theta _given_ that information. Which, I agree, should be done when possible. But this doesn’t mean the “FCF” is a fallacy. (Your example of a 9-m CP1 interval on p.5 is a case of additional information: the size of the submarine and the shape of the bubble distribution, neither of which are used by CP1.)

(I wonder how much of this disagreement is due to a semantic or philosophical difference in the definition of “probability”. I assert that e.g. “in the absence of any other information, there is a 95% probability that the obtained confidence interval includes the population mean” is completely correct and consistent as a statement about frequentist probability.)

Related to this, you’re very concerned in your paper that there’s more than one “valid” CP, and each one gives a different CI; you seem to think this invalidates the statement that a given CI has a 50% chance of containing theta. But if you perform two Bayesian analyses using different priors and different probability models, obviously you’ll get different posteriors; wouldn’t that mean that you can’t interpret posteriors as likelihoods either?

Thank you for clarifying where CP1 and CP2 come from. This example still doesn’t show what you claim, though, because the biggest difference between the three methods (CP1, CP2, and CredInt) is not in whether they’re Bayesian or frequentist but in how much information each one uses.

The Bayesian credibility interval (CredInt) that you give uses the length of the submarine, the fact that the bubble distribution is uniform along that length, and the separation between the bubbles; basically all of the available information.

It’s easy to construct a frequentist confidence interval procedure that uses all this as well (and it basically follows the argument you describe in your supplement for CredInt): Let dx=|x1-x2| be the separation between the bubbles. The farthest the mean of this sample could be from theta is (5-dx)/2, so the sampling distribution of the mean is a uniform distribution centred at theta with width (10 – dx). Thus the 50% CI is x-bar +/- (5-dx)/2. Which of course is the same as your CredInt, but arrived at using purely frequentist methods.

CP2 uses the submarine length and the uniform distribution information, but throws away the bubble separation, so obviously it’s going to perform more poorly. Could Bayesian methods do any better without using the bubble separation?

CP1 does use the bubble separation, but it doesn’t use any information about the submarine at all! If you don’t know whether the length of your submarine is 10mm, 10m, or 10km, it shouldn’t be a surprise that the CI you get from 2 data points is not that useful — and I don’t see how any method could give you a better idea of your measurement precision in that case! This seems to me to illustrate a serious problem with trying to use non-parametric methods with tiny sample sizes, but it doesn’t say anything about CIs in general. What would an _equivalent_ Bayesian credibility interval look like if the ONLY information it’s allowed to use is x1 and x2 (nothing about the bubble probability distribution, submarine size, etc)?

There are certainly many situations where Bayesian methods are the easiest and/or best ways to incorporate information. But I think the reason your submarine example strikes people as silly is because in this case there’s a very straightforward frequentist CP that you’re ignoring, which undermines your entire argument.

]]>Matus:

You write, “Derivation of conf intervals is an exact science.” I’m not quite sure what is meant by “exact science” in this context but I don’t think your description is accurate. What, for example, is the exact science behind the derivation of confidence intervals for logistic regression coefficients?

The derivation of confidence intervals is an exact science in some simple examples but not in general.

]]>I’m unclear how one might argue that. We merely state Neyman’s definition of a confidence procedure. There’s nothing “out of context” about it.

]]>The last I looked at intro textbooks (2 or 3 years ago), the one that seemed the best of those I looked at was DeVeaux, Velleman, and Bock, Stats: Data and Models, 3rd ed. I’ve got some comments on using it in a particular course at http://www.ma.utexas.edu/users/mks/M358KInstr/M358KInstructorMaterials.html . However, that course is for math majors, so includes more mathy stuff than the usual intro stat course. But possibly one of the other books by the same authors might be good for a more standard intro stats course.

For many years, Moore and McCabe’s Introduction to the Practice of Statistics was pretty good, but it seems to have gone downhill since Moore retired and a third author was added.

]]>You might also be interested in Fisher’s opinion on the matter. This is from the discussion on Neyman’s 1934 paper, where he first introduces the idea of the confidence interval (http://www.jstor.org/stable/2342192):

“In particular, [Fisher, as opposed to Neyman] would apply the fiducial argument, or rather would claim unique validity for its results, only in those cases for which the problem of estimation proper had been completely solved, i.e. either when there existed a statistic of the kind called sufficient, which in itself contained the whole of the information supplied by the data, or when, though there was no sufficient statistic, yet the whole of the information could be utilized in the form of ancillary information. Both these cases were fortunately of common occurrence, but the limitation seemed to be a necessary one, if they were to avoid drawing from the same body of data statements of fiducial probability which were in apparent contradiction.

“Dr. Neyman claimed to have generalized the argument of fiducial probability, and he had every reason to be proud of the line of argument he had developed for its perfect clarity. The generalization was a wide and very handsome one, but it had been erected at considerable expense, and it was perhaps as well to count the cost. The first item to which he would call attention was the loss of uniqueness in the result, and the consequent danger of apparently contradictory inferences.” (pp. 617-618)

Fisher also understood that there is not *one unique* way to build a confidence interval. [It is worth noting that in the submarine/uniform case, the Bayes/likelihood interval can be obtained by conditioning on the ancillary statistic, and hence Fisher would identify the objective Bayes interval as the unique fiducial interval. It is also worth noting that Fisher explicitly notes there that Neyman’s theory *does not require this*.]

]]>Our whole point is that yes, the CIs are inappropriate. We actually discuss the “valid” CI for this example in the paper. Its validity rests on its likelihood/Bayes properties, NOT frequentist properties, because the “valid” interval has suboptimal frequentist properties. If you think anyone who uses CP2 over the objective Bayes interval is an idiot, you’re calling Neyman and Welch (and many other frequentist) idiots.

]]>As it is the paper contains a relatively points out two issues that could arise in rare instances when, if they’re big issues they’ll generally be pretty obvious, and then rails agains the FCF that’s known to be false anyway. While the FCF may have proponents in their field, there are CI proponents who know the FCF false and it doesn’t dissuade them.

I’d be much happier with the paper if they changed their use of the word “general” in many places to the word “sometimes”.

]]>You’re just plain wrong; there is not one way of generating confidence intervals. I cited the actual theoretical literature on CIs – some of which deals with the precise example we discuss – and you suggest that I’m embarrassing myself (you know Wasserman didn’t invent CIs, right?)? Read Welch (1939); are you going to tell me that he didn’t understand Frequentist CI theory, but you do? Welch explicitly says that the Bayes/likelihood interval is not the best way to generate a CI in the uniform example, from a frequentist point of view. He then gives an interval that dominates it in frequentist terms. Our CP2 also dominates the Bayes interval.

Under CI theory, there is *not* one way to build a CI. What matters is coverage of the true value and exclusion of false values (in long-run terms). There are sometimes several ways to approach this, leading to the counterintuitive results we discuss.

You should read Neyman (1937) and Welch (1939) before you post again.

(I also noted with some amusement that you and Erf suggested that we didn’t use the “obviously correct” way of generating the interval, and then you two suggested two *different* intervals as obviously correct, both of which we mention in the paper.)

]]>The post may be incorrect about the specific assumptions of your CP1 and CP2 but it’s not incorrect that there are perfectly valid CIs for the situation and your paper only works if the CI researcher is an idiot using inappropriate CIs and the Bayesian isn’t. matus just reversed the idiots.

]]>As to the point 2 quote from Neyman, he prefaces the whole thing by essentially saying that if this confidence procedure is applicable to the situation. In which case he’s correct. So one might argue the quote is out of context given how it’s being used.

]]>He’s talking about a correct confidence procedure for that situation. The CI’s do get the true value 95% of the time, it’s just sometimes the CI’s lie entirely in regions which are impossible (according to the same assumptions as used to construct the CI).

What’s happening is the CI’s are chasing after that 95% coverage. In order to get it in the long the normal unexceptional CI procedure will generate absurd intervals for certain data points. The insane part of this is that given the actual data, you know whether you’re in one of those bad cases. The Bayesian Credibility intervals are automatically taking this into consideration and giving the optimal interval estimate for the data actually seen.

More to the point, if you’re allowed to vary the bet sizes in Momo’s bet above, a Bayesian can use this fact to bankrupt the Frequentist very quickly (unless of course the CI’s are always equal to the equivalent Bayesian Credibility Interval).

Chalk one up for Bayes.

]]>bxg,

The backwards conclusion Morey et al make at the end of their paper is, “we have shown that confidence in- tervals do not have the properties that are often claimed on their behalf” and it’s not subtle at all. That’s not about the specific intervals they looked at. They’re making a sweeping generalization and it’s using logic equally as bad as the FCF fallacy (using their terminology).

]]>Daniel,

If across repeated uses of the procedure in a given situation it doesn’t capture the true value 95% of the time then by definition it isn’t a confidence procedure for that situation. So that’s a pretty specious argument. The procedure for getting a proper credible interval can’t be misapplied either.

]]>Is there an introductory text that you have found to be particularly good?

]]>