Daniel Lippman sent me this news article which would be an excellent thing to give your statistics students to read if you’re covering confidence intervals and sampling.

Daniel Lippman sent me this news article which would be an excellent thing to give your statistics students to read if you’re covering confidence intervals and sampling.

You could ask students what they think of the author's explanation of margin of error. The article says:

"In theory, when a poll says 40 percent, with a 3 percentage point margin of sampling error, that means that 19 times out of 20, the number would be between 37 and 43 if you interviewed every adult American. But it’s a bell curve. So 40 is the most likely number. Thirty-nine and forty-one are a little less likely. Thirty-eight and 42 are even less likely than 39 and 41. And 37 and 43 are still less likely."

Re: previous comment. I think that explanation is just plain wrong. Setting a 95 percent confidence interval says that there is a 95 percent probability that the observation came from a distribution where p was between 37-43.

Ironically, the probability is actually quite low that the observation came from one where, say, p was every close to .4.

Vince, Ollie:

I think the article's description is pretty good. It's a Bayesian inference based on a flat prior distribution on the unknown probability. The flat prior distribution is not perfect, and problems certainly show up when sample sizes are small, but it's a good start, and it's consistent with how polls are summarized.

Ok, for the sake of discussion. Say you take your poll and calculate, say, a 50% confidence interval. That confidence interval would be a fairly narrow band about 40, no?

Now suppose we took a 25 percent confidence interval. It is even narrower, and centered about 40, right? Now drop the interval to 10%. It still contains 40 at the center, right?

I guess that my beef with this explanation (the one from the article) is that it gives too much sway to the actual statistic; that is why you see people saying "oh, Thompson's support is dropping since it was 40 today and was 42 yesterday." In fact, it is "probable" that the 40 and the 42 were samples from the same distribution.

I welcome correction and/or feedback as my questions are designed to strengthen my own understanding!

Ollie,

A frequentist 95% confidence interval should be interpreted as follows:

In repeated sampling, the confidence interval constructed from the data will cover the true value of the parameter 95% of the time.

If you are constructing a 95% bayesian interval, then the posterior probability that p belongs to the interval (.37,.43) is .95. If you construct an interval with a lower confidence level, then the probability that p belongs to the interval constructed will be smaller. The width of the confidence interval gives you some indication as to how much information you have about the "true" location of the parameter. If your 95% confidence interval is fairly wide, then the posterior probability that the parameter is within an extremely small region around phat is going to be really close to zero. However, the posterior mode is still the most likely estimate for p, even though the probability that p is exactly equal to the posterior mode is extremely unlikely.

Is it obvious that I'm a graduate student in statistics? I think that my explanation would make sense to a statistician, though I think I probably just failed in my attempt to explain a confidence interval to a non-statistician.

Your explanation is fine (at least for me); my writing is not. :)

I am not a statistician; I am a topologist, and therefore my understanding of statistics is basic.

I know that the measured p(hat) is the MLE estimator for the parameter p.

So, the explanation in the Times article is, while not precise, is ok for the lay-person, though I'd be a bit more cautious to explain that the observed "p" from a poll is reall a "point" estimate of the parameter "p".