it’s just a test question, so shouldn’t be taken as advice about what to Actually do.

]]>Sure, fair enough. I hadn’t thought of that. I agree that we could do better using prior information. Considering the question is not just current officers but “Have you ever held,” the true population value is not negligible. I’ve never had a student try to include that information in the answer, but it would be ok if they did.

]]>Every place with 10k inhabitants has a mayor, almost certainly every place above 5k, hence my numbers. Larger communities have more than one elected office. Not American, so can’t guess what the lowest reasonable bound is, but could be as low as 1/500 which is meaningfully different from zero.

@Carlos

1) But the prior knowledge that the CI can’t start below zero is expected to be incorporated? That’s doublethink. I took the instruction to mean that you can’t plug in the true number you happen to know from elsewhere.

2) I thought this place was about thinking beyond the numbers.

3) I’m not suggesting students whose CI starts at zero should be considered as having failed. My issue is with ‘confidence intervals that excluded zero (which can’t be right, given that the data are 0/50)’ which can’t be right, given that I know we’re talking about _Americans_. That information is given in the question, yet subsequently we’re expected to selectively ignore it.

This concern appears to be dismissed by ‘given the data are 0/50’, but that’s irrelevant, because the data also are ‘Americans’ and we know Americans have elected offices held by other Americans. Even if we polled 5 million Americans without finding a single one who held elected office it’s still more likely that our procedure is faulty or we’re just very very unlucky than that everything we know about the American political system is wrong. ]]>

with pm.Model() as q_8_model:

p = pm.Beta(“p”, alpha=2, beta=2)

y = pm.Binomial(“y”, n=50, p=p, observed=[0])

mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat

p 0.036841 0.025556 0.000446 0.001293 0.08663 3875.80476 1.001024

From Chris “Bayesian inference is all conditional on the model, up to and including the prior specification … It puts the emphasis squarely back on the workflow of scientific reasoning and modeling assumptions at play.”

And “only ever feel like I can understand most non-Bayesian procedures after mapping them to some kind of Bayesian analogue” – me too.

It really should, but that’s more recent than you think and likely still disregarded in many places. I took a shot at it being disregarded in 1996 “When a posterior is presented, I believe it should be clearly and primarily stressed as being a “function” of the prior probabilities and not the probability “of treatment effects.” Two cheers for Bayes. https://www.sciencedirect.com/science/article/pii/S0197245696907539?via%3Dihub

From Daniel “I think it’s clear that Classical Statistics has the most political power … applied in many areas as a way to try to wield power to crush dissent.” That certainly is the case in areas I have worked. One concern would be how it would hold up in court :-(

From Andrew ” first the statistical establishment required statistical significance, now the statistical establishment is saying that statistical significance isn’t good enough” I would add “not providing a clear sense of what would be widely acceptable today.”

]]>Yup. But the funny thing is that I think that a lot of the people doing bad science also feel that they’re being pounded by classical statistics.

It goes like this:

– Researcher X has an idea for an experiment.

– X does the experiment and gathers data, would love to publish.

– Because of the annoying hegemony of classical statistics, X needs to do a zillion analyses to find statistical significance.

– Publication! NPR! Gladwell! Freakonomics, etc.

– Methodologist Y points to problems with the statistical analysis, the nominal p-values aren’t correct, etc.

– X is angry: first the statistical establishment required statistical significance, now the statistical establishment is saying that statistical significance isn’t good enough.

– From Researcher X’s point of view, statistics is being used to crush new ideas and it’s being used to force creative science into narrow conventional pathways.

This is a narrative that’s held by some people who detest me (and, no, I’m not Methodologist Y; this might be Greg Francis or Uri Simonsohn or all sorts of people.) There’s some truth to the narrative, which is one thing that makes things complicated.

]]>when we interpret powerful as political power, I think it’s clear that Classical Statistics has the most political power, that is, the power to get people to believe things and change policy or alter funding decisions etc… Today Bayes is questioned at every turn, and ridiculed for being “subjective” with a focus on the prior, or modeling “belief”. People in current power to make decisions about resources etc are predominantly users of Classical type methods (hypothesis testing, straw man NHST specifically, and to a lesser extent maximum likelihood fitting and in econ Difference In Difference analysis and synthetic controls and robust standard errors and etc all based on sampling theory typically without mechanistic models…

The alternative is hard: model mechanisms directly, use Bayes to constrain the model to the reasonable range of applicability, and do a lot of computing to get fitted results that are difficult for anyone without a lot of Bayesian background to understand, and that specifically make a lot of assumptions and choices that are easy to question. It’s hard to argue against “model free inference procedures” that “guarantee unbiased estimates of causal effects” and etc. But it’s easy to argue that some specific structural assumption might be wrong and therefore the result of a Bayesian analysis might not hold…

So from a political perspective, I see Classical Stats as it’s applied in many areas as a way to try to wield power to crush dissent.

]]>Thankfully, I find that it is super humbling and clarifying to always insist on recalling that the Bayesian inference is all conditional on the model, up to and including the prior specification – and of course the particular data at hand. The posterior measure is relative to the prior measure. It’s all conditional on “this is the (small) world” as Lindley would say. What this prevents is naive reification of probabilities, or models for that matter, as some kind of ultimate thing (“probability does not exist!”). It puts the emphasis squarely back on the workflow of scientific reasoning and modeling assumptions at play.

Anyhow, only tangentially related, but perhaps pertinent is that personally I only ever feel like I can understand most non-Bayesian procedures after mapping them to some kind of Bayesian analogue. ]]>

No, but hopefully we have learned how damaging unsafe Classical statistics has been.

Which style of unsafe statistics is worse (Daniel’s question)?

Not sure – which is the most powerful or gives seeming most compelling results…

]]>The Agresti-Coull method was the only thing I taught in class, so students either got this right because they remembered what to do, or they screwed up somewhere. If someone had come up with a completely different approach on their own, I’m not sure what I’d’ve done, but it didn’t come to that.

]]>I find the two-tailed intervals less principled, even though they behave better and are easier to calculate. For the Clopper-Pearson interval I don’t understand why the two-tailed probability alpha/2 is used even when one of the tails disappears. Using alpha (and recovering the rule of three) makes more sense for x=0 (or x=1), leading to narrower intervals which still have the right coverage (we see in figure 11 that close to 0 or 1 the using alpha/2 gives too much coverage). Even when two tails with probability alpha/2 exist the equal-tailed interval is not optimal: https://hal.archives-ouvertes.fr/hal-00537683/document

]]>I assume these plots are for n = 50?

]]>I obviously meant [0, 0.057] and [0, 0.038].

As a bonus, Haldane’s prior is included here: https://imgur.com/a/v4JU3MZ

It gives the interval [0, 0] which is indeed quite unreasonable!

]]>I added the average coverage (even though it doesn’t really mean anything in from a frequentist point of view).

Note as well that the reasonable response for the confidence interval is [0, 0.088] so if Laplace’s [0, 0.57] is bad Jeffreys’ [0, 0.38] is twice as bad.

]]>The course and the book are mostly Bayesian (you can see in a few months when Regression and Other Stories appears in print), but we do have some classical confidence intervals.

]]>Well, I asked for a 95% confidence interval, which is not what those other methods give. Anyway, this was the final exam for the class I taught, based on the book I wrote. I hadn’t mentioned those other approaches in my class or my book, so it’s unlikely any students would come up with these.

]]>That “what will people repeatedly do” later given what they should have learned in the course.

And if the course won’t or can’t cover Bayesian Workflow adequately, pointing them to Bayes may well do more harm than good.

Bayes has to be practiced safely!

]]>I like teaching the Agresti-Coull method because (a) these y=0 or y=n examples do come up in real life from time to time, and (b) talking about these examples reveals a problem with the standard sqrt(p(1-p)/n) interval which is what I usually recommend. The interval also has approximately 95% coverage which is not such a big deal to me but can be convenient in applications because then I can just point to the Agresti-Coull paper and move on. As can be seen from some of the comments in the above thread, it’s good to give *some* reasonable answer here or else people can do all sorts of weird things.

Since I’d already told them how to do this in class, and since it’s in the book too, I wasn’t really expecting anything other than the Agresti-Coull interval here. There are lots of answers that one can give to this problem, but in the context of this applied statistics course, Agresti-Coull was the answer. We didn’t ever even talk about the beta distribution in that class.

And I do think that calculating the numbers is an important part of applied statistics. I see too many cases where students (and, later on, practitioners) given nonsensical answers just because they popped out of some wrongly-applied formula.

]]>No, I’m asking for a confidence interval, not a hypothesis test.

]]>I recommend you read the Agresti and Coull paper, which discusses many of these issues. It’s good stuff. I don’t think the Agresti-Coull method solves all problems—I agree that if you have real prior information you should use it—but it does give that approximate 95% coverage, it’s easy to use, and the result is reasonable.

]]>For the high end of the confidence interval (the low end obviously being zero), aren’t we asking how high the percent of the population that has held office could be, such that the odds of polling 0/50 are .05 or less? If so, then if .088 of the population has held office, then the odds of sampling 0/50 are 0.912^50, which is 0.01, not 0.05. (OK, 0.009994 given these exact rounded figures.) Am I misunderstanding the question?

]]>In the interest of being as simple as possible, and not being picky, we could just say that the result of zero responses out of 50 is unlikely if the probability p were actually non-zero. How unlikely? Probably no more than one standard deviation out. For this condition, we would have

s = sqrt(Npq) = Np # estimate of standard deviation of survey counts

==> q = Np = 1-p

p = 1/(N + 1) ~= .02

Sure, this is smaller than the one based on Agresti-Coull, but considering the other uncertainties, not too bad.

]]>Agresti-Coull is not really equivalent to a Bayesian analysis, it’s what Jaynes would call an adhoc device to produce an interval with some frequentist properties.

]]>If I had given the response [-0.01, 0.09] and it was marked as incorrect I would claim it’s as good [0, 0.09] as far as the definition of a confidence interval goes.

Anyway, it’s interesting that the confidence intervals given here (using 4 different methods) were substantially narrower: [0, 0.06] or [0, 0.07] instead of [0, 0.09].

]]>