Question 11 of my final exam for Design and Analysis of Sample Surveys

11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES.

Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval.

Solution to question 10

From yesterday:

10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office.

Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

12 thoughts on “Question 11 of my final exam for Design and Analysis of Sample Surveys

  1. Not having studied statistics since long before Agresti-Coull was published I would have chosen to solve this a different way.

    Being Bayesian at the core, I would have calculated a posterior distribution and calculated the interval [0, x] such that it had a 95% cume. prob.

    Doing so with a uniform prior (with 100 trials the prior doesn’t matter much), the cume is cdf(s) = 1 + (s – 1)^101 over the range [0, 1].

    Solving for cdf(s) = 0.95, gives, s = 0.0292252.

    So, did I make a mistake somewhere or is the Agresti-Coull shortcut imprecise in this case? (0.029 vs. 0.05)?

    Chuck
    PS. I got
    d(p) = 101*(1 – p)^100
    p in the range [0, 1] for the posterior distribution of p.

    • Chuck:

      I’m not sure. One way to look at is that Agresti-Coull is based on a Beta(2,2) prior rather than a flat Beta(1,1). Another take on it is that Agresti-Coull is supposed to have good frequency properties for the 95% interval, whereas the interval based on (y+1)/(n+2) doesn’t have such good coverage conditional on the true p.

      • Why the obsession with coverage properties? Once you admit that a 95 percent confidence interval doesn’t have exactly 95 percent coverage in any finite case, aren’t you freed up to use results like Chuck’s (which I admit I’ve used before, so I’m looking for a little Bayesian imprimatur.) Particularly since your interval, unlike Chuck’s doesn’t include 0. While I grant in the particular problem you have, zero is impossible (although it’s always possible that all political offices up to now have been held by zombies) there are lots of times when zero in teh population in question might actually be the true fraction. Agresti-Coull will never put in the confidence interval the result best supported by the data!

        • Jonathan:

          1. This was a final exam for my class on design and analysis of sample surveys in the political science department. I wanted them to have something to do when y=0, and Agresti-Coull is a standard method.

          2. My interval does include zero! You must have misread what I wrote. Look again at the very last sentence of the above post.

        • Yeah, I hadn’t had my coffee yet, which disturbs both my confidence intervals and my reading comprehension. But doesn’t that make it worse? You actually calculate an Agresti-Coull interval and then truncate to make it make sense. What is the coverage probability of such a beast?

          Agresti-Coull is certainly a standard method, and your students (and everyone else) should know it, and I know you don’t like exact methods (Clopper-Pearson comes immediately to mind) but the “conservatism” of exact methods has always seemed a misnomer to me, except in situations where some narrow Neyman-Pearson consideration is at play, ie, almost never.

        • Jonathan:

          The Agresti-Coull interval is supposed to be truncated. I have not studied the coverage probability of this “beast” (as you call it); that’s the subject of the Agresti and Coull paper!

    • The “rule of 3” (the 95th percentile if there are zero cases observed) is 3/n or .03.

      One reference to this rule is Winkler, et al, The American Statistician, 2002. In that article, they discuss how in many contexts there is a better Bayesian approach than the “rule of 3”. That article is later than Agresti-Coull but does not reference it.

      But I’ve just used the rule of thumb and never done research on it.

    • Using a Beta(1,1) as a prior is also the bayesian “point of view” of the well-known frequentist “rule of three”.

      B. D. Jovanovic and P. S. Levy, A Look at the Rule of Three, The American Statistician, Vol. 51, No. 2 (May, 1997), pp. 137-139

  2. Pingback: Question 12 of my final exam for Design and Analysis of Sample Surveys « Statistical Modeling, Causal Inference, and Social Science

  3. It doesn’t bother you that the correct answer on your test was that 2% (+/- something) of the population has held political office?

    • Tim:

      I actually do think the true population percentage is between 0 and 5%. The point of the Agresti-Coull procedure is to get a 95% confidence interval. The (y+2)/(n+4) value is just a step in that procedure, it’s not really to be interpreted as a point estimate.

      • And, (beating a dead horse) it’s not intended to actually get a 95 percent confidence interval, either. From Brown, Cai DasGupta, Annals of Statistics, 2002:

        “To summarize, the conclusion is that the Agresti–Coull interval dominates the other intervals in coverage, but is also longer on an average and is quite conservative for p near 0 or 1. The Wilson, the likelihood ratio and the Jeffreys
        prior interval are comparable in both coverage and length, although the Jeffreys interval is a bit shorter on average. If we also take simplicity of presentation and ease of computation into account, the Agresti–Coull interval, although a bit
        too long, could be recommended for use in this problem. If simplicity is not a paramount issue, either the Wilson, the likelihood ratio, or the Jeffreys intervalmay be used, depending on taste.”

Comments are closed.