Question 17 of my final exam for Design and Analysis of Sample Surveys

17. In a survey of n people, half are asked if they support “the health care law recently passed by Congress” and half are asked if they support “the law known as Obamacare.” The goal is to estimate the effect of the wording on the proportion of Yes responses. How large must n be for the effect to be estimated within a standard error of 5 percentage points?

Solution to question 16

From yesterday:

16. You are doing a survey in a war-torn country to estimate what percentage of unemployed men support the rebels in a civil war. Express this as a ratio estimation problem, where goal is to estimate Y.bar/X.bar. What are x and y here? Give the estimate and standard error for the percentage of unemployed men who support the rebels.

Solution: x is 1 if the respondent is an unemployed man, 0 otherwise. y is 1 if the respondent is an unemployed man and supports the rebels, 0 otherwise. The estimate is y.bar/x.bar [typo fixed], the standard error is (1/x.bar)*(1/sqrt(n))*s.z, where s.z is the standard deviation of the residuals z.i = y.i – (y.bar/x.bar)x.i.

8 thoughts on “Question 17 of my final exam for Design and Analysis of Sample Surveys

    • Just to elaborate on my previous comment: Classical statistical education seems to me to be cluttered with special cases, so-and-so’s method or test or whatever. I much prefer working from basic principles. For example, in Bayesian Data Analysis we briefly discuss how the so-called Wilcoxon test can be seen as a standard analysis obtained after first doing a rank transformation of the data. To me, the key principles are the rank transformation (typically not a good idea in my opinion, but, sure, there’s a time and a place for everything) and the standard analysis (normal-theory is fine here). Having a named test to me just distracts from the key ideas.

      • Here’s what I would do. Assuming that both questions are asked simultaneously of respondents, there are four possible outcomes: x1 = unemployed,support rebels, x2 = unemp,no support, x3 = emp, sup, x4 = emp, no sup. This is a multinomial so means/covariances are known. Then the percent of unemployed men who support the rebels is x1/(x1 + x2).
        Formulas for the distribution of this random variable are given in Newsom “Asymptotic distributions of linear combinations of logs of multinomial parameter estimates” (this seems to be unpublished, but google it–it references published work, including that by Lindley from 1988 on this same problem).

        I think your response would be that your approximation is sufficient and actually, I agree with you–fighting over a marginally better approximation is not typically worthwhile since sources of errors from other factors are much greater than the error of using one approximation of a distribution (almost always). The advantage of the above approach(es) is that they allow one to build on any linear combinations of the elements of the multinomial (and hence is worth pointing to the references in the literature, since one of the points of your blog is information).

  1. Ummm.. the original problem said MEN so the proposed solution is inaccurate… unless the “you” in
    Andrew’s “if you’re unemployed” is implicitly understood as referring only to men.

  2. Pingback: Question 18 of my final exam for Design and Analysis of Sample Surveys « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.