When I teach this material I make the point that sqrt(p*(1-p)) has a maximum of 0.25 and that it is very close to this maximum for a wide range of values of p. If a student were to use 0.4*0.6 in the above problem, I’d still give full credit. Either answer is ok here.

]]>I disagree that ‘It doesn’t matter.’ When you make people double check and doubt the ‘correct’ answer then thats annoying.

Sure it doesn’t matter in practice. For a student’s sanity it might.

]]>If the model used for estimating the effect size is a logistic regression, wouldn’t it make more sense to frame the effect size as an odds ratio (1.5) or a log odds ratio rather than a difference in percentages (10%)?

It seems like the odds ratio is an effect size measure that’s more likely to generalize, for example if you want to predict the effect of an incentive on a similar survey which would otherwise get, say, a 30% response rate.

In any case, thanks for sharing these questions. It’s always an interesting and slightly-humbling thing to read through.

]]>If it is a concern, and if we still want to solve it by hand rather than referring to the Beta distribution table, we can use log-normal approximation. Let (1-q)~z and z~log_normal(mu, sigma), using mu+sigma^2/2= 1-\hat p and exp(sigma^2-1) exp(2mu +sigma ^2) = \hat p ( 1- \hat p) /100 we can derive mu and sigma (by hand), and the normal interval of log z translates into the log-normal interval of p and q.

]]>A more concerning problem: the procedure is seriously wrong if the percentage in the population who know the answer is close to 100%. In that case all the responses may be correct by chance (because all the students selected know the answer, or those who don’t get lucky) and the confidence interval produced is pointwise and doesn’t include the true value.

I did a simulation (and I may have done something wrong!) but what I got is that coverage is reasonable for true values below 0.75. There are some patterns (low-freq and high-freq) but the coverage remains more often than not between 94% and 95% (and always above 92%). For true values above 0.75 the oscillations in coverage become larger, going below 90% four times between 0.85 and 0.95, down to 80% coverage for p=0.96 and again for p>0.98 (with coverage approaching 0% as we get closer to 100%).

]]>It doesn’t matter. For this purpose, 0.25 and 0.24 are the same thing. I went with 0.25 because it’s generally cleaner in considering the design of studies to just use that 0.25 upper bound. If the estimated probability were 0.1 or something, then, yeah, I’d use p*(1-p).

]]>Hey, question 2 is just more of the same if you’re doing this by simulation. At least if I correctly recall what a confidence interval is :-) It also helps that I’ve already coded an IRT model with guessing in working out Stan case studies. I sort of feel like all these questions have a little trick in them like this in that they’re not just turn the crank. You have to think about how the log odds scale matters in that first question. At least these questions should get students thinking hard about the likelihoods involved.

P.S. Thanks for getting the `pre` tags working in comments again. It helps if you want to post code.

P(Correct) = q

P(Correct | Guess) = 0.25

P(Correct | Know) = 1

P(Know) = p

P(Guess) = 1 – p

q = 0.25(1 – p) + p = 0.25 + 0.75p

p = (q – 0.25) / 0.75

q.hat = 0.6

p.hat = (q.hat – 0.25) / 0.75 = 0.467

n = 100

Var[p.hat] = Var[(q.hat – 0.25) / 0.75] = 1.778Var[q.hat] = 1.778(q(1 – q) / n)

SE.hat[p.hat] = sqrt(1.778(q.hat(1 – q.hat) / n)) = sqrt(1.778(0.6 * 0.4 / 100)) = 0.065

95% CI = p.hat +/- 1.96 * SE.hat[p.hat] = 0.467 +/- 1.96 * 0.065 = (0.340, 0.594)

Answer:

47% (34%, 59%)

]]>