Sam:

When I teach this material I make the point that sqrt(p*(1-p)) has a maximum of 0.25 and that it is very close to this maximum for a wide range of values of p. If a student were to use 0.4*0.6 in the above problem, I’d still give full credit. Either answer is ok here.

]]>Andrew:

I disagree that ‘It doesn’t matter.’ When you make people double check and doubt the ‘correct’ answer then thats annoying.

Sure it doesn’t matter in practice. For a student’s sanity it might.

]]>D’oh, sorry, I meant to reference Question 1 and not Question 2.

]]>If the model used for estimating the effect size is a logistic regression, wouldn’t it make more sense to frame the effect size as an odds ratio (1.5) or a log odds ratio rather than a difference in percentages (10%)?

It seems like the odds ratio is an effect size measure that’s more likely to generalize, for example if you want to predict the effect of an incentive on a similar survey which would otherwise get, say, a 30% response rate.

In any case, thanks for sharing these questions. It’s always an interesting and slightly-humbling thing to read through.

]]>Hi Carlos, you are questioning the normal approximation of a restricted variable p \in (0,1). The requirement for q+2sd< 1 result in P(observed correct)=q <0.96 and therefore P(know)=p <0.95 for this sample size. Anything higher than that leads to mass loss on the right end. This explains your simulation results.

If it is a concern, and if we still want to solve it by hand rather than referring to the Beta distribution table, we can use log-normal approximation. Let (1-q)~z and z~log_normal(mu, sigma), using mu+sigma^2/2= 1-\hat p and exp(sigma^2-1) exp(2mu +sigma ^2) = \hat p ( 1- \hat p) /100 we can derive mu and sigma (by hand), and the normal interval of log z translates into the log-normal interval of p and q.

]]>This procedure can generate “ugly” confidence intervals including impossible values. For example if the correct responses were 25% instead of 60% the confidence interval would be centered around 0. And if we make the sample size small enough the confidence interval covers completely the [0,1] range.

A more concerning problem: the procedure is seriously wrong if the percentage in the population who know the answer is close to 100%. In that case all the responses may be correct by chance (because all the students selected know the answer, or those who don’t get lucky) and the confidence interval produced is pointwise and doesn’t include the true value.

I did a simulation (and I may have done something wrong!) but what I got is that coverage is reasonable for true values below 0.75. There are some patterns (low-freq and high-freq) but the coverage remains more often than not between 94% and 95% (and always above 92%). For true values above 0.75 the oscillations in coverage become larger, going below 90% four times between 0.85 and 0.95, down to 80% coverage for p=0.96 and again for p>0.98 (with coverage approaching 0% as we get closer to 100%).

]]>No.

]]>Ethan:

It doesn’t matter. For this purpose, 0.25 and 0.24 are the same thing. I went with 0.25 because it’s generally cleaner in considering the design of studies to just use that 0.25 upper bound. If the estimated probability were 0.1 or something, then, yeah, I’d use p*(1-p).

]]>Oops. Is that second mistake because you just wanted students to use prebuilt software or is it because you wanted me to do something like a bootstrap estimate from the actual data? It gives you the same draws here to bootstrap from data and to binomial sample from the MLE. So I don’t see the problem.

]]>Hey, question 2 is just more of the same if you’re doing this by simulation. At least if I correctly recall what a confidence interval is :-) It also helps that I’ve already coded an IRT model with guessing in working out Stan case studies. I sort of feel like all these questions have a little trick in them like this in that they’re not just turn the crank. You have to think about how the log odds scale matters in that first question. At least these questions should get students thinking hard about the likelihoods involved.

P.S. Thanks for getting the `pre` tags working in comments again. It helps if you want to post code.

The alternative approach is to obtain the CI for q, and then transform the endpoints to obtain the CI for p, but by linearity this produces the same CI as directly obtaining the CI for p.

]]>P(Correct) = q

P(Correct | Guess) = 0.25

P(Correct | Know) = 1

P(Know) = p

P(Guess) = 1 – p

q = 0.25(1 – p) + p = 0.25 + 0.75p

p = (q – 0.25) / 0.75

q.hat = 0.6

p.hat = (q.hat – 0.25) / 0.75 = 0.467

n = 100

Var[p.hat] = Var[(q.hat – 0.25) / 0.75] = 1.778Var[q.hat] = 1.778(q(1 – q) / n)

SE.hat[p.hat] = sqrt(1.778(q.hat(1 – q.hat) / n)) = sqrt(1.778(0.6 * 0.4 / 100)) = 0.065

95% CI = p.hat +/- 1.96 * SE.hat[p.hat] = 0.467 +/- 1.96 * 0.065 = (0.340, 0.594)

Answer:

47% (34%, 59%)

]]>