# Frequentist

n = 100 # number of students

theta = 0.47 # assumed population proportion of students who know answer

z = rbinom(500, n, theta) # 500 samples of # of students who know answer

y = rbinom(500, n-z, 0.25) # total number of correct answers among those who don’t know answer

p = (z+y)/n # sample proportion of correct answers

hist(p)

mean(p)

quantile(p, probs=c(0.025, 0.975))

# when I ran this, I got 0.60 mean, (0.51, 0.69) 95% interval as expected

# Bayesian

theta = runif(1000) # “flat prior”: 1000 draws of proportion of correct answers

z = sapply(theta, function (q) rbinom(1, n, q)) # number of students who know answer

y = sapply(z, function (k) rbinom(1, n-k, 0.25)) # number of correct answers among those who don’t know

p = (z+y)/n # proportion of correct answers

# let’s look at p versus theta

plot(p ~ theta, las=1, cex=0.7, cex.axis=0.7)

# get regression line

ptheta.lm = lm(p~theta)

abline(ptheta.lm, col=”red”)

text(0.3, 0.9, paste(“p = “, round(coef(ptheta.lm)[1],2), “+ “, round(coef(ptheta.lm)[2],2), “* theta”), col=”red”, cex=0.7)

abline(h=0.6) # observed p was 0.6

abline(h=c(0.51, 0.69), lty=2) # 95% interval around 0.6 as found above

# restrict attention to samples with p around 0.6 as observed

theta2 = theta[round(p,1)==0.6]

hist(theta2)

mean(theta2)

quantile(theta2, probs=c(0.025, 0.975))

# when I ran this, I got mean 0.47, 95% interval (0.34, 0.62), close to expected

Andrew, thanks for your patience and explanation. Population theta (duh!). Sigh.

This blog is fun and useful. I only have one problem with it. When I go to my mailbox I find that I fear that there will be a large tuition bill from Columbia.

Bob

]]>Which by the way is the same as a binomial(0.55,100), but it was not completely obvious to me at first sight.

https://en.wikipedia.org/wiki/Binomial_distribution#Conditional_binomials

]]>Correction: X is distributed as binomial(0.25,100-K).

]]>> Your formulation has number of right answers distributed as binomial(0.55, 100). But it appears to me that, assuming theta = 0.4, the number of right answers is distributed as 40 + X, where X is distributed as binomial(0.25, 60).

The number of right answers is distributed as K+X where K (the number of students who know the correct answer) is distributed as binomial(theta,100) and X is distributed as binomial(0.25,K).

]]>Bob:

I think my answer is correct but I didn’t describe it so clearly. Theta is the proportion of students *in the population* who know the answer. So if theta=0.4, that doesn’t mean that 40 of the 100 students *in the class* know the answer.

To put it another way, you’re getting inference for the finite-sample quantity: what proportion of students in the class know the answer? In my problem, I’m asking for inference for the population quantity: what proportion of students in the population know the answer? The finite-sample inference will be more precise than the population inference, which makes sense: we know more about these 100 students than we do about the general population of which we are considering them as a sample.

]]>Well, after reflection I am even more confused.

Your formulation has number of right answers distributed as binomial(0.55, 100). But it appears to me that, assuming theta = 0.4, the number of right answers is distributed as 40 + X, where X is distributed as binomial(0.25, 60).

Unless I have made a mistake, the variance of the first distribution is 24.75 while the variance of the second is 11.25.

I haven’t tried to do the relevant simulations, but it seems to me highly likely that the 40 + X model will give different answers than will the binomial(0.55, 100) model.

Bob

]]>Thanks

Bob

]]>Bob:

Try this. Start by assuming a true value of theta. It shouldn’t matter exactly what value we choose; say 0.40 as this is comfortably within our confidence interval. Then Pr(correct answer) is 0.25 + 0.75*theta = 0.55 in this case. Now it’s easy to simulate 100 students’ responses: y = rbinom(1, 100, 0.55). Do this 1000 times, so you have 1000 simulations from the sampling distribution of y, conditional on the assumed true value of theta.

Now for each of these 1000 simulated y’s, compute p_hat = y/100 and theta_hat = (p_hat – 0.25)/0.75.

And now we’re ready to use these simulations to approximate the sampling distribution of theta_hat | theta. Compute the mean of the 1000 theta_hat’s, this will be approx 0.40 because theta_hat is an unbiased estimate of theta. Compute the sd of the 1000 theta_hat’s, and you’ll get something close to 0.7, because that’s the standard error we worked out above.

]]>I tried to do such a simulation. But I stumbled on the problem of “What is the process to be simulated?” In particular, do we (1) assume a fixed number of students who know the answer and a variable number of students who guess correctly—leading to variation in the number of correct answers or

(2) assume a fixed number of students (60) of students who get the answer correct and therefore a varying number of students who guess correctly and a correspondingly varying number of students who know the answer?

If we choose (1) what is the proper number of students to assume know the answer?

Bob

]]>I often ask multiple choice questions with the instruction “select all that apply” to signal that the correct answer may consist of multiple items.

But, this doesn’t work with popular auto-grading systems. Multiple choice exams are popular because they can be graded by machines, and the machines usually require that there be a single correct answer.

]]>Hmmm… I was thinking of more subtle questions. For example, given a problem P and a set of possible solution methods, one could ask, for example

Q: What are the admissible methods for solving P, and why ?

A: S1 because X

B: S2 because Y

C: S1 and S2, but S1 can be preferred if Z(P)

D: S1 and S2, but S2 should be preferred because T(P)

Example : a set of possible solutions to a given statistical problem, where the subject matter *may* involve a preference in the bias/precision tradeoff. Depending on P, some combinations of the answers may or may not be possible.

This type of question allows to get more information about the knowledge and the reasoning abilities of the respondent than a simple alternative…

]]>This would likely be cultural. Multiple correct options on a multiple choice test in the US is almost always handled as an option.

ex What is the answer?

A: Red

B: Blue

C: Green

D: All of the above

E: A and B

> A multiple-choice test item has four options

as meaning “four *non-mutually-exclusive* options”. So the set of possibles answer had cardinal 16, only one being actually correct. (The rest of my reasoning was identical to yours).

Is that a frequent misunderstanding ?

]]>Got it. Thanks.

]]>Ea:

Nope, you’re wrong on this one.

If you need convincing, you could fit the model in Stan. Or you could do a simulation study using fake data, simulating the process 1000 times and each time computing the estimate of theta, then looking at the sd of those 1000 estimates. Or you could consider how your (mistaken) se formula works in the edge case when 100% of students get the question correct.

]]>Your students’ responses seem just as correct. The binomial model applies to theta. The agent is either type 1 or type 2 and p is just a linear transformation of theta. A binomial model certainly applies to p too, but I would describe it with a mixture model.

]]>Ea:

The binomial model applies to p. Theta is a linear transformation of p. It’s theta = (p – 0.25)/0.75. The easiest way to solve the problem, as in the above solution, is to first use the binomial model to get inference for p, then apply the linear transformation to obtain inference for theta.

]]>I don’t really care about the problem either, I care (a little) about the definition of “95% confidence interval”. If this is not one of the basic principles that you want to test you could ask the students to provide just an estimate and standard error.

]]>Carlos:

Sure, if I really cared about the problem I’d just fit the model in Stan. But the point of this aspect of the exam is to check that students understand the basic principles of estimation and uncertainty.

]]>Evan:

I think the quickest simplest way is to search for “estimating a population proportion” in your favorite search engine. You’ll likely get enough there to bridge the gap.

]]>Strictly speaking using 0.5 for the s.e. calculation gives a “better” confidence interval, at least if valid values of theta are between 0 and 1.

When the true value of the parameter is not around 0.35 the coverage gets better than 95%: for theta=0 or theta>0.7 is above 97% and becomes essentially 100% for theta > 0.95. But this is “acceptable” in the sense that the coverage is at least 95% for all the values of theta.

The proposed alternative doesn’t even guarantee a 90% coverage for theta between 0.85 and 0.95 and covverage collapses as theta aproaches 1. This seems an acceptable confidence interval calculation only if the domain of theta is restricted and doesn’t extend to the whole [0 1] interval.

]]>Thanks. This clears it up for me. I had not seen this when I posed my question below.

Bob

]]>I’ve got the same question or misunderstanding.

Consider a modified version of the question. Assume (1) there are a million possible answers and (2) of 100 test takers, 99 get the right answer. I estimate that 99% of the population know the answer and the confidence interval is more or less meaningless (given the discrete nature of the test takers)—but it’s very small.

Bob

]]>Jean:

1. Good catch. I added the 95% conf interval to the above solution.

2. I used 0.5 for the s.e. because that’s standard practice when the probability is near 50%. Had it been 85%, I would’ve used sqrt(p*(1-p)).

]]>Ok now for my perhaps ignorant question: why use 0.5 for the estimate’s s.e. calculation? Would you have chosen the same if 85% of student got it correct? Am I missing something obvious?

]]>sorry, I misread my numbers… CI= exp(0.19) to exp(0.27)

]]>But this one is either trivial or hard, depending on what you mean by “how much more likely”. If you want an absolute measure then the result depends utterly on the distance to the well. Maybe that’s what you are testing?

If you want a relative measure and accept relative odds as an approximation, then the OR is exp(0.23), confidence bounds exp(0.21) to exp(0.29), approximate SE width of CI/4. ]]>