Completely agree.. I don’t think this question is up to par to pose on an exam. It’s mostly testing of the student can untangle a hidden question rather than use the statistical skills needed.

]]>By real world, I mean the analyst will not be supplied all the info/data needed to solve the problem. The analyst then needs to either find out from other people the necessary data, or make a reasonable assumption and keeping going. Don’t be the analyst who throws up his/her hand and do nothing because he/she doesn’t have all the data.

]]>It was not completely clear to me if (7) was a question about power, but looking at the answer it seems it was not the case

]]>I don’t particularly mind the problem as stated. But it is not at all real world. In the real world if you don’t know the size of the control vs treatment you wouldn’t get past GO.

]]>Oops! You are totally correct.

]]>In general, seems about right – overly conservative if ‘true’ theta is very close to either 0 or 1. Thinking more about this, if we are bothering to specify a conjugate Beta prior, let’s just go full Bayesian and use a more informed prior :) I wouldn’t put the prior expectation any higher than 2.5% so a Beta(1,39). Plugging that in, my 95% credible interval becomes [0,4%].

]]>If I’ve not done some mistake, this is what the coverage looks like for true values between 0.001 and 0.999 (for each value there are three points, corresponding to simulations with 10000 events each, to give an idea of the variability).

]]>Do you mean 0 to 0.058? I think that what you gave is the 5% confidence interval.

]]>Indeed, a conjugate Bayesian model here results in a posterior distribution for \theta of Beta(1,51). From this, I get a 95% credible interval of [0%,7%] after rounding, with a median of 1.3%. I don’t know about the confidence coverage, but I suspect this works just fine, although it is a bit larger than the ‘rule of 3’ adduced above.

]]>One approach here is to get a confidence interval by inverting the hypothesis test. This would give you an interval from 0 to 0.00103. (And yeah, this approach has issues as noted on this blog earlier in https://statmodeling.stat.columbia.edu/2014/12/11/fallacy-placing-confidence-confidence-intervals/).

]]>The baseline voting rate does matter in that the variance will be lower for voting rates near 0 or 1 (due to the fact that the variance of a binomial is sqrt(p * (1 – p) / n).

The variance estimator used by Andrew is conservative in that it assumes the worst case (for variance) voting rate 0.5.

]]>p = (0+1)/(50+2) = 1.92% is the estimate obtained using Laplace’s sucession rule (which corresponds to a Bayesian analysis with a beta(1,1) prior, equivalent to having sampled two people before one of which had held office). The standard error of this estimate is sqrt(p*(1-p)/(50+3)) = 1.89%

]]>Yeah I think that “how large an effect” is confusing (and I also had decided I would say that I assumed half the sample got the treatment and not to think about how this sampling was done). I also wondered about the one tailed issue. What bothers me more is not knowing the baseline voting rate, since I think that actually matters. But maybe it’s a fair assumption that this is not a place with extreme voting rates.

]]>Maybe some students didn’t know how to start because they didn’t understand what was being asked.

Did anyone propose a one-tailed test?

]]>I love these questions!

Would it be possible to publish the complete exam and solutions all in one place once you are finished with the blog posts? This would make it more accessible for future references.

Best,

Andrew