dataset = data.frame(treated=c(rep(1,500),rep(0,500)), responded = c(rep(1,500*.5).rep(0,500*.5),rep(1,500*.4),rep(0,500*.6))

summary(glm(responded~treated,data=dataset,family=binomial))

That gives a very similar answer to Bob’s, a treatment estimate of .405 and a SE of .128. Maybe this is the same as Bob got but he respected Andrew’s feeling on number of significant digits.

]]>from a Bayesian perspective I’d probably have solved it with a beta binomial conjugate prior system… beta(251,251) vs beta(201,301)… you can sample and subtract to get a posterior dist. for the effect size on the probability scale

]]>Having been raised by wolves in the wilds of machine learning, I’ve never had a proper classical stats education. So I always have to work these things out by first principles.

My first inclination would be to do the Bayesian equivalent. That’d be a simple Stan program, with the data manipulation baked in (I didn’t understand Andrew’s qualifications on the code he wanted, so I’m just hard coding the data in the Stan program for this question—I think Andrew will give me a pass on knowing how to manipulate data in Stan.) This is assuming improper uniform priors for both treatment and control log odds. Adding a standard logistic prior would correspond to a uniform prior on probability of response in both cases; I didn’t evaluate that case, but would if I was doing this for real just to get a sense of prior sensitivity.

parameters { real alpha; // treatment real beta; // control } model { 250 ~ binomial_logit(500, alpha); 200 ~ binomial_logit(500, beta); } generated quantities { real delta = alpha - beta; }

Then if we put this program in a file called q1.stan, we can run the sampler and compute the answers

library(rstan) model < - stan_model("q1.stan") fit <- sampling(model) delta_draws <- extract(fit)$delta avg_treatment <- mean(delta_draws) avg_treatment_posterior_sd <- sd(delta_draws)

Running this gives me a posterior mean for the average treatment effect (we need a loss function to properly choose a point estimate, so I'm just assuming you wanted square loss). The posterior mean for delta (our point estimate for average treatment effect assuming square loss) is 0.4. The posterior sd for delta is 0.13.

To simulate classical standard error, I think (though my confidence is only so-so), that I'd take the MLE for the same model I wrote above, which you can also do in Stan using the optimizing function, but we know the answer's going to be hat_alpha = logit(0.5) and hat_beta = logit(0.4), so hat_delta = logit(0.5) - logit(0.4) = 0.40. So the MLE agrees with the posterior mean with the Bayesian uniform prior model. So we'd take those numbers, and simulate the MLE calculation and calculate diffs and standard deviation of those,

logit < - function(u) log(u / (1 - u)) M <- 1e5 sim_hat_alpha <- logit(rbinom(M, 500, 0.5) / 500) sim_hat_beta <- logit(rbinom(M, 500, 0.4) / 500) sim_hat_delta <- sim_hat_alpha - sim_hat_beta std_error <- sd(sim_hat_delta)

The program gets the same answer I worked out by hand for sim_hat_delta. The standard error works out to be approximately 0.13. That also agrees with the simple Bayesian model's posterior standard deviation for the difference in effects on the log odds scale.

This took me about 45 minutes to write up, so I'm going to fail to finish this exam if I try to work out both Bayesian and frequentist answers. If I didn't need to actually get the code running there would've been a couple bugs (like failures to convert to logit in simulation, wrong constraints on Stan model parameters, etc.), but I'd have been roughly done on time but with only partial credit. Stan's compilation time is killing my test performance!

]]>This is not relevant for the question at hand. But if Andrew had asked his class to state their confidence in their estimate out-of sample it definitely would be. And so I thought it worth mentioning.

]]>Since the experiment is “randomized”, no further context can be relevant so far as I can see.

]]>Good point. Fake-data simulation is key, and we do teach it in the class, but maybe not enough.

]]>No, it’s not a trick question. I’m looking for the straightforward answer here.

]]>I realize most people asking this question on an exam would be expecting answers like p.hat.1 – p.hat.0, but I suspect you’re asking something trickier.

]]>