As promised, it’s time to go over the final exam of our applied regression class. It was an in-class exam, 3 hours for 15 questions.

Here’s the first question on the test:

1. A randomized experiment is performed within a survey. 1000 people are contacted. Half the people contacted are promised a $5 incentive to participate, and half are not promised an incentive. The result is a 50% response rate among the treated group and 40% response rate among the control group.

(a) Give an estimate and standard error of the average treatment effect.

(b) Give code to fit a logistic regression of response on the treatment indicator. Give the complete code, including assigning the data, setting up the variables, etc. It is not enough to simply give the one line of code for running the logistic regression.

See tomorrow’s post for the solution and a discussion of common errors.

Very nice. What should have also been required it to list the data and methods used to test the code. Replicability in science should also require posting of such things. For a comprehensive treatment of analytics in systems and software testing see the edited book published last year https://www.amazon.com/gp/product/1119271509/ref=dbs_a_def_rwt_bibl_vppi_i1

Ron:

Good point. Fake-data simulation is key, and we do teach it in the class, but maybe not enough.

Is this a trick question? If you want the average treatment effect, and you haven’t told us whether simple random sampling was used, as opposed to some complex survey design, the sampling procedure would need to be incorporated into the calculations. Also, without a data set, I don’t know what steps I’m supposed to do before the logistic regression. Maybe the data is already set up as two columns, y 1/0 and x 1/0, and the only missing step is the read.csv call. If the data set needs more preprocessing, I’d need to know what it actually looks like to say what extra steps there are.

I realize most people asking this question on an exam would be expecting answers like p.hat.1 – p.hat.0, but I suspect you’re asking something trickier.

Ram:

No, it’s not a trick question. I’m looking for the straightforward answer here.

Are you asking them to supply a fake data set to import and analyze? Or to simulate one in the program?

The complete data set is given in the question.

Since the experiment is “randomized”, no further context can be relevant so far as I can see.

There is something I think needs fleshing out with your comment. There can be quite a bit of difference between randomized sampling and randomized assignment of treatment within a given sample. The question only says that the people who were contacted are randomly assigned to control / treatment. We do not know if those contacted were chosen via random sampling or via some other means.

This is not relevant for the question at hand. But if Andrew had asked his class to state their confidence in their estimate out-of sample it definitely would be. And so I thought it worth mentioning.

Well, this is humbling. Do statistical practitioners find that their ability to answer the simple, technical statistical questions declines as they move along in their careers, despite the fact that their general statistical knowledge and intuition in practical scenarios has grown immensely? Bleh. I’d take much too long to do this exam given the effort that was required to (presumably) correctly answer this one.

I imagine practitioners who are also teachers (or textbook authors or software developers) generally stay up to speed.

I wouldn’t be too sure of that.

Is the treatment being promised $5 or *not* being promised $5?

I waited until I saw day’s 2 post had been posted before posting a spoiler.

Having been raised by wolves in the wilds of machine learning, I’ve never had a proper classical stats education. So I always have to work these things out by first principles.

My first inclination would be to do the Bayesian equivalent. That’d be a simple Stan program, with the data manipulation baked in (I didn’t understand Andrew’s qualifications on the code he wanted, so I’m just hard coding the data in the Stan program for this question—I think Andrew will give me a pass on knowing how to manipulate data in Stan.) This is assuming improper uniform priors for both treatment and control log odds. Adding a standard logistic prior would correspond to a uniform prior on probability of response in both cases; I didn’t evaluate that case, but would if I was doing this for real just to get a sense of prior sensitivity.

Then if we put this program in a file called q1.stan, we can run the sampler and compute the answers

Running this gives me a posterior mean for the average treatment effect (we need a loss function to properly choose a point estimate, so I'm just assuming you wanted square loss). The posterior mean for delta (our point estimate for average treatment effect assuming square loss) is 0.4. The posterior sd for delta is 0.13.

To simulate classical standard error, I think (though my confidence is only so-so), that I'd take the MLE for the same model I wrote above, which you can also do in Stan using the optimizing function, but we know the answer's going to be hat_alpha = logit(0.5) and hat_beta = logit(0.4), so hat_delta = logit(0.5) - logit(0.4) = 0.40. So the MLE agrees with the posterior mean with the Bayesian uniform prior model. So we'd take those numbers, and simulate the MLE calculation and calculate diffs and standard deviation of those,

The program gets the same answer I worked out by hand for sim_hat_delta. The standard error works out to be approximately 0.13. That also agrees with the simple Bayesian model's posterior standard deviation for the difference in effects on the log odds scale.

This took me about 45 minutes to write up, so I'm going to fail to finish this exam if I try to work out both Bayesian and frequentist answers. If I didn't need to actually get the code running there would've been a couple bugs (like failures to convert to logit in simulation, wrong constraints on Stan model parameters, etc.), but I'd have been roughly done on time but with only partial credit. Stan's compilation time is killing my test performance!

I’m just reading this on my phone, but are you saying you’ve calculated a 40% difference in response rate between two groups? theres something wrong in that calculation.

On the log odds scale.

I see, the logistics regression coefficient estimate is 0.4, by itself this doesn’t mean much due to the nonlinear transform, so it requires transformations to interpret a difference in probability.

from a Bayesian perspective I’d probably have solved it with a beta binomial conjugate prior system… beta(251,251) vs beta(201,301)… you can sample and subtract to get a posterior dist. for the effect size on the probability scale

If you aren’t allowed to do it in one line of code, I guess you could do two?

dataset = data.frame(treated=c(rep(1,500),rep(0,500)), responded = c(rep(1,500*.5).rep(0,500*.5),rep(1,500*.4),rep(0,500*.6))

summary(glm(responded~treated,data=dataset,family=binomial))

That gives a very similar answer to Bob’s, a treatment estimate of .405 and a SE of .128. Maybe this is the same as Bob got but he respected Andrew’s feeling on number of significant digits.