Sadish Dhakal writes:

I am struggling with the problem of conditioning on post-treatment variables. I was hoping you could provide some guidance. Note that I have repeated cross sections, NOT panel data. Here is the problem simplified:

There are two programs. A policy introduced some changes in one of the programs, which I call the treatment group (T). People can select into T. In fact there’s strong evidence that T programs become more popular in the period after policy change (P). But this is entirely consistent with my hypothesis. My hypothesis is that high-quality people select into the program. I expect that people selecting into T will have better outcomes (Y) because they are of higher quality. Consider the specification (avoiding indices):

Y = b0 + b1 T + b2 P + b3 T X P + e (i)

I expect that b3 will be positive (which it is). Again, my hypothesis is that b3 is positive only because higher quality people select into T after the policy change. Let me reframe the problem slightly (And please correct me if I’m reframing it wrong). If I could observe and control for quality Q, I could write the error term e = Q + u, and b3 in the below specification would be zero.

Y = b0 + b1 T + b2 P + b3 T X P + Q + u (ii)

My thesis is not that the policy “caused” better outcomes, but that it induced selection. How worried should I be about conditioning on T? How should I go about avoiding bogus conclusions?

My reply:

Best would be if you can simply observe Q and include it in the model. If that’s not possible, get some estimate of Q, some pre-treatment measure of pre-treatment quality. Label that measure as z. Then you can fit a measurement-error model:

Y = b0 + b1 T + b2 P + b3 T X P + Q + u (ii.a)

z = Q + error (ii.b)

Your inferences will be sensitive to your model for the error in z: the higher the error, the further it is from your ideal model (ii) above. But that’s life. You gotta make assumptions somewhere.

To put it another way, you should proceed on two fronts:

1. Data,

2. Modeling.

Get better data on the selection process, and model it too. Both data and model are important.

“+ z + u” in (ii.a), no?

Isn’t this just a case of change the model -> change the coefficients?

Does it even make sense to attempt interpreting these coefficients when you don’t think you have all, and only, the relevant variables included (obviously there is room for ignoring negligible ones)?

That is like saying “if this model is approximately correct (we believe it is not), the effect of x on y is…”, then going on to assume this effect is a real thing.

Also, the description of the hypothesis includes an element of time but I don’t see time anywhere in the proposed model.

I think the lack of a time variable is because it is cross-sectional and not panel. There is a before and after but not a measure of time in the data, if I understood correctly. Honestly I’m not sure how repeated cross-sections over time do not constitute a panel dataset, perhaps they were all taken at the same time on different units?

On your main point, it seems like he has all the variables he needs in his model, just not a quality proxy.

What are you basing this on? Perhaps this study was done in the fall but there is a completely opposite effect in the spring, eg “high quality people” are less likely to join at that type of year. So then the model is missing a season variable. Maybe the “high quality people” were also mostly female, so when you add gender as a variable it changes the other coefficients. I could come up with a million things like that, also nonlinear interactions, etc.

Perhaps I should’ve said he doesn’t seem to think he has that problem, and it seems plausible that his concerns can be addressed without confronting it.

How so? I am really wondering, not being sarcastic. It seems to me if you want to interpret a coefficient you need the model to be approximately correct. I don’t know how you know whether you model is approximately correct or not without deriving it from some principles or making accurate predictions with it.

I would say it’s even more difficult then that. Two or more models can have similar predictive performance on a given test set but have wildly different coefficients for identical predictors; and that isn’t surprising since their coefficients depend on the model, which has changed.

So maybe you can say that in the absence of scientific reasoning, a model that predicts poorly is generally a bad candidate for causal inference. But that doesn’t mean a model that predicts well is necessarily a good candidate for the same.

I have multi year data, so can control for seasonality. It’s data for each person enrolling in programs over time, so only one observation per person (not panel data). And let’s say I’m not worried about the other biases as much, or can control for them to some extent. I definitely don’t have Q though. Any ideas?

What would be an observable consequence of the “caused selection” vs the “policy worked” model? If there’s nothing you would expect to see different between the two mechanisms in your data, there’s no way to determine which is at work.

That’s a good point, but my argument is that there is no reason for the policy to lead to a change in **this particular outcome**. So the only explanation is selection. I can share the paper if it’s getting too abstract.