I just came back from a talk by Jere Behrman on “What Determines Adult Cognitive Skills? Impacts of Pre-School, School-Years and Post-School Experiences in Guatemala.” Here’s the paper.

It was all interesting, but what confused me here, as in other talks of this type, was the interpretation of regressions controlling for several variables that are sequential in time. This particular example was a longitudinal study of about 1500 people, looking at adult cognitive outcomes and including, as predicotrs, measures of health at age 6, years of schooling, and work after school was over. It’s tricky to interpret the coefficient of pre-school health in this regression as a “treatment effect” since it can affect the other predictors. People at the seminar were talking along the lines of “causal pathways” but this always confuses me too. A simple response is to follow the basic advice of not controlling for post-treatment outcomes, but doing such an analysis wouldn’t address some of the questions the researchers were trying to study here.

So I’m left simply confused. I’m not trying to be critical of this paper, since I’m not really offering an alternative. But I’m not quite sure how to interpret all these regression coefficients. (Even setting aside the issues involving instrumental variables, which are used in this study also.) I’m just a little stuck here.

The best that correlation can do is to suggest areas where further investigation could be carried out to investigate causality. Correlation is a statistical phenomenon: Causality is the etablishment of a sequence of events by the employment of scientific method.