Daniel Hawkins pointed me to a post by Kevin Drum entitled, “Crime in St. Louis: It’s Lead, Baby, Lead,” and the associated research article by Brian Boutwell, Erik Nelson, Brett Emo, Michael Vaughn, Mario Schootman, Richard Rosenfeld, Roger Lewis, “The intersection of aggregate-level lead exposure and crime.”
The short story is that the areas of St. Louis with more crime and poverty, had higher lead levels (as measured from kids in the city who were tested for lead in their blood).
Here’s their summary:
I had a bit of a skeptical reaction—not about the effects of lead, I have no idea about that—but on the statistics. Looking at those maps above, the total number of data points is not large, and those two predictors are so highly correlated, I’m surprised that they’re finding what seem to be such unambiguous effects. In the abstract it says n=459,645 blood measurements and n=490,433 crimes, but for the purpose of the regression, n is the number of census tracts in their dataset, about 100.
So I contacted the authors of the paper and one of them, Erik Nelson, did some analyses for me.
First he ran the basic regression—no Poisson, no spatial tricks, just regression of log crime rate on lead exposure and index of social/economic disadvantage. Data are at the census tract level, and lead exposure is the proportion of kids’ lead measurements from that census tract that were over some threshold. I think I’d prefer a continuous measure but that will do for now.
In this simple regression, the coefficient for lead exposure was large and statistically significant.
Then I asked for a scatterplot: log crime rate vs. lead exposure, indicating census tracts with three colors tied to the terciles of disadvantage.
And here it is:
He also fit a separate regression line for each tercile of disadvantage.
As you can see, the relation between lead and crime is strong, especially for census tracts with less disadvantage.
Erik also made separate plots for violent and non-violent crimes. They look pretty similar:
In summary: the data are what they are. The correlation seems real, not just an artifact of a particular regression specification. It’s all observational so we shouldn’t overinterpret it, but the pattern seems worth sharing.