Tyler Cowen links to a blog by Paul Kedrosky that asks why winning times in the Boston marathon have been more variable, in recent years, than winning times in New York. This particular question isn’t so interesting–when I saw the title of the post, my first thought was “the weather,” and, in fact, that and “the wind” are the most common responses of the blog commenters–but it reminded me of a more general question that we discussed the other day, which is how to think about Why questions.
Many years ago, Don Rubin convinced me that it’s a lot easier to think about “effects of causes” than “causes of effects.” For example, why did my cat die? Because she ran into the street, because a car was going too fast, because the driver wasn’t paying attention, because a bird distracted the cat, because the rain stopped so the cat went outside, etc. When you look at it this way, the question of “why” is pretty meaningless.
Similarly, if you ask a question such as, What caused World War 1, the best sort of answers can take the form of potential-outcomes analyses. I don’t think it makes sense to expect any sort of true causal answer here.
But, now let’s get back to the “volatility of the Boston marathon” problem. Unlike the question of “why did my cat die” or “why did World War 1 start,” the question, “Why have the winning times in the Boston marathon been so variable” does seem answerable.
What happens if we try to apply some statistical principles here?
Principle #1: Compared to what? We can’t try to answer “why” without knowing what we are comparing to. This principle seems to work in the marathon-times example. The only way to talk about the Boston times as being unexpectedly variable is to know what “expectedly variable” is. Or, conversely, the New York times are unexpectedly stable compared to what was happening in Boston those same years. Either way, the principle holds that we are comparing to some model or another.
Principle #2: Look at effects of causes, rather than causes of effects. This principle seems to break down in marathon example, where it seems very natural to try to understand why an observed phenomenon is occurring.
What’s going on? Perhaps we can understand in the context of another example, something that came up a couple years ago in some of my consulting work. The New York City Department of Health had a survey of rodent infestation, and they found that African Americans and Latinos were more likely than whites to have rodents in their apartments. This difference persisted (albeit at a lesser magnitude) after controlling for some individual and neighborhood-level predictors. Why does this gap remain? What other average differences are there among the dwellings of different ethnic groups?
OK, so now maybe we’re getting somewhere. The question on deck now is, how do the “Boston vs. NY marathon” and “too many rodents” problems differ from the “dead cat” problem.
One difference is that we have data on lots of marathons and lots of rodents in apartments, but only one dead cat. But that doesn’t quite work as a demarcation criterion (sorry, forgive me for working under the influence of Popper): even if there were only one running of each marathon, we could still quite reasonably answer questions such as, “Why was the winning time so much lower in NY than in Boston?” And, conversely, if we had lots of dead cats, we could start asking questions about attributable risks, but it still wouldn’t quite make sense to ask why the cats are dying.
Another difference is that the marathon question and the roach question are comparisons (NY vs. Boston and blacks/hispanics vs. whites), while the dead cat stands alone (or swings alone, I guess I should say). Maybe this is closer to the demarcation we’re looking for, the idea being that a “cause” (in this sense) is something that takes you away from some default model. In these examples, it’s a model of zero differences between groups, but more generally it could be any model that gives predictions for data.
In this model-checking sense, the search for a cause is motivated by an itch–a disagreement with a default model–which has to be scratched and scratched until the discomfort goes away, by constructing a model that fits the data. Said model can then be interpreted causally in a Rubin-like, “effects of causes,” forward-thinking way.
Is this the resolution I’m seeking? I’m not sure. But I need to figure this out, because I’m planning on basing my new intro stat course (and book) on the idea of statistics as comparisons.
P.S. I remain completely uninterested in questions such as, What is the cause? Is it A or is it B? (For example, what caused the differences in marathon-time variations in Boston and New York–is it the temperature, the precipitation, the wind, or something else? Of course if it can be any of these factors, it can be all of them. I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions, fundamentally in a way no different from the notorious comparisons of statistical significance to non-significance.
This last point has nothing to do with causal inference and everything to do with my preference for continuous over discrete models in applications in which I’ve worked in social science, environmental science, and public health.