In the most recent round of our recent discussion, Judea Pearl wrote:
There is nothing in his theory of potential-outcome that forces one to “condition on all information” . . . Indiscriminate conditioning is a culturally-induced ritual that has survived, like the monarchy, only because it was erroneously supposed to do no harm.
I agree with the first part of Pearl’s statement but not the second part (except to the extent that everything we do, from Bayesian data analysis to typing in English, is a “culturally induced ritual”). And I think I’ve spotted a key point of confusion.
To put it simply, Donald Rubin’s approach to statistics has three parts:
1. The potential-outcomes model for causal inference: the so-called Neyman-Rubin model in which observed data are viewed as a sample from a hypothetical population that, in the simplest case of a binary treatment, includes y_i^1 and y_i^2 for each unit i).
2. Bayesian data analysis: the mode of statistical inference in which you set up a joint probability distribution for everything in your model, then condition on all observed information to get inferences, then evaluate the model by comparing predictive inferences to observed data and other information.
3. Questions of taste: the preference for models supplied from the outside rather than models inspired by data, a preference for models with relatively few parameters (for example, trends rather than splines), a general lack of interest in exploratory data analysis, a preference for writing models analytically rather than graphically, an interest in causal rather than descriptive estimands.
As that last list indicates, my own taste in statistical modeling differs in some ways from Rubin’s. But what I want to focus on here is the distinction between item 1 (the potential outcomes notation) and item 2 (Bayesian data analysis).
The potential outcome notation and Bayesian data analysis are logically distinct concepts!
Items 1 and 2 above can occur together or separately. All four combinations (yes/yes, yes/no, no/yes, no/no) are possible:
– Rubin uses Bayesian inference to fit models in the potential outcome framework.
– Rosenbaum (and, in a different way, Greenland and Robins) use the potential outcome framework but estimate using non-Bayesian methods.
– Most of the time I use Bayesian methods but am not particularly thinking about causal questions.
– And, of course, there’s lots of statistics and econometrics that’s non-Bayesian and does not use potential outcomes.
Bayesian inference and conditioning
In Bayesian inference, you set up a model and then you condition on everything that’s been observed. Pearl writes, “Indiscriminate conditioning is a culturally-induced ritual.” Culturally-induced it may be, but it’s just straight Bayes. I’m not saying that Pearl has to use Bayesian inference–lots of statisticians have done just fine without ever cracking open a prior distribution–but Bayes is certainly a well-recognized approach. As I think I wrote the other day, I use Bayesian inference not because I’m under the spell of a centuries-gone clergyman; I do it because I’ve seen it work, for me and for others.
Pearl’s mistake here, I think, is to confuse “conditioning” with “including on the right-hand side of a regression equation.” Conditioning depends on how the model is set up. For example, in their 1996 article, Angrist, Imbens, and Rubin showed how, under certain assumptions, conditioning on an intermediate outcome leads to an inference that is similar to an instrumental variables estimate. They don’t suggest including an intermediate variable as a regression predictor or as a predictor in a propensity score matching routine, and they don’t suggest including an instrument as a predictor in a propensity score model.
If a variable is “an intermediate outcome” or “an instrument,” this is information that must be encoded in the model, perhaps using words or algebra (as in econometrics or in Rubin’s notation) or perhaps using graphs (as in Pearl’s notation). I agree with Steve Morgan in his comment that Rubin’s notation and graphs can both be useful ways of formulating such models. To return to the discussion with Pearl: Rubin is using Bayesian inference and conditioning on all information, but “conditioning” is relative to a model and does not at all imply that all variables are put in as predictors in a regression.
Another example of Bayesian inference is the poststratification which I spoke of yesterday (see item 3 here). But, as I noted then, this really has nothing to do with causality; it’s just manipulation of probability distributions in a useful way that allows us to include multiple sources of information.
P.S. We’re lucky to be living now rather than 500 years ago, or we’d probably all be sitting around in a village arguing about obscure passages from the Bible.