Judea Pearl (Dept of Computer Science, UCLA) spoke here Tuesday on “Inference with cause and effect.” I think I understood the method he was describing but it left me with some questions about what were the method’s hidden assumptions. Perhaps someone familiar with this approach can help me out here.

I’ll work with a specific example from my one of my current research projects.

A treatment is being considered of giving zinc supplements to some HIV-positive children in South Africa. The treatment, call it Z, would be randomly assigned: Z=1 for half the kids and Z=0 for the others. The outcome of interest is CD4 percentage after a year of treatment (or control); call this Y. High values of Y are good, low values are bad. There’s also an intermediate outcome, the amount of diarrhea during the year; call this D. Zinc supplements are known to reduce diarrhea, also diarrhea can have bad consequences for CD4 (something to do with the immune system).

The causal path diagram looks like this:

Z has arrows going to D and to Y, and D has an arrow going to Y. Thus, Z can affect Y directly and also through D.

Now suppose I want to estimate the effect of D–that is, the effect of some treatment that would reduce diarrhea directly (and not by giving zinc). From Pearl’s talk, I gather that this would be the operation of “fixing” D to a specified value, and he would do this using a “mutilated model” [Pearl’s term] that would remove all the arrows to D (in this case, that would just be the arrow from Z to D).

OK, so suppose I actually supplied some data on (Z,D,Y) for 200 kids, and also suppose that, in these data, (D,Y) had a joint normal distribution given Z. (Here, Z is binary, so I’m saying that (D,Y) have one joint normal distribution for the treated kids, and another joint normal distribution for the untreated kids.)

So now, can we apply Pearl’s method, and actually get an estimate for the direct effect of D on Y? His talk led me to believe that we could. But what would that estimate mean? In real life, we can’t really estimate this direct effect without having direct manipulation of D. Or, to put it another way, we can only estimate this effect if we make some additional assumptions. What assumptions is Pearl’s model making? I’m willing to swallow distributional assumptions, and I’m happy with the arrows in the path diagram, but there’s gotta be something else going on here.

The techniques Pearl discusses are built on top of probability theory, and causality is essentially derived from conditional independence structures. Now, in the particular case, you could look at the partial correlation of D and Y given Z. But while the notion of partial correlation is restricted to a Gaussian model, the logic of conditional independence holds for an arbitrary probabilistic model. No distributional assumptions.

For direct manipulation, Pearl proposes the operator do() and treats each application of do() onto a variable X essentially as another variable do(X). This appears in his "Causality" book. So, without this operator, you just don't model interventions. There is some amount you can capture with 'controlling-for', but that's it.

Aleks,

I see what you're saying–I caught this from Pearl's talk. My point is: in the example I described, the model is fully specified and I think Pearl really could do his do(X) operator on the variable D. Thus he would come up with an estimated causal effect. But I don't see how such an estimate could make sense. At least not without some additional assumptions, and I don't understand the method well enough to know what these implicit assumptions are.

Aha! Thanks for the extra explanation. The hidden assumptions are:

Probability theory as a model of the world.

The completeness of your data and of your variables.

Whatever probabilistic model you build for your data. You have to do this, namely, but it's your assumption, not one of the methodology.

Now, in your case, you don't have an "observable" Z, you have a "controllable" do(Z) (random decision on whether to treat or not). You do *not* have a controllable do(D). You can *assume* that you control D purely by controlling Z, this implies the assumption that the model is complete, i.e., that there is going to be no Simpson's paradox given a new variable.

In summary, your model is specified as {do(Z)->Y, do(Z)->D, D->Y}. The assumption in the inference you mention is D|do(Z) == do(D), i.e., that Z is the sole causal influence acting on D.

I'm not a probability theorist or a statistician, but I am a logician/type theorist. I understand Judea Pearl's formalism as a system that bears the same relationship to a modal logic of counterfactuals that regular probability theory bears to propositional logic.

In particular, you can think of the modal logic as David Lewis's theory of counterfactuals, and interpret a claim "Y | do(X)" as Lewis's counterfactual implication "X []-> Y",

which means "Y holds in the closest possible world where X is true". The modal worlds are the graphs, and you can set up an accessibility relation between them as precisely the "minimal mutilations" between them.

I suspect Pearl would dispute this characterization, though. Nevertheless, I'm fascinated by it, because it gives a very simple,

mechanizablenotion of causality.Alexs,

I guess I'd like to frame the assumptions for my example based on the potential outcomes of D,Y given do(Z). I still don't quite understand "D|do(Z) == do(D)."

Neel,

I agree that the computable nature of Pearl's method is appealing. My struggle still is to see exactly what assumptions he's making, in applications where I'm trying to do causal inference.