Judea Pearl and Dana Mackenzie sent me a copy of their new book, “The book of why: The new science of cause and effect.”

There are some things I don’t like about their book, and I’ll get to that, but I want to start with a central point of theirs with which I agree strongly.

**Division of labor**

A point that Pearl and Mackenzie make several times, even if not quite in this language, is that there’s a division of labor between qualitative and quantitative modeling.

The models in their book are qualitative, all about the directions of causal arrows. Setting aside any problems I have with such models (I don’t actually think the “do operator” makes sense as a general construct, for reasons we’ve discussed in various places on this blog from time to time), the point is that these are qualitative, on/off statements. They’re “if-then” statements, not “how much” statements.

Statistical inference and machine learning focuses on the quantitative: we model the relationship between measurements and the underlying constructs being measured; we model the relationships between different quantitative variables; we have time-series and spatial models; we model the causal effects of treatments and we model treatment interactions; and we model variation in all these things.

Both the qualitative and the quantitative are necessary, and I agree with Pearl and Mackenzie that typical presentations of statistics, econometrics, etc., can focus way too strongly on the quantitative without thinking at all seriously about the qualitative aspects of the problem. It’s usually all about how to get the answer given the assumptions, and not enough about where the assumptions come from. And even when statisticians write about assumptions, they tend to focus on the most technical and least important ones, for example in regression focusing on the relatively unimportant distribution of the error term rather than the much more important concerns of validity and additivity.

If all you do is set up probability models, without thinking seriously about their connections to reality, then you’ll be missing a lot, and indeed you can make major errors in casual reasoning, as James Heckman, Donald Rubin, Judea Pearl, and many others have pointed out. And indeed Heckman, Rubin, and Pearl have (each in their own way) advocated for substantive models, going beyond data description to latch on to underlying structures of interest.

Pearl and Mackenzie’s book is pretty much all about qualitative models; statistics textbooks such as my own have a bit on qualitative models but focus on the quantitative nuts and bolts. We need both.

Judea Pearl, like Jennifer Hill and Frank Sinatra, are right that “you can’t have one without the other”: If you think you’re working with a purely qualitative model, it turns out that, no, you’re actually making lots of data-based quantitative decisions about which effects and interactions you decide are real and which ones you decide are not there. And if you think you’re working with a purely quantitative model, no, you’re really making lots of assumptions (causal or otherwise) about how your data connect to reality.

Continue reading ‘“The Book of Why” by Pearl and Mackenzie’ »