Columbia University computer science professor Elias Bareinboim points to a new textbook he’s been developing, Causal Artificial Intelligence. He also points to a recent paper with Drago Plecko, On the Structural Basis of Conditional Ignorability, that revisits the connection between potential outcomes and graphical models. Bareinboim writes that it is intended as a more technical note that addresses specific questions and provides a mathematical grounding to some topics we discussed back in 2012, on the use of hierarchical modeling to generalize to new settings.
Bareinboim’s book and course looks great. He doesn’t use the methods that I am familiar with, but it is important for students to be exposed to multiple perspectives. (If you’re curious what we say, you can take a look at chapters 18-21 of Regression and Other Stories, which can be downloaded here.) Ideally students would take both of our courses so they can be experts in both approaches.
A key theme of Bareinboim’s book is that mechanistic or process models are important, and I agree. When I first took causal inference from Rubin back in 1985, he emphasized the amazing thing about randomized experiments that you can measure a causal effect without having any mechanism. But more and more I think that a causal effect without a mechanism model is rarely useful, first because effects tend to be small and so it’s rare to precisely identify an effect size from data along with no model, second because we are always interested in generalization (in causal jargon, we almost always only care about the population distribution of treatment effects, not the sample distribution), third because we care about variation, and fourth because even without the other three items, in the rare cases when we can discover a causal effect from a black-box experiment, we’ll want to understand the mechanism going forward. This reasoning does allow space for black-box causal discovery in the screening process–we look for clear effects which can then be studied in more detail–but even there we have an implicit population of possible effects under study (e.g., many different drugs, or many different genes, or many different policy innovations), and then I’d argue we’re already halfway there to some sort of process model, which in practice I would implement as a Bayesian latent-variable model, but there are lots of ways to do it. To me, the latent variables correspond to the “gears” in a mechanistic model.
It’s not that I think Rubin was wrong on the technical point; I just think, in retrospect, that by concentrating on the estimation of the sample average treatment effect he was attaining mathematical beauty at the cost of generalizability. Of course, in practice Rubin was very interested in generalization and very sensible about such issues; it’s just that in his theoretical work he focused on the in-sample problem. His argument was that it was best to start with what could be estimated from the data with minimal assumptions. I expressed my disagreement with this focus in item 2 of my (generally positive) review of the Imbens and Rubin book: https://statmodeling.stat.columbia.edu/2015/09/07/comments-on-imbens-and-rubin-causal-inference-book/
I made some of the above points in a post a few years ago, “Causal” is like “error term”: it’s what we say when we’re not trying to model the process: Unfortunately, Judea Pearl didn’t seem to understand my point there, but overall the comments on that post are helpful. Perhaps the title of that post was misleading. In any case, my point was that the mechanistic models we use in science are indeed causal (under Pearl’s definition or Rubin’s): they say that if you do X, then Y will happen. For example, if I fit a multi-compartment model in pharmacometrics, that’s causal: it says that if you increase the concentration of the drug in one compartment, this will have predictable effects going forward in time, as governed by a certain differential equation. But in statistics and econometrics, the term “causal inference” tends to be reserved for black-box settings where there’s no mechanistic model, and inference is done using a purely design-based “identification strategy” such as regression discontinuity or whatever. Causal inference is very glamorous right now in statistics and econometrics, and that’s fine, but people who love causal inference should also love mechanistic models. I think that the association of “causal” with black-box models leads to lots of problems. So I think this puts me in agreement with much of the spirit of Bareinboim’s book even if we are using different methods.
Spillovers
There are some interesting things going on at the border of black-box causal inference and process models. One such example is spillover effects. In the traditional statistical formulation of black-box causal inference, spillovers are an annoyance, a violation of the “stable unit treatment value assumption.” (There are other reasons to abandon the stable unit treatment value assumption–see this paper–but we won’t get into that here.) So the idea would be to design the study so there would be no spillover or to get estimates that were robust to spillover or to construct estimands of total effects averaging over spillover or to fit models in which spillover wouldn’t happen . . . but all that is the old way of thinking about things. The new way to handle spillover effects is to model them using some sort of mechanistic or process model, that is, a parametric model that corresponds to some model of the spillover process. It could be a spatial model, for example. I think of this as being on the border between black-box and process models for causal inference, in that the magnitude of the treatment effect might be estimated using some black-box regression approach (as in chapters 18-20 of Regression and Other Stories) or some black-box identification approach (as in chapter 21 of that book), but in a context where the spread of the effect across units is modeled as a process.
One potential point of confusion is that any model will have elements that are “black boxes” because all models are intentional simplifications. For example, Newton’s famous model for gravitational attraction does not specify the mechanism of gravity but it does a great job of capturing the form of the relationship between mass and distance, at least at large scales. So is Newton’s model a “black box” model or a “causal” model?
In my view, “causal” and “black box” are terms that define a dimension along which models may vary, but do not necessarily pick out categories of model. In other words, I think it is meaningful to say that one model is more “causal” than another, at least in a particular domain of application, even if it is not possible to unambiguously assign a model to a “causal” or “black box” category. Models that fall more toward the “black box” end of the scale try to describe patterns in observable data without necessarily specifying why those patterns are there. Models that fall more toward the “causal” end of the scale try to describe the processes that produce observed patterns.
Finally, relevant to the discussion from a few days ago, I wish we had better terminology for specifying this distinction. “Causal”, “process”, and “mechanistic” are all terms that have been used somewhat interchangeably, but “causal” especially seems to be understood in different ways by different people.
Gec:
When you’re talking about science, sure. But statistics textbooks (including my own) typically present causal models as entirely black box.
In the above post I’m not trying to say that “causal” and “black-box” are opposites. Rather, the opposites are “mechanistic/process model” and “black-box.” Both can be causal.
I think you have reinforced my displeasure with the available terminology—if there is no difference between “causal” and “black box” under a common use of the term “causal”, then I don’t understand what “causal” is intended to mean. I’m not disputing the fact that those terms can be used that way, just that I find it very confusing!
I’d say there are “causal black box models” and “causal mechanistic models”, and then there are “associational models”.
A black box causal model is often something where you’ve done some random assignment or have some “as if random” external assignment, and so you want to quantify the size of the effect you got from some intervention. You don’t know *why* here, you don’t have a sense of what causes what, only that the random assignment ensures that on average your observed association is somehow a good estimate of the change you caused.
Mechanistic causal models you hypothesize a mechanism… doing A causes B which causes C and D etc… Here you use measurements of the system under perturbations to estimate the details of the functional form of the causal equation. Then you can estimate “under such and such circumstances, if you do XYZ then PDQ will happen”. This is scientific model building, while the non-mechanistic causal model is more like measurement.
Associational models have no random assignment and no mechanistic model, and simply tell you “typically when XYZ is like this then PDQ will be like that” without telling you things like “if I take unit U and set its XYZ to such and such then we will change its PDQ to this or that”…
Thanks for the breakdown, Daniel!
My confusion is localized to the first term, a “black box causal model”. It sounds like what makes such a model causal is not a property of the model per se, in the sense that the model contains some formal structure that represents a causal relationship distinct from a noncausal one. Rather, what makes it causal is our interpretation of the model in light of our knowledge of the data collection process. I would call such a model “associational” in your terminology, since they are formally equivalent. It is only by dint of knowledge that is not directly represented in the model that a black box model can be interpreted in causal terms.
Gec:
Even only considering black-box models, there are causal models and there are purely descriptive models. Most of Regression and Other Stories is about descriptive models, then in the final section we consider causal models.
A black-box causal model does not require random assignment. Randomization-based models are just one example of black-box causal models.
> Even only considering black-box models, there are causal models and there are purely descriptive models
Thanks for this additional context, it is helping me to understand this issue better. In addition, it brings me back around to my original conception of causal vs. black box being a continuum as opposed to distinct categories. Would it be reasonable to say that some “black box” models tend toward being used for purely descriptive purposes while other “black box” models are constructed to as to represent causal relationships, where “causal” is operationally defined within the context of the application? If so, then I think you could put “causal black box” models somewhere between purely descriptive models and process models that explicitly represent mechanisms of causation.
Whenever I see “causal” in the name of a technique, I assume its some byzantine scheme based on ambiguous terminology that doesn’t work.
Its like approaches that actually help with causality just do it as a matter of course and don’t need to mention it. Like if a restaurant advertised having napkins, you would think something is up.
Andrew: you have hit on a major issue in economics today. Economics used to rely heavily on models. These models, like all models, are simplifications. And one might object to them on other grounds. But when an empirical research tested something based on the model, everyone (including the researcher) knew what the mechanism was thought to be.
Now most applied micro research obsesses about causality without thinking hard about the mechanisms. Many papers end with some mutterings about the mechanism that seem like little more than post-hoc rationalizations. We end up with causal estimates of things that are so vague that the causality seems misleading.
At least in economics, most statistical estimations have *some* hypothetical causal model underlying them. These models rarely have a compelling functional form (there are a few exceptions here…. I’m thinking of financial CAPM models in which linearity is implied by the underlying construct, though it fails most rigorous tests of linearity) and often the data use proxies for unobservable quantities. But there are, I think, few pure black boxes in which “how does this vector of variables predict this outcome” without any underlying model at all.
The “causal revolution” is, IMO, an attempt to make the loose causal language underlying these models more precise by constraining the statistical methods that might be used to those which might possibly extract the causal signal from the noise. Its triumph is not that it actually uncovers causal relationships… you can still do a lousy regression discontinuity study. It’s that it stops researchers from publishing studies that could never in principle separate causal mechanisms from mere statistical association, at least without acknowledging that fact.
I think it’s useful to separate two purposes of assessing possible causation. One is about generalizability, or more precisely, identifying the conditions under which an effect would be likely to occur. When I taught biostats (only one year), I emphasized that for practitioners it comes down to the patient in front of you. Most associations are conditional rather than universal, so you want to know which ones are likely to apply to this particular person. That’s what causal identification can do for you if it’s done carefully.
The second is, in the long run, much more important. We have slowly been building an edifice of scientific understanding, based on the processes that produce outcomes. Empirical work often occurs at the frontier of that edifice. Here is where testing a mechanistic model is really crucial. You might be able to make predictions about narrowly defined situations without it, but ultimately we’re creating something that can truly explain and generate predictions, or at least hypotheses, well outside the realm of current data.
Incidentally, building that set of explanatory models is a work in progress. Often chunks will be developed that are incomplete in various ways. Newton’s gravitation was like that.
I don’t quite follow your distinction. What I don’t see is how the first purpose (I’m thinking of your example of the patient in front of you) calls for a different model than your second purpose. Whatever the state of the mechanistic models at a point in time, the patient in front of you needs to be evaluated in light of that knowledge plus whatever unique or experiential knowledge pertains to that individual. The second more general purpose you describe is going to be applied to individual cases as well. It may result in algorithms (standard practices) that are sure to include some room for modification based on individual circumstances. So, how do you distinguish between the two purposes you state?
I can know that, under a specified set of conditions, X causes Y without knowing *why* that’s the case. The why part is what I’m referring to as use #2. Empirical methods can be employed to assess how likely a particular why may be, and that may or may not be helpful for use #1.
Peter: That does not seem logical. How can you know if something is causal if you don’t theoretically understand how it occurs? How would you differentiate between the exact same result with a different variable that was purely correlative?
Isn’t that about experimental design? If you’ve got sufficient control, you can adduce causation even if you can’t describe it mechanically. Is this a semantic question?
I think of the example of smoking and lung cancer, where causal association was established early on, but there were (and are) struggles over identifying the precise mechanisms (and mechanisms that attenuate them).
For smoking, the experimental results from animal studies existed early on. Pretending that you don’t need it for behaviors that are not being manipulated appears to be more a slight of hand rather than a causal argument.
A I choose to believe Theory X1 rather than theory X2 without rigorous argument and without an actual experiment.
Not to mention that it was the fact that it was deemed unethical to do controlled experiments on humans with smoking.
Why not use an example that is true, because the situation in reality is opposite what you describe (Andrew asked me not to touch this sacred cow but you can search old posts on here).
Use vitamin C for scurvy, or something like cardiac stents. Historically, without treatment almost no one gets better. But after treatment almost everyone gets better.
Ie, you can establish “causality” with very low NNT interventions that only require small easily reproducible studies.
This article resonates strongly. In academic medicine ‘pragmatic’ trials are quite popular. The definitions vary, but they are generally understood as trials that try to show whether A or B is a better choice in routine care. The idea being, that if a trial is embedded in routine care, the results will be more generalizable (broader inclusion criteria, adherence more as in ‘real-life’, etc). Results might be obtained from a registry for example, and study-specific data collection is minimal. I think paradoxically, this often leads to worse generalization to the patient in front of me, because there’s just no data on the mechanism…
Very simple example: The effectiveness of lipid-lowering drugs depends on the reduction in cholesterol (apoB) achieved. We guide our decision to increase dosage or add a second drug based on this mechanism and lipid-targets. If we had conducted all our trials without assessing the effect of the drugs on the lipids, just on incidence of death or myocardial infarction, we’d still know that the drugs work on average, but not how and it would be much less useful for clinical practice.
This paper on a framework to enable generalized claims about cause and effect also seems relevant.
Esterling, Kevin M., David Brady, and Eric Schwitzgebel. 2025. “The Necessity of Construct and External Validity for Deductive Causal Inference.” Journal of Causal Inference 13 (1). https://doi.org/10.1515/jci-2024-0002.
Wise point Andrew. Importantly, there has been an absence of procedures for explicitly linking mechanistic knowledge to causal interpretations. Toward that end, I have recently developed and demonstrated a process “causal knowledge analysis” that includes procedures for explicitly documenting the mechanistic causal knowledge for some problem of interest. The important point is, “How can we build causal knowledge if we don’t have a way of documenting it?” I now have three papers describing and demonstrating this procedure: (https://doi.org/10.1002/ecm.1628); (https://doi.org/10.1111/ele.70029) (https://doi.org/10.1111/1365-2745.70152).
Jim Grace – new contact information; [email protected]