Epidemiologist Donna Spiegelman sez: SUTVA is “mostly not necessary for valid causal estimation and inference most of the time”

Donna Spiegelman shares this presentation she gave at the recent American Causal Inference Conference. I like what she has to say.

Here are the two parts of the stable treatment value assumption:

1. No interference between units. As Spiegelman says, nowadays it’s not hard to model spillovers. As I say, untangling spillovers is an ill-posed inverse problem that can be solved using Bayesian inference with reasonable priors. Serious practical work has moved past the demonstrate-that-spillover-doesn’t-matter stage to the just-model-the-spillover-directly stage.

2. Deterministic potential outcomes. As Spiegelman says, in the real world, outcomes are stochastic. Jonas and I talk about this in our Russian roulette paper.

The part that I’m less sure about is Spiegelman’s claim that adjustments for pre-treatment variables usually don’t matter. I’m persuaded that they usually don’t matter in the epidemiology and biostatistics applications she’s worked on, but I think that in social science, such adjustments can be important. Especially if there are big treatment interactions and your population is a lot different from your sample.

In any case, I recommend you look through Spiegelman’s slides, as she offers a refreshing perspective compared to our usual obsessive focus on the details of causal identification:

36 thoughts on “Epidemiologist Donna Spiegelman sez: SUTVA is “mostly not necessary for valid causal estimation and inference most of the time”

  1. I like the take-home messages a lot, and they seem to at least mitigate some of the ‘adjustment doesn’t matter’ concerns. Where I have become increasingly more persuaded, working in applied health econ, is that you are usually better off trying to model the process itself than trying to contort your data into complex simulacra of trials.

    I feel as though it shouldn’t be controversial to say that scientific understanding should guide our analysis and interpretation, but causal inference as a discipline is young enough that there are plenty of efforts to carve out fiefdoms proclaiming the advent of the One True Method, and these thrive on drawing strong and often artificial distinctions.

    • Say you want to infer the effect caused by a drug. The drug is given to an individual person, *not* an average person. The individual level is where cause and effect needs to be modeled.

      Yet, standard practice is to look at an averaged difference between drug vs not, then try to infer causality from that.

      I don’t see how this can ever work, and that’s probably related to the layers-upon-layers of jargon that have been piled onto this topic.

      Instead we want:

      1) Models (could be arbitrary) with predictive skill at the individual level, once a successful model exists we can modify input variables to check counterfactuals.

      2) The next, but harder, step is to derive models like in 1 from plausible premises, rather than arbitrary regressions/networks.

      • You keep saying this and I keep asking how realistic it is to expect good predictive ability at the individual level. While I agree that is a great goal, I’ve seen very few data sets where individual predictions are at all accurate. You can get 95% of the predictions within a 95% confidence interval, but I think you are asking for much more than that. And I’m not convinced it is possible.

        • Compare to going on stack exchange in 2015 and reading answers citing the proof that one neural network layer is all you need. The proof is correct, but has nothing to do with the reality of limited time/resources.

          https://en.wikipedia.org/wiki/Universal_approximation_theorem

          I’d rather have imperfect information about the real world rather than perfect information about some hypothetical world that doesnt exist, with a bunch of ad hoc stuff piled on top instead of fixing the root problem.

          Another recent example is the move from LIBOR to SOFR.

        • Anoneuoid –

          I’d rather have imperfect information about the real world rather than perfect information about some hypothetical world that doesnt exist, with a bunch of ad hoc stuff piled on top…

          You’re offering a false framing just to set up your conclusion.

          You contrast “imperfect information about the real world” with “perfect information about a hypothetical world,” but that distinction doesn’t hold. The “real world” is itself hypothetical. Every description of it involves abstraction unless you have perfect knowledge. And hypothetical worlds aren’t purely imaginary; they incorporate aspects of reality.

          So the choice you’re presenting isn’t real. It’s just a rhetorical device.

        • Dale, the point, I think, is that causal mechanisms act at an individual level. For example say blood pressure drugs. Maybe they do some vasodilation, or they reduce total blood volume by affecting the kidneys or they affect the feedback mechanisms that alter stress hormone production, or whatever. we should be looking at those mechanisms and collecting the data needed to determine the extent to which the mechanisms act on individuals. its fine for this to be noisy and not super accurate on the individual level but the aggregate level should come from aggregating effects that are visible on the individual level which are plausible biological mechanisms.

        • Daniel
          Of course the effects are on the individual. The effect of drug A on person B is an individual effect, and of course, modeling this effect should be based on “plausible biological mechanisms.” But, the data will either be on a limited (small) number of individuals in some form of RCT, or for a larger number of people in an observational study. Either way, estimating the effect of drug A on “persons like B” is likely to have too much uncertainty to be useful. That is, unless there are many people “like B,” in which case we are dealing with averages, but across subgroups. If the subgroups are very small, this will look like individual effects, but we are back to the large uncertainties. If the subgroups are large, then we are getting away from individual effects and back to average effects.

          So, I think the criticism is somewhat empty. I agree that we should avoid thinking of average effects as applying to individuals. Andrew has often made the point that most interventions (in medicine or social science at least) will help some and hurt (or not help) others, so the individual effects necessarily diverge from the average effects. But to take the next step and say that we need to study individual effects, and that average effects are useless, seems unrealistic and misguided to me.

        • Anoneuoid, I agree with your sentiment. However, I am wondering what people think about these related comments from the book Causal Inference by Imbens and Rubin. On page 8 they write “learning about causal effects typically requires multiple units… There is sometimes a tendency to view the same physical object at different times as the same unit. We view this as a fundamental mistake.”

        • Brian:

          I disagree with the Imbens and Rubin quote. I mean, sure, yes, I agree with them that units change over time; to say it another way, to generalize unit A from time 1 to time 2 requires assumptions. But it also requires to generalize from unit A to unit B!

          I am bothered by the sleight of hand in which statisticians (including me in my textbooks!) first emphasize that all that can be estimated is the average treatment effect, but then we just take that average and carry it around as if it implied an individual effect. This is just as strong an assumption as the stationarity assumption involved in a before-after study and maybe more pernicious because it is often made implicitly.

        • Andrew
          You say you disagree with Imbens and Rubin when they write “learning about causal effects typically requires multiple units…” but your agree with the obvious point that how to treat the same individual at two time points requires assumptions. Please explain what it is that you disagree with in the Imbens and Rubin statement – I think this is at the heart of the individual vs average question. You only complain about the case of taking “that average and carry it around as if it implied an individual effect.” I agree with you about that being dangerous and requiring strong assumptions (which are not generally satisfied). But that leaves open the question of whether you can learn about causal effects from a single observation. I’m having trouble seeing how you view that.

        • I think its best to explain what I mean by pointing to a study that looks like what I think medical studies should look like, and thats the Ultra processed foods study from 2019 which we discussed twice on the blog (once when it came out and once when there were questions about some data collection which were answered satisfactorally by the authors.

          I’m not saying that we should never do observational stuff or whatever but most health science studies should look a lot more like this one

          https://www.cell.com/cell-metabolism/fulltext/S1550-4131(19)30248-7

        • Daniel
          That study is very well done – but how does it relate to the issue of individual vs average predictions? The study had 20 people. I don’t see individual measurements except in one graph, and it appears that 3 of the 20 had results of the opposite sign of the other 17. I’m having difficulty seeing how this avoids the issue of multiple observations on the need to generalize from a small sample to a large population. I am not criticizing the study – I’m just not understanding how it relates to Anoneuoid’s claim that we need to study individual predictions.

        • Thanks again Anoneuoid, and Robin, Joshua, Daniel, and Dale. Andrew, I think I agree with you. The “distance” between say a person today and that same person tomorrow is less than the “distance” between that same person today and some other similar individual today, even a monozygotic twin.
          To address a related problem during sensitivity analysis of the consistency assumption with stochastic potential outcomes, my collaborators and I distinguish between different kinds of probability spaces: one for the “baseline” population and three (one natural and two forced) filtered probability spaces for each individual over time. See Section 2.1 of https://arxiv.org/abs/2512.21379, which will appear soon in the June 2026 issue of Observational Studies. The ideas behind that paper come from (the coaching of) ice hockey, but we use golf as a simpler example in the paper. To bring it back to the context of this discussion, would you rather study 100 different golfers who play the same hole once each, or one golfer play the same hole 100 times over and over again? I think in the end, if you as a coach want to give a recommendation to that one specific golfer, then it is better to study just that one specific golfer play the hole 100 times in sequence over and over again. However, coaching becomes more complicated in the context of a team sport like ice hockey, where it seems like Donna is right to say that interference (or spillover effects) should be embraced.

        • Brian
          Well, since you bring up golf… if you observe me playing the same hole 100 times in succession or 100 different golfers playing that hole, you are likely to find similar amounts of variability in both cases. I’m not even sure that degree of skill would change that comparison – more skill would result in less variation in both situations. But at my skill level, I wouldn’t place bets on the comparison.

        • The Weber-Fechner law, one of the most well-tested laws of perception, was obtained based on *individual* measurements.

          Learning curves of animals are also individual. Each animal has the same equation describing its learning curve, but each of them takes a slightly different time to learn whatever is being taught. So blindly averaging them together destroys information and does not allow you to learn the curve. You have to shift the curves and then average them to get any useful information. You can also measure the individual learning times.
          See e.g. https://www.pnas.org/doi/10.1073/pnas.0404965101.

          The point is: we need to know how the subjects differ from one another before we can average.
          For medical studies, the study that Daniel linked is a small step forward.

        • I think these comments are straying from the original issue. Nobody is claiming (at least I’m not) that we don’t need individual measurements. The issue was whether individual predictive performance is a sensible goal. The learning curve example seems like an issue of the ecological fallacy – the average relationship differs greatly from the underlying individual relationships. But it sounds like the individual characteristics are still modeled to estimate an average relationship – with that average incorporating the relevant individual characteristics. In my experience, the confidence intervals for that relationship can be useful and accurate (depending on the model assumptions of course). But even when those intervals are useful, the individual predictions are still subject to huge uncertainties.

          I’d put the issue in perspective this way: when I discuss my health (for example, should I go on statins) with my physician, I get advice that comes from studies based on averages over a large number of people (at least broken down by age, smoking history, and a couple of variables – I wish there were more). If my physician offered advice for my individual case, saying that this is a prediction of what will happen to me, I’m inclined to see another physician. This is because I don’t believe we have studies that are based on people “like me” except on a few fairly gross measurements and characteristics. I’m all for eschewing gross averages and having meaningful breakdowns by important characteristics. I am frustrated that the only characteristics used are often age, sex, and a few types of family medical history. I think we can do better than that (at least from observational data – granular RCT data is very expensive while granular observational data is mainly expensive because of the fragmented nature of the US health care system). But the goal of predictive accuracy at an individual level seems unrealistic and counterproductive to me. I think there will always be a tradeoff between granularity and predictive precision.

        • Dale:
          I don’t think anyone is claiming that individual predictions in medicine are valid right now. But we can and do expect individual predictions in other fields to be valid. Earlier I said that you have to shift the individual learning curves so they can be averaged, but after some further reading, I think you don’t even need to do an averaging step. Each person’s learning curve can be calculated (it’s the same for everyone) and their “time to learn” (which varies between people) can be measured. So I would say that individual prediction IS a good goal for medicine to aim at. That will make medicine a true science.

          Here is a quote from the discoverer of homeostasis:
          Certain experimenters . . . published experiments by which they found
          that the anterior spinal roots are insensitive; other experimenters
          published experiments by which they found that the same roots were
          sensitive. These cases seemed as comparable as possible; here was the
          same operation done by the same method on the same spinal roots.
          Should we therefore have counted the positive and negative cases and
          said: the law is that anterior roots are sensitive, for instance, 25 times out
          of a 100? Or should we have admitted, according to the theory called
          the law of large numbers, that in an immense number of experiments
          we should find the roots equally often sensitive and insensitive? Such
          statistics would be ridiculous, for there is a reason for the roots being
          insensitive and another reason for their being sensitive; this reason had
          to be defined; I looked for it, and I found it; so that we can now say:
          the spinal roots are always sensitive in given conditions, and always
          insensitive in other equally definite conditions.

        • Brian —

          “I think in the end, if you as a coach want to give a recommendation to that one specific golfer, then it is better to study just that one specific golfer play the hole 100 times in sequence over and over again.”

          No doubt that’s true. If the goal is to optimize for one particular golfer, repeated observations of that golfer would be best. And if you want to optimize and can only include one approach, then of course you would focus on repeated observations of the individual golfer.

          But you can also learn relevant things from watching the 100 different golfers. If a given golfer tends to slice or hit long, those tendencies might be advantages or disadvantages depending on how the hole plays, and in that sense looking at how different golfers play the hole might give you relevant information you would not get from the same golfer playing repeatedly.

          What I disagree with, to the extend anyone is making this argument, is the argument that average data are useless and only individualized data are valuable. First, I don’t think that maps onto reality, because the categories themselves are not clearly distinguishable. And back to the analogy, what you can learn from the many golfers informs how you understand the one golfer, and the tendencies you learn from the one golfer can be more useful in light of how different golfers interact with the same hole. Second, I reject the idea of an idealized form of individual data on one side and a completely useless form of average data on the other. That feels to me like a version of the nirvana fallacy, where the imperfect option is dismissed simply because it is not the ideal.

          The broader point is that the two sources of information are not mutually exclusive from the perspective of predictive value. They overlap, they inform each other, and they both contribute to understanding how the golfer and the hole interact.

        • Meese
          I will defer to your (and other’s) better knowledge of medical conditions. But I believe it is rare to know with certainty that a treatment works on individual A under conditions X and never works under conditions Y. But I can believe such cases exist. The issue, then, is whether pursuit of such cases is a good thing. I’m not so sure about that. It may seem like “a good goal for medicine to aim at” (your statement), but it comes at a cost. Many conditions and treatments can hope for no such certainty, and if we aim at cases where this can be achieved, we may fail to study the most important cases. So, it isn’t clear to me what it means to state that as a goal. More individual precision in predictions is a good thing – I think we can all agree on that – but that doesn’t automatically translate into what we should study or how we should study that.

        • Dale, I think that’s far too pessimistic. There is also a cost to our current method of giving up on individual prediction. We have almost no real scientific knowledge and a huge replication crisis.

          My proposal:
          1. Get together a huge number of subjects and measure, essentially, dose-response curves or some analog, depending on application. At least one curve per person. Something like the study that Daniel linked.
          3. Find out what causes the individual variation, as was done with spinal root sensitivity.
          4. Make a causal, predictive model that works for everyone (this may require careful averaging, or it may not). Just like learning curves.

          It’s of course easy to say this, but hard to get such a study funded. But studies like this are what would need to be done.

        • Meese –

          this may require careful averaging, or it may not)

          I don’t understand.

          Suppose you’re collecting data on 10k variables for each individual. How would you evaluate which of the variable changes are causal without making comparisons across people (effectively averaging)?

          Without contrasts, every variable change looks equally plausible. What’s the mechanism for identifying relevant variables if you can’t rely on population-level structure? And are you going to have a mechanistic model for each and every variable change?

        • Joshua:
          10k variables?? This is the issue with social and biomedical scientists. I was thinking about 2-3 variables for each person, each with a mechanistic model. Again, look at learning curves. Only a single variable per animal is measured, the success rate over time. It is easy to make causal models of this process, make predictions and rule out alternatives (animals seem to use simple heuristics to make decisions). It is also easy to see that the only real variation between animals is the time that they take to figure out what to do. One single parameter varies.
          Also consider the study linked by Daniel. Only the weight is measured over time.
          Consider a further example: pharmacological models. Pharmacology is where the idea of a dose-response curve comes from. We have mechanistic models for figuring out the concentration of a drug over time in the body. These are very useful.

          You’re not going to find the small number of relevant variables instantly, by the way. It takes time and the efforts of many investigators. But that’s how science is.

          We want to find the simple truth hidden behind the complexity. That is science. Models with 10k variables are essentially black box machine learning models, and they may in certain cases be useful (though I honestly doubt that), but they are definitely not science.

        • Meese —

          Just to clarify, it seems maybe you think I’m a social scientist (with an agenda?)

          I’m not a social scientist (or any kind of scientist). I’m just trying to puzzle my way through this.

          The reason I used a big number like 10,000 variables was to be generous in the sense that more variables could increase predictive power.

          If you’re saying that with a strong mechanistic theory you only need 2–3 variables, I’m not going to argue with that in principle. If you genuinely have a well‑supported causal mechanism, then sure, that naturally reduces the number of variables you need to measure.

          But first, I would imagine there’s a pretty limited number of scenarios where you have such strong mechanistic theories that you can reduce your analysis to only 2 or 3 mechanistic pathways. Second, even if the model has only 2–3 variables, I still don’t see how you avoid averaging or comparisons across people in order to assess causality.

          I’m not familiar with the details of the specific examples you gave, but my assumption is that even those mechanistic models were discovered or validated by looking at variation across individuals. Maybe I’m wrong about that — but that’s exactly why I’m asking. How do you determine which variables shoe causality without relying on population‑level structure?

          And by the way, contrary to an agenda you might think I have, I’m actually a big believer that the antidote to the mass confusion of correlation with causation — especially with the explosion of cheap correlation arguments online — is to require a strong theory of causal mechanisms behind any causal claim.

      • Anoneuoid –

        The drug is given to an individual person, *not* an average person.

        Seems to me you’re drawing an arbitrary line between “individual” and “average,” as if they are cleanly separable. But average effects have some predictive value for individuals. Their predictive value depends on the contex and there’s no universal point where “individual” stops and “average” begins.

        Saying we should only model individual effects is the nirvana fallacy: rejecting valuable but imperfect models because they aren’t an idealized model. It’s fascinating how frequently you do this. Of course contextual pieces are critical, but that doesn’t make average effects meaningless or illegitimate.

        • I agree with the core of this answer. In reality the individual prediction applies to any group of persons with this combination of observed characteristics. There is no real individual in the context of a model’s predictions.

        • @Robin

          There are currently about 2^33 ~ 8.5 billion individuals. So a model with 34 binary features could model every individual with room to spare. It would be better to frame the problem as interpolation vs extrapolation.

          For extrapolation we need my #2: “derive models like in 1 from plausible premises, rather than arbitrary regressions/networks.”

        • Anoneuoid –

          So a model with 34 binary features could model every individual with room to spare.”

          How are you choosing your set of 34 binaries? It seems like the choice would be pretty arbitrary, given the infinite set of possible binaries. The only way it wouldn’t be is if you’re using a hypothetical model to determine inclusion/exclusion criteria. How does this not conflict with your “real world” vs. “hypothetical world” framing?

        • Anoneuoid,

          Thinking about it a bit more… another aspect of what you are saying interests me.

          You reference modeling individuals with a set of 34 binaries. Perhaps that works as a snapshot, but of course individuals change over time. Real people are in flux, their internal states shift, and their responses drift.

          That is why longitudinal data is so valuable, it lets you see how the same person changes across time.

          But assembling longitudinal data gets complicated. Short of longitudinal data, in a large sample it seems to me that averaging across people captures something similar, because the population includes people at different stages of a response trajectory. It is not the same as longitudinal data for individuals, of course, but in some sense it approaches what longitudinal data gives you. In that sense, it feels to me that averaging has an advantage you are not acknowledging.

        • @Anoneuoid
          Theoretically, sure, you could make a model so comprehensive that it could account for all possible sources of variation in outcomes between individuals. I suspect we are a long way away from being able to distil complex phenomena like genetic dose responses and pharmacokinetics to a series of variables. Even data veracity in, say, an EMR — somewhere you want exactly correct entries! — is hilariously poorly verified.

          I suppose you may have been agreeing with my initial sentiment, which is to do our best trying to model the phenomenon itself rather than deriving average treatment effects. I think most diehard causal inference researchers aim for this individualised counterfactual estimation as a holy grail.

      • Meese
        I am not understanding. I need to be concrete. Let’s use the case of statins – I’ve been advised to try these since the risk calculator says there is about a 10% chance of my having a heart attack in the next 10 years (and statins might prevent that). The risk calculator is a severely averaged estimate – based on age, sex, and I think family history (perhaps not even that). I think we can do better. But your suggestion to get a huge number of people and develop dose response curves for each seems utterly unrealistic to me. Of course, I’d love to know how a particular dose of a particular drug will affect my heart attack probability. But I don’t see arranging the huge number of people and developing the individual dose response relationships. It is only partially a matter of cost – the whole exercise seems fraught with impossible issues (how long should the study last, how can I use the output of the study assuming I’m not already dead, how many potential variables do we need to collect data on and how do we estimate these dose response curves with any precision, etc.).

        What I’d like to see is a huge data set comprised of the many people that are already on statins along with myriad variables about them: physical factors, medical history, lifestyle variables, prescription medicines, dietary variables, … In principle this should not be too expensive to collect and the sample size should still be quite large. Since it would only be observational data, it is limited in what it can tell me about my potential outcomes if I start taking statins. And I think the uncertainty in my individual case would still be quite large – but it would be better than the over-simplified risk calculator that is currently available.

        But it seems a long way from the scenario you are describing.

  2. Andrew, thanks for posting this. Donna, thanks for contributing at ACIC 2026. Here are my thoughts in case they help.

    1. I think of consistency as more than just deterministic potential outcomes. There can be consistency violations when the potential outcomes are stochastic. Consistency roughly means that hidden (pre-treatment or concomitant) versions of treatment are irrelevant, and also that the outcome arising naturally is equal (in distribution) to the outcome arising from intervention.

    2. The E-value is defined with maximums and minimums over the support of the unmeasured covariate vector. The notation in https://pmc.ncbi.nlm.nih.gov/articles/PMC4820664/ may obscure how in many realistic situations the bounding factor could actually be quite large, due to maximizing, minimizing, and taking ratios over what could be a high-dimensional space.

    However, computing an E-value is preferable to assuming no unmeasured confounding. Likewise, reasoning about the possibility of bias due to hidden versions of treatment is preferable to assuming that hidden versions are irrelevant.

  3. A few comments about the positivity slide:

    – I some cases, you can actually increase bias by excluding people who can’t experience treatment: https://pubmed.ncbi.nlm.nih.gov/40971781/

    – She says everyone must be able to experience the outcome but even that isn’t always necessary: https://pubmed.ncbi.nlm.nih.gov/40293398/

    – Outcome regression is also affected by positivity violations. If we ignore those violations we are implicitly relying on homogeneity assumptions about the people in the missing strata.

  4. Andrew (if I may), I was in the same session as Donna, with Jonas there too, and want to echo your, Jonas’s, and Donna’s call to incorporate stochastic potential outcomes. In that session I tried to develop the stochastic setting further by addressing Philip Dawid’s (2000) worry that the LATE identification result is in trouble there. His concern is that a single joint distribution over everything—factual and counterfactual—presupposes a strong metaphysical view, namely Laplacian determinism. I sidestep this by staying neutral about it: rather than a single joint distribution, I use one only over the factuals plus a separate probability space for each possible intervention, and prove a stochastic generalization of the identification result on that basis. I hope this satisfies Dawid.

    I also connected this to a precursor of his worry—the 1970s debate between David Lewis and Robert Stalnaker on counterfactuals in philosophy.

    My ACIC presentation is based on this manuscript (still a bit rough, but has everything in my presentation):
    https://arxiv.org/pdf/2605.12847

    In case slides are preferred:
    https://www.dropbox.com/scl/fi/tc1dj0vtg1sa5xrpd2kvb/Lin-Never-Too-LATE-2026-ACIC.pdf?rlkey=5hdwhkrmv45907gcpryw63vwg&dl=0

    • Given that I’ve been obsessed with stochastic counterfactuals for a while now, let me share my alternative approach to addressing Dawid’s criticism in which I develop a semantics for stochastic causal counterfactuals, including a connection to the Lewis/Stalnaker debate: https://proceedings.mlr.press/v275/beckers25b.html

      A follow-up paper on the topic is currently under review (including a brief discussion of Andrew’s Russian roulette paper), any feedback is more than welcome: https://arxiv.org/abs/2512.12804

    • Hanti and Sander (or anyone else), can you help answer this for me? Suppose there is a penalty kick in soccer/football. It’s just the kicker vs the goalie. For simplicity, there are only two actions available to the kicker. The kicker can kick the ball to the kicker’s left or right. At the moment when the kicker kicks, say to the right, there is a stochastic potential outcome. Suppose at that moment that the probability of success is 50%. But success is not determined like a coin flip. There is an opponent (the goalie). That opponent is conscious, and that opponent will react to the kick. Also, for the most part the kicker is “done” after the kick, and can only helplessly watch during those moments as the ball sails through the air. But the kicker is still free to take additional actions to help cause success, like yelling (the sound of the yell could pass the ball and reach the goalie). So in general my question is this. How do you define a potential outcome as a random variable when additional actions from the individual or actions from other individuals can still influence the outcome. How are we justified philosophically in focusing on the treatment (kicking) as the most important cause, when a great many other things matter as well? Are exclusion restrictions/assumptions necessary for causal inference?

      • I believe there are many issues going on at the same time here.

        1: I regard the very notion of a potential outcome variable as something rather strange. To me there are just “normal” variables, and we can speak of those variables taking on values in different possible worlds, one of which is the actual one, all others are counterfactual. I see no need for introducing special variables to accomplish this. Therefore I’d prefer simply having a normal variable for the outcome (score or not), and that variable is given meaning entirely independently of whatever causal setup it finds itself in, be it stochastic, one cause, or multiple causes. So the question of defining a variable doesn’t show up.

        2: As far as I can tell the primary focus on just a single treatment variable is also rather specific to the potential outcomes literature, and I suspect — but I’m just guessing — it’s due to the fact that the standard medical treatment scenario is the most paradigmatic one to which it is currently applied. In the Pearlian framework one can focus on as many causal variables for any effect as one likes, although I agree that even there this is not done so often in practice.

        3: Now of course there is always a context-dependent choice to be made as to which variables to include and which ones to ignore, but that’s unavoidable. The discussion on what criteria to use to leave out certain variables is a hard one, and one that has received far too little attention. At least in the literature on actual causation, both on the philosophical and on the computer science side, this topic has been studied to some extent. A classic example is that we do not usually consider the presence of oxygen to be a cause of a flame when a match is being struck, and yet structurally it plays the same role as the actual striking of the match. But once one chooses to add a variable for the presence of oxygen, it only seems right to say that it’s also a cause of there being a flame, and similarly with there being a variable for the shouting of the kicker in your example.

        4: Still, although the issue of variable selection has received attention in the actual causation literature, a big hiatus is that work on a quantified account of actual causation is relatively scarce, and yet that seems to be what is at stake in your example, given that you speak of “most important”. Obviously one would imagine probabilities come into this, but how exactly, and what other factors might enter into it, is far from clear. (The topic of root cause analysis also comes to mind as a recent addition to this literature.)

        I hope this somewhat addresses the issue you raise, but I might be misinterpreting what you’re after…

Leave a Reply

Your email address will not be published. Required fields are marked *