Skip to content

Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking

As I’ve written here many times, my experiences in social science and public health research have left me skeptical of statistical methods that hypothesize or try to detect zero relationships between observational data (see, for example, the discussion starting at the bottom of page 960 in my review of causal inference in the American Journal of Sociology). In short, I have a taste for continuous rather than discrete models.

As discussed in the above-linked article (with respect to the writings of cognitive scientist Steven Sloman), I think that common-sense thinking about causal inference can often mislead.

In many cases, I have found that that the theoretical frameworks of instrumental variables and potential outcomes (for a review see, for example, chapters 9 and 10 of my book with Jennifer) help clarify my thinking.

Here is an example that came up in a recent blog discussion. Computer science student Elias Bareinboim gave the following example: “suppose we know nothing about the world, except that one causal link is missing (e.g., skin color does not affect intellectual capacity).” Bareinboim describes this as a “transparent set of assumptions” but to me it’s not transparent at all. That’s ok, that’s my problem not his. But to resolve my problem, I’ll bring out my tools to understand it.

What does it mean to say that “skin color does not affect . . .”? I have to imagine an alteration of skin color. There are different ways to do this. For example, I could go to the beach and get a tan. This could well negatively affect my cognitive skills (we can call this the Jersey Shore theory). Or maybe at conception you could switch some of my genes around. Assuming this sort of manipulation were technically possible, it would change other things about me than skin color. That’s ok. Similarly, tanning has effects other than changing my skin, it also puts me at the beach (or the tanning salon) rather than in the library where I might be improving my intelligence.

All of this is the “instrumental variables” way of thinking about the world. If you want to understand the effect of some observed condition X on an outcome Y, you manipulate some instrument I that affects X, then you look at the effects of I on X and on Y.

The example X = skin color is typical in that there are different possible instruments that can be imagined, and these will have different effects on Y. For that reason, I find a claim such as “skin color does not affect intellectual capacity” to be undefined (until I know what instrument is being considered to affect skin color) and implausible, in that any instruments I can think of would have some (possibly small) effects on intellectual capacity. This is the “potential outcome” approach: we consider possible outcomes under different potential treatments (that is, different assignments of the instrument).

For an applied example, you can see our 1990 article on incumbency advantage, where we were explicit in defining conditions and potential outcomes. The point is that in studying such causal relations, it can be helpful to define the manipulation or instrument explicitly, even if it only has a theoretical existence. In that sense, instrumental variables and potential outcomes are a sort of accounting principle, giving us a tool to define as precisely as possible what we are studying.


  1. Gabriel says:

    Wouldn’t it be easier to think about this in terms of an experimental approach? In other words, a “causal effect” is what you would observe as the effect of a hypothetical experiment where you directly manipulated condition X and then observed outcome Y.

    The instrumental variables approach to thinking just seems to complicate matters, because then (as with a real world IV application), you have to think about whether the I you can imagine affects Y in some way other than through X.

    In your example, it’s impossible to think of an experiment which involves directly changing skin color. As I see it, THAT is why the statement “skin color does not affect intellectual capacity” is not defined.

    • Andrew says:


      Yes, I’m thinking of potential experiments. I find the instrumental variables formulation a helpful way of thinking this way. As noted above, I can think of different ways to directly change skin color, hence different possible experiments, different possible instruments, different possible causal effects.

  2. I don’t think you are really disagreeing with Elias, you’re just imagining different things. If I’m understanding correctly, Elias is thinking about what would happen if you were to directly set the value of the variable Skin_Color. He is, in effect, treating it like you are treating your instrumental variable I.

    Suppose you had a perfect instrument for Skin_Color: an instrument that directly affects only Skin_Color. The idea for Elias (I think) is that if you used that instrument to change the value of Skin_Color, the value of Intellectual_Capacity would not change. Whereas, if Skin_Color is a cause of something, like maybe how likely a mosquito is to bite you, then we expect that if one uses a perfect instrument to change Skin_Color, the other variable, say Bite_Frequency, changes as well.

    On preview, Gabriel seems to be saying the same thing.

    • Andrew says:


      No, that’s my point: the statement “directly set the value of the variable Skin_Color” can mean different things if applied to the real world. There are different ways so set this variable.

      • Ricardo says:

        I think I understand what you mean, but if there is no imaginable way of changing “X” with “I” that does not directly affect “Y”, then in principle one cannot say anything about the causal relationship between “X” and “Y” without throwing in even more untestable assumptions. And what about the effect of “I” on “Y”? If I randomize “I” in an experiment (apologies for the awful number of “Is” here!), but in some other situation “I” is chosen based on some decision theoretical criterion (say, because I know “I = i” in the randomization scenario maximized the expected value of “Y”), how can I say anything about transfering my results in randomization to the later scenario without making assumptions about invariance? Should “I” be defined by some meta-instrument “I'” that causes “I”? How is “I'” set etc.

        And if it assumed there is a way of changing “X” with “I” that does not directly affect “Y”, then I don’t see how this is fundamentally different from what Elias was talking about. There are of course parametric assumptions one can also exploit which purely graphical models cannot account for (that’s where potential outcome/structural equation models can be particularly important), but I don’t think that’s the main point here…

      • I would agree with you if the claim were that one cannot directly set things like Skin_Color. Or if the claim is limited to the point that real-world manipulations could be carried out in many different ways. But I think I must still be missing something because it looks to me like you have no problem with directly setting the value of your instrument I. If that’s right, then I don’t understand why directly setting Skin_Color is mysterious but directly setting something that affects Skin_Color, like melanin concentration or something in the DNA or whatever, is unmysterious. And if manipulating an instrument is mysterious, then don’t we end up with a vicious regress in which no manipulation is ever unmysterious?

        • Andrew says:


          I can imagine more than one way to set skin color. I can imagine going to the beach. I can imagine altering a photo (in an online setting). I can imagine altering genes before birth. These can have different effects, that’s why I prefer to consider specific interventions rather than considering unspecified changes in the variable.

          • Agreed. But you can also imagine different ways to go to the beach or alter a photo or alter genes, right? For example, you could go to the beach in Florida or you could go to the beach in California. Or you could go on a sunny day or on a cloudy day or in the summer or on a Tuesday or …

            I agree that the way a variable gets set to whatever value it takes is typically under-specified. Maybe ineliminably under-specified. And I agree that this sometimes matters, for example when I am trying to replicate someone else’s experiment. But I’m not yet seeing how appealing to instruments — giving more specificity about how an intervention is actually carried out — leads to greater conceptual clarity about causation in general. In this regard, you seem to think that directly setting the value of a variable is unclear or unhelpful or mysterious in some way. Do you find it similarly mysterious or unhelpful or unclear when someone says that they set the variable Goes_to_Beach to “yes”? If so, doesn’t that land you in a vicious regress? After all, if you have to have an instrumental variable in order to understand any proposed intervention/manipulation, then don’t you need an instrument for the instrument and another one for the second one and so on? If not — if you don’t think that setting Goes_to_Beach to “yes” is mysterious in the same way as setting Skin_Color to “dark” or whatever value you like –, then what is the difference between the two cases?

          • Andrew says:


            Yes, and to the extent that choosing a different day to go to the beach could yield a different effect, I’d want to include that. But that’s exactly the point: I’d like the discussion to be at the level of specified possible (or hypothetical) treatments, rather than what seems to me a context-free statement about altering a variable. It’s not a “vicious regress” (to use your term) because I have a much clearer sense of what it means to go_to_the_beach than to alter skin_color.

  3. BP says:

    Your interpretation of instrumental variables is a bit foreign to me, but it sounds like you’re getting at the local average treatment effect. You imagine one treatment effect if the instrument which moves skin color from one shade to another is tanning at the beach, and another if it is direct genetic manipulation, and another if it is some sort of surgery as an adult. None of these identify the ATE of skin color on intelligence, but they are presumably better than nothing.

    My bigger philosophy of science concern is about manipulability, i.e. no causation without manipulation.

  4. Manoel Galdino says:

    I’m a bit surprised that people are missing what Gelman is saying. The whole point is that when you set the value of the variable Skin_Color, you define what is the causal effect that you’re supposed to estimate. Just talking about setting the value of the variable isn’t the same as talking about the causal effect of skin_color. There is no such a thing as the causal effect of skin color. There is only the causal effect of skin_color against a counter-factual world, in which we define precisely the instrument (or the treatment, if you prefer). In other words, there is only the causal effect of an specific value of skin_color.

    I had a problem like this in my current job a few weeks ago. Our client asked us to devise an experiment to test the causal effect of segmenting (targeting) an e-mail marketing. The problem is that they didn’t realized that they were asking us a meaningless task. Without defining what would be the counter-factual world, I can’t devise an experiment. In fact, It was quite hard to explain to her that the implicit counter-factual (no segmentation) didn’t make sense in the context she was talking about. What she really wanted was to test if a good segmentation (in a definite, operational sense) was better than a poor segmentation (in a definite, operational sense as well). The problem is that she lacked the potential outcome/IV framework to think clearly about causality. I think it would really help if people thought in this way.

    • Ricardo says:

      I don’t think people are misunderstanding the issue. It is fair to say there are some variables which one cannot imagine a practical way of changing Y only through X. One can think using potential outcomes if so desired, but this is still just a blackbox prediction problem given a regime indicator one can still define mathematically anyway. In any case, if we fail to accept a well-defined invariant notion of intervention on X, we have two options: i) give up and estimate E[Y | I] (which might answer a question nobody cares e.g. the effect of going to the beach on intellect); ii) add some untestable assumptions that link I to X and Y and somehow accept whatever notion of causal effect of X and Y is obtained is better defined than a perfect intervention on X – which I don’t buy at all.

      • Andrew says:


        In general I don’t find it so helpful to work with concepts such as “only through X.” I’d prefer to think of potential interventions that change skin color (or whatever) and go from there.

        • Ricardo says:

          Fair enough. Personally, I think it is easier to think in terms of variables to which one feels comfortably postulating the notion of a perfect intervention, even if not humanly possible. Ceteris paribus is an assumption, not a law, as you have discussed in many nice examples through the blog. But, really, my take is that if one cannot possibly hypothetize a perfect intervention for X, then one should throw it away and not try to preserve it by conjuring an IV – because whatever the “causal effect of X on Y” is, this will still not make any sense to me regardless how clever the choice of IV is (don’t get me wrong, I think IV analysis can be very helpful when one cannot manipulate X in practice – but this will not make the problem of a ill-defined causal effect of X on Y go away). In this situation, I see no shame on settling to estimate E[Y | I] is this is a reasonably interesting question.

  5. Ilir Deebran says:

    Your views may change if you make the right things continuous and the right things discrete. Alex Gheg has a new consumer theory that makes more accurate predictions and allows the measurements of previously hidden thoughts, which means we can measure utility growth. Quantity, quality, variety and convenience in one equation.

  6. Fernando says:

    Assumptions are just assumptions.

    According to Pearl, to identify the effect X -> Y using instrument Z you need three conditions, not two:

    1. Z -> X
    2. not(Z -> Y)
    3. not exist u s.t. Z Y

    The latter merely rules out unobserved common causes that affect both the instrument and the outcome. Might be trivial if we are doing the manipulation and Z is randomized, not otherwise.

    Why don’t you like assumption not(X -> Y) yet are happy with assumption – not(Z->Y)?

    PS Is it possible to add *Markdown* markup language in comments? Combined with Mathjax would help the discussion (test $\sum_i$).

    • Andrew says:


      I have no idea what is the meaning of not(). As explained above, I find it helpful to think in terms of hypothetical interventions or instruments.

      • Fernando says:

        not(Z->Y) is a short way of saying Z does not affect Y directly.

        If I read you correctly you criticize Elias for making statements of the sort “Assume X has no effect on Y” or not(X->Y).

        But an exact same statement is implicit in any variable regarded as an instrument.

        • Andrew says:


          I’m not criticizing Elias. In fact, I explicitly wrote, “That’s ok, that’s my problem not his.” I’m just giving my perspective, under which there are different possible ways to alter skin color. I find it helpful to specify a single (possibly hypothetical) intervention, or some set of possible interventions. That’s how I think of causal effects, which is why I have difficulty talking about the effect of skin color in the abstract without considering the intervention.

          • I agree with Andrew that an instrument “helps one’s thinking about causality”, but if the aim is to help our thinking about causality, there are other sources of information beside familiarity with a specific instrument that can help us think. For example, why not use scientific knowledge, “the moon causes the tides”, “earthquakes cause damage”, etc. Such knowledge should help us think even though we have no instrument in mind.  Apropos, here is a quote from a recent article by Bollen and Pearl, “Eight Myths about Causality …”

            Myth #3 No causation without manipulation.

            In an influential JASA article, Paul Holland (1986:959) wrote on causal inference, he discusses the counterfactual or potential outcome view on causality. Among other points, Holland (1986:959) states that some variables can be causes and others cannot: “The experimental model eliminates many things from being causes, and this is probably very good, since it gives more specificity to the meaning of the word cause. Donald Rubin and I once made up the motto


            to emphasize the importance of this restriction.”

            Holland uses race and sex as examples of “attributes” that cannot be manipulated and therefore cannot be causes and explicitly criticized SEMs and path diagrams for allowing arrows to emanate from such attributes.

            We have two points with regard to this myth: (1) we disagree with the claim that the “no causation without manipulation” restriction is necessary in analyzing causation and (2) even if you agree with this motto, it does not rule-out doing SEM analysis.

            Consider first that the idea that “no causation without manipulation” is necessary for analyzing causation. In the extreme case of viewing manipulation as something done by humans only, we would reach absurd conclusions such as there was no causation before humans evolved on earth. Or we would conclude that the “moon does not cause the tides, tornadoes and hurricanes do not cause destruction to property, and so on” (Bollen 1989:41). Numerous researchers have questioned whether such a restrictive view of causality is necessary. For instance, Glymour (1986), a philosopher, commenting on Holland’s (1986) paper finds this an unnecessary restriction. Goldthorpe (2001:15) states: “The more fundamental difficulty is that, under the – highly anthropocentric – principle of ‘no causation without manipulation’, the recognition that can be given to the action of individuals as having causal force is in fact peculiarly limited.”

            Bhrolchain and Dyson (2007:3) critique this view from a demographic perspective: “Hence, in the main, the factors of leading interest to demographers cannot be shown to be causes through experimentation or intervention. To claim that this means they cannot be causes, however, is to imply that most social and demographic phenomena do not have causes—an indefensible position. Manipulability as an exclusive criterion is defective in the natural sciences also.” Economists Angrist & Pischke (2009:113) also cast doubt on this restrictive definition of cause.

            A softer view of the “no causation without manipulation” motto is that actual physical manipulation is not required. Rather, it requires that we be able to imagine such manipulation. In sociology, Morgan and Winship (2007:279) represent this view: “What matters is not the ability for humans to manipulate the cause through some form of actual physical intervention but rather that we be able, as observational analysts, to conceive of the conditions that would follow from a hypothetical (but perhaps physically impossible) intervention.” A difficulty with this position is that the possibility of causation then depends on the imagination of researchers who might well differ in their ability to envision manipulation of putative causes.

            Pearl (2011) further shows that this restriction has led to harmful consequence by forcing investigators to compromise their research questions only to avoid the manipulability restriction. The essential ingredient of causation, as argued in Pearl (2009:361) is responsiveness, namely, the capacity of some variables to respond to variations in other variables, regardless of how those variations came about.


          • revo11 says:

            can’t reply one level down..

            I think the issue is not that human manipulation has to occur for causation to exist, but that the manipulation (not necessarily by a person) has to be specified for a causal effect one is estimating to be well defined.

            For example, I might make a claim, _without_ manipulation that “taking drug X will cure Y”. I then run a trial where patients ingest X, nothing happens, and I conclude that the statement “drug X will cure Y” is false. However, it turns out that the drug had to be taken intravenously in order to be effective. The problem is that without specifying the specific manipulation, what I am measuring the causal effect of is not precisely defined because “taking drug X” corresponds to many different meanings with different effects.

            Perhaps you could say that this is a straw man, that you could define the “cause” needs to be more precisely as an “intravenously taken dose”, but then the endpoint of defining the cause as precisely as necessary for the effect to be well-defined is that one has defined a manipulation.

          • Ricardo says:

            I think revo11 got it exactly right. We cannot really disentangle the notions of causes and manipulations. Variables have to be precisely defined, and I’d rather not let its definition change according to different choices of manipulation but according to what it means to be “perfectly intervened on”.

          • Andrew says:


            The variable “skin color” can be perfectly defined using some optical device. But I think that any statement about “the effect of skin color” is most clearly defined by stating (at least) one potential intervention. Otherwise I really don’t know what people are talking about. See my discussion of Sloman’s book as linked in the above blog post. I think people write all sorts of sloppy things about causality.

          • Ricardo says:


            Thanks for the time taken with your comments, I really appreciate it. I don’t want to abuse your patience (as usual, this blog has been a great forum), so let me just make one more comment.

            I certainly agree from the very beginning that one has to think about which manipulations we are interested in, but I don’t see an easy way disentangle it from the very definition of the measurements being made.

            Even “Skin color” is not that straightforward to agree on. Is it pigmentation due to genetic factors? Due to exposition to the sun? Body paint? Raw measurements from naked eye? (why?) Hold on, is the researcher really interested in the effects of skin color, or of something else (genetic background?) of which skin color is a indirect observation? We know that there are different “types” of skin color if we know there are different ways of manipulating it. It is all circular. We need to have some reasonable amount of information about the measurements because otherwise our manipulation can be a confounder of both X and Y (does sun tanning take time away from mental exercises to improve intelligence?), and in this case we don’t really learn anything. Yes, I agree there is a lot of sloppiness in causal research, but I see it all starting with measurement.

    • Fernando says:

      My comments get garbled in HTML. Assumption 3 should read:

      3. not exist u s.t. u->z and u->Y

      When I put u in middle of Z and Y with neither u nor arrows are rendered….

      • * On assumptions (or, “skin color does not affect intellectual capacity”-type of claim as a logical necessity for the IV analysis) 
        I do not have much to add to the technical comments made by Ricardo and Fernando, the assumptions are not personal preferences but embedded in the IV analysis, I just want to make this point more explicit.

        A claim of the type ‘skin color does *not* affect intellectual capacity’ means that there is no direct causal effect of ‘skin color’ on ‘intellectual capacity’, and in the instrumental variable analysis, at least as understood in Epidemiology, Economics, and CS, the assumptions that the instrument does NOT affect the outcome is needed, i.e., the claim that there is no direct affect of I on Y (only through X). (They are usually called ‘exclusion restrictions’ in Econometrics, and are depicted as missing-arrows in the graphs. )

        In other words, the IV scenario encompasses several assumptions, and one of them is precisely of ‘no direct effect’-type, which means that, technically speaking, we cannot avoid judging the plausibility of this type of claim in order to proceed with the IV analysis. 


  7. Mayo says:

    There’s a difference between a concept being meaningful and your being able to operationalize it; much less does a concepts’ meaningfulness depend on being able to to give instruments for its detection.

    • K? O'Rourke says:

      I think you are correct that some clarity may come from a semiotic or representational analysis – what is being represented for what purpose?

      Andrew may be thinking in terms of Pearl’s responsiveness in trying to get at what is responsive to what I mean/define as “change in skin colo[u]r”.

      (These responsivenesses can be represented – but not empirically verified with any certainty [models always mis-represent if it was _reality_ that was intended to be represented].)

  8. Steve Sailer says:

    When Americans say “skin color” they don’t mean skin color, they mean race, which is, basically, who you are descended from. You can’t change who your ancestors are.

    • Andrew says:


      Could be. That’s why I prefer to consider a specific comparison (or set of comparisons) rather than just using a phrase such as “skin color” as if its causal properties are clear. To me, it’s difficult to understand a claim such as “skin color does not affect intellectual capacity” without knowing what way it is that skin color could be altered, whether by genes or tanning or whatever.

      • Fernando says:

        I think this gets to construct validity. There is a distinction between our concept of X and an actual manipulation of X.

        For example, we talk whether labor training programs are effective in changing employment status or increasing compensation. But each training program is very different. Some only recruit youth, other women, some involve formal teacher pupil, other are seminar type, groups learning, whatever.

        So if X is the concept of labor training, X=x is an actual implementation (x could be a vector of characteristics). The question is whether we care about X -> Y or x -> Y. A single experiment identifies the latter. A sequence of them may identify the former.

        BTW this is a hierarchical model where X may contain population parameters, x is member of population: much like Bois et al.

        • Fernando says:

          OK, I think I’d found what Andrew is getting at.

          Define DAG G(X -> Y; X Y) where X causes Y but effect is confounded by unobserved U.

          An experiment to identify the effect of X on Y is an instrument Z that yields G'(Z -> X -> Y; X Y). This allows us to identify the effect of X on Y using IV.

          Z is an experimental protocol, an intervention, an encouragement to take X.

          In thinking about causality it might be more natural to think only about Zs.

          • Fernando says:

            Again HTML suppresses my notation. G(X -> Y; X Y) should read G(X -> Y; U -> X; U -> Y). When I type X U Y with arrows in between X U and Y, the arrows and the U disappear in HTML so the published statement makes no sense whatsoever.

            If this blog had Mathjax enabled we could write $X \rightarrow U \leftarrow Y$ and avoid confusion.

  9. Phil says:

    Elias goes on to quotes Bollen: “Consider first that the idea that “no causation without manipulation” is necessary for analyzing causation. In the extreme case of viewing manipulation as something done by humans only, we would reach absurd conclusions such as there was no causation before humans evolved on earth. Or we would conclude that the “moon does not cause the tides, tornadoes and hurricanes do not cause destruction to property, and so on” (Bollen 1989:41).” A good point! We can indeed talk about causation, and even be sure that it occurs, even when there is no manipulation and no prospect of making the necessary manipulation. (We are not going to be able to remove the moon or to eliminate hurricanes and tornadoes).

    But I think it’s worth noting that at least we can perform “thought experiments” to elucidate what we mean by claims like “the moon causes tides”: we can _imagine_ eliminating the moon, and agree that when we say “the moon causes tides” we mean that “without the moon there would be no tides.” Well, actually there would still be tides because the sun causes them too. So maybe we mean “the tidal range would be lower without the moon,” or even “tides would be very different if there were no moon.” Reasonable people could probably agree on what manipulations could be performed to test the hypothesis that “the moon causes tides”, even if those manipulations are not possible. And in this case, since the physics of the situation is well understood, it is possible to confirm through calculation that this is true.

    In contrast, as Andrew points out in his post, reasonable people might not agree on what is meant by the assertion that skin color affects intelligence.