The butterfly effect: It’s not what you think it is.

John Cook writes:

The butterfly effect is the semi-serious claim that a butterfly flapping its wings can cause a tornado half way around the world. It’s a poetic way of saying that some systems show sensitive dependence on initial conditions, that the slightest change now can make an enormous difference later . . . Once you think about these things for a while, you start to see nonlinearity and potential butterfly effects everywhere. There are tipping points everywhere waiting to be tipped!

But it’s not so simple. Cook continues:

A butterfly flapping its wings usually has no effect, even in sensitive or chaotic systems. You might even say especially in sensitive or chaotic systems.

Sensitive systems are not always and everywhere sensitive to everything. They are sensitive in particular ways under particular circumstances, and can otherwise be quite resistant to influence.

And:

The lesson that many people draw from their first exposure to complex systems is that there are high leverage points, if only you can find them and manipulate them. They want to insert a butterfly to at just the right time and place to bring about a desired outcome. Instead, we should humbly evaluate to what extent it is possible to steer complex systems at all. We should evaluate what aspects can be steered and how well they can be steered. The most effective intervention may not come from tweaking the inputs but from changing the structure of the system.

Yes! That’s an excellent, Deming-esque point.

Bradley Groff pointed be to the above-linked post and noted the connection to my recent note on the piranha principle, where I wrote:

A fundamental tenet of social psychology, behavioral economics, at least how it is presented in the news media, and taught and practiced in many business schools, is that small “nudges,” often the sorts of things that we might not think would affect us at all, can have big effects on behavior. . . .

The model of the world underlying these claims is not just the “butterfly effect” that small changes can have big effects; rather, it’s that small changes can have big and predictable effects. It’s what I sometimes call the “button-pushing” model of social science, the idea that if you do X, you can expect to see Y. . . .

In response to this attitude, I sometimes present the “piranha argument,” which goes as follows: There can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see any consistent effects of anything in observational data.

I’m thinking of social science and I’m being mathematically vague (I do think there’s a theorem there somewhere, something related to random matrix theory, perhaps), whereas Cook is thinking more of physical systems with a clearer mathematical connection to nonlinear dynamics. But I think our overall points are the same, and with similar implications for thinking about interventions, causal effects, and variation in outcomes.

P.S. This is related to my skepticism of structural equation or path analysis modeling and similar approaches used in some quarters of sociology and psychology for many years and promoted in slightly different form by Judea Pearl and other computer scientists: These methods often seem to me to promise a sort of causal discovery that cannot be realistically delivered and in which in many cases I don’t even think makes sense (see this article, especially the last full paragraph on page 960 and the example on page 962), and I see this as connected with the naive view of the butterfly effect described above, an attitude that if you just push certain buttons in a complex social system that you can get predictable results.

In brief: I doubt that the claims deriving from such data analyses will replicate in new experiments, but I have no doubt that anything that doesn’t replicate will be explained as the results of additional butterflies in the system. What I’d really like is for researchers to just jump to the post-hoc explanation stage before even gathering those new validation data. The threat of replication should be enough to motivate people to back off of some of their extreme claims.

To speak generically:
1. Research team A publishes a paper claiming that X causes Y.
2. Research team B tries to replicate the finding, but it fails to replicate.
3. Research team A explains that the original finding is not so general; it only holds under conditions Z, which contain specifics on the experimental intervention, the people in the study, and the context of the study. The finding only holds if the treatment is done for 1 minute, not 3 minutes; it holds only in warm weather, not cold weather; it holds only in Israel, not in the United States; it works for some sorts of stimulus but not others.
4. Ideally, in the original published paper, team A could list all the conditions under which they are claiming their result will appear. That is, they could anticipate step 2 and jump right to step 3, saving us all a lot of time and effort.

P.P.S. This post was originally called “Of butterflies and piranhas”; after seeing some comments, I changed the title to focus the message.

47 thoughts on “The butterfly effect: It’s not what you think it is.

  1. “Cook is thinking more of physical systems with a clearer mathematical connection to nonlinear dynamics.”

    The math is control systems theory. In general, the most interesting thing about a control system is how it maintains stability, not how it gets perturbed, but the small input change that results in a big output change gets the attention because it is the “man-bites-dog.”

    The butterfly was always a lousy choice. The atmosphere is overdamped by orders of magnitude with respect to the effects of a butterfly flap. Better to start with a pebble that rolls down a hill, causing a catastrophic flood in the city. That hews much more closely to how instability works.

  2. “A fundamental tenet of social psychology, behavioral economics, at least how it is presented in the news media, and taught and practiced in many business schools, is that small “nudges,” often the sorts of things that we might not think would affect us at all, can have big effects on behavior (…). There can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see any consistent effects of anything in observational data.”

    I wonder if, and how, this relates to a possibly neglected (but crucial) aspect of science: principle of parsimony. According to this source (http://www.oxfordreference.com/view/10.1093/oi/authority.20110803100346221), this can be defined as:

    “The principle that the most acceptable explanation of an occurrence, phenomenon, or event is the simplest, involving the fewest entities, assumptions, or changes”

    I am not well-versed concerning philosophy of science so please correct me if i am wrong about any of this, but i get the feeling the principle of parsimony hardly ever gets mentioned in psychology for instance.

    I think this has posssibly contributed to many sub-optimal practices in psychology, for instance possibly re-inventing the wheel by coming up with new names/constructs for existing ones (e.g. “grit” vs. “conscientiousness”), and emphasizing/hyping research into tiny, highly variable effects.

    Perhaps you could even go so far as saying that not (attempting to) include constructs or things that have much larger effects on things paves the way for gross misinformation, and a skewed picture of reality.

      • Thank you for the links. Next to not being well-versed in philosophy of science, i am also not well-versed in statistics, so please forgive me if i say something nonsensical in all of the following.

        I assume the scientific goal of, let’s say psychology, is to explain behavior. If this is correct, i reason this should come with some sort of weight of the variables involved in this explanation, or some comparison to other things that might explain things.

        In one of the links you provided you wrote the following: “Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better.”

        I think what i am refering to with the principle of parsimony is that 1) you can use as many variables (parameters?) as you like or want to use in your explanation, just as long as there is some information about their relative weights/importance, and 2) you could (should?) perhaps focus your attention (and resources) in trying to explain behavior on those variables (parameters?) which explain the most/best.

        If not, what is stopping you from adding thousands of variables (parameters?) to your study. Perhaps they will all explain a fraction of behavior, but i would reason that would not be very useful, even though technically their inclusion could result in “a better” explanation from a statistical point of view (i.c. higher total variance accounted for)?.

        I am not sure if this reasoning makes scientific sense though, like i said: i am bad with statistics and know almost nothing of philosophy of science. It’s just what i intuitively think could be important in (the goal of) science. If i am not mistaken, this reasoning fits nicely with some statistical analyses where there is talk of “unique variance explained”, etc. I also wonder if, and how, this reasoning could potentially relate to Meehl’s “everything correlates with everything” point (https://journals.sagepub.com/doi/10.2466/pr0.1991.69.1.123).

        Or in different words:

        1) If the goal of, let’s say, psychology is to explain behavior, i reason there should be (at least) mentioning of the (relative) weight/importance/size of variables in predicting this behavior.

        2) I would intuitively additionally reason that it could be even better when there would be a research program that systematically adds and compares different variables to try and determine the relative, and unique, weights of all possible variables (explanations) to in turn find the “best”, “simplest”, or “most parsimonious” explanation.

        If 1) and 2) are done, i reason you are adhering to what i reason is the most crucial part of my interpretation of the princple of parsimony. I feel 1) is sometimes done, and 2) not so much. But again, i don’t know if any of this makes any sense whatsoever :)

      • Quote from above: “I see the “nudge tenet” as a warped version of the principle of parsimony.”

        I view the “nudge tenet” as a possible result of not adhering to the principle of parsimony in research.

        In my reasoning, all this research on nudges seems to be about tiny, higly variable and/or non-replicable effects, which i reason is a possible result of a cetain type of research that is probably based on small sample sizes, p-hacking, bad “theorizing”, etc.

        I reason that this all could be a direct result of bad practices in combination with (and facilitating) not giving priority to the principle of parsimony in theorizing, and experimental study design (“parsimony” in the sense of trying to theorize with, and find, the most important/biggest influencing variables to explain behavior).

        • OK — different definitions of parsimony. What I was thinking of is that the “nudge” theories are often “promoted” as giving simple explanations.

  3. Most systems observed in the world are at least semistable. We have a version of the anthropic principle to thanks for this. If either people or the physically world were radically unstable, it’s quite unlikely we’d be around to observe it. A world in which people were prone to large effects from small stimuli would cause personality instability unless either (a) the stimuli themselves were so numerous as to be be normally distributed (in which case individual stimuli wouldn’t matter much… I take this to be your piranha principle) or (b) humans themselves could perceive the imbalance and instantiate countervailing change. (If the ground shifts under you, you shift your weight to maintain balance.) What we observe out in the world is a combination of large measures of both (a) and (b), since otherwise we’d all be an inconsistent stew of characteristics.

  4. I think the butterfly effect is more of a description of the system/nature/true-data-generating-mechanism, whereas the piranha argument is more of a requirement of the statistical model being used in order to give a more robust/reliable/replicable finding. Can one really do statistics in a truly butterfly world?

  5. I’ve always though the Butterfly Effect is hooey. It seems based on a version of the ‘Eve’ idea that there was an original ‘woman’ from which we all descend and if she didn’t exist then all humanity would not exist. Another is the Hitler story: if Hitler had succeeded as a painter or been killed in WWI, then … Both of these miss the point that they’re premised on counter-factuals which did not happen phrased as though they did happen except they didn’t, as though you could alter one piece that actually happened and leave the rest and something huge would happen differently. Yep, maybe that person killed by a terrorist or drunk driver might have cured cancer! And we all live happily ever after.

    This stuff really bothers me mathematically because if there is one thing that math teaches us it’s that things are complicated, that they combine in vast numbers of ways, that some of these ways are relatively stable, that some are useful as analytical tools and a few are actual constants. The difficulty in discovering what is constant is evident in string theory’s inability to connect the complex to physically measurable existence. And by complex, I mean complex numbers because all of these reductions in fact involve reducing complexity to some area or point that we can treat as real in some meaningful perspective. That last sentence includes the idea that they remain complex in other perspectives, and yet they have no mechanism by which they restrict that complexity other than what often reads to me like a recitation of the old song ‘dem bones’ where the knee bone’s connected to the hip bone and the hip bone’s connected to the … except that works with an actual freaking skeleton as a model so you can point at the connections. That gets extremely complex when you add dimensions like one of them ‘dem bones’ got metastasized cancer or there’s calcium deficiency or the marrow ain’t working right.

    I can’t remember if it was Feynman who said something on the order of it’s unappreciated how many ways you can combine basics like Pi, etc. to come up with things that don’t mean anything at all other than that they combine. To me, that’s like saying here’s a big compound number and you tell me how many ways you can get to that number. Or one related to that number either by the process you used – or didn’t use – and thus all the numbers all those choices open up.

    This comes up a lot with primes. A simple example is cryptography, which isn’t simple, but which is because the idea is to specify a single address or series of those which can’t be duplicated in some other way. That’s one reason Fermat’s speculation is so important: if you can’t make anything but squares add up perfectly, then there is no backdoor. In physics, this translates into a lack of hidden variables that explain non-classical probability results. But mostly by primes I mean that not many people – certainly not many social science researchers – have a clue that prime has meaning beyond being divisible only by itself and 1. They don’t even grasp the concept of coprime, even though that’s a basic abstraction of the concept so prime applies to numbers that aren’t prime. One way of phrasing what they do is they come up with compound answers that they think are prime, meaning they come up ways to get from A to Z or (o to z if you’re L. Frank Baum) as though those answers are generally prime when they are only unstably prime, meaning ‘unique’ or ‘specifically identifiable’ only when you identify the uniqueness. They misunderstand that the answer itself is compound and that at best they are identifying one combinatorial or factorial method for reaching a compound answer that includes an indefinite number of other methods.

    • “if there is one thing that math teaches us it’s that things are complicated, that they combine in vast numbers of ways, that some of these ways are relatively stable, that some are useful as analytical tools and a few are actual constants”

      I see this as something that math *ought* to teach us, but that math classes often do not. It is something that I have tried to teach my students, but it is difficult to teach, since so many of them come in with the preset notion that math is about memorizing and learning algorithms and facts. This is why I volunteered to teach prospective math teachers — because they can’t teach real math if they have the memorize-facts-learn-algorithms mindset.

      • Quote from above: “This is why I volunteered to teach prospective math teachers — because they can’t teach real math if they have the memorize-facts-learn-algorithms mindset.”

        This made me think of several things i recently read in a Bruce Lee book called “Striking thoughts: Wisdom for daily living”. Here are 2 of them:

        1) “The truth is outside of all set patterns. – Conditioning is to limit a person within the framework of a particular system. All fixed set patterns are incapable of adaptability or pliability. The truth is outside of all fixed patterns.”

        2) “Be flexible to change with change. – Be flexible so you can change with change. Empty yourself! Open up! After all, the usefulness of a cup is in its emptiness.”

  6. It seemed to me that real point of the butterfly effect was that it is futile to assume that you can always find the singular cause.

    But the piranha principle to me isn’t that the piranhas will all eat each other, which is to say that the causal effects of tiny changes with undo the causal effects of all the other. The piranha principle seems to me to be that since you can’t truly identify the tiny cause that leads one piranha to swim this way and attack there, you have to look at the rules describing how the whole school of fish behaves.

    The notion that causality necessarily implies predictable seems to me still pretty common. Talking about the butterfly effect seems to me to be still necessary.

    The notion that causal explanation has to be reductonist, that you have to talk about the rules governing the behavior of whatever your analysis reduces the phenomenon to, and that you have to be able to predict how it does this, seems to me to be somewhat outmoded.

    • Steven:

      I think one of the problems is that, in traditional statistical education (including the education that I received, and including much of the perspective of our books), you can and indeed should estimate causal effects in a completely atheoretical way, with no model of the underlying process. I’ve been coming to think that this “reduced form” approach is a dead end.

      • +1/2 — there remains the problem that models of the underlying process can be way off in fantasy land — yet people believe in them and “torture the data” to purportedly support their model.

        • I recall someone saying “Statisticians are like ageing rock stars: they get far too attached to their models.”

        • It’s not just statisticians — it’s often non statisticians who use statistical models in a rote way (or “that’s the way we’ve always done in our field”) rather than considering the individual circumstances of the study.

  7. It’s interesting to look at the history of the ‘butterfly effect’ metaphor. Lorenz only presented it at an obscure symposium and it was not picked up again (I think) until Gleick dug it up for ‘Chaos’. I recently wrote about it as one of the 2 most mis-intepreted analogies in science: http://metaphorhacker.net/2018/11/cats-and-butterflies-2-misunderstood-analogies-in-scientistic-discourse. (The other one is Schoedinger’s cat.)

    It is meant to be the metaphor for sensitivity to initial conditions but as Lorenz himself points out, it is actually a metaphor for measurement error (something that should be near and dear to this blog). Because in complex systems, we don’t really have initial conditions. Only a point of measurement and a point of the predicted state. In fact, even seemingly perfectly linear systems are so only because we picked scales (time and space) on which we can measure them well enough on both ends. Zoom in or out a bit and it’s chaos again.

    But the bottom line is that any time anybody mentions the butterfly effect, you can pretty much guarantee that they don’t understand complexity. What often follows is the assumption that because they identified a flap, they can now trace the path directly to the hurricane.

  8. These methods [of Pearl et al.] often seem to me to promise a sort of causal discovery that cannot be realistically delivered and in which in many cases I don’t even think makes sense

    This is what was bothering me about Pearl’s claims. Yes, his methods can be useful to make clear a model’s causal assumptions, but how far does that get us? How intricate a causal web can we usefully analyzed? It is often devilishly difficult to run a single regression and so each link in the causal web is only vaguely understood, and errors propagate through the system.

    By contrast, Claude Shannon’s early work on analyzing how a network of circuits behaves was extremely helpful because each circuit is very reliable, so we can be confident that an analysis of a very complicated network of circuits will be accurate.

    • Terry:

      I see Pearl as having two messages, one of which I strongly agree with and one of which I strongly disagree with.

      The message that I agree with is the statement that causal reasoning requires assumptions external to the data, and that these assumptions can be clarified using graphs of networks of variables.

      The message that I disagree with is the statement that causal relationships can be discovered using automated analysis of observational data.

      It seems to me that Pearl, along with his collaborators and followers, go back and forth between these two messages, which oddly enough seem to me to be in contradiction!

      • > The message that I disagree with is the statement that causal relationships can be discovered using automated analysis of observational data.

        Is that really his message? I think the point is that causal effects can be estimated from the analysis of observational data, given the causal assumptions you mention before and under some conditions.

        • Carlos:

          I’m never really clear on what Pearl’s message is, but it is my impression that he is sympathetic to methods that purport to discover causal relationships from observational data alone.

        • It’s possible to do both effect estimation and causal discovery from observational data under certain causal assumptions. I don’t think this is controversial, it’s just math. What could be controversial is how often (if ever) the required causal assumptions hold (or come close enough to holding) in realistic settings. Another thing that can be controversial is the utility of defining counterfactuals based on ill-defined interventions in a causal model. I don’t see what either of these potential shortcomings of Pearl’s causal inference framework have to do with the butterfly effect.

        • Z:

          The connection between Pearl’s assumptions and the butterfly effect is that I’ve seen attempts to extract causal inference from observational data using complex causal structures that I don’t believe. Stories such as the indirect effect of X on Y often seem to have a butterfly-effect flavor.

          An example would be that claim of the effects of subliminal smiley faces on attitudes toward immigration. This particular model was not fit using a Pearl-derived method, but I have a feeling that the methods he favors would give similar results.

        • Yeah mediation is subtle, and I don’t want to fully defend Pearl’s exact approach to it. In particular, I have some qualms with the assumptions required to estimate Pearl’s “Natural Direct Effect”. However, I do think that mediation questions can often be well defined in a counterfactual framework (e.g. I think I’ve seen you speak approvingly of Sobel’s framing on the blog), and causal graphs can be extremely helpful for encoding and evaluating assumptions required for estimating those effects. Jamie Robins explains how to do this. So I would say that causal inference employing causal graphs as an aid (which is what I think we’re talking about when we talk about Pearl) could actually be a safeguard against the type of analysis in the link.

          I also think that causal graphs can help *prevent* butterfly analyses. Often in discussions of causality in the blog comments some people will say that it’s necessary to model an entire process involving many variables to get at some effect of interest (which is real butterfly territory). Causal graphs make clear which (usually proper) subset of the variables in the system actually need to be modeled to get at an effect of interest.

  9. Robust reduced-form findings are the foundation of many disciplines. Maybe the study subject of social science is too complex for social science to be truly scientific right now.

  10. “if you just push certain buttons in a complex social system that you can get predictable results”

    In the cases in which you do get the results you predict, you often get additional results you didn’t predict (The Law of Unintended Consequences).

    Some examples:
    http://www.aei.org/publication/ten-examples-of-the-law-of-unintended-consequences/
    http://www.aei.org/publication/five-examples-of-the-law-of-unintended-consequences/

  11. Can you explain the piranha thing? I read the linked post and I’m still not sure I’m getting it.

    For example, there are thousands of easily identifiable causes that have extremely large and predictable effects on human mortality: being in a high velocity crash, eating cyanide, eating botulinum, being shot in the chest, exposure to high dose radiation… I could easily go on all day with these. This does not appear to present either an epistemological problem or a conceptual one.

    Why can’t the same be true of social behavior? It may be challenging to estimate effects in observational data if there are thousands of large causes of, say, voting behavior, but I would think we can solve that with sufficient sample sizes and appropriately designed work.

    It’s trivially true that the there can’t be a large number of factors that explain a large proportion of the ultimate variance in whatever outcome, but what’s the problem with a large number of factors the independent and partial effect of which is large?

    • Joe:

      Good question.

      The difference is that the examples you give are rare, and they act immediately. The social science analogy would be something like this: being exposed to words relating to elderly people, having an older sibling, having a younger sibling, being in a certain part of the monthly cycle, having your local football team win or lose, being exposed to someone in an expansive or contractive posture, receiving implicitly racist signals, having an age ending in 9, riding in coach class on an airplane, hearing about a local hurricane, etc.: these are common, not rare, and they can all potentially interact.

      To put it another way, your mortality example is like a set of piranhas that are each in their own tank, whereas “Psychological Science” or “PNAS”-style psychology research is like a set of piranhas that are all in a tank together.

    • IMO social behavior also just has a MUCH larger number of competing influences operating at similar magnitudes where the physical worlds effects are spread over a large range of magnitudes so that at any given magnitude the number of competing factors is small. So, say two speeding vehicles collide. There are also sand grains blowing in the wind but the impact of sand grain on colliding cars is very very small. OTOH, put twenty people in a room and measure the impact if two of them get into a loud heated argument – the impact on arguers is probably larger than for bystanders, but roughly of the same magnitude, and there are many interactions.

  12. The “piranha effect” is basically the same as the criticism of instrumental variables. People will use the same instrument to model a dozen phenomena, but that kind of invalidates the instruments because some of these derivative phenomena will be dependent on each other.

  13. “In brief: I doubt that the claims deriving from such data analyses will replicate in new experiments, but I have no doubt that anything that doesn’t replicate will be explained as the results of additional butterflies in the system. What I’d really like is for researchers to just jump to the post-hoc explanation stage before even gathering those new validation data. The threat of replication should be enough to motivate people to back off of some of their extreme claims.”

    1) How does this work with large data sets?

    For instance, i am worried proposed large scale “collaborative” projects will result in “data-dredging”/”mining for significant reuslts” of which the conclusions will not easily be in the position to even be possibly replicated and/or corrected in the future (due to the extremely large N for instance). Also see http://datacolada.org/73

    2) Is (or should there be) some sort of statistical correction when analyzing pre-existing large data sets?

    If i am not mistaken, even randomly produced large data sets may contain many spurious statistically significant correlations. I can imagine many spurious significant correlations, and other type of statistical findings, are present in actually collected large data sets.

    When you would have, let’s say, 10 hypotheses with associated statistical tests and then collect the large data set, you would probably make an adjustment for the p-values. But i’m not sure something like this happens, or is even possible, for later analysis of a pre-existing large data set.

    Also, if i understood thing correctly, a p-value of a certain statistical analysis is only “valid” if you decided upfront what the statistical analysis will be before you gather the data. This also makes it easier to then make an adjustment for the p-values in the case of multiple analyses. If this is correct, i wonder how this all relates to analyzing pre-existing large data sets. I wonder if the asummptions of p-values and/or the statistical analysis could actually make p-values coming from analyses of large pre-existing data sets technically “invalid” (e.g. because you are not able to correct for multiple comparisons).

    In other words, if HARKING and p-hacking is “wrong” in little N = 52 studies, why is it okay to look for statistically significant findings of the many possible hypotheses cocerning large pre-existing data sets?

  14. Quote from above: “4. Ideally, in the original published paper, team A could list all the conditions under which they are claiming their result will appear. That is, they could anticipate step 2 and jump right to step 3, saving us all a lot of time and effort.”

    I think this sort of thing is being proposed by a recent paper called “constraints on generality” https://www.ncbi.nlm.nih.gov/pubmed/28853993

    I never understood, or agreed with, the paper though. Nor do i understand, or agree with, the quote above.

    I reason, either you investigate the actual different conditions in your studies that you in turn mention in your paper, or you just stick to what you can conclude based on the study you performed.

    I reason, either you are writing some sort of review paper that consists of presenting actual evidence about the possible conditions under which a certain effect appears, or you just stick to writing about your specific study.

    I reason trying to write about all the possible conditions under which a certain effect could or could not appear will probably actually boil down to guessing, and/or story telling: that’s not scientific in my reasoning and understanding. I reason it’s basically more of the same kind of BS-“theorizing” and “reasoning” that can already be read in the introduction and conclusion section of many psychology papers, and is in my reasoning a total waste of time, and energy.

  15. “The message that I disagree with is the statement that causal relationships can be discovered using automated analysis of observational data.”

    Wish I could understand this. When my “check engine” light comes on, I hook up the code reader, and it tells me what the electronic diagnosis circuitry read. Then I (or my mechanic) fix it based upon what caused the anomalous performance. What am I missing? How deep into epistemology do I need to go for this to not make sense?

    • Matt:

      It depends on the context.With your car engine light, there’s a lot of theory and engineering going into the system, and we understand the connection between the engine trouble and the light going on. The analogy in social science would be a clean randomized experiment, in which the design and data collection give us identification.

      In observational data in social science, there is generally no such clear pattern. To use your analogy, someone might observe the engine in car A, the light in car B, and the circuitry in car C. No amount of analysis of such observational data would tell us much about causal relationships here.

    • The check engine light is not observational data. It’s a system designed specifically to detect and diagnose issues. It’s subject to a huge quantity of design and testing to ensure that it does its job.

      Observational data would be something like collecting the tire wear pattern, paint oxidation, zip code of owner, owner race, and owner educational attainment data of cars brought in to have their transmissions fixed and using some structural assumptions about people’s decision making skills and this observational data inferring something about the causal effects of culture, education, and income on maintenance behavior and its resulting impact on longevity.

    • I would say the computer in your car is not discovering causal relationships automatically in anything near the sense that Andrew means. It is not trying to discover how cars in general work from the data it gathers. Instead, it just filters data through a calibrated theoretical model of how cars work.

  16. I’m not sure if the following story has a point or is relevant but…

    I still remember one of our mathematical physiology group meetings from when I was a PhD student.

    One of the students was presenting work on a model of cellular calcium dynamics (nonlinear ODEs with fast-slow dynamics etc). Our advisor stopped him part way through an explanation of some phenomena in the model to say something to the effect of: ‘you can’t explain this well by saying this goes up, so this goes down, so this goes down, so this goes up etc etc’.

    I took his point to be that we can write down equations governing the proposed underlying mechanisms for something like calcium oscillations in a cell (including many feedbacks etc) but it’s pointless to try to interpret/trace *each* of the individual ‘causes’ and ‘effects’ in such a system.

    The pathways are so interconnected that it quickly becomes meaningless to talk in such a way – whereof one cannot speak, thereof one must be silent?

    And yet we can still make progress by writing down systems of nonlinear ODEs and analysing their ‘emergent’ behaviour. So…shrug?

  17. The piranha effect gives words to my unease with the “priming” literature. My existence exposes me to all kinds of primes that interfere with each other (including many that have not been studied and are thus unknown), and what is their cumulative effect?

  18. I had a discussion with a colleague who works in hydrology a while back about methods. He’s been working on causal discovery and my initial response was “people try to do that”? I figured Gelman would have a discussion on this topic and I was right!

    I’m trained as an engineer but mostly use social science in my research. My sense is my colleague is working from a Pearl perspective, which I agree with Andrew (graphical approach is nice, but I don’t buy causal discovery in observational settings). Perhaps, hydrology is closer to the car example than my work on travel behavior, thus causal discovery is more plausible. In applied economic research, the causal identification question is really more of a consensus within the community that, for example, an instrumental variable is plausible rather than a concrete provable fact. Even hydrology seems too dynamic and uncontrolled to me to believe an “automatic” causal discovery argument. I emphasis “automatic” because testing a hypothesis about causation is always a form of discovery, but it isn’t drawn from the data through automatic processes.

Leave a Reply

Your email address will not be published. Required fields are marked *