Skip to content
 

More on the piranha problem, the butterfly effect, unintended consequences, and the push-a-button, take-a-pill model of science

The other day we had some interesting discussion that I’d like to share.

I started by contrasting the butterfly effect—the idea that a small, seemingly trivial, intervention at place A can potentially have a large, unpredictable effect at place B—with the “PNAS” or “Psychological Science” view of the world, in which small, seemingly trivial, intervention at place A can have large, consistent, and predictable effects at place B. My point was that the “butterfly effect” and what might be called “PNAS hypotheses” seem superficially to be similar but are actually much different.

Related to this is that butterfly effects are, presumably, not just inconsistent; it’s also that any particular butterfly effect will be rare. As John Cook puts it:

A butterfly flapping its wings usually has no effect, even in sensitive or chaotic systems. You might even say especially in sensitive or chaotic systems. . . . The lesson that many people draw from their first exposure to complex systems is that there are high leverage points, if only you can find them and manipulate them. They want to insert a butterfly to at just the right time and place to bring about a desired outcome. Instead, we should humbly evaluate to what extent it is possible to steer complex systems at all.

I then connected this to the “piranha principle,” that there can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see any consistent effects of anything in observational data. As Kaiser puts it, “The piranha effect gives words to my unease with the ‘priming’ literature. My existence exposes me to all kinds of primes that interfere with each other (including many that have not been studied and are thus unknown), and what is their cumulative effect?”

From here we had some comments.

Zbicyclist connects to the law of unintended consequences. I’ve on occasion expressed skepticism about this so-called law (see also here and, more recently, here). But, sure, unpredictability of outcomes is real.

Joe writes:

Can you explain the piranha thing? I read the linked post and I’m still not sure I’m getting it.

For example, there are thousands of easily identifiable causes that have extremely large and predictable effects on human mortality: being in a high velocity crash, eating cyanide, eating botulinum, being shot in the chest, exposure to high dose radiation… I could easily go on all day with these. This does not appear to present either an epistemological problem or a conceptual one.

Why can’t the same be true of social behavior? It may be challenging to estimate effects in observational data if there are thousands of large causes of, say, voting behavior, but I would think we can solve that with sufficient sample sizes and appropriately designed work.

It’s trivially true that the there can’t be a large number of factors that explain a large proportion of the ultimate variance in whatever outcome, but what’s the problem with a large number of factors the independent and partial effect of which is large?

To which I respond:

Good question.

The difference is that the examples you give are rare, and they act immediately. The social science analogy would be something like this: being exposed to words relating to elderly people, having an older sibling, having a younger sibling, being in a certain part of the monthly cycle, having your local football team win or lose, being exposed to someone in an expansive or contractive posture, receiving implicitly racist signals, having an age ending in 9, riding in coach class on an airplane, hearing about a local hurricane, etc.: these are common, not rare, and they can all potentially interact.

To put it another way, your mortality example is like a set of piranhas that are each in their own tank, whereas “Psychological Science” or “PNAS”-style psychology research is like a set of piranhas that are all in a tank together.

And Jim continues:

IMO social behavior also just has a MUCH larger number of competing influences operating at similar magnitudes where the physical worlds effects are spread over a large range of magnitudes so that at any given magnitude the number of competing factors is small. So, say two speeding vehicles collide. There are also sand grains blowing in the wind but the impact of sand grain on colliding cars is very very small. OTOH, put twenty people in a room and measure the impact if two of them get into a loud heated argument – the impact on arguers is probably larger than for bystanders, but roughly of the same magnitude, and there are many interactions.

Relatedly, Matt asks:

“The message that I disagree with is the statement that causal relationships can be discovered using automated analysis of observational data.”

Wish I could understand this. When my “check engine” light comes on, I hook up the code reader, and it tells me what the electronic diagnosis circuitry read. Then I (or my mechanic) fix it based upon what caused the anomalous performance. What am I missing? How deep into epistemology do I need to go for this to not make sense?

To which I respond:

It depends on the context.With your car engine light, there’s a lot of theory and engineering going into the system, and we understand the connection between the engine trouble and the light going on. The analogy in social science would be a clean randomized experiment, in which the design and data collection give us identification.

In observational data in social science, there is generally no such clear pattern. To use your analogy, someone might observe the engine in car A, the light in car B, and the circuitry in car C. No amount of analysis of such observational data would tell us much about causal relationships here.

And Daniel continues:

The check engine light is not observational data. It’s a system designed specifically to detect and diagnose issues. It’s subject to a huge quantity of design and testing to ensure that it does its job.

Observational data would be something like collecting the tire wear pattern, paint oxidation, zip code of owner, owner race, and owner educational attainment data of cars brought in to have their transmissions fixed and using some structural assumptions about people’s decision making skills and this observational data inferring something about the causal effects of culture, education, and income on maintenance behavior and its resulting impact on longevity.

And Dzhaughn adds:

I would say the computer in your car is not discovering causal relationships automatically in anything near the sense that Andrew means. It is not trying to discover how cars in general work from the data it gathers. Instead, it just filters data through a calibrated theoretical model of how cars work.

Anyway, I hope the above explanations are helpful for some readers out there.

In a world where things like this get published in top journals, followed by major media exposure, I think it’s important that we understand the systemic problem with these sorts of “Psychological Science” or “PNAS”-style claims, rather than just playing whack-a-mole when each one comes along.

To put it another way, there are two things that work together to keep cargo-cult science alive. First, there are the misunderstandings of statistical methods (researcher degrees of freedom, forking paths, problems with null hypothesis testing, etc.) which give researchers the tools to routinely make inappropriately certain claims from noisy data. Second, there are the theoretical misunderstandings by which researchers think that that we live in a world of populated by huge effects of priming, as if our every step is buffeted by mysterious forces that are beyond our conscious understanding yet can be easily manipulated by psychologists.

We (the statistics profession and quantitative social researchers) have spent a lot of time addressing that first misconception, but maybe not enough time addressing the second.

P.S. I still want to do that statistical theory research project where we model the piranha problem and prove (given some assumptions) that it’s unlikely to see many large effects in a single system.

48 Comments

  1. Anoneuoid says:

    When my “check engine” light comes on, I hook up the code reader, and it tells me what the electronic diagnosis circuitry read. Then I (or my mechanic) fix it based upon what caused the anomalous performance. What am I missing? How deep into epistemology do I need to go for this to not make sense?

    Reminds me of:

    Yet, we know with near certainty that an engineer, or even a trained repairman could fix the radio. What makes the difference? I think it is the languages that these two groups use (Figure 3). Biologists summarize their results with the help of all-too-well recognizable diagrams, in which a favorite protein is placed in the middle and connected to everything else with two-way arrows. Even if a diagram makes overall sense (Figure 3A), it is usually useless for a quantitative analysis, which limits its predictive or investigative value to a very narrow range. The language used by biologists for verbal communications is not better and is not unlike that used by stock market analysts. Both are vague (e.g., “a balance between pro- and antiapoptotic Bcl-2 proteins appears to control the cell viability, and seems to correlate in the long term with the ability to form tumors”) and avoid clear predictions.

    These description and communication tools are in a glaring contrast with the language that has been used by engineers (compare Figures 3A and 3B). Because the language (Figure 3B) is standard (the elements and their connections are described according to invariable rules), any engineer trained in electronics would unambiguously understand a diagram describing the radio or any other electronic device. As a consequence, engineers can discuss the radio using terms that are understood unambiguously by the parties involved. Moreover, the commonality of the language allows engineers to identify familiar patterns or modules (a trigger, an amplifier, etc.) in a diagram of an unfamiliar device. Because the language is quantitative (a description of the radio includes the key parameters of each component, such as the capacity of a capacitor, and not necessarily its color, shape, or size), it is suitable for a quantitative analysis, including modeling.

    […]

    In biology, we use several arguments to convince ourselves that problems that require calculus can be solved with arithmetic if one tries hard enough and does another series of experiments.

    One of these arguments postulates that the cell is too complex to use engineering approaches. I disagree with this argument for two reasons. First, the radio analogy suggests that an approach that is inefficient in analyzing a simple system is unlikely to be more useful if the system is more complex. Second, the complexity is a term that is inversely related to the degree of understanding. Indeed, the insides of even my simple radio would overwhelm an average biologist (this notion has been proven experimentally), but would be an open book to an engineer. The engineers seem to be undeterred by the complexity of the problems they face and solve them by systematically applying formal approaches that take advantage of the ever-expanding computer power.

    https://www.cell.com/cancer-cell/fulltext/S1535-6108(02)00133-2

    • Anonymous says:

      “Moreover, the commonality of the language allows engineers to identify familiar patterns or modules (a trigger, an amplifier, etc.) in a diagram of an unfamiliar device.”

      Yes, this is very close to the idea of the “check engine” light I was trying to express. Observational data is a sort of ground truth, and we lose touch of that at our peril. Observational data, for me at least, is any data measured from the wide world outside the lab. It sits at the top of the knowledge hierarchy, which works like this:

      1. We develop a hypothetical model using inductive and deductive reasoning. An untested model can make a Type 1 prediction.

      2. We use deductive reasoning to develop and perform experiments to validate our model, hewing as close to the reality we wish to model, but also attempting to account for the idiosyncrasies of our experimental design. If our experiment produces results which conflict with what the model predicted, we tweak the model (not the experimental results). Using our experimental results, we can make a Type 2 prediction about performance. And now we can use our model to make Type 2 predictions about other behavior.

      3. We go out and measure the phenomenon of interest in the real world. This is our “observational data.” We use our observational data to tweak our experimental set-up to better reproduce it, and we also tweak our model. We do not adjust the observational data to match the experimental results or the model. Now we can make a Type 3 prediction about future performance. We can use our now fully validated model to make Type 3 predictions about other behavior.

      The “Type” predictions serve solely to show that the strength of the prediction increases, even if we cannot measure how much. This shows the importance of observational data.

      The “check engine” light is a warning to the driver that the automated detection system has used observational data to determine that a significant fault exists that requires the driver’s attention.

      Dzaughn wrote:

      “I would say the computer in your car is not discovering causal relationships automatically in anything near the sense that Andrew means. It is not trying to discover how cars in general work from the data it gathers. Instead, it just filters data through a calibrated theoretical model of how cars work.”

      The check engine light alerts the driver to go to a mechanic. The mechanical code reader tells the human mechanic the source of the fault. The mechanic tests the part under suspicion and discovers that it is bad. The mechanic replaces the part and tests to verify the fault is fixed. She then runs the car for 30 minutes and verifies that the check engine light does not trip.

      We have used purely observational data to determine the root cause of the fault. To me it just seems silly in this type of situation to say “well we can’t REALLY know why that light tripped.”

  2. Tom Passin says:

    When Andrew talks about this effect, he writes things like “that there can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see any consistent effects of anything in observational data”. This always makes me think of the interference of waves, like light waves.

    But illumination with a lot of unsynchronized waves can result in a general illumination level, like a room lit by a lot of light bulbs. That occurs when the contribution of each wave is rectified before being detected. It strikes me that these putative stimuli (like priming) might often act in a rectified way. If so, the effects of a lot of such inputs would be to raise the general stimulation level of a person. Sensory overload from too many sound sources would be an example.

    So the effects of a large number of these things could be to raise the background stimulus level of a person, which would entail a higher stress level. This sounds like it could be detected and measured.

  3. Jonathan says:

    You could use light to model the piranha problem: if you treat the surface as having depth, which is the correct way, then strong effects band into colors, something Zeke Newton showed. If you have a lot of effects, you end up with white light, like that of a bowl of sugar where individual grains appear relatively colorless but the whole is white. Running this through a multi-layer model with competing, adjustable effects levels matches pretty well.

    I like to think of the wing flapping thing as the odds that some tiny point of barely visible light keeps approaching until you realize it’s a train. In other words, sure: everything is a wing flap at some level but this doesn’t magically transform into a material effect because it has to pass through all the intermediate stages. I think they confuse how things grow in complexity with how many threads can be followed back into complexity. As in, great big wave is made of particles and each of those has its story but if you pick this one right here and write the story of the wave from its perspective then you can easily assign an exaggerated importance. This is a limitation even if you’re using spherical cows or other idealizations because they group, attach, and thread differently unless you impose increasingly large constraints on interactions. That’s impossible: every threading entangles with other threadings. It doesn’t take much math to see that a simple base2 or binary expansion translated into regular counting numbers rapidly generates massively large threading potential. And that’s without considering the inherent magnification, meaning these expansions contain and are contained, so any point appears in multiple counting threads.

    BTW, I’m not entirely convinced piranha is the best name: they eat other in two circumstances, but mostly when there’s not enough room or food. They don’t eat each other until one survives.

  4. Radford Neal says:

    Quote from John Cook: A butterfly flapping its wings usually has no effect, even in sensitive or chaotic systems.

    I think this is incorrect. In one world, a butterfly flaps its wings. In another counterfactual world, everything is identical except that the butterfly’s wings stay still. If you look at the weather exactly ten years after this, I think the locations of storms will be quite different in the two worlds.

    Of course, the overall frequency of storms is very likely to be quite similar, but not their timing and locations.

    Obviously, one can’t test this experimentally, but I think it would happen in a simulation (which would obviously need to be at a higher resolution than anything currently feasible, else the butterfly wouldn’t even appear).

    • This is exactly the butterfly effect. *Every* little thing produces perturbations whose effect causes the system to diverge exponentially quickly from an alternate path (according to the dominant Lyapunov exponent of the system).

      There’s a great little video on the wikipedia site demonstrating how this works. Notice how everything looks pretty similar through about 20 seconds, and by 40 seconds everything has diverged completely…

      https://en.wikipedia.org/wiki/File:Double_pendulum_simultaneous_realisations.ogv

      Of course, with energy being sucked out of the system by friction, we know that eventually it will return to its stable state. That’s not true for most systems because there’s no one single stable state.

      I think the point is, at any given time and place, if a butterfly flaps its wings, there is *zero* chance to predict what the overall effect of that one action will have been viewed at a sufficiently later time. So we can’t talk about “the effect” because for sufficiently long time, we know no more about what will happen than we know about a cryptographic random number generator.

      • jim says:

        “there is *zero* chance to predict what the overall effect of that one action will have been”

        I must be misunderstanding something, because the above statement seems wrong.

        There is, in fact, a 100% chance that, viewed at a later time, there will be zero effect. Your pendulums are analogous to the air wave initiated by each flap of a butterfly’s wing. Sure, the waves don’t all propagate exactly the same, but the outcome is the same for all of them: the entire effect is entirely dissipated, and the system returns to a state in which it’s impossible to tell that the flap ever occurred.

        Is the system irrevocably changed by the flaps? Sure, but those changes are so small and irrelevant that they can’t possibly impact the future of the system. They amount roughly to rearranging the random arrangement of air molecules to a new state of randomness.

        • >Is the system irrevocably changed by the flaps? Sure, but those changes are so small and irrelevant that they can’t possibly impact the future of the system. They amount roughly to rearranging the random arrangement of air molecules to a new state of randomness.

          The butterfly flaps are analogous to the tiny differences in initial position and velocity of the pendulums. you claim these are small enough to be irrelevant, yet at 40 seconds are all the pendulums doing essentially the same thing? No. Macroscale variables like position and velocity of the pendulum angles are dramatically different. we aren’t just talking about vibrational modes of individual molecules in the bearings

          • jim says:

            “yet at 40 seconds are all the pendulums doing essentially the same thing? No.”

            Yet at 5 minutes you can’t tell than any of the pendulums ever moved at all. Right? After they stop moving, there is no evidence whatsoever in the system that they ever moved. The exact position of every particle in the system is different, but the system itself is the same because the distribution of the particles is of the same form.

            So in the end, you can predict with absolute certainty that very small actions will have no impact whatsoever.

            Seems like you also have to specify in what dimension you’re measuring impact. There are both distance and time elements of the system, right? At some time T, you can’t sense that the pendulums have moved. At some distance X, you can’t sense that the pendulums have moved.

            The idea of time as a variable is what people are actually measuring when they try to determine if the intervention “sticks”. For example, with “quality preschool” yes it impacts the child’s performance in first grade, but what about lifetime degree of attainment? After all our objective isn’t to get smarter first graders, it’s to get smarter adults. So on what scale does the effect dissipate?

            One interesting property of the pendulums is that their behavior diverges, the divergence peaks, then converges as the pendulums come to rest.

            • Chris Wilson says:

              Yes, but your argument requires a rather strong form of ergodicity. In the non-ergodic world – good luck!

            • The convergence is because there’s basically 1 state of the system that is a stable equilibrium. The system is 2 dimensional (two angles define the position). In most real world systems the dimensionality is very high, there is no single stable fixed point, and to the extent that it can settle into one of many many stable states at large time, tiny perturbations cause those states to be vastly different.

              • jim says:

                “The convergence is because there’s basically 1 state of the system “

                I disagree. The convergence is because the perturbation is small relative to the system and thus doomed to end in the same arrangement of molecules in which it began.

                If the dimensionality were higher, the disturbance would dampen faster. Imagine dropping a pebble in a pond. If the energy of the disturbance propagates only along a line in two directions, the wave will propagate much further than if the energy were allowed to spread out in all directions.

                Fill a ten gallon tub with rice. Push your hand through the rice until it touches the bottom of the tub, then pull it out. Is the arrangement of the rice grains meaningfully different? No, right? The grains are in different places. But so what? They’re all the same.

              • > I disagree. The convergence is because the perturbation is small relative to the system and thus doomed to end in the same arrangement of molecules in which it began.

                No, it converges to pointing straight down because this is its only stable equilibrium state. You can prove this mathematically by doing a perturbation analysis around each of the two equilibrium points you find by writing down the equations of motion and solving them for where d/dt = 0. There is one equilibrium point where both coordinates point “up” ie. the pendulum is “standing on end” and one equilibrium point where both coordinates point down … hanging… the first equilibrium point is unstable in that any perturbation from the upper equilibrium makes the machine move farther from that point, and the “hanging” equilibrium is stable, any small perturbation from that point causes the system to move back towards the hanging point.

                If you constantly remove kinetic energy from the system through mild friction, such as air resistance, it can *only* end up in a single state: hanging. I believe you can actually use the kinetic energy of the system as a Lyapunov function to prove this.

                If you provided 10 notched detents in the bearing where it could get locally “stuck” you would find that tiny perturbations to the initial conditions would dramatically alter which of the notches it got stuck into. The key would be wherever it finally fell to low enough kinetic energy that it couldn’t “climb” the local notch potential energy barrier. If you started with enough energy, the position where the friction finally removed sufficient energy to cause it to stick could be any of the notches, and small perturbations would result in dramatic changes of the final position. Depending on the notch construction, the distribution of final states wouldn’t necessarily be uniform, but the final state wouldn’t be predictable from a photograph of the initial state.

                An alternative is the “Galton Box” or “Bean Machine”

                https://commons.wikimedia.org/w/index.php?title=File%3AGalton_box.webm

                If you drop a single grain into the top of such a machine from a high enough height you will not be able to reliably predict which slot it will eventually wind up in, because small perturbations to its initial conditions will result in large variations in the path it takes. (if you slide a grain in from the right while tilting the box flattish so the friction of the grain sliding was high, you could probably make it reliably slowly slide always to the same place).

                Nevertheless, we can see that the range of x coordinates is bounded, and the probability of winding up in a given location has a pretty well described distribution.

                In the rice grains system the coordinate you observe “how high the surface of the rice is” is not a chaotic dynamical system for the most part. However, the positions of the grains *are* chaotic and dynamical under sufficient perturbation. When you choose to ignore the fact that the grains undergo a lot of complex motions it doesn’t alter the fact that they do in fact undergo those motions.

                If you look at the orbits of the planets in our solar system for a few years or even decades, it may seem that they are non-chaotic, but if you integrate this system out to billions of years it seems very likely you find that perturbed orbits diverge exponentially. Here is some early work on that from Jack Wisdom at MIT: http://web.mit.edu/wisdom/www/pluto-chaos.pdf

            • Carlos Ungil says:

              > After all our objective isn’t to get smarter first graders, it’s to get smarter adults. So on what scale does the effect dissipate?

              Very much as in the pendulum example: when they stop moving.

            • > The idea of time as a variable is what people are actually measuring when they try to determine if the intervention “sticks”. For example, with “quality preschool” yes it impacts the child’s performance in first grade, but what about lifetime degree of attainment? After all our objective isn’t to get smarter first graders, it’s to get smarter adults. So on what scale does the effect dissipate?

              Not all systems are necessarily chaotic dynamical systems either. Education is all about control (of the development of skills). You can take a pendulum and attach a control system to it, and it will force the double pendulum to stand upright and stable. The pendulum alone is chaotic dynamical, the pendulum with control system rapidly converges to the same stable state for most perturbations:

              https://youtu.be/JpNAhKT7yY4?t=131

              Biological systems are inherently “controlled” in many of their important dimensions, which is how they manage the constant perturbations they experience. Still, uncontrolled dimensions may be chaotic. So, your heart beat will maintain regular intervals for decades but what you choose to eat for lunch today might easily be wildly influenced by perturbations in your environment, such as smells, sights, or conversations you have with passing tourists…

          • IamNJK says:

            I think your pendulum analogy might be a bit too simple. Of course small differences in the initial conditions are able to manifest themselves as large differences later since they are otherwise relatively unperturbed and the small differences have a chance to cumulate. This analogy would work better with pendulums that are on the bed of a truck that’s driving on a gravel road during an earthquake; the question would then be if gently blowing on one of the pendulums will have a significant impact on where it will end in, say, ten minutes.

            More abstractly, of course functions 0.5*x and 0.51*x will be in dramatically different states when x is large enough, but adding 0.01 at some single time point to a Gaussian random walk will be largely inconsequential.

            • >This analogy would work better with pendulums that are on the bed of a truck that’s driving on a gravel road during an earthquake; the question would then be if gently blowing on one of the pendulums will have a significant impact on where it will end in, say, ten minutes.

              Yes, it will, absolutely. The point is that at *any* given time, an epsilon perturbation to the path will cause *exponential* increases in the divergence of the path from its original path.

              The fact that you’re constantly perturbing the system due to the gravel road doesn’t change any of this. Now, what *might* happen though is that the gravel road etc dramatically increases the rate of energy dissipation (that is, conversion of energy from bulk motion of the pendulum arms to noise, vibration, heat etc). If that’s the case, then the pendulum will tend to rapidly converge to the vicinity of its stable state, and the little perturbations will be irrelevant because the amount of mechanical energy in the system is so small it no longer displays chaotic behavior.

              The biological equivalent of this would be something like rapid extinction of an endangered species due to constant pollution or loss of habitat. It doesn’t matter whether you drop a penny into a fishtank or not if you are pouring a steady drip of cyanide into the tank, you can predict that all the fish will be dead and decayed 6 months from now (but at the microscopic scale, the number, type, position, genetics etc of the bacterial biofilms in the tank will be perturbed by that penny, you just don’t have a counterfactual to compare it to)

        • Tom Passin says:

          Here’s an example of chaotic behavior that most people can probably visualize, and it’s actually a counter-example to what @jim wrote. Imagine you are on a bridge over a river that is in turbulent flow, and you drop two leaves side by side into the water. At first the two leaves move downstream side by side. As time goes on, they move apart. One may even get trapped in an eddy near the bank of the river, while the other doesn’t.

          Much later, the river flow becomes laminar as it becomes wider and slower. Our two leaves will hardly ever end up floating side by side, even though the river’s chaotic flow has been dissipated.

          As Daniel Lakeland wrote, at the end it’s virtually impossible to predict where the two leaves will end up… except for one thing. The leaves will be constrained to be between the banks of the river. The randomness applies only to locations within those boundaries.

          • Martha (Smith) says:

            “at the end it’s virtually impossible to predict where the two leaves will end up… except for one thing. The leaves will be constrained to be between the banks of the river. The randomness applies only to locations within those boundaries.”

            Uh … What if a shaggy dog is swimming in the river and one of the leaves gets stuck in the dog’s coat and dragged out of the river when the dog climbs out? ;~)

    • Anonymous says:

      There is a lot of comment below this, but seems easier to add to the main one – maybe somebody else has said this, if so, apologies: anyway I personally find the explanation/characterisation of chaotic systems that I read once from Vladimir Arnold, and via John Baez (I’m assuming that is the path), very intuitive.

      It goes like this: any dynamic system (modelled as a symplectic manifold) has a group action that preserves volume of any (measurable) set containing the initial state – in Lypanov systems, this action preserves locality over time, but in many systems (Baez used the nice example of a tumbling irregular asteroid) what actually happens is that this set gets stretched and eventually laminated over time. The result is that your initial localised volume in the phase space can quite rapidly comes to resemble the layer of butter in a croissant, at which point arbitrarily close starting points can be arbitrarily distant. Note that ergodicity is a feature of this but not an explanation.

      Exactly the same analysis applies to any particular butterfly wing flap: because you can only model that effect as a measurable set, which will be smeared over the phase space in the same way, which is why you can’t make predictions.

      What the butterfly wing effect is about is the arbitrarily small distances between possible initial conditions, not about interventions.

  5. The butterfly effect is perhaps the most misunderstood analogy in the social sciences. Even celebrated academic superstars spout complete garbage about it. Here is my attempt to elaborate: http://metaphorhacker.net/2018/11/cats-and-butterflies-2-misunderstood-analogies-in-scientistic-discourse.

    Key quotes that elaborate on the above:
    > It is not true that a butterfly’s wings cause anything but minute variations of air right next to them. But it may be true that they are one of the infinite variations of the whole weather system that is simply impossible to measure with finite precision. It’s not that it is hard to calculate all the variations, it is that there are more variations than we have atoms to calculate them with.

    And

    > The only thing we can learn from the butterfly effect is that we cannot measure complex systems accurately enough to predict their behavior over the long term with enough precision. The big mismatch is that while the variation in ‘initial conditions’ is too small to measure, the variation in the outcomes is not. And that feels wrong.

    But perhaps the deeper problem is the overpromise of what causal inference is possible. Our typical example of causality is an engine whereas it really should be weather.

  6. Terry says:

    Joe writes:

    Can you explain the piranha thing? I read the linked post and I’m still not sure I’m getting it.…

    It’s trivially true that the there can’t be a large number of factors that explain a large proportion of the ultimate variance in whatever outcome, but what’s the problem with a large number of factors the independent and partial effect of which is large?

    To which I respond:

    The difference is that the examples you give are rare, and they act immediately. … these are common, not rare, and they can all potentially interact.

    To put it another way, your mortality example is like a set of piranhas that are each in their own tank, whereas “Psychological Science” or “PNAS”-style psychology research is like a set of piranhas that are all in a tank together

    Seems to me that Joe correctly nailed the essence of the piranha effect and correctly notes it is very straightforward. If a set of factors each have a large effect, then the factors must be rare. If there are a hundred factors that can kill a person, and if people rarely die, then the lethal factors must be correspondingly rare. Conversely, if a set of factors occurs frequently, then their effects must be small. If “poisoning” is common, then the effects of a single “poisoning” must be small. (Health-obsessed people often talk about being “poisoned” by common foods).

    Interaction is not necessary to explain the piranha effect, and, while interaction might be important in some rare cases, it sounds like interaction is wildly unimportant in most cases. The lethal factor example is a good one. Getting hit by a car has little or no interaction with being poisoned. The PNAS stuff Andrew mentions is similar. Having an age ending in 9 has little interaction with riding in coach on a plane. Further, it seems improbable that the effects of each piranha would be negatively correlated so as to cancel out.

    Interaction between piranhas just seems like a weak second-order effect compared to the first order inverse relation between frequency of a factor and magnitude of a factor’s effect put forward by Joe.

    (I don’t understand Joe’s second comment: “what’s the problem with a large number of factors the independent and partial effect of which is large?” This seems to contradict his first, insightful comment.)

    • For many people, suddenly having $5000 more in their bank account would be a big effect, but this is an effect that happens regularly for many people on the day their paycheck deposits… the issue is that throughout the month there are similar magnitude offsetting transactions like rent or mortgage, insurance payments, food costs, transportation, etc that nearly cancel out. Many people are not saving anything, or even spending more than they make, and those that are saving are often saving just a few percent of their monthly income, maybe $50 or $100 a month.

      Under these common circumstances, the “partial effect” of your paycheck is large and positive, but the “partial effect” of your rent payment or car payment is large and negative… the net effect of all your bank transactions over a month is nearly zero.

      The claim is that if each small psychological priming has a large and consistent effect then, together with the fact that people are relatively psychologically stable

      1) Either these primes are rare, and so only rarely do we have sudden dramatic changes to our psychological state

      or

      2) They could be common enough, but there are so many of them and in such different directions, that they cancel each other out so that the net effect of all of them is just small wiggles around our daily average.

      The third possibility is that they are all small effects in the first place, but this isn’t the claim of the published papers.

  7. zbicyclist says:

    I recently finished Michael Blastland’s book, The Hidden Half.

    Blastland’s point is that we tend to emphasize the variation we can account for (via genes, environment, etc.) but tend to ignore the roughly half that we can’t explain.

    Blastland is a journalist, but has co-authored with David Spiegelhalter and originated the excellent BBC statistics program More or Less.

    If you to go LOOK INSIDE! on Amazon you get a nice write-up on the marmorkrebs example — crayfish with identical genetics reared in identical environments that turn out very differently. https://www.amazon.com/gp/product/B07FN76Y67/ref=dbs_a_def_rwt_hsch_vapi_taft_p1_i0

    It’s a biological analogue to Daniel Lakeland’s link to the pendulum experiment.

    • jim says:

      “It’s a biological analogue to Daniel Lakeland’s link to the pendulum experiment.”

      The variation in the crayfish is analogous to the pendulums in one sense: although there is variation in the “geometry” of the effect, that variation ultimately dies out and has no impact on the larger system. In other words, whether the crayfish is green or red or a little shorter or a little longer, with respect to the local ecology and larger system of the universe, these effects are irrelevant.

      You can think of the variation in pendulum behavior and the color of the crayfish as the eddies created in the airstream by the butterfly’s wings: yes, every set of eddies will be slightly different, but that doesn’t mean that they propagate indefinitely, much less amplify. They don’t: they simply die out.

      Here’s the paper about the crayfish:
      https://jeb.biologists.org/content/211/4/510.short
      https://jeb.biologists.org/content/jexbio/211/4/510.full.pdf?download=true

    • Martha (Smith) says:

      Thanks for the link. At first, I found it really interesting. But toward the end, it didn’t seem that he considered one (to me) obvious source of the observed variability: Namely, that when a cell divides, the two “daughter” cells are not guaranteed to be identical — they may get different proportions of all the “stuff” that is in the cell. In other words, the process of multi-celled life creates an internal mechanism within the organism that produces variation within the individual and between individuals. (I guess this might be called a second order effect? Maybe he did mention it, but toward the end I was just skimming — it just seemed like a rant or mystical or something.)

      • Anoneuoid says:

        Namely, that when a cell divides, the two “daughter” cells are not guaranteed to be identical — they may get different proportions of all the “stuff” that is in the cell.

        Even just genetically, it is extremely unlikely for the daughter cells to have the exact same DNA sequence as the parent cell.

        • jim says:

          In this case the crayfish are clones so they have identical genes. Quoting from the abstract of the paper, the crayfish “were shown to be isogenic”.

          Here’s the abstract and paper:
          https://jeb.biologists.org/content/211/4/510.short
          https://jeb.biologists.org/content/jexbio/211/4/510.full.pdf?download=true

          • Nick Adams says:

            Every mitotic division is subject to random error. So even if they are genotypically identical they won’t necessarily be phenotypically identical.

            • jim says:

              That sounds reasonable, however the paper notes in the introduction:

              “Phenotypic variation can be produced by genetic differences, environmental influences and stochastic developmental events.”

              I suspect copying error isn’t considered because it’s extremely small relative to the other influences. Also, copying error is random for each reproduction; thus it couldn’t manifest as variation in a trait over a population.

          • Anoneuoid says:

            I haven’t looked but I can 100% assure you whatever they did does not show the same DNA sequence in all the cells of two different animals. That is just so implausible… It is nonsensical.

            If true, then everything we think we know about mutations and replication needs to be reconsidered from scratch.

          • Anoneuoid says:

            Genomic DNA was obtained from ethanol-preserved tissue from walking legs or the pleon

            […]

            Ten microsatellite primer combinations… were tested for potential use with the marbled crayfish. Out of these, two primer combinations rendered successfully amplified DNA with microsatellites in the marbled crayfish

            […]

            All of the 21 marbled crayfish examined from our livestock and two further German populations showed the same
            allelic pattern, namely alleles of 156·bp and 172·bp at locus PclG-04 (repeat motif: TCTA) and alleles of 186·bp and 188·bp at locus PclG-26 (repeat motif: CA).

            We can see the result is an average sequence from a bunch of cells, not single-cell. Also, IIUC they only compared the sequences of 10 microsatellites (~100 bp repetitive sequences). Only 2 out of those 10 “worked” for whatever reason.

            The marbled crayfish apparently has a genome consisting of 3.5 billion bp:

            We determined the genome size at approximately 3.5 gigabase pairs

            https://www.nature.com/articles/s41559-018-0467-9

            So they looked at a couple thousand bp at most out of 3.5 billion to determine what they call “isogeneticity”, or about .0001% of the genome. Also, their description is rather vague but I don’t see how this method is going to detect copy number variation.

            Finally, skimming the paper I see they are not actually using “isogenic” to mean genetically identical:

            The progeny of such apomictic parthenogens is generally regarded as being genetically uniform and identical to the mother, with the exception of random mutations.

            • Martha (Smith) says:

              “with the exception of random mutations.”

              The real question seems to be: How common are these “random mutations”, and how wide is the resulting phenotype variability?

              • Anoneuoid says:

                I don’t know about crayfish, but:

                we calculated a median germline mutation rate of 3.3 × 10−11 and 1.2 × 10−10 mutations per bp per mitosis for humans and mice, respectively.

                […]

                The results indicate a median somatic mutation frequency of 2.8 × 10−7 and 4.4 × 10−7 per bp for human and mouse, respectively, more than an order of magnitude higher than the germline mutation frequency in both species

                https://www.ncbi.nlm.nih.gov/pubmed/28485371

                Then you have the seemingly (millions to billions of times) more common larger structural mutations:

                the rate of chromosome missegregation in untreated RPE-1 and HCT116 cells is ~0.025% per chromosome and increases to 0.6 – 0.8% per chromosome upon the induction of merotely through mitotic recovery from either monastrol or nocodazole treatment (Fig. 3C). These basal and induced rates of chromosome missegregation are similar to those previously measured in primary human fibroblasts (Cimini et al., 1999). Assuming all chromosomes behave equivalently, RPE-1 and HCT116 cells missegregate a chromosome every 100 cell divisions unless merotely is experimentally elevated, whereupon they missegregate a chromosome every third cell division. Chromosome missegregation rates in three aneuploid tumor cell lines with CIN range from ~0.3 to ~1.0% per chromosome

                https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2265570/

              • Anoneuoid says:

                I tried to get better info on the rate of chromosomal abnormalities in “normal” cells. It looks like even at the “best” age of 29, only 80% of the embryos are aneuploid: https://www.ncbi.nlm.nih.gov/pubmed/24355045/

                I’d love to see data on this by tissue and age.

              • Anoneuoid says:

                typo: only 80% of the embryos are euploid

  8. Bob says:

    Anoneuoid wrote:
    . . . any engineer trained in electronics would unambiguously understand a diagram describing the radio . . . Because the language is quantitative (a description of the radio includes the key parameters of each component, such as the capacity of a capacitor, and not necessarily its color, shape, or size), it is suitable for a quantitative analysis, including modeling.

    Well, yes and no. Often the above is true. But there are many circuit elements in the real world device that are not in the circuit diagram. A friend, who is even older than I am, worked at RCA in the early days of color TV. RCA had two or three engineers who designed the UHF circuitry in the TV receivers. He said that no one else in the company understood what those guys did. A radio in a car might appear to be failing because another system in the car is generating unwanted interference.

    Bob

    • > He said that no one else in the company understood what those guys did

      That’s undoubtedly true, but if you were a trained electrical engineer, how long would it take one of those guys to teach you what they were doing if you started working for them? My guess is, in 6 months to a year you’d be one of the elite, maybe not equal in capabilities to the best of the group, but so steeped in the knowledge that the outsiders would say “I don’t know what that guy does”, and you *would* be able to design and modify things to reliably achieve a goal.

      I don’t think this is true of biological systems like say cancer research. It doesn’t matter how long you study with the top people in cancer biology, you will not be able to have a person with cancer come to you and you reliably design a treatment that eliminates their cancer.

      • Bob says:

        I agree that electronics, and indeed most manufactured devices, possess simpler structures than biological systems and those structures are built of subsystems or components that people pretty much understand. But there are a variety of weird ways that electronic systems can fail and some failure modes involve “circuit elements” that nature, rather than the designer, put there.

        Bob
        PS For an example of complex electronics see this https://bit.ly/2LbwmFM—-a trillion transistor chip.

        • Anoneuoid says:

          I agree that electronics, and indeed most manufactured devices, possess simpler structures than biological systems

          I think you missed the point of that paper I quoted, and Daniel’s comment. The biological systems may or may not be more complex, but the methods used by biologists fail even when trying to apply them to the supposedly simpler system of a transistor radio.

  9. This comes up in Genome-Wide Association Studies. GWAS is a canary in the coal-mine due to being applied on a scale where problems become obvious.

    Figures like this are popular:
    https://en.wikipedia.org/wiki/Genome-wide_association_study#/media/File:GWAS_Disease_allele_effects.png

    Causation is definitely a hard problem in GWAS. Mendelian Randomization is an interesting approach.

  10. While you’re certainly right about the flaws of the PNAS-type literature, why are you confident that it is not the case that at least some of our steps are “buffeted by mysterious forces that are beyond our conscious understanding yet can be easily manipulated by psychologists”?

    I would say there’s many examples that show exactly that (which don’t suffer from the problems of the PNAS-style literature). For instance,
    – Framing and status quo bias have extremely robust effects that have been replicated hundreds of times in many different guises. See for instance Kahneman / Tversky’s Asian Disease problem, or Ariely et al.’s Coherent Arbitrariness.
    – There’s a huge economics literature on peer effects showing how your actions are influenced by things outside your control.
    – There’s neurological effects like “blindsight” (the ability of people who are cortically blind due to lesions in their striate cortex to respond to visual stimuli that they do not consciously see). These people can’t explain their choices, but it would be easy to manipulate them. This type of multiplicity, that your decisions are influenced by multiple neural systems, some of which are evolutionarily old and not conscious, seems to be pervasive across the brain (see e.g. David Redish’s “The Mind Inside the Brain”)
    – There’s a large psychology literature showing that people often don’t know why they did something (which is not terribly surprising given that answering why you did something, requires you to infer which of hundreds of causes caused the n=1 decision you’ve just made). See e.g. Maier’s famous Two Chords Problem (Nisbett, Wilson, Psych. Review, 1977 is an older review).

    And my second question: Your intuition about the piranha problem seems to rely on some implicit assumption that “behavior” is a much lower-dimensional than the set of possible influences, right? (I.e. you can’t have 100 factors have a large effect on whom you vote for, but you can easily have 1 factor with a large effect on your vote, 1 with a large effect on what you eat, 1 with a large effect on what movie you watch etc.)

  11. ojm says:

    I missed this whole discussion…but these might be relevant and of interest to some:

    A bonus lecture on Lorenz’s paper (see rest of course for intermediate dynamical systems class):

    https://github.com/omaclaren/open-learning-material/blob/master/qualitative-analysis-dynamical-systems/qualitative-analysis-lectures-2019/combined-lecture-notes/engsci-711-2019-lectures-14-combined.pdf

    Preprint on ‘What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems’

    https://arxiv.org/abs/1904.02826

    This considers stability issues in causal inference

    • I like the abstract of “what can be estimated?” I haven’t read the paper yet. I’m imagining that your point is related to the sensitive dependence issue. For example, suppose we get some video of a double physical pendulum which is given a fair amount of energy and has little friction. We view it for a few seconds. It’s obviously imperfect measurements. Can we estimate where the pendulum will be in 30 seconds or a minute? The answer is no, because even tiny errors in the estimate of where it is now will lead to a state of knowledge not much better than “it will be somewhere within its reachable phase space”. Since we knew that a-priori, the data is uninformative.

      Now, if we get 1 second of video and want to estimate where it will be in 2 seconds… this may be very different.

Leave a Reply