If we’re against parsimony, what are we for?

Longtime readers will know about my distaste for parsimony, so you won’t be surprised to hear of my interest in this recent article in International Studies Quarterly by Seva Gunitsky, which begins:

Parsimony remains a vague, enigmatic, and unusually divisive concept in political science. Proponents see it as essential for theory building, while critics attack it as a naive simplification of reality. As a result, parsimony is either ignored completely or tolerated as an arcane element of social science philosophy. . . .

Despite this confusion, parsimony remains a key concept in international relations theory. It is embedded in ideas like Weber’s ideal types, Lakatos’ hard core, and Kuhn’s normal science. It plays a crucial role in statistical concepts such as degrees of freedom and model overfitting. And issues related to parsimony surface, often implicitly, in debates about “the end of theory” and the merits of paradigms.

Gunitsky continues:

I argue that there are three distinct justifications for parsimony . . . The aesthetic justification emphasizes theoretical elegance; the ontological justification sees parsimony as a reflection of reality; and the epistemological justification treats parsimony as a stylized assumption made for the sake of theory. . . . distinguishing among them has important implications for how we think about evaluating theories.

Here are some good nuggets:

A common pitfall of parsimonious explanations is the reification of stylized assumptions into statements about the world. This is the source, for example, of Hirschman’s (1985) critique of neoclassical economics—economists coming to believe their simplified models are descriptions of human behavior. . . .

[In theoretical physics,] parsimony becomes a desired attribute because the symmetry of parsimonious theories is not just elegant but signals a closer correspondence to reality. . . . In the physical sciences, therefore, parsimony is a desirable feature of theories not because of their elegance, but because elegance signals that a theory is more likely to be a true description of physical reality. . . .

At Marx’s funeral, Engels proclaimed that just as Darwin “discovered the law of evolution in organic nature, so Marx discovered the law of evolution in human history” This trait wasn’t limited to communism; in its cruder forms, modernization theory also embraced a determinist ontology that saw political evolution as a series of discrete and even predictable stages.

Here’s an example from a few years back of some silly social-science modeling by a group of physicists writing about decision making bodies and claiming that “there is a phase transition when committee size reaches 20.” This is bad social science and also bad math.

Gunitsky concludes:

For common types of social inquiry—namely, explanatory theory that emphasizes generalization and causal inference—parsimony is a key element of theory building and hypothesis testing. . . . From this perspective, parsimony is best seen as a necessary evil rather than an intrinsic virtue. . . . To the extent that theorizing demands abstraction, judgments about theoretical assumptions are also judgments about the appropriateness of parsimony.

I’m not sure what to make of all this, but I thought I’d share it with you because the general topic is important to what we do.

33 thoughts on “If we’re against parsimony, what are we for?

  1. There’s a phrase often attributed to Einstein but probably apocryphal, that the right level of complexity is “as simple as possible but no simpler”.

    Human interactions are not particularly simple. If you want to make progress in social science theories I’d guess models need to be somewhere between one and two orders of magnitude more complicated than the stuff we see featured here all the time.

    For example that study about pollution and the river boundary for the free coal policy. If that model doesn’t have some kind of average weather/ convection, some kind of model of economic development, and some kind of model for migration, it’s dead in the water.

    Voting models in the US… If you’re not including a timeseries model of educational attainment, migration/assortment, and active psy-ops by foreign state actors, then what are you doing?

    Inflation in the US, you’ve gotta have the entire history of federal reserve activity since before the dot com boom, debt fueled spending on foreign wars, supply/demand/supply-chain shocks from COVID, and the effect of accelerated inheritance due to 1M deaths are all very necessary to understand what’s going on.

    • Daniel:

      In answer to your question, “Voting models in the US… If you’re not including a timeseries model of educational attainment, migration/assortment, and active psy-ops by foreign state actors, then what are you doing?”, see the political science articles here.

    • Along the lines of your Einstein reference, Tufte’s conclusion (Visual Display of Quantitative Information):

      “What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult – that is,
      the revelation of the complex.”

    • If you want to make progress in social science theories I’d guess models need to be somewhere between one and two orders of magnitude more complicated than the stuff we see featured here all the time.

      I dunno. Look at Conway’s game of life. You can deduce “surprisingly” complex phenomenon from a very few simple premises. In fact, that it seems so surprising probably reflects some deeply flawed assumptions we make about the universe.

      Regarding inflation, people pretend you can increase the money supply without increasing prices by ignoring the Cantillon effect. For decades healthcare, education, and stocks primarily inflated because the government and banks/investors got the new money first, and that is where they spent it. Then stimulus checks came out so the price of everyday items increased.

      Like water flowing downstream, we know it follows the path of least resistance, but it is still not easy to predict exactly what that path will be or the timing.

    • “There’s a phrase often attributed to Einstein but probably apocryphal”

      The smart money has it that this is the (ahem) parsimonious restatement of a phrase from one of Einstein’s 1934 lectures:

      “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.” (p. 165)

      https://quoteinvestigator.com/2011/05/13/einstein-simple/?amp=1

      https://drive.google.com/file/d/1vNF_cy0jAqesubzL49T8fAJC1hzl9Xdl/view?usp=drivesdk

    • Depends on what kind of science you’re trying to do.

      The research culture in economics prioritizes isolating and measuring specific effects/mechanisms, where the rest is noise orthogonal to the thing you are trying to study.

      This usually means economics has limited predictive power, because most of the variation in the outcome doesn’t come from any particular cause.

      To the extent economists are interested in understanding how counterfactual “policies” change, say, average outcomes, this kind of parsimony is usually a trade-off worth making.

      A “complete” theory incorporating every conceivable channel is at best overkill and likely to be harder to estimate and less robust.

      • In general no-one does models of “every conceivable channel”. But what is going on is there’s a spectrum of importance, and if you don’t include the top few important things, you have basically no chance of getting anywhere.

        Imagine you want to understand the effect of different quantities of some chemical added to tire rubber on stopping distance. Suppose you don’t understand anything about physics, so you don’t control any of the following:

        1) Initial speed (varies haphazardly and uncontrolledly from 1MPH to 120MPH)
        2) Wet vs dry pavement (both types are present and un-measured)
        3) Hydroplaning (nobody even wrote it down)
        4) Mass of the car (varies from Geo Metro to 3 axle trucks fully loaded)
        5) turning vs straightaway (no one bothers to check where the braking occurred)

        Now, there are bajillions of other things that affect stopping distance, for example wind, dust on the roadway, brake pad temperature at start of braking, rust on the brake discs, mass of the occupants, etc.

        But **initial velocity** is a huge one… failing to consider that one and you’re completely hosed. And wet vs dry is another big one, you can’t get anywhere without that one… hydroplaning, yeah, if you are hydroplaning you’ve got a totally different process going on, so that’s a huge one. etc etc.

        “the lifetime income effect of an extra year of graduate education post baccalaureate” is **not a thing** in exactly the same way as “the average effect of sulfur compounds on stopping distance” is not a thing… The sulfur compounds may affect the friction on dry pavement, but stopping distance is such a complex interaction between multiple things that the sulfur compound itself doesn’t really have “an effect”.

  2. Obviously, “parsimony” is sometimes an imperfect correlate to some other factor or effect relevant to building models. It would be better to discuss that other factor directly and leave parsimony out of it.

  3. Thanks for sharing! I like the breakdown of the various justifications for parsimony and I agree with most of the author’s main points.

    I don’t agree with the author’s claim that, “in the physical sciences a theory’s elegance may indeed offer a hint of concordance with reality”. It seems like the author views parsimony as scientifically unjustified everywhere except theoretical physics. Sabine Hossenfelder has convinced me that parsimony is not scientifically justified there either. Elegance can be useful if it helps people understand and apply a theory. Elegance can also be a useful guide for thinking, in the same way that an informed prior can help steer an MCMC sampler through an otherwise featureless region of parameter space. But I don’t think it is reasonable to assume that elegance in any theory, regardless of field, necessarily makes the theory more “true”.

  4. If parsimony refers to favoring simplicity to complexity, then I believe the author is missing the point of something like a complexity budget (or maybe it’s just not in the abstract — I didn’t read the paper). If you need to do A and B, then you trade off your efforts between those two things.

    Maybe that is unfair because it implies parsimony can only be understood by trading off against other things (so somehow it can’t be evaluated on its own merits), but I don’t think you need to really know the details of the trade offs to know they’re coming.

    This is like tortoise and hare — a fast start often doesn’t pay off. In this theme, I favor parsimony all the time, but just cuz of expectation of unforseen complexities, which doesn’t seem like it fits in the author’s list at all. A simple explanation (which I’m inclined to believe) is I’m missing the point.

  5. Using parsimony as a rule would seem ill advised, but in practice it seems to often work out to be best. I would love to use more complex mechanistic or first principles models in the projects that I work on, but this really isn’t possible – the data isn’t available and is also unattainable. When I make much more complex models, I am often folding in lots of data that may be proxies for some other data that I wish I had or it may be poor data with lot’s of missing values or measurement error. The point is that often the complex model seems to fold in lots of misinformation or that of dubious worth, nixing any gain via the complexity, or worse, making the performance even worse. Of course, I can assign better priors, but then that is hard or impossible too. Unless they really are a nice mechanistic or first principles model, then it seems the complex models require a lot more data, which I don’t have. I think there has been some discussion recently on this blog about this regarding forecasting. https://statmodeling.stat.columbia.edu/2022/11/23/time-series-forecasting-futile-but-necessary-an-example-using-electricity-prices/#comment-2134838
    Also, I recently found this article linked on the Stan forum while looking around for posts on forecasting, which seemed interesting https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0194889

    I guess in summary, I’m often forced to be reluctantly parsimonious.

    • Zhou:

      See the wonderful quote from Radford Neal. The short answer is that if your complicated model is performing worse, then you’re not regularizing it enough. To put it another way, a parsimonious model can be viewed as a very strongly regularization applied to a more general model.

      That said, I agree with the other commenters that simple models are good starting points. See the golf case study for an example of how I did this in a particular example.

  6. On a mundane level, to estimate variance from a sample, you normally divide the sum of squares by (n – 1). But if you have fewer degrees of freedom (i.e., m constraints or parameters), you divide by (n – m), which increases the estimated variance. Of course, this is not very sensitive to m if n >> m, but it does suggest using the smallest value of m that is possible, if only to get the lowest estimated variance.

    I like to say “Start simple because it’s only going to get more complex”

  7. “In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts!”

    There’s no particular reason why less complicated models they should be worse than more complicated models. But there **are** reasons why they’re likely to be better. The most obvious reason is that a simple model almost always costs the same or less than a complex model to design and deploy, so whatever improvement the complex model offers, if any, must exceed the cost of design and implementation.

    Going back to the debate I had with “somebody” about calories in vs. calories out, one can solve the problem so that C(out) > C(in) without knowing any of the terms involved in internal calorie processing, much less their values. The fact that a more complex model is possible and may with some additional knowledge ultimately fit the data better doesn’t make a more complex model appropriate or better for solving the problem at hand – which in that case was individual health through weight loss.

    Even if every term of the more complex model can be specified and its terms accurately measured, it’s not necessarily better for solving the problem because the cost and trouble of measurement might not be worth the improvement. In the case of maintaining individual health through weight loss, it’s highly doubtful the complex model would justify the cost for the average person.

    In social sciences the chance that a complex model is better seem very low because the primary measurements have often have extremely low accuracy and precision. Often the measurements themselves depend on an unstated model – that is, they dependent on assumptions that aren’t even expressed let alone sensibly evaluated to the assumed value.

    • While trivially true, the calories in vs calories out model doesn’t give much insight.

      How did Bill Gates got so rich?
      He just made more money than he spent.

  8. Simple models seem more general, and they often are in an approximate sense, but in another they’re more restrictive. Every simplification comes with explicit or implied assumptions, each of which narrows the scope of generalization. So a lot depends on how consequential the departures from assumptions are in practice.

    To put it differently, a very precise model will probably be *extremely* intricate and complex, and it may well be on a knife’s edge, where slight discrepancies from the “true” representation are fatal. Then we should look for more tractable, forgiving alternatives that perform well with the data and purposes we actually have.

    Isn’t one of the Big Changes with Big Data that the relaxation of data constraints now favors models that are a complex as ML can make them? Alpha Zero can whomp all of us at chess, but its model is absolutely opaque to human understanding. Why did it push that h-pawn?

    • You bring up Alpha Zero, which seems kinda interesting here. I think the “Zero” part in the name references that it is using no human data — it’s bootstrapping itself from nothing. I think the original AlphaGo used human games, and the no-human-games version came later.

      In this case I believe the Zero thing is supposed to be more general because you don’t need a big pile of reference games. But is it more complicated or not to bootstrap from nothing? Clearly you don’t need the reference games, but you’d probably need some more training infrastructure?

      It’s not clear to me which of these models is simpler really. Seems like it could vary based on what you want to do with them.

  9. In his previous blog linked above, Andrew wrote:

    “I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.)”

    I can’t shoulder the burden of coming up with aesthetic, ontological, and epistemological justifications, but I will take a shot at this on behalf of Willie of Ockham, who no doubt would shrug if you told him he was still famous all these years later for those particular words.

    The Law of Parsimony applies to reverse causation, so I’m going to use my trusty Black Squirrel model.

    You wake up to discover that your power is out on a fine summer morning. You walk outside and notice that all your neighbors have power. Then you look up and there is an unusually large squirrel, somewhat blackened, draped across the insulator on the power line to your house. The wires look blackened as well.

    Now you need to make a decision about what to do next, so you quickly build a model in your brain. The first possibility is that the squirrel somehow shorted the line. We can think of this possibility as requiring a single assumption, that the squirrel was able to reach across the gap and cause a short that took out your power. But there are a vast number of other possibilities, and you can add them to your model. What if the transformer failed and the squirrel is unrelated? What if someone put that squirrel up there to hide an act of vandalism? And so on. But all the possibilities except the first one require you to assume that the two anomalous events – your power going out and there being a dead black squirrel across your insulator – did not occur in direct causal succession, and therefore require you to assume that more than one act was involved.

    Meanwhile, you need to figure out what to do next. Should you keep working on making a more complete model, or should you start the process of getting the squirrel off the wire and your power back on?

    The blog posts, most of the comments, and the drive-by from von Neumann all suggest that parsimony is telling us that our model is better with just the squirrel-that-reached. But that is not what William of Ockham meant. Of course you can make a more complete model if you add in all those low probability causes. The only point being made with Occam’s Razor is that if one explanation for a phenomenon requires less assumptions – less unknown things to be true – than all the other explanations, the “simpler” explanation is most likely to be correct.

    So now that you have thought it through, go ahead and call the power company and tell them about the squirrel rather than working on your model, and thank William of Ockham for giving you the rationale.

    In summary, I think the Law of Parsimony, as originally intended, is nothing more than a simple logical statement about reverse causation probabilities under very specific conditions in which the amount of information is fixed, the circumstances suggest a single direct line of causation, and all other causal explanations require more independent things to be true.

    With all that out of the way, I will admit that I often cringe when I see parsimony invoked in scientific literature to mean simpler is better, so I do recognize that it is misused.

    • Thanks, Matt. This is well done. There is also the famous med school diagnostic expression: “If you hear hooves, think horses, not zebras.” But the reverse causality is highly data-dependent. Suppose you don’t see a squirrel. Then a lightning bolt is a more likely explanation than a fried squirrel spirited off by roadkill enthusiasts.

      • Whether zebras or horses makes more sense depends a lot on which part of the world you’re in. If you’re in a southern African savannah and you hear hooves, you should think zebras, wildebeest, maybe some kind of antelope…

        Which brings me to Jaynes, chapter 20. What Okhams razor really means is not to prefer the *simplest* explanation but rather the one with the higher *prior probability*

    • “I think the Law of Parsimony, as originally intended, is nothing more than a simple logical statement about reverse causation probabilities under very specific conditions in which the amount of information is fixed, the circumstances suggest a single direct line of causation, and all other causal explanations require more independent things to be true. ”

      Well, when it’s a single cause vs. a single cause + other things, it would seem that in a Bayesian context the prior would handle this. After all, the prior for single cause + other things has to be lower than the prior for single cause alone in order to satisfy the Kolmogorov axioms.

  10. “[In theoretical physics,] parsimony becomes a desired attribute because the symmetry of parsimonious theories is not just elegant but signals a closer correspondence to reality. . . . In the physical sciences, therefore, parsimony is a desirable feature of theories not because of their elegance, but because elegance signals that a theory is more likely to be a true description of physical reality. . . .”

    I’m for sure not the best physics expert around here, so… how do we know this? Do you agree with that claim?

    • Physics PhD here. IMHO that is an extremely difficult question that’s still open. There are plenty of cases where parsimony does seem to have been a good guide to truth. One classic example would be Newton’s law of gravitation. It says that gravitational attraction is proportional to 1/r^2. In principle, you could have any other number in place of the 2. It could be 2.173489 or whatever. Just plain “2” is obviously simpler than that, and in the nineteenth century it was confirmed to very high accuracy that the actual number is indeed “2”.

      A similar pattern has held in more recent theoretical developments, e.g. Einsteinian general relativity.

      However, there are plenty of fundamental laws that don’t look so simple. If you look at the Lagrangian for the electroweak interaction, for example, “simple” isn’t quite the word that comes to mind. More quantitatively, there are a lot of free parameters with experimentally-found values that look sort of random.

      Is there an even more fundamental theory that’s distinguished by its elegance? Or does the elegance of some physical principles merely emerge in the course of approximating a less elegant underlying reality? I don’t think anyone really knows.

      • Occam’s Razor does have some empirical basis in physics and related fields. The Copernican model is simpler than epicycles, and turned out to be right. De Broglie’s model of hydrogen is extremely simple and works really well. Even when some simple physical law has turned out to be wrong the replacement has generally turned out to be not all that much more complicated (e.g. Newton’s expression for kinetic energy is wrong, but special relativity’s expression is still pretty simple; Schroedinger’s equation is wrong but the Dirac equation is not super complicated either).

        As Einstein supposedly said, “The most incomprehensible thing about the world is that it is comprehensible.” Perhaps there’s some sort of weak anthropogenic principle here: if physical laws were a lot more complicated we wouldn’t be able to understand them at all and nobody would be making these claims.

        • The Copernican model is simpler than epicycles, and turned out to be right.

          The Copernican model not only had epicycles, it had more of them than Ptolemaic. Further, you can have mathematically equivalent geocentric and heliocentric models. The choice of reference frame is a matter of preference/convenience. There is no right or wrong choice, that is the whole point of relativity.

        • Yeah, OK, Copernicus got twisted up in knots because he insisted on uniform circular motion for the planets. What I should have said is that the Copernican Revolution led to a simpler model that turned out to be correct…well, at least until general relativity came along to mess with it again.

          As for being able to express planetary motions in any frame of reference you wish, that’s true…but if you genuinely believe it’s as simple to work out what’s going on with an earth-centric solar system as with a heliocentric one, you’re nuts.

        • The best model is a matter of convenience. If you only really need the sun, moon, and stars for navigating the Earth’s surface then geocentric with epicycles is simpler.

          Each ellipse is four parameters (length, width, theta, and which focus holds the sun) while each circle only has a radius plus a single equant (offset) for the main orbit. So you get an average of two epicycles per “crystalline sphere” before epicycles become more complex.

          But if you are trying to understand/navigate the solar system, then heliocentric with ellipses is the simplest choice. For navigating the galaxy you would use Saggitarius A*, and the solar system appears to trace a vortex-like path, etc.

        • Here is a good one. It turns out people have even worked out an equivalent “Hollow Earth” model (not only is the Earth hollow, but it contains all of what is usually considered outer space):

          Using simple equations, Abdelkad-er performs on space what geometers call an “inversion” with respect to the sphere. All points outside the sphere are ex-changed with all points inside. The sphere’s center maps to infinity, and in-finity maps to the center. Inversion theory is often used by geometers for proving difficult theorems, and it has been extreme-ly useful in physics.

          After inverting the cosmos, Abdel-kader then applies the same inversion to all the laws of physics. The result is a consistent physics that cannot be falsified by any conceivable observation or experi-ment! Of course the equations for the laws become horribly complex. Light rays follow circular arcs, the velocity of light goes to zero as it approaches the center of inversion, and all sorts of other bizarre modifications of laws are required. To an observer in this inverted universe every-thing looks and measures exactly the same as in the Copernican model, even though the heavenly bodies become mi-nuscule. Day and night, eclipses, and the orbits of the sun, moon, and planets— everything—can be explained by suitably inverted laws. Instead of the earth rota-ting, the shrunken celestial bodies revolve the opposite way around the earth’s “axis.” Because light follows curved paths, the sun seems to set as usual below the “horizon” as it travels a conical helix, six months in one direction and six months in the other. The Foucault pen-dulum, Coriolis effects, and other inertial “proofs” of the earth’s rotation are all accounted for by the drastically modified laws.

          […]

          Why, then, does science reject it? The answer is that the price one has to pay in com-plicating physical laws is too high. A similar situation arises in relativity theory. There is nothing “wrong” in supposing the earth fixed, as Ptolemy believed it was, with the cosmos whirling around it. The question of which frame of reference is “right,” a fixed earth or a fixed universe, is as meaningless as asking whether you stand on the earth or the earth stands on your feet. Only relative motions are “real,” but the complexity of description required when the earth is taken as the preferred fixed frame is too great a price to pay.

          The opposite is the case with respect to choosing between Euclidian space and the non-Euclidian spacetime of general relativity. It is possible to preserve Eu-clidian space and modify the laws of rela-tivity accordingly—indeed, just such a proposal was advanced by Alfred North Whitehead—but here simplicity is on the side of non-Euclidian space. In the space-time of relativity, light continues to move in straight lines, rigid objects do not alter their shapes, and gravity becomes iden-tical with inertia. It is only when we talk in a Euclidian language that gravity bends light, objects contract at fast relative speeds, and gravity and inertia appear as distinct forces.

          https://cdn.centerforinquiry.org/wp-content/uploads/sites/29/1988/07/22165255/p21.pdf

  11. Apart from the case of physics I mentioned in my posting above, for me science, modelling, even theorising, is never exclusively about making true claims about reality, and I have a hard time accepting the “ontological justification” – I mean on what basis? (That’s the physics case again.)

    Rather I think that we do science because as human beings we want to get a handle on things, and theory and models are tools. And part of getting the tools to work is communication. Parsimony is key for communication, even though it’s complicated, as communication in different setups can require (or at least be helped by) different levels of parsimony. Also people can learn to handle more complicated models and theories, but there will always be limits.

    Also organised conscious action may require simplicity, sometimes more, sometimes less. But indeed “A common pitfall of parsimonious explanations is the reification of stylized assumptions into statements about the world.” good point.

Leave a Reply

Your email address will not be published. Required fields are marked *