Statistical analysis recapitulates the development of statistical methods

There’s a old saying in biology that the development of the organism recapitulates the development of the species: thus in utero each of us starts as a single-celled creature and then develops into an embryo that successively looks like a simple organism, then like a fish, an amphibian, etc., until we reach our human form in preparation for birth.

Recently it struck me that some version of this recapitulation occurs just about whenever we do applied statistics. We start with the simplest methods—univariate data summaries and some basic multivariate analyses—then we perform some comparisons which we check via standard errors and off-the-shelf hypothesis tests, then we move to modeling. We might well start with least squares and maximum likelihood and then move to regularization and multilevel modeling as needed, then throw in measurement error models, selection models, nonparametric this and that, and so forth.

The analogy isn’t perfect—in particular, we don’t always begin an analysis with simple averages and plots; sometimes we begin with a sophisticated nonparametric data-exploration tool such as lowess or Bart. And, indeed, lots of methods for graphical exploratory data analysis have only been developed recently.

Within the context of modeling, though, it does seem to me that we tend to start simple and then add more complicated features one at a time—and this seems like a sensible way to proceed. In so proceeding, we’re motivated in part by computational stability but also in part by the logic of increasing complexity: we take each step for a reason. Thus it is logical that statistical analysis recapitulates the development of statistical methods.

P.S. It’s been pointed out to me that modern biologists don’t believe in this recapitulation. So let me emphasize that this is just supposed to be a thought-provoking analogy, not a literal statement about recapitulation of a historical chain of developments in statistics (or evolution).

5 thoughts on “Statistical analysis recapitulates the development of statistical methods

  1. There’s a saying in biology that the development of the organism recapitulates the development of the species: thus in utero each of us starts as a single-celled creature and then develops into an embryo that successively looks like a simple organism, then like a fish, an amphibian, etc., until we reach our human form in preparation for birth.

    This is called Haeckelian recapitulation, and it’s largely discredited. What actually happens is that embryos of various species of vertebrates all converge on a pretty similar shape at the pharyngula stage, but from there they all develop differently (and their development does not really recapitulate the species phylogeny.)

  2. > development of statistical methods

    Also, does not sound like the historical development of statistics – very roughly Daniel Bernoulli in a 1778 stating a principle of maximum likelihood “of all the innumerable ways of dealing with errors of observations one should choose the one that has the highest degree of probability for the complex of observations as a whole” to Laplace trying to sort out this for double exponential to Gauss choosing double exponential squared (Normal) and seeing it was the lowly mean after all.

  3. The trouble is that the narrative does not exist. The notion that there is a natural progression from descriptive stats to predictive stats, and that technology had allowed a model to be more real than events in the world is untrue. Economists have tried for several hundred years to find the validation of their science like that Newton gave physical science. Like Social Darwinism economics will continue to be the definition if pseudo science no matter how complex the associated technology becomes. Period.

Leave a Reply to Corey Cancel reply

Your email address will not be published. Required fields are marked *