We ran this a few years ago but it remains interesting so I’m reposting:
There’s a old saying in biology that the development of the organism recapitulates the development of the species: thus in utero each of us starts as a single-celled creature and then develops into an embryo that successively looks like a simple organism, then like a fish, an amphibian, etc., until we reach our human form in preparation for birth.
Modern biologists don’t believe in this recapitulation. But taking this as an intriguing idea, I see an analogy with statistical practice.
Some version of this recapitulation occurs just about whenever we do applied statistics. We start with the simplest methods–univariate data summaries and some basic multivariate analyses–then we perform some comparisons which we check via standard errors and off-the-shelf hypothesis tests, then we move to modeling. We might well start with least squares and maximum likelihood and then move to regularization and multilevel modeling as needed, then throw in measurement error models, selection models, nonparametric this and that, and so forth.
The analogy isn’t perfect–in particular, we don’t always begin an analysis with simple averages and plots; sometimes we begin with a sophisticated nonparametric data-exploration tool such as lowess or deep nets. And, lots of methods for graphical exploratory data analysis have only been developed recently; indeed, even methods as basic as scatterplots are only a few centuries old.
Within the context of modeling, though, it does seem to me that we tend to start simple and then add more complicated features one at a time–and this seems like a sensible way to proceed. In so proceeding, we’re motivated in part by computational stability but also in part by the logic of increasing complexity: we take each step for a reason. Thus it is logical that statistical analysis recapitulates the development of statistical methods.