Stephen Senn quips: “A theoretical statistician knows all about measure theory but has never seen a measurement whereas the actual use of measure theory by the applied statistician is a set of measure zero.”
Which reminds me of Lucien Le Cam’s reply when I asked him once whether he could think of any examples where the distinction between the strong law of large numbers (convergence with probability 1) and the weak law (convergence in probability) made any difference. Le Cam replied, No, he did not know of any examples. Le Cam was the theoretical statistician’s theoretical statistician, so there’s your answer.
The other comment of Le Cam’s that I remember was his comment when I showed him my draft of Bayesian Data Analysis. I told him I thought that chapter 5 (on hierarchical models) might especially interest him. A few days later I asked him if he’d taken a look, and he said, yes, this stuff wasn’t new, he’d done hierarchical models back when he’d been an applied Bayesian back in the 1940s.
A related incident occurred when I gave a talk at Berkeley in the early 90s in which I described our hierarchical modeling of votes. One of my senior colleagues–a very nice guy–remarked that what I was doing was not particularly new; he and his colleagues had done similar things for one of the TV networks at the time of the 1960 election.
At the time, these comments irritated me. But, from the perspective of time, I now think that they were probably right. Our work in chapter 5 of Bayesian Data Analysis is–to put it in its best light–a formalization or normalization of methods that people had done in various particular examples and mathematical frameworks. (Here I’m using “normalization” not in the mathematical sense of multiplying a function by a constant so that it sums to 1, but in the sociological sense of making something more normal.) Or, to put it another way, we “chunked” hierarchical models, so that future researchers (including ourselves) could apply them at will, allowing us to focus on the applied aspects of our problems rather than on the mathematics.
To put it another way: why did Le Cam’s hierarchical Bayesian work in the 1940s and my other colleague’s work in 1960s not lead to more widespread use of these methods? Because these methods were not yet normalized–there was not a clear separation between the math, the philosophy, and the applications.
To focus on a more specific example, consider the method of multilevel regression and poststratification (“Mister P”), which Tom Little and I wrote about in 1997, then David Park, Joe Bafumi and I picked back up in 2004, and then finally took off with the series of articles by Jeff Lax and Justin Phillips (see here and here). This is a lag of over 10 years, but really it’s more than that: when Tom and I sent our article to the journal Survey Methodology back in 2006, the reviews said basically that our article was a good exposition of a well-known method. Well-known, but it took many many steps before it became normalized.