The greatest works of statistics never published

Posted on July 5, 2010 9:20 AM by Andrew

The other day I came across a paper that referred to Charlie Geyer’s 1991 paper, “Estimating Normalizing Constants and Reweighting Mixtures in Markov Chain Monte Carlo.” I expect that part or all of this influential article was included in some published paper, but I only know it as a technical report–which at the time of this writing has been cited an impressive 78 times!

This made me wonder: what are the most influential contributions to statistics that were never published (not counting posthumous publication or decades-later reprints in compilation volumes).

Here’s all that I can think of:

– Thomas Bayes’s original article, published only in 1763, two years after his death.

– John Tukey’s legendary manuscript on multiple comparisons from the 1950s. I actually think Tukey’s work on multiple comparisons was horribly illogical stuff, very clever but also bad bad bad, and I’m very happy that he moved on to other things. But I can’t deny that his unpublished multiple comparisons book was highly influential.

– The paper by Hammersley and Clifford with their eponymous theorem about conditional independence, a legendary work that I think was never published because it was superseded by Besag’s classic 1974 paper on spatial statistics.

– E. T. Jaynes’s book on probability theory, which in his lifetime only existed as an incomplete pdf file.

– Geyer’s aforementioned 1991 paper on computing normalizing constants.

Any other important unpublished statistical works?

(In political science, Larry Bartels must have set the record for the most influential set of unpublished papers (some of which made their way into his Unequal Democracy book).

15 thoughts on “The greatest works of statistics never published”

Manolo on July 5, 2010 5:48 AM at 5:48 am said:

I guess it is field (substantive and methodological) specific, but Steiger's conference presentation on RMSEA could be included in the unpublished influential.
Kevin Wright on July 5, 2010 7:03 AM at 7:03 am said:

Bill Venables' "Exegeses on Linear Models".
Andrew Gelman on July 5, 2010 11:02 AM at 11:02 am said:

I hadn't heard of either of these, but a Google search appears to show 73 citations for Steiger's paper. Venables's paper has a less-impressive 12, but I guess it's possible that it was influential in the S community.

I wasn't able to download Steiger's paper. I did take a look at Venables's, and it doesn't quite fit in the category I was thinking of. Unlike the above-noted Tukey, Hammersley and Clifford, Bayes, and Geyer papers, it doesn't contain any original research. It's more like one of those html and pdf "tutorials" that are floating around the web. This is not a putdown–after all, I've written lots of textbooks and review articles–I just wouldn't put this article in the category of "great works in statistics."
Michael Ernst on July 5, 2010 8:39 PM at 8:39 pm said:

Andrew,

Pitman's 1949 lecture notes are famous among nonparametricians. In fact, I have never seen them and would love a copy if you have access to them at Columbia:

Pitman, E.J.G. (1949), "Lecture notes on non-parametric statistical inference," Columbia University.
Giles Warrack on July 6, 2010 2:37 AM at 2:37 am said:

I think quite a lot of this is in "Some Basic Theory for Statistical Inference" (Chapman & Hall, 1979). His son, who is a professor in the Stat Dept at Cal might be able to help you
Kevin Wright on July 6, 2010 10:04 AM at 10:04 am said:

How did you get 12 for "Exegeses on Linear Models"? I get 418.

Yes, this paper is mentioned from time to time in the R/S community.
JID on July 6, 2010 7:28 PM at 7:28 pm said:

Never been here before, just thought I'd chip in: Kaplan and Meier, JASA 1958, on nonparametric estimation of survival curves with censored data.
JID on July 6, 2010 7:29 PM at 7:29 pm said:

Scratch that, you said never published, duh.
Andrew Gelman on July 6, 2010 10:03 PM at 10:03 pm said:

Giles:

If an important manuscript was written in 1949 and not published until 1979, I'd count that as "unpublished" for my purposes here.

Kevin:

Thanks for the info. I guess you did a better search than I did. 418 citations sounds very influential! I still wouldn't put it in my "greatest works" category because it's expository and has no research content. Again, this is not meant as a disparagement but rather as a clarification of what I'm looking for.
Manolo on July 7, 2010 4:48 AM at 4:48 am said:

It was actually
Steiger and Lind, 1980 handout with notes and even though they introduced RMSEA as gof it was Browne and Cudeck who called it that way, according to the handout. I was thinking from the top of my head. Sorry for the wild goose chase.
Manolo on July 7, 2010 4:51 AM at 4:51 am said:

Addendum: A Google Scholar search says Steiger and Lind conference presentation has been cited 789 times.
Bob Carpenter on July 7, 2010 9:55 AM at 9:55 am said:

How about

Neal, R. M. (1993) Probabilistic Inference Using Markov Chain Monte Carlo Methods, Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto

Google says 848 citations, Citeseer says 372.
Andrew Gelman on July 7, 2010 10:51 AM at 10:51 am said:

Bob: Isn't that Radford's Ph.D. thesis, which was published that year as a Springer softcover?
jimmy on July 7, 2010 5:59 PM at 5:59 pm said:

according to radford neal's website, this is his ph.d. thesis.

Neal, R. M. (1994) Bayesian Learning for Neural Networks, Ph.D. Thesis, Dept. of Computer Science, University of Toronto

from the table of contents, the thesis and technical report both talk about hybrid monte carlo. but neural networks do not seem to come up in the tech report?

my knowledge is pretty limited though so i cannot judge the overlap, if any, between the two.
mjp on July 14, 2010 1:00 PM at 1:00 pm said:

Max-Stable Processes and Spatial Extremes by R.L. Smith (1990). Unpublished.

Comments are closed.