The other day I came across a paper that referred to Charlie Geyer’s 1991 paper, “Estimating Normalizing Constants and Reweighting Mixtures in Markov Chain Monte Carlo.” I expect that part or all of this influential article was included in some published paper, but I only know it as a technical report–which at the time of this writing has been cited an impressive 78 times!

This made me wonder: what are the most influential contributions to statistics that were never published (not counting posthumous publication or decades-later reprints in compilation volumes).

Here’s all that I can think of:

– Thomas Bayes’s original article, published only in 1763, two years after his death.

– John Tukey’s legendary manuscript on multiple comparisons from the 1950s. I actually think Tukey’s work on multiple comparisons was horribly illogical stuff, very clever but also bad bad bad, and I’m very happy that he moved on to other things. But I can’t deny that his unpublished multiple comparisons book was highly influential.

– The paper by Hammersley and Clifford with their eponymous theorem about conditional independence, a legendary work that I think was never published because it was superseded by Besag’s classic 1974 paper on spatial statistics.

– E. T. Jaynes’s book on probability theory, which in his lifetime only existed as an incomplete pdf file.

– Geyer’s aforementioned 1991 paper on computing normalizing constants.

Any other important unpublished statistical works?

(In political science, Larry Bartels must have set the record for the most influential set of unpublished papers (some of which made their way into his Unequal Democracy book).

I guess it is field (substantive and methodological) specific, but Steiger's conference presentation on RMSEA could be included in the unpublished influential.

Bill Venables' "Exegeses on Linear Models".

I hadn't heard of either of these, but a Google search appears to show 73 citations for Steiger's paper. Venables's paper has a less-impressive 12, but I guess it's possible that it was influential in the S community.

I wasn't able to download Steiger's paper. I did take a look at Venables's, and it doesn't quite fit in the category I was thinking of. Unlike the above-noted Tukey, Hammersley and Clifford, Bayes, and Geyer papers, it doesn't contain any original research. It's more like one of those html and pdf "tutorials" that are floating around the web. This is not a putdown–after all, I've written lots of textbooks and review articles–I just wouldn't put this article in the category of "great works in statistics."

Andrew,

Pitman's 1949 lecture notes are famous among nonparametricians. In fact, I have never seen them and would love a copy if you have access to them at Columbia:

Pitman, E.J.G. (1949), "Lecture notes on non-parametric statistical inference," Columbia University.

I think quite a lot of this is in "Some Basic Theory for Statistical Inference" (Chapman & Hall, 1979). His son, who is a professor in the Stat Dept at Cal might be able to help you

How did you get 12 for "Exegeses on Linear Models"? I get 418.

Yes, this paper is mentioned from time to time in the R/S community.

Never been here before, just thought I'd chip in: Kaplan and Meier, JASA 1958, on nonparametric estimation of survival curves with censored data.

Scratch that, you said never published, duh.

Giles:

If an important manuscript was written in 1949 and not published until 1979, I'd count that as "unpublished" for my purposes here.

Kevin:

Thanks for the info. I guess you did a better search than I did. 418 citations sounds very influential! I still wouldn't put it in my "greatest works" category because it's expository and has no research content. Again, this is not meant as a disparagement but rather as a clarification of what I'm looking for.

It was actually

Steiger and Lind, 1980 handout with notes and even though they introduced RMSEA as gof it was Browne and Cudeck who called it that way, according to the handout. I was thinking from the top of my head. Sorry for the wild goose chase.

Addendum: A Google Scholar search says Steiger and Lind conference presentation has been cited 789 times.

How about

Neal, R. M. (1993) Probabilistic Inference Using Markov Chain Monte Carlo Methods, Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto

Google says 848 citations, Citeseer says 372.

Bob: Isn't that Radford's Ph.D. thesis, which was published that year as a Springer softcover?

according to radford neal's website, this is his ph.d. thesis.

Neal, R. M. (1994) Bayesian Learning for Neural Networks, Ph.D. Thesis, Dept. of Computer Science, University of Toronto

from the table of contents, the thesis and technical report both talk about hybrid monte carlo. but neural networks do not seem to come up in the tech report?

my knowledge is pretty limited though so i cannot judge the overlap, if any, between the two.

Max-Stable Processes and Spatial Extremes by R.L. Smith (1990). Unpublished.