Just kidding. Here’s the real story. Susanna Makela writes:

A few of us want to start a journal club for the statistics PhD students. The idea is to read important papers that we might not otherwise read, maybe because they’re not directly related to our area of research/we don’t have time/etc.

What would you say are the top ten (or five) statistics papers that you think statistics PhD students should read?

What do all of *you* think? We’ve listed the most-cited statistics papers here. This list includes some classics but I don’t think they’re so readable.

For my recommendations I’d probably like a few papers that demonstrate good practice. Not necessarily the papers that introduce new methods, but the papers that show the use and development of good ideas.

And then there are the papers that introduce bad methods but are readable and thought-provoking. A lot of Tukey’s papers fit this category.

Anyway, let me think . . . I don’t really know where to start . . . let’s look at the index to my books . . .

Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: evidence from Social Security administrative records. American Economic Review 80, 313-336. I’m not saying this paper is perfect—what paper is?—but it’s a good one to read and discuss, treating it as an interesting study to criticize, not something to just be admired.

Rubin, D. B. (1980). Using empirical Bayes techniques in the law school validity studies (with discussion). Journal of the American Statistical Association 75, 801–827. This is good because there’s a lively discussion, also the paper itself is very solid.

Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, ed. S. Brooks, A. Gelman, G. L. Jones, and X. L. Meng, 113–162. New York: Chapman & Hall. This is not an applied paper at all, it’s an expository paper on a computational method. But it’s just full of interesting ideas and excellent throwaway lines.

OK, three recommendations is enough from me.

What recommendations can the rest of you give?

All pretty conceptual pieces. I don’t agree with all of their points but all are very interesting.

“To Explain or Predict” http://arxiv.org/pdf/1101.0891.pdf

“Statistical Modeling: Two Cultures” http://projecteuclid.org/euclid.ss/1009213726

“Statistical Inference: The Big Picture” http://www.stat.cmu.edu/~kass/papers/bigpic.pdf

Zach:

Regarding the first paper you mention, see here.

Regarding the second, see here.

I remember reading both of these posts posts and your reply to the Kass paper as well! I haven’t read the 1997 Brieman article you mention. I know some of your papers on predictive checking/eda more or less give your opinions on the 1st and 2nd pieces it might be interesting to read what you think of them directly. The idea of using learning algorithms for semi-automatic eda is one thing in particular that i started thinking a lot about after reading the first two.

I say give them the GAISE College Report so they can start to think about teaching good statistical practice to people with (presumably) less mathematical skill than themselves.

I think Richard Royall’s short book: Statistical Evidence: A Likelihood Paradigm should be required reading for anyone wanting to do inferential statistics, even if they ultimately reject a pure likelihood approach, ie only cover Chapters 2-5.

If we’re just adding one each, I’d add:

Fienberg, S. E. (1971). “Randomization and Social Affairs: The 1970 Draft Lottery,” Science, 171, 255-261.

Galton (1886) Regression towards mediocrity in hereditary stature.

Journal of the Anthropological Institute.

http://galton.org/bib/JournalItem.aspx_action=view_id=157

The paper is surprizingly readable for 19th C writing, and has much that can be recognized a forerunners of modern regression, but is unburdened by all of the jargon and other stuff that has beed added to modern regression. Also, there is an ingeneous mechanical calculator for forecasting the child’s height.

Also, didn’t somebody recently publish another collection of the writings of I.J. Good? Good is not the most careful of scholars, but his writing is always entertaining and thought provoking.

I’m a fan of Paul Meehl’s “Theory-testing in psychology and physics: A methodological paradox” (Philosophy of Science, 34 (2), pp. 103–115). It really makes you think about what you’re trying to achieve when you use classical hypothesis tests.

I also like many of Ioannidis’ papers, such as “Why Most Discovered True Associations Are Inflated” (Epidemiology, 19 (5), pp. 640–648. doi:10.1097/EDE.0b013e31818131e7), which show the impact of statistical procedures in real research.

While I too like that Meehl paper it always annoyed me to no end that he conflates an observed p-value and Type I error early on. It’s a mistake that taints the entire presentation for me and leads to more logically incorrect errors later. Meehl does have some very good papers but I’m not sure they’re ideal recommendation for a statistics student.

An alternative would be to use the Meehl paper to ignite a discussion of measurement and statistics across fields but also point out Meehl’s mistakes (or have students find them) and then have a discussion of whether his arguments are compromised.

Is this what you refer to:

“Hence the investigator, upon finding an observed difference which has an extremely small probability of occurring on the null hypothesis, gleefully records the tiny probability number “p < .001,” and there is a tendency to feel that the extreme smallness of this probability of a Type I error is somehow transferable to a small probability of “making a theoretical mistake.” It is as if, when the observed statistical result would be expected to arise only once in a thousand times through a Type I statistical error given H02, therefore one’s substantive theory T, which entails the alternative H1, has received some sore of direct quantitative support of magnitude around .999 [ = 1 – .001]."

Roger Koenker and Olga Geling (2001). Reappraising Medfly Longevity: A Quantile Regression Survival Analysis. JASA 96, 458-468.

http://www.jstor.org/stable/2670284

OT I happened to find a typo on p. 15 of your “Mythical Swing Voter” paper:

“We estimate that in fact only 0.5% of individuals switched from Obama of Romney ….”

I suggest the Daryl Bem paper, the red-dress-fertility paper & the cubic-polynomial-China-regression paper.

It’s good to learn what not to do too, eh?

I agree.

Maybe the ideal reading list would pair each great paper with its Nemesis.

Agreed.

Get to them to read this http://www.ncbi.nlm.nih.gov/pubmed/11738537

then a day later read this http://www.badscience.net/2010/08/give-us-the-trial-data/

Terrible warnings often trump good examples.

+1 for terrible warnings.

One of the best Chemical Engineering classes I ever had was the one that dissected past disasters ( Bhopal, Seveso, Flixborough, Chernobyl, Three Mile Island etc.)

There’s even a superb Engineer, Donald Kletz who has made an entire career out of compiling, categorizing and examining past accidents, big & small. Wrote an excellent book called

“What Went Wrong”I think we need more such books in statistics on forensics. Describe crap that ended up being published in top journals. Might do a lot of good for future academics.

Reading one Daryl-Bem-like paper can sometimes have a stronger impact than methodological papers exhorting you to not do something.

You mean Trevor Kletz?

Lalonde AER(1986) should be in that list, IMHO

Along with Bertrand, Duflo & Muallainathan QJE (2004) – you know, one for point estimates, one for inference.

Agree. Two cautionary tales.

what about the classic papers from christian robert’s class? will try to find the link. but can be found by searching on this blog.

https://xianblog.wordpress.com/2010/07/01/top-15-and-more/

http://statmodeling.stat.columbia.edu/2010/06/25/classics_of_sta/

hat tip to bodhi for the reminder.

https://normaldeviate.wordpress.com/2013/07/23/the-five-jeff-leeks-challenge/

Actually, the first paper on the just kidding list with Christian Hennig http://www.stat.columbia.edu/~gelman/research/unpublished/objectivity10.pdf

would be really good, but most Phd students may not be at a point yet to appreciate it.

It is long, so I have not completely read it all yet, but I really do like it and I do think it will move the conversation forward.

My one criticism, for now, is that the phrase “Correspondence to observed reality” would have been better as “lack of discordance with observed reality”.

That would be more inline with the Chang quote “act without being frustrated by resistance from reality”, (but also annoyingly with Peirce’s terminology.)

Assume there is a post on it in the que.

Angrist published a correction to the 1990 AER paper:

http://www.jstor.org/stable/2006780

Sorry to be a pain folks, but can you write out the reference enough so people from different disciplines know the paper you are describing. One of the nice thing about this blog is there are people with different backgrounds. I want to look at what some of you are suggesting!

This is shameless self-promotion and friend-promotion, but everyone should read Gelman and Price’s “All Maps of Parameter Estimates Are Misleading”.

Seriously, this issue does come up a lot and everyone should know about it; and the paper is quite readable.

Edwards, Lindman, & Savage (1963, Psychological Review). Published in the premier journal in psychology, this paper is cited almost exclusively by statisticians. A wonderful read. Another fantastic article is Dennis Lindley’s “Philosophy of statistics” (2000). You might not agree with all that Lindley has to say, but boy, you’d have a hard time winning the argument.

I’m not sure how much those papers, I’d like to mention, will be of use for Ph.D students in statistics but for me-a sociology student with additional basic mathematical background-they opened my eyes for the glaring problems of real applied statistics (especially in the social sciences). There’s nothing new in there for readers of this blog but to many (social) scientists using statistics all this is still unknown and unquestioned:

All started with the paper by King (1968) and his critic of the R-squared parameter. Ioannidis (2005) disillusioned me about common research practices and Gill’s (1999) paper does not only have a catching title but allowed me to understand my own collywobbles about NHST far better. I don’t feel so comfortable with some of Gill’s recommendations anymore and specifics of the problem are far better described by other papers (e.g. the one from our gracious host together with Hal Stern 2006) but was his paper was an eye-opener for me.

Ioannidis, John P. A. „Why Most Published Research Findings Are False“. PLoS Med 2, Nr. 8 (30. August 2005): e124. doi:10.1371/journal.pmed.0020124.

King, Gary. „How not to lie with statistics: Avoiding common mistakes in quantitative political science“. American Journal of Political Science, 1986, 666–87.

Gelman, Andrew, und Hal Stern. „The difference between “significant” and “not significant” is not itself statistically significant“. The American Statistician 60, Nr. 4 (2006): 328–31.

Gill, Jeff. „The insignificance of null hypothesis significance testing“. Political Research Quarterly 52, Nr. 3 (1999): 647–74.

While none of these papers actually need to be read, I really think it might help statistics Ph.D. students to get a sense of the gap between research practice and statistical theory and the problems and efforts of communication between statisticians and applied scientists.

In that general genre I’d put

Maltz, Michael “Deviating from the Mean: The Declining Significance of Significance”. In: Journal of research in crime and delinquency, Vol. 31 No. 4, November 1994 434-463.

and Cohen’s The World is Round article.

+1

I’ll give a shout out to Jaynes’s Statistical papers. They were not written by a statistician obviously and they have a 1/3 philosophy, 1/3 theory, and 1/3 applied quality to them that makes them somewhat eccentric when compared to most statistics papers.

Personally, I’ve had far more success deriving stat methods from the sum and product rules in the style of Jaynes from scratch and only consulting the literature for implementation and computational issues, than I ever had applying anything I learned in the huge number of graduate stat classes I took. If anyone is interested in doing the same here are some of his more pure stat papers which are interesting not because of the specific problems they solve, but for what they teach you about how to do statistics:

New Engineering Applications of Information Theoryhttp://bayes.wustl.edu/etj/articles/new.eng.app.pdf

This paper walks step by step through a series of real engineering probabilistic decision theory problems. The decisions problems start off so simple you can see the answer without any theory, but at each state they get more involved to the point where eventually probability/decision theory is required.

Confidence Intervals vs Bayesian Intervalshttp://bayes.wustl.edu/etj/articles/confidence.pdf

A direct comparison Classical vs Baysian methods in fairly run of the mill stats problems where Confidence Intervals are a disaster. Jaynes shows that the Bayesian methods still work perfect in these examples and explains in depth why all this is happening. This paper was actually written half a century ago, but the median statisticians will still find it’s contents shocking.

Prior Information and Ambiguity in Inverse Problemshttp://bayes.wustl.edu/etj/articles/ambiguity.pdf

A huge number of problems (far more than most realize) can be interpreted abstractly as trying to invert a non-invertible transformation (i.e. find a ‘reasonable’ inverse where mathematically more than one possible inverse exists).

Highly Informative Priorshttp://bayes.wustl.edu/etj/articles/highly.informative.priors.pdf

Jaynes walks through several time series type problems using progressively more informative priors and analyzing the results at each stage.

Monkeys, Kangaroos, and Nhttp://bayes.wustl.edu/etj/articles/cmonkeys.pdf

A deep discussion of the issue of setting up priors for image reconstruction problems that effectively use all the prior information available.

Bayesian Spectrum and Chirp Analysishttp://bayes.wustl.edu/etj/articles/cchirp.pdf

A great paper which conducts a Bayesian analysis of chirped signals. The methodology is of high interest well beyond that specific problem though. Of special interest is his discussion of the classical periodogram and the way he destroys Tukey’s supposed successes in this field. This paper lead to the work by Bretthorst which so revolutionized Nuclear Magnetic Resonance Imaging it was thought to be a hoax at first.

Detection of extra-solar-system planets

http://bayes.wustl.edu/etj/articles/cplanets.pdf

The intended audience is presumably scientists doing applied work rather than stats students, but I’d like to nominate Hogg et al.’s pedagogical overview of model fitting from a few years ago:

http://arxiv.org/abs/1008.4686

The text itself is great, but I love this manuscript because of the rich footnotes throughout offering the authors’ candid thoughts on the different statistical methods being presented.

For a more detailed summary, check out the Astrobite on this paper:

http://astrobites.org/2011/07/26/astrostatistics-how-to-fit-a-model-to-data/

In fact, that’s what I’d really like to recommend: that some enterprising stats students start a Statsbites site! On Astrobites, a team of astrophysics graduate students from around the world collaborate to summarize and discuss contemporary research papers as a daily resource for students worldwide. We started this effort four years ago with motivation almost identical to what Susanna wrote, and now we have an archive of more than a thousand perspectives on recent literature that span the gamut of research in our field.

More information and contact details for the Astrobites collaboration here –

http://astrobites.org/about/

My favourite conceptual/philosophical papers on statistics are about how to use models without “believing” them and particularly about the notion of “approximating” reality with probability models. It would be great if PhD students would know at least one of them:

P. L. Davies “Data features”, Statistica Neerlandica 49, 185–245, 1995

http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9574.1995.tb01464.x/abstract

P. L. Davies “Approximating data”, Journal of the Korean Statistical Society, 37, 2008, 191–211

http://www.sciencedirect.com/science/article/pii/S1226319208000380

J. W. Tukey “More honest foundations for data analysis”, Journal of Statistical Planning and Inference 57, 1997, 21–28

http://www.sciencedirect.com/science/article/pii/S0378375896000328

How about this one, from a special issue of Ecology (mostly) examining P-values.

Aho K., Derryberry D. & Peterson T. (2014) Model selection for ecologists: the worldviews of AIC and BIC. Ecology 95, 631–636.

They explain the difference between AIC and BIC in terms of the different goals of the two GOF metrics, and the kind of problems they are each best suited to. In brief: AIC for “the world is infinitely complicated and I want to identify as many tapering effects as possible” and BIC for “the world is quite simple, with a few true causal effects, and I want to identify these from a much larger set of candidates”.

Anscombe, F. J. (1973). “Graphs in Statistical Analysis”. American Statistician 27 (1): 17–21. JSTOR 2682899