When I say “best,” I mean coolest, funnest to read, most thought-provoking, etc. Not necessarily the most path-breaking. For example, did Andrew Wiles write a paper with the proof of Fermat’s last theorem? If so, I can’t imagine this would be readable. So, sure, it’s a great accomplishment but it’s not what I’m talking about.
But the paper has to be important in some way. It can’t just be readable. So, for example, I don’t know that Mark Kac’s classic, Can One Hear the Shape of a Drum?, would count, wonderful as it is.
Let’s make a list. Maybe we can put them all together into a fun book. Let’s start with these two:
1. Benoit Mandelbrot, How Long is the Coast of Britain?
2. Daniel Kahneman and Amos Tversky, Judgment under Uncertainty: Heuristics and Biases.
Can you add more to the list? No jokes, please. The paper also has to be of high quality. For example, Bem’s 2011 paper on ESP was important, in a historical sense, but not scientifically important or interesting.
Joe Redmon’s YOLO papers are pretty casual bht well written and fun.
Published on June, 9
A. Einstein “ On a Heuristic Point of View about the Creatidn and Conversion of Light”.
http://users.physik.fu-berlin.de/~kleinert/files/eins_lq.pdf
Its elegant derivation has always amazed me.
Sourav:
Sorry, it was not my intention commenting as a reply.
“Science and Statistics”
George E. P. Box, Journal of the American Statistical Association, Vol. 71, No. 356. (Dec., 1976), pp. 791-799.
https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writings/Boxonmaths.pdf
Here’s one from economics: “The Market for “Lemons”: Quality Uncertainty and the Market Mechanism,” George A. Akerlof
The Quarterly Journal of Economics, Vol. 84, No. 3 (Aug., 1970), pp. 488-500. It eventually earned him a (quasi)Nobel Prize and influenced considerable thought regarding the role of information in markets – particularly asymmetric information. Akerlof also wrote an interesting narrative of the difficulties he had getting the article published: https://www.nobelprize.org/prizes/economic-sciences/2001/akerlof/article/. I also recall him writing another piece about the publication difficulties that I can’t seem to find – in that one, he said that the first version of the paper was actually better than what eventually got published. In order to publish it, he had to make it more mathematical (and less readable). If someone can locate where he said that, I’d appreciate it.
I would add to this Michael Spence’s article on market signaling, which introduced the hugely influential idea of costly signaling, and got him a Nobel along with Akerlof and Stiglitz:
https://www.jstor.org/stable/1882010
Dear Dale,
I liked the quasi Noble Prize, actually it does not have any link to the prize…/Magnus
Schrödinger, E. Die gegenwärtige Situation in der Quantenmechanik, https://doi.org/10.1007/BF01491891
It describes an important debate in the development of quantum mechanics. It also presents the famous Schrödinger’s Cat scenario, which was a joke making fun of the Copenhagen interpretation of quantum mechanics, not a thought experiment.
There’s an English translation here. The famous cat scenario isn’t a joke making fun of ‘the’ CI but an example – a “ridiculous case” – serving as a warning against naive “psiontology”:
How about Bertlmann’s Socks and the Nature of Reality, by Bell? I don’t know if this is the original paper, or a post-publication precis of the concept.
https://hal.archives-ouvertes.fr/jpa-00220688
Good science reports should have these properties:
1) Deduction of an accurate quantitative model from a small set of simple and common sense laws that can explain an apparently complex and confusing phenomenon
2) Open source, easily reproducible
3) Immediate practical applications
This is probably the best bio paper I’ve seen along those lines:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916857/
It’s also funny that you can basically ignore everything in neuroscience from 1920-2010 and get that paper. If Ramon y Cajal had a computer it could have been written during WWI.
On the Akerlof Lemons paper
https://www.nobelprize.org/prizes/economic-sciences/2001/akerlof/article/
“By June of 1967 the paper was ready and I sent it to The American Economic Review for publication. I was spending the academic year 1967-68 in India. Fairly shortly into my stay there, I received my first rejection letter from The American Economic Review. The editor explained that the Review did not publish papers on subjects of such triviality.”
“was again rejected on the grounds that the The Review [of Economic Studies] did not publish papers on topics of such triviality.”
“The next rejection was more interesting. I sent “Lemons” to the Journal of Political Economy, which sent me two referee reports, carefully argued as to why I was incorrect.”
“I may have despaired, but I did not give up. I sent the paper off to the Quarterly Journal of Economics, where it was accepted.”
> 2. Daniel Kahneman and Amos Tversky, Judgment under Uncertainty: Heuristics and Biases.
Really, a paper presenting work that is based totally on small samples and p-values? I didn’t know you had it in you.
This also comes as a surprise to me.
Sentinel:
The Kahneman and Tversky article is an analytical summary of many previously-conducted experiments. I’m sure that some of these experiments are flawed and would not replicate, but I also think that a lot of what they found was real. The 1974 article is valuable both in putting all these stories in one place and in attempting to put the findings in a larger theoretical context.
I’m not a fan of T&K in general, but this paper is a classic. Its easy to read, provides a good summary of findings up to that point in time, helped spark the development of behavioral economics, and much of the work is replicable.
There’s nothing wrong with small samples and p-values. The problems are with how they tend to be interpreted.
Mitchell J. Feigenbaum, “Quantitative universality for a class of nonlinear transformations” J. Stat. Phys. 19, 25–52(1978) doi: 10.1007/BF01020332
The paper that presented universal scaling properties in all period-doubling routes to chaos. Very readable and with great figures to illustrate the math.
“A Mathematical Theory of Communication,” by Claude Shannon, is both important—it defined information theory—and very readable.
http://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
“Reflections on Trusting Trust,” by Ken Thompson, is just plain awesome.
https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf
I don’t know if two papers on fractals are too many, but Bak, Tang, and Wiesenfeld’s “Self-organized criticality: An explanation of the 1/f noise” can change your life. The Abelian sandpile model they introduce shows how a simple system can tune itself to exhibit scale-invariant behavior (in their case, avalanches), and is a great toy model for thinking about all sorts of complex phenomena.
Journal link: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.59.381
PDF link: http://www.chialvo.net/Curso/UBACurso/DIA3/Papers/SOC1.pdf
Turing’s “On Computable Numbers”
Link: https://academic.oup.com/plms/article-abstract/s2-42/1/230/1491926?redirectedFrom=fulltext
In similar bracket, Shannon’s A Mathematical Theory of Communication, https://onlinelibrary.wiley.com/doi/10.1002/j.1538-7305.1948.tb01338.x
Same as with Turing, I don’t know about “funnest to read”, but the underlying concepts are approachable. And definitely cool.
Turing’s paper on AI (“Computing Machinery and Intelligence”) is very good, although it’s one of the most badly misunderstood papers in academic history (mostly because almost no one has read it; we all just assume what it says). His “Turing test”, that is, the real Turing test, not the inane silliness that everyone thinks is the Turing test, asks that a man pretend to be a woman to an interlocutor who tries to figure out if she’s a she, and then it ask that the computer play the role of the impostor in that game, and be rated on how well it pretends to be a woman. So it’s a real test of intelligence and human empathy. It isn’t about the computer fooling people at all, it’s about actually being intelligent.
Turing, A. M. Computing Machinery and Intelligence, Mind, Volume LIX, Issue 236, October 1950, Pages 433–460
https://academic.oup.com/mind/article/LIX/236/433/986238
Shannon, “A mathematical theory of communication.” Should be readable to most anyone here.
Edward Purcell’s “Life at low Reynolds number” (1977; doi: 10.1119/1.10903; PDF here) is fun to read, thought-provoking, and hugely influential. The topic is how simple physics reveals deep constraints on what microorganisms can do.
It perhaps shouldn’t count, since it’s essentially a transcript of a talk, which helps give it a casual tone, and since it’s a pedagogical piece describing things that Purcell and others figured out. Still, it’s a wonderful paper.
+1
Raghu:
Maybe that’s another category of such explanatory articles. Other examples would be Mr. Tompkins in Wonderland and How Animals Work.
Good point. As a “primary” (not explanatory) article on similar lines, I’ll list:
“Physics of chemoreception” by Howard Berg and Edward Purcell (Biophys J., 1977, PDF) — also a classic, beautiful and influential. The abstract makes it sound boring; the rest is great. Many later papers by various authors flesh out the details of small bits of this one; the broad strokes here sketch out a whole landscape.
— Raghu
Would love to see a post on that category! This one is great too :)
Also, Thomas J. Schelling, “Dynamic Models of Segregation” J. Math. Sociol. 1, 143-186 (1971). doi: 10.1080/0022250X.1971.9989794 and Robert Axelrod and William D. Hamilton, “The Evolution of Cooperation” Science 211, 1390-1396 (1981). doi: 10.1126/science.7466396
Two landmark papers (both very readable and enjoyable) on surprising emergent phenomena that arise from the interactions of individual decisions with collective behavior.
+1
And in the same vein I would recommend “Inductive Reasoning and Bounded Rationality” by W. Brian Arthur, often referred to as The El Farol Bar problem. Available t http://tuvalu.santafe.edu/~wbarthur/Papers/El_Farol.pdf
Jonathan:
I have mixed feelings about both these celebrated papers. The ideas in the papers are both pretty and relevant. These are simple theoretical explanations for important social phenomena. My problem with the papers is that people can take them too seriously, that people can see a possible explanation as the explanation.
I wrote about Axelrod’s work in a paper, “Methodology as ideology: Some comments on Robert Axelrod’s ‘The Evolution of Cooperation,'” which is based on my undergraduate thesis from 1986; see here for background.
Andrew:
Your criticism of the way people can easily abuse the results of both of those papers is very important.
I teach both papers in my course on Agent-Based Modeling and I am always uncomfortable teaching the Schelling paper in particular because it’s easy to completely misuse it to argue that housing segregation was caused by individual choice or that attempts to desegregate neighborhoods will inevitably fail.
But I use that discomfort as a way to teach my students to be cautious about generalizing from their own simple models. I bring in “The Color of Law” to talk about how there was an extensive body of public policy at federal, state, and local levels, to deliberately segregate neighborhoods, and that Schelling knew about this when he wrote his paper. And I ask students to discuss what it means to make a model of segregation three years after the Fair Housing Act that looks only at individual-level voluntary choices, and that treats the groups of people symmetrically.
I think the problem you identify, “that people can see a possible explanation as the explanation” is an important problem that goes way beyond these papers, and when I teach these papers they provide a great way for me to warn students against that tendency because their simplicity makes it easy to highlight the dangers of rushing to generalize from might be to must be without compelling evidence.
I also really liked your paper about Axelrod. I hadn’t seen it before and I am very happy that you called my attention to it now. I especially liked the critical reflection on the assumptions Axelrod makes about soldiers’ motivations and the gulf that often grows between sweeping theories about motives, decisions, and behavior, and what we see in detailed empirical observations (e.g., that firing makes you a target, so game theory may be overthinking a much simpler phenomenon).
But I still feel that these are great papers, despite the problems we see with them, and I still enjoy returning to them and re-reading them every year when I teach them.
Jonathan:
I agree. Teach the complexity. Back when I used to teach decision analysis, I’d first teach classical Neumann decision theory and then discuss the flaws in the theory.
Andrew said,
“My problem with the papers is that people can take them too seriously, that people can see a possible explanation as the explanation.”
and Jonathan G replied,
“I think the problem you identify, “that people can see a possible explanation as the explanation” is an important problem that goes way beyond these papers, and when I teach these papers they provide a great way for me to warn students against that tendency because their simplicity makes it easy to highlight the dangers of rushing to generalize from might be to must be without compelling evidence.”
I agree wholeheartedly. I think of the problem in mathematical terms: “A conjecture is not a proof.” I really wish that teaching this maxim were part of standard mathematical education. One quote that sticks in mind with me (I forget the name of person who said it) goes something like, “I have eyes to see where I am going, and feet to get me there. That is the difference between conjecture and proof.”
Yes, “A Mathematical Theory of Communication” was the first that came to mind.
Highly readable, interesting, and different (field work in law and econ): Ellickson’s “Of Coase and Cattle”.
https://digitalcommons.law.yale.edu/fss_papers/466/
I like Leo Breiman’s “Statistical modeling: the two cultures”
Kenneth:
Indeed. See here.
Nice response! Thanks
Goodman, Goodman, Goodman, & Goodman. 2014. “A Few Goodman: Surname Sharing Economist Coauthors” https://scholar.harvard.edu/files/joshuagoodman/files/goodmans.pdf
Ha ha! Don’t forget Leo Goodman…
I wonder how many Smiths have coauthored (or could coauthor) a paper.
(Side comment: In math, the custom is to list authors by alphabetical order of last name. A mathematician whose last name is Small once said he’d like to write a paper with me sometime, so that he could be first author for once.)
Paul Romer on “Endogenous Technical Change,” Journal of Political Economy, 1990. Just beautifully written and laid out.
“Fuck Nuance” (Kieran Healy) was a fun read – https://kieranhealy.org/files/papers/fuck-nuance.pdf
Since you posted a link to T&K, here’s a good one from their arch nemesis titled “Why Heuristics Work” – https://pure.mpg.de/rest/items/item_2100099/component/file_2100098/content
This one is about how the parasite-host relationship might have contributed to the evolution of the brain. Surprisingly readable for someone who knows almost nothing about immunology – https://www.journals.uchicago.edu/doi/abs/10.1086/705038?journalCode=qrb
“Top 10 replicated findings from behavioral genetics” – https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4739500/
Bryan, W. L., & Harter, N. (1899). Studies on the telegraphic language. Psychological Review, 6, 345-375.
Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87.
Oh neat! What is the significance of the telegraph paper? My dad likes this stuff, so I’ll send this to him.
I’m fond of Drew McDermott’s “Artificial Intelligence Meets Natural Stupidity”. Despite the cutesy title, it has some very important ideas in it that AI has to get right to work, and which keep getting forgotten (and sometimes rediscovered, albeit rarely). It does have the problem that it’s examples are taken from MIT AI Lab work from the 1970s, and is a bit opaque to folks not familiar with that work. Sigh. Among other things, it points out the problem of the “wishful mnemonic”: If your representation uses English words, your program’s reasoning probably isn’t anywhere near what you think it is. E.g. (Move John Ball1) is a very different thing from (Move John Mary) (Roger Schank insisted that different primitives be used: PTRANS for physical things, AFFECTS (or whatever it was) for emotional things). That English words don’t/can’t work as a substrate for reasoning is why ideas like grovelling over the internets, reading pages, and extracting information/building “knowledge bases” simply can’t work. It appears AI is currently in a period of forgetting how really hard human reasoning is*. Sigh.
*: In it’s defense, AI isn’t trying to do human reasoning, only to make magic black boxes that do kewl things.
Evolution in Mendelian Populations, Sewall Wright:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1201091/pdf/97.pdf
A landmark in human thought. Darwin figured out evolution but never got close to speciation. Wright’s paper explains how it works in a way that fits the patterns we see. His “shifting balance theory” is one of those discoveries that seems to be cut from whole cloth rather than built on the ideas of others.
And pertinent to this blog, a full understanding of Wright’s theory pretty much yanks the rug out from under evolutionary psychology. Evo psych is based upon the idea that if you can name and describe a genetic trait it must be adaptive, but Wright showed that actual speciation is far too messy for that.
This provides a nice example of theory versus data:
Evolution. 2007 Nov; 61(11):2528-43. doi: 10.1111/j.1558-5646.2007.00219.x. Epub 2007 Sep 25.
Spatial Differentiation for Flower Color in the Desert Annual Linanthus Parryae: Was Wright Right?
Douglas W Schemske, Paulette Bierzychudek
PMID: 17894812 DOI: 10.1111/j.1558-5646.2007.00219.x
Abstract
Understanding the evolutionary mechanisms that contribute to the local genetic differentiation of populations is a major goal of evolutionary biology, and debate continues regarding the relative importance of natural selection and random genetic drift to population differentiation. The desert plant Linanthus parryae has played a prominent role in these debates, with nearly six decades of empirical and theoretical work into the causes of spatial differentiation for flower color. Plants produce either blue or white flowers, and local populations often differ greatly in the frequencies of the two color morphs. Sewall Wright first applied his model of “isolation by distance” to investigate spatial patterns of flower color in Linanthus. He concluded that the distribution of flower color morphs was due to random genetic drift, and that Linanthus provided an example of his shifting balance theory of evolution. Our results from comprehensive field studies do not support this view. We studied an area in which flower color changed abruptly from all-blue to all-white across a shallow ravine. Allozyme markers sampled across these regions showed no evidence of spatial differentiation, reciprocal transplant experiments revealed natural selection favoring the resident morph, and soils and the dominant members of the plant community differed between regions. These results support the hypothesis that local differences in flower color are due to natural selection, not due to genetic drift.
“This provides a nice example of theory versus data:”
Nah. Folks have been taking potshots at Wright’s theory for over 80 years, resulting in some good challenges that have yet to be resolved. This one doesn’t rise to anywhere near that level. I’m afraid I got a chuckle out of this part:
“Our results from comprehensive field studies do not support this view. We studied an area in which flower color changed abruptly from all-blue to all-white across a shallow ravine.”
Derivation of theory by means of factor analysis or Tom Swift and his electric factor analysis machine
J. Scott Armstrong (1967)
chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://repository.upenn.edu/cgi/viewcontent.cgi?article=1015&context=marketing_papers
This was the first cautionary tale style paper I read while in grad school. It’s lesson is still relevant.
D. Gale and L. S. Shapley (1962), College admissions and the stability of marriage, The American Mathematical Monthly, 69:9–15.
http://www.jstor.org/stable/2312726
This short paper is wonderfully clear – almost chatty – and yet eventually won Shapley a Nobel prize in Economics. Although Gale and Shapley didn’t know it at the time, the algorithm they invented had been discovered inedependently and was used – indeed, still is used – by the National Resident Matching Program to match junior doctors with training posts in American hospitals. This real-world applicability caused ravening packs of economists, computer scientist and mathematicians to fall on the problem and gnaw it to shreds, yielding many publications, but the original is still the most fun to read.
My favorite is
“Effects of remote, retroactive intercessory prayer on outcomes in patients with bloodstream infection: randomised controlled trial” by Leonard Leibovici. It appeared in the BMJ, 22 December 2001. The conclusion is that “Remote intercessory prayer said for a group of patients is associated with a shorter hospital stay and shorter duration of fever in patients with a bloodstream infection, even when the intervention is performed 4–10 years after the infection.” In more detail, “Mortality was 28.1% (475/1691) in the intervention group and 30.2% (514/1702) in the control group… The length of stay in hospital and duration of fever were significantly shorter in the intervention group (P=0.01 and P=0.04, respectively).”
The conclusion is “Remote, retroactive intercessory prayer can improve outcomes in patients with a bloodstream infection. This intervention is cost effective, probably has no adverse effects, and should be considered for clinical practice.” The savings in our skyrocketing health care costs cannot be understated.
A worthwhile paper (unless misused to …).
Efron, B., 1986. Why isn’t everyone a Bayesian?. The American Statistician, 40(1), pp.1-5.
Dennis Lindley’s response to Efron in particular has helped me understand the Bayesian argument. I’m still trying to figure it out, but this was very useful. The responses from others were good too, but I find Lindley’s writing especially clear and cogent to me, a non-statistician!
Anon,
The Efron paper is fine for what it is, but I wouldn’t call it one of the best scientific papers ever written. It’s a thoughtful expression of a position—I’ve written such papers myself—but I would not say that it makes a scientific contribution on its own.
Ward, et al, The Perils of Policy by P-value: Predicting Civil Conflicts. Clearly written and on important methodological and substantive topics. https://journals.sagepub.com/doi/pdf/10.1177/0022343309356491
Einstein, A.; Podolsky, B.; Rosen, N. (1935-05-15). “Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?” (PDF). Physical Review. 47 (10): 777–780.
Strong contender for most philosophical paper ever. Einstein and co put their finger precisely on the weirdest features of QM. Bell turned philosophy into experiment in another landmark paper.
Bell, J. S. (1964). “On the Einstein Podolsky Rosen Paradox” (PDF). Physics Physique Физика. 1 (3): 195–200.
I meant most “philosophical *physics* paper ever”.
Alfred Knudson (1971) “Mutation and Cancer: a statistical study of retinoblastoma”
Logical analysis of a simple dataset (the relative incidence of unilateral and bilateral retinal tumours in their familial and sporadic forms) produced a remarkably prescient theory of carcinogenesis – Knudson’s Two Hit hypothesis.
Paul Samuelson’s note written in words of one syllable: “Why we should not make mean log of wealth big though years to act are long.” He cheated a little in footnotes. Available at e.g., http://www.nccr-finrisk.uzh.ch/media/pdf/Samuelson_JBF1979.pdf
I don’t know if it quite counts as science, but Tony Hoare’s 1978 paper “Communicating Sequential Processes” is a masterpiece. I still read it every so often. The treeatment of concurrency in the Go programming language was heavily influenced by it – Rob Pike is also a fan.
Baby bear’s dilemma; a statistical tale. Agronomy Journal, 1982. On the use of multiple comparisons for comparing varieties of oats for making porridge.
Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician – Donald B. Rubin
“What the Frog’s Eye Tells the Frog’s Brain”, Jerome Lettvin …
At one time, I think this was the most cited scientific paper, ever.
Never had the pleasure of meeting him. Lettvin was a physician during the Battle of the Bulge.
“Body Maps on the Human Genome”, Christopher Cherniak
https://molecularcytogenetics.biomedcentral.com/articles/10.1186/1755-8166-6-61
— Mark
Cohen, J. (1994). The Earth is Round (p < .05)
This line alone warrants consideration…
“And we, as teachers, consultants, authors, and otherwise perpetrators of quantitative methods, are responsible for the ritualization of null hypothesis significance testing (NHST; I resisted the temptation to call it statistical hypothesis inference testing) to the point of meaninglessness and beyond.”
And pretty much anything written by Paul Meehl.
Eb:
Hmmm, I guess I should clarify. See comment here. I’m more interested in papers with some research contributions, not just position papers, no matter how influential and charming they may be.
Thanks to all for lots of interesting suggestions. I’ll still go back to Kac. The title is fantastic, as is the paper. One should also add the paper of Gordon, Webb, and Wolpert, which answered Kac’s question with a resounding No. Wikipedia has the relevant citations.
I debated submitting this one because Andrew requested no jokes, but the impact of Sokal’s paper far transcended a mere joke:
https://physics.nyu.edu/faculty/sokal/transgress_v2/transgress_v2_singlefile.html
Sokal’s paper had more than a little of seeing the mote in someone else’s eye while ignoring the beam in one’s own.
Several years before Sokal’s paper was published, The Canadian Journal of Physics published a paper that purported to use chaos theory to “prove” that feminism was dangerous to society, using “evidence” drawn from the author’s informal interviews with undergraduate physics majors, combined with some incoherent ranting about chaos theory, working parents, and the nuclear family. G.R. Freeman, “Kinetics of nonhomogeneous processes in human society: Unethical behaviour and societal chaos,” Can. J. Phys. 68, 794-798 (1990).
Given this, Sokal would have done better to be a good deal humbler rather than crowing about how bad peer review was in other disciplines.
Jonathan:
I don’t get why Sokal, just cos he’s a physicist, should feel humble because some other physicist was a horse’s ass. Sokal is not responsible for every stupid and obnoxious thing that physicists do.
Andrew: Your point would be more persuasive if Sokal had used his prank only to criticize the editors of the journal Social Text, but he generalized from that one journal to broadly criticize the entire field of science studies and the humanities in general.
This paper is fun to read and certainly influential:
The Magical Number Seven, Plus or Minus Two Some Limits on Our Capacity for Processing Information
http://www2.psych.utoronto.ca/users/peterson/psy430s2001/Miller%20GA%20Magical%20Seven%20Psych%20Review%201955.pdf
Dan:
That psychology paper by George Miller reminds me of this classic from psychology by Dale Miller, The Norm of Self-Interest.
Jaynes’ Information Theory and Statistical Mechanics
Markowitz’s Portfolio Selection
So many great ones already listed, and so many that are new to me that I look forward to reading.
As an applied clinical scientist with no formal statistical training, the Ionannides paper “Why most published research findings are false” had a big impact on my thinking. I don’t know if it’s really the kind of thing you were looking for here…
https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
Of course, now he’s even more famous for being one of the most prominent and vocal covid-19 skeptics. Hmmmm…
Michael,
The Ioannidis paper is influential and I have cited it many times. I agree with its overall message, but I’m not thrilled with its framing of findings as “true” or “false,” as we’ve discussed on this blog.
I’m totally fine with it as a persuasive argument to people working in the field under discussion on their own terms.
We agree on the inappropriateness of true/false dichotomization.
Nevertheless, it was a real eye-opener for me. Also, I work largely with M.D.s, and I have found it to be a pretty good gateway drug for them. They tend to not get the math – but the graphs are good enough to help them understand the issue.
That was the tricky part of it being influential, but it set a poor precedent about being careless about the logic and math – if the overall message was sensible.
That is what I actually said to Stephen Goodman when he raised various issues that were later written up here – https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0040168
I now think that was a big mistake on my part and only recently saw the author’s response https://journals.plos.org/plosmedicine/article/file?type=printable&id=10.1371/journal.pmed.0040215
Bill Hamilton (1971) Geometry for the Selfish Herd, Journal of Theoretical Biology
Lawrence Slobodkin (1986) “The Role of Minimalism in Art and Science”, The American Naturalist
Beginning,
WE wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A,). This structure has novel features which are of considerable biological interest.
(Almost) ending
It has not escaped our notice that the specific pairing we have postulated irrimediately suggests a possible copying mechanism for the genetic material.
Watson & Crick 1953. Yes, that Watson.
Incidentally, how did they choose the order of the authors?
By ego presumably.
John Nelder’s “From Statistics to Statistical Science” published in JRSS Series D
Actually, this might be too niche
Waldfogel, Joel. “The Deadweight Loss of Christmas.” The American Economic Review 83, no. 5 (1993): 1328-336. Accessed June 9, 2020. http://www.jstor.org/stable/2117564.
Krugman’s Theory of Interstellar Trade (1978) is a fun one! http://www.standupeconomist.com/pdf/misc/interstellar.pdf
Claude Shannon and John Conway are known to have fun and easy to read papers, but I couldn’t pick a good representative here.
Rose’s Sick Individuals and Sick Populations is something I come back to again and again and has influence in epidemiology.
Rose, G. (2001). Sick individuals and sick populations. International journal of epidemiology, 30(3), 427-432.
+Inf
WS Jevons 1875-78 writings on sunspot theory nicely captures uncertainty around correlation/causation and still seems to attract a new advocate every couple of decades.
Reducing the Dimensionality of Data with Neural Networks, Science, July 2006
G. E. Hinton and R. R. Salakhutdinov
I think this was the earliest paper in the current (i.e. second) coming of neural networks, that reignited interest in the field. As is the case for papers in the journal Science, it is very compact, just about 3 pages.
It always amazes me that for years deep learning was discouraged because “one layer could approximate any function”.
https://en.wikipedia.org/wiki/Universal_approximation_theorem
When I was learning about it I would always come across stuff saying that many layers were not needed due to that proof that ignored computational resources required. You can still find these types of posts on stack exchange.
Then it turned out to be the most efficient way to perform many tasks.
As far as I can tell, nobody knows why they work beyond saying the phrase “low dimensional manifold” and throwing a salt shaker over your shoulder, so not sure they were really “wrong.” Just so happened that someone eventually tried it and it worked.
The part thats wrong is assuming a proof that ignores how practical an approach is should guide what we do in practice.
Although the case in point is not strictly an “impossibility proof”, I find JS Bell sums up the fallacy nicely: “what is proved by impossibility proofs is lack of imagination.”
Ah yeah, that’s true. Reminds me of Troubling Trends in Machine Learning Scholarship
Mathiness manifests in several ways: First, some papers abuse mathematics to convey technical depth—to bulldoze rather than to clarify. Spurious theorems are common culprits, inserted into papers to lend authoritativeness to empirical results, even when the theorem’s conclusions do not actually support the main claims of the paper. We (JS) are guilty of this in [70], where a discussion of “staged strong Doeblin chains” has limited relevance to the proposed learning algorithm, but might confer a sense of theoretical depth to readers.
The ubiquity of this issue is evidenced by the paper introducing the Adam optimizer [35]. In the course of introducing an optimizer with strong empirical performance, it also offers a theorem regarding convergence in the convex case, which is perhaps unnecessary in an applied paper focusing on non-convex optimization. The proof was later shown to be incorrect in [63].
There was an article in Quanta (fairly recently) pointing out that what they do is essentially _texture_ recognition. So they’re really good at tasks that can be solved by differentiate textures or pattern recognition (e.g. they’re perfect for move generation in Go). Also, they were shown to be really good at differentiating between portable and fixed medical X-ray equipment. So there’s no more need for salt.
To the best I can tell, the things that Minsky and Papert proved the perceptron model couldn’t do current neural networks still can’t do. I suppose “Rebooting AI” is too long to be listed as a favorite article, but it’s good.
It’s sort of like you can represent any smooth function on a compact set as a polynomial. but sometimes it’s useful to run your function through inv_logit to bound it between 0,1… composition of functions is maybe more efficient even if it’s not strictly necessary
I dont know enough to tell, but do you think that similarity might provide insight as to the success of deep learning?
I mean the easiest way to do something depends on the tools available, maybe with a different type of computer (in the general sense) a different approach would be better.
I definitely think deep learning is just taking advantage of the basic idea that it’s easier to solve a hard problem by breaking it into a series of steps.
Bits Through Queues, 1996, V. Ananthram and S. Verdu
This is one my favorite information theory papers. Very well written and easy to read. It sets up a queue as a noisy channel, through which information can be made to flow by encoding it in the inter-arrival times between entities arriving at the queue. The channel capacity for various types of queues is derived.
Berger and Sellke, Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence
EQUILIBRIUM POINTS IN N-PERSON GAMES, John Nash
Deterministic Nonperiodic Flow; Edward N. Lorenz; Journal of the Atmospheric Sciences (1963) 20 (2): 130–141.
The “butterfly” system of ODEs. Motivated by and derived from atmospheric chaos.
But also, Lorenz’s subsequent paper, “Predictability: Does the Flap of a Butterfly’s Wings in Brazil Set off a Tornado in Texas?” delivered at the 1972 meeting of the AAAS, in which he concludes that despite the chaotic properties of the atmosphere, there are many good reasons to be skeptical that a butterfly flapping its wings in Brazil could possibly influence tornadoes in Texas.
The Economic Organisation of a P.O.W. Camp
R. A. Radford
Economica New Series, Vol. 12, No. 48 (Nov., 1945), pp. 189-201
Also:
ECONOMETRICA: APR 1937, VOLUME 5, ISSUE 2 p.147-159
Mr. Keynes and the “Classics”; A Suggested Interpretation
J. R. Hicks
And +1 on Akerlof Lemon Paper.
Also:
Martin Weitzman: Prices vs. Quantities” (1974)
Can a Biologist Fix a Radio?
https://www.cell.com/cancer-cell/pdf/S1535-6108(02)00133-2.pdf
The radio inspection analogy is pretty funny. I kinda disagree with the idea that there’s some great airplane/radio engineering-specific secret that transfers to learning about biology though.
Isaac Asimov “The relativity of wrong” https://chem.tufts.edu/answersinscience/relativityofwrong.htm
Ah! I just remembered: “The Tau Manifesto”.
It turns out that Pi can be argued to the wrong thing for trig/calculus/analysis. Pi is the circumference over the _diameter_, but for doing a lot of math, things are defined in terms of the radius. So 2*pi appears all over the place in math. In particular Tau (which is 2*pi) _means_ one whole cycle of (the argument to) a trig function, so when it’d be natural to talk about whole cycles or whole turns, pi only gets you half way there. A lot of high school and first/second year college math makes way more sense if you’ve read The Tau Manifesto.
This is interesting because it means that our whole cultural gestalt that pi is somehow transcendentally fundamental to the universe might be quite inscrutable to an alien intelligence that just did their math using the more rational circle constant.
https://hexnet.org/files/documents/tau-manifesto.pdf
Trivers paper on Reciprocal Altruism:
https://api.semanticscholar.org/CorpusID:19027999
Richard Nisbett’s & Timothy Wilson’s “Telling more than we can know”:
http://people.virginia.edu/~tdw/nisbett&wilson.pdf
Not sure how replicable those experiments are from today’s perspective. But they nicely fit the neuropsychological literature (e.g., Michael Gazzaniga’s work) showing that there is an entire brain system dedicated to spinning just-so stories that try to make sense of our own (and others’) behavior without having any privileged introspective insight into the actual roots of our behavior. Decades earlier Freud made a similar point. And although he may have erred on many (or even most) other accounts, I think he was dead-on in this case.
From the days when (medical) science was slow: John Snow On the Mode of Communication of Cholera.
Rosenhan, David (19 January 1973). “On being sane in insane places”. Science. 179 (4070): 250–258.
From wikipedia:
The Rosenhan experiment or Thud experiment was conducted to determine the validity of psychiatric diagnosis. The experimenters feigned hallucinations to enter psychiatric hospitals, and acted normally afterwards. They were diagnosed with psychiatric disorders and were given antipsychotic drugs. The study was conducted by psychologist David Rosenhan, a Stanford University professor, and published by the journal Science in 1973 under the title “On being sane in insane places”. It is considered an important and influential criticism of psychiatric diagnosis.
Rosenhan’s publication was probably based on fraud.
From wikpedia:
In a 2019 popular book on Rosenhan by author Susannah Cahalan, The Great Pretender, the veracity and validity of the Rosenhan experiment was questioned; Cahalan argues that Rosenhan never published further work on the experiment’s data, nor did he deliver on a book on it that he had promised. Moreover, she presents her inability to find the experiment’s subjects, save two—a Stanford graduate student who had experiences similar to Rosenhan’s, and one whose positive psychiatric hospital experience was excluded from the published results.[7][8][better source needed] As noted by Alison Abbott in a review of the book in the journal Nature, Kenneth J. Gergen, a Stanford University colleague stated that “some people in the department called him a bullshitter’, a conclusion with which Cahalan appeared to be in agreement, although, Abbott writes, “[s]he cannot be completely certain that Rosenhan cheated. But she is confident enough to call her engrossing, dismaying book The Great Pretender.”[9]
See also https://www.spectator.co.uk/article/how-a-fraudulent-experiment-set-psychiatry-back-decades
https://www.theguardian.com/books/2020/jan/10/great-pretender-susannah-cahalan
Edward Lorenz, Deterministic nonperiodic flow, Journal of the atmospheric sciences, 1963
Albert Einstein’s “On a Heuristic Point of View about the Creation and Conversion of Light” (1905)
The math in the paper is not that hard to follow even if you are not a physicist and know the basics.
The paper presents simple heuristic argument (not a theory) that light is made of quanta using clever metaphorical thinking.
He figures out that the mathematical equation for blackbody entropy is identical to that of a entropy in a molecular gas when you substitute the number of gas molecules in the molecular entropy for the E/hν in the blackbody entropy. Then like a bolt from the blue, no gradual development, what is called the most revolutionary sentence written by a physicist of the twentieth century: “According to the assumption to be contemplated here, when a light ray is spreading from a point, the energy is not distributed continuously over ever-increasing spaces, but consists of a finite number of ‘energy quanta’ that are localized in points in space, move without dividing, and can be absorbed or generated only as a whole.” (Folsing, Albrecht (1997), Albert Einstein: A Biography, trans. Ewald Osers, Viking)
Then he goes to solve some concrete physical problems using this heuristically derived viewpoint. One of them was explanation of photoelectric effect.
What makes this most amazing paper of all time is that Einstein was the only physicist to think that light had energy quanta over 18 years in the middle of quantum physics revolution. His Nobel price specially mentioned Photoelectric effect, yet the ‘heuristic viewpoint’ was seen as error made by genius until Compton discovered Compton effect 1923.
Plank felt need to defend Einstein when he recommended him for university position due this crazy view. Plank didn’t believe in photons but he was the biggest champion of Einstein for other reasons.
An Investigation of the Therac-25 Accidents, Nancy G. Leveson and Clark S. Turner, 1993.
http://www1.cs.columbia.edu/~junfeng/08fa-e6998/sched/readings/therac25.pdf
This paper, decades old, always gives me the chills. I don’t usually get that from analysis of computer systems.
phew…that’s an interesting one.
how’s this for irony, too? one of the authors is listed at the end of the article as “Boeing Professor of Computer Science and Engineering at UW.”
A few people have mention John Bell on QM. I’d add “ Is the Moon There When Nobody Looks? Reality and the Quantum Theory” by N David Mermin:
https://physicstoday.scitation.org/doi/10.1063/1.880968
It’s an absolutely wonderful exposition of matters related to Bell’s theorem, so good that Feynman wrote the author a fan letter about a version of it.
three suggestions:
1. Cox, David R (1972). “Regression Models and Life-Tables”. Journal of the Royal Statistical Society, Series B. 34 (2): 187–220. JSTOR 2985181. MR 0341758.
2. Samuel Karlin and James McGregor (1959) A CHARACTERIZATION OF BIRTH AND DEATH PROCESSES, PNAS March 1, 1959 45 (3) 375-379; https://doi.org/10.1073/pnas.45.3.375
3. Box, G. E. P. (198O) Sampling and Bayes’ Inference in Scientific Modelling and Robustness, JRSS, Al43, https://www.cs.princeton.edu/courses/archive/fall09/cos597A/papers/Box1980.pdf
Monetary Theory and the Great Capitol Hill Baby Sitting Co-op Crisis
Authors: Joan Sweeney and Richard James Sweeney
Source: Journal of Money, Credit and Banking, Vol. 9, No. 1, Part 1 (Feb., 1977), pp. 86-89
URL: http://www.eecs.harvard.edu/cs286r/courses/fall09/papers/coop.pdf
Short and readable, it explains in simple human terms the need for monetary policy in any society which uses a medium of exchange.
It was also the basis of perhaps Paul Krugman’s most famous explanation, appearing in his book Peddling Prosperity and also in this Slate column: https://slate.com/business/1998/08/baby-sitting-the-economy.html
Krugman reportedly called it “life-changing”. It’s a monument to the power of a good story.
Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81-97. https://doi.org/10.1037/h0043158
Hick, W. E. (1952). On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology, 4(1), 11-26. https://doi.org/10.1080/17470215208416600
Treisman, A. (1977). Focused attention in the perception and retrieval of multidimensional stimuli. Perception & Psychophysics, 22(1), 1-11. https://doi.org/10.3758/BF03206074
Seminal psychological work on selective perception
As far as I can see only work by woman that has been recommended
Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81-97. https://doi.org/10.1037/h0043158
Hick, W. E. (1952). On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology, 4(1), 11-26. https://doi.org/10.1080/17470215208416600
All three works address crucial psychological questions about time and strategies used to make decisions
Hubble’s paper which led to the Big Bang theory of the beginning of the universe: “A relation between distance and radial velocity among extra-galactic nebulae”. PNAS March 15, 1929 15 (3) 168-173; https://doi.org/10.1073/pnas.15.3.168
Hands down meets the critieria for great paper
Wallace:On the Tendency of Varieties to depart indefinitely from the Original Type
He unified all living creatures past, present and future with 2 simple rules. Darwin did the same but it took him a whole book.
Even better Wallace got the idea during a 3 day delirium from malaria.
I had some time to spare so I compiled all the papers and found their DOIs.
In case anyone is interested, here’s a csv: https://pastebin.com/7zwdMd1j