Skip to content

What are the best scientific papers ever written?

When I say “best,” I mean coolest, funnest to read, most thought-provoking, etc. Not necessarily the most path-breaking. For example, did Andrew Wiles write a paper with the proof of Fermat’s last theorem? If so, I can’t imagine this would be readable. So, sure, it’s a great accomplishment but it’s not what I’m talking about.

But the paper has to be important in some way. It can’t just be readable. So, for example, I don’t know that Mark Kac’s classic, Can One Hear the Shape of a Drum?, would count, wonderful as it is.

Let’s make a list. Maybe we can put them all together into a fun book. Let’s start with these two:

1. Benoit Mandelbrot, How Long is the Coast of Britain?

2. Daniel Kahneman and Amos Tversky, Judgment under Uncertainty: Heuristics and Biases.

Can you add more to the list? No jokes, please. The paper also has to be of high quality. For example, Bem’s 2011 paper on ESP was important, in a historical sense, but not scientifically important or interesting.


  1. Sourav says:

    Joe Redmon’s YOLO papers are pretty casual bht well written and fun.

  2. John Mauer says:

    “Science and Statistics”
    George E. P. Box, Journal of the American Statistical Association, Vol. 71, No. 356. (Dec., 1976), pp. 791-799.

  3. Dale Lehman says:

    Here’s one from economics: “The Market for “Lemons”: Quality Uncertainty and the Market Mechanism,” George A. Akerlof
    The Quarterly Journal of Economics, Vol. 84, No. 3 (Aug., 1970), pp. 488-500. It eventually earned him a (quasi)Nobel Prize and influenced considerable thought regarding the role of information in markets – particularly asymmetric information. Akerlof also wrote an interesting narrative of the difficulties he had getting the article published: I also recall him writing another piece about the publication difficulties that I can’t seem to find – in that one, he said that the first version of the paper was actually better than what eventually got published. In order to publish it, he had to make it more mathematical (and less readable). If someone can locate where he said that, I’d appreciate it.

  4. Ron Beavis says:

    Schrödinger, E. Die gegenwärtige Situation in der Quantenmechanik,
    It describes an important debate in the development of quantum mechanics. It also presents the famous Schrödinger’s Cat scenario, which was a joke making fun of the Copenhagen interpretation of quantum mechanics, not a thought experiment.

    • Paul Hayes says:

      There’s an English translation here. The famous cat scenario isn’t a joke making fun of ‘the’ CI but an example – a “ridiculous case” – serving as a warning against naive “psiontology”:

      It is typical of these cases that an indeterminacy originally restricted to the atomic domain becomes transformed into macroscopic indeterminacy, which can then be resolved by direct observation. That prevents us from so naively accepting as valid a “blurred model” for representing reality.

      In his 1935 analysis of “the present situation in QM” (the ‘cat’ paper), he made decisive steps toward an epistemic interpretation of quantum states, even if not under that name. According to Schrödinger, the Ψ function does not represent an existing physical state, but a maximal catalogue of possible measurements. It embodies “the momentarily-attained sum of theoretically based future expectations, somewhat as laid down in a catalog…. It is the determinacy bridge between measurements and measurements” ([1935]1983.p.158).

  5. VLK says:

    How about Bertlmann’s Socks and the Nature of Reality, by Bell? I don’t know if this is the original paper, or a post-publication precis of the concept.

  6. Anonymous says:

    Good science reports should have these properties:

    1) Deduction of an accurate quantitative model from a small set of simple and common sense laws that can explain an apparently complex and confusing phenomenon

    2) Open source, easily reproducible

    3) Immediate practical applications

    This is probably the best bio paper I’ve seen along those lines:

    It’s also funny that you can basically ignore everything in neuroscience from 1920-2010 and get that paper. If Ramon y Cajal had a computer it could have been written during WWI.

  7. Stephen McKay says:

    On the Akerlof Lemons paper

    “By June of 1967 the paper was ready and I sent it to The American Economic Review for publication. I was spending the academic year 1967-68 in India. Fairly shortly into my stay there, I received my first rejection letter from The American Economic Review. The editor explained that the Review did not publish papers on subjects of such triviality.”

    “was again rejected on the grounds that the The Review [of Economic Studies] did not publish papers on topics of such triviality.”

    “The next rejection was more interesting. I sent “Lemons” to the Journal of Political Economy, which sent me two referee reports, carefully argued as to why I was incorrect.”

    “I may have despaired, but I did not give up. I sent the paper off to the Quarterly Journal of Economics, where it was accepted.”

  8. sentinel chicken says:

    > 2. Daniel Kahneman and Amos Tversky, Judgment under Uncertainty: Heuristics and Biases.

    Really, a paper presenting work that is based totally on small samples and p-values? I didn’t know you had it in you.

    • Friedrich says:

      This also comes as a surprise to me.

    • Andrew says:


      The Kahneman and Tversky article is an analytical summary of many previously-conducted experiments. I’m sure that some of these experiments are flawed and would not replicate, but I also think that a lot of what they found was real. The 1974 article is valuable both in putting all these stories in one place and in attempting to put the findings in a larger theoretical context.

    • Jordan says:

      I’m not a fan of T&K in general, but this paper is a classic. Its easy to read, provides a good summary of findings up to that point in time, helped spark the development of behavioral economics, and much of the work is replicable.

      There’s nothing wrong with small samples and p-values. The problems are with how they tend to be interpreted.

  9. Mitchell J. Feigenbaum, “Quantitative universality for a class of nonlinear transformations” J. Stat. Phys. 19, 25–52(1978) doi: 10.1007/BF01020332

    The paper that presented universal scaling properties in all period-doubling routes to chaos. Very readable and with great figures to illustrate the math.

  10. Madeleine Thompson says:

    “A Mathematical Theory of Communication,” by Claude Shannon, is both important—it defined information theory—and very readable.

    “Reflections on Trusting Trust,” by Ken Thompson, is just plain awesome.

  11. Joe says:

    I don’t know if two papers on fractals are too many, but Bak, Tang, and Wiesenfeld’s “Self-organized criticality: An explanation of the 1/f noise” can change your life. The Abelian sandpile model they introduce shows how a simple system can tune itself to exhibit scale-invariant behavior (in their case, avalanches), and is a great toy model for thinking about all sorts of complex phenomena.

    Journal link:
    PDF link:

  12. Witold says:

    Turing’s “On Computable Numbers”

  13. Turing, A. M. Computing Machinery and Intelligence, Mind, Volume LIX, Issue 236, October 1950, Pages 433–460

  14. yyw says:

    Shannon, “A mathematical theory of communication.” Should be readable to most anyone here.

  15. Edward Purcell’s “Life at low Reynolds number” (1977; doi: 10.1119/1.10903; PDF here) is fun to read, thought-provoking, and hugely influential. The topic is how simple physics reveals deep constraints on what microorganisms can do.

    It perhaps shouldn’t count, since it’s essentially a transcript of a talk, which helps give it a casual tone, and since it’s a pedagogical piece describing things that Purcell and others figured out. Still, it’s a wonderful paper.

  16. Also, Thomas J. Schelling, “Dynamic Models of Segregation” J. Math. Sociol. 1, 143-186 (1971). doi: 10.1080/0022250X.1971.9989794 and Robert Axelrod and William D. Hamilton, “The Evolution of Cooperation” Science 211, 1390-1396 (1981). doi: 10.1126/science.7466396

    Two landmark papers (both very readable and enjoyable) on surprising emergent phenomena that arise from the interactions of individual decisions with collective behavior.

    • Jonathan (another one) says:


    • Andrew says:


      I have mixed feelings about both these celebrated papers. The ideas in the papers are both pretty and relevant. These are simple theoretical explanations for important social phenomena. My problem with the papers is that people can take them too seriously, that people can see a possible explanation as the explanation.

      I wrote about Axelrod’s work in a paper, “Methodology as ideology: Some comments on Robert Axelrod’s ‘The Evolution of Cooperation,'” which is based on my undergraduate thesis from 1986; see here for background.

      • Andrew:

        Your criticism of the way people can easily abuse the results of both of those papers is very important.

        I teach both papers in my course on Agent-Based Modeling and I am always uncomfortable teaching the Schelling paper in particular because it’s easy to completely misuse it to argue that housing segregation was caused by individual choice or that attempts to desegregate neighborhoods will inevitably fail.

        But I use that discomfort as a way to teach my students to be cautious about generalizing from their own simple models. I bring in “The Color of Law” to talk about how there was an extensive body of public policy at federal, state, and local levels, to deliberately segregate neighborhoods, and that Schelling knew about this when he wrote his paper. And I ask students to discuss what it means to make a model of segregation three years after the Fair Housing Act that looks only at individual-level voluntary choices, and that treats the groups of people symmetrically.

        I think the problem you identify, “that people can see a possible explanation as the explanation” is an important problem that goes way beyond these papers, and when I teach these papers they provide a great way for me to warn students against that tendency because their simplicity makes it easy to highlight the dangers of rushing to generalize from might be to must be without compelling evidence.

        I also really liked your paper about Axelrod. I hadn’t seen it before and I am very happy that you called my attention to it now. I especially liked the critical reflection on the assumptions Axelrod makes about soldiers’ motivations and the gulf that often grows between sweeping theories about motives, decisions, and behavior, and what we see in detailed empirical observations (e.g., that firing makes you a target, so game theory may be overthinking a much simpler phenomenon).

        But I still feel that these are great papers, despite the problems we see with them, and I still enjoy returning to them and re-reading them every year when I teach them.

        • Andrew says:


          I agree. Teach the complexity. Back when I used to teach decision analysis, I’d first teach classical Neumann decision theory and then discuss the flaws in the theory.

        • Martha (Smith) says:

          Andrew said,
          “My problem with the papers is that people can take them too seriously, that people can see a possible explanation as the explanation.”

          and Jonathan G replied,
          “I think the problem you identify, “that people can see a possible explanation as the explanation” is an important problem that goes way beyond these papers, and when I teach these papers they provide a great way for me to warn students against that tendency because their simplicity makes it easy to highlight the dangers of rushing to generalize from might be to must be without compelling evidence.”

          I agree wholeheartedly. I think of the problem in mathematical terms: “A conjecture is not a proof.” I really wish that teaching this maxim were part of standard mathematical education. One quote that sticks in mind with me (I forget the name of person who said it) goes something like, “I have eyes to see where I am going, and feet to get me there. That is the difference between conjecture and proof.”

  17. Dave says:

    Yes, “A Mathematical Theory of Communication” was the first that came to mind.

  18. Anonymous says:

    Highly readable, interesting, and different (field work in law and econ): Ellickson’s “Of Coase and Cattle”.

  19. Kenneth Tay says:

    I like Leo Breiman’s “Statistical modeling: the two cultures”

  20. Matt says:

    Goodman, Goodman, Goodman, & Goodman. 2014. “A Few Goodman: Surname Sharing Economist Coauthors”

    • Eliot J says:

      Ha ha! Don’t forget Leo Goodman…

    • Martha (Smith) says:

      I wonder how many Smiths have coauthored (or could coauthor) a paper.

      (Side comment: In math, the custom is to list authors by alphabetical order of last name. A mathematician whose last name is Small once said he’d like to write a paper with me sometime, so that he could be first author for once.)

  21. Joshua Gans says:

    Paul Romer on “Endogenous Technical Change,” Journal of Political Economy, 1990. Just beautifully written and laid out.

  22. Jordan says:

    “Fuck Nuance” (Kieran Healy) was a fun read –

    Since you posted a link to T&K, here’s a good one from their arch nemesis titled “Why Heuristics Work” –

    This one is about how the parasite-host relationship might have contributed to the evolution of the brain. Surprisingly readable for someone who knows almost nothing about immunology –

    “Top 10 replicated findings from behavioral genetics” –

  23. Jonathan baron says:

    Bryan, W. L., & Harter, N. (1899). Studies on the telegraphic language. Psychological Review, 6, 345-375.

    Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87.

  24. David J. Littleboy says:

    I’m fond of Drew McDermott’s “Artificial Intelligence Meets Natural Stupidity”. Despite the cutesy title, it has some very important ideas in it that AI has to get right to work, and which keep getting forgotten (and sometimes rediscovered, albeit rarely). It does have the problem that it’s examples are taken from MIT AI Lab work from the 1970s, and is a bit opaque to folks not familiar with that work. Sigh. Among other things, it points out the problem of the “wishful mnemonic”: If your representation uses English words, your program’s reasoning probably isn’t anywhere near what you think it is. E.g. (Move John Ball1) is a very different thing from (Move John Mary) (Roger Schank insisted that different primitives be used: PTRANS for physical things, AFFECTS (or whatever it was) for emotional things). That English words don’t/can’t work as a substrate for reasoning is why ideas like grovelling over the internets, reading pages, and extracting information/building “knowledge bases” simply can’t work. It appears AI is currently in a period of forgetting how really hard human reasoning is*. Sigh.

    *: In it’s defense, AI isn’t trying to do human reasoning, only to make magic black boxes that do kewl things.

  25. Matt Skaggs says:

    Evolution in Mendelian Populations, Sewall Wright:

    A landmark in human thought. Darwin figured out evolution but never got close to speciation. Wright’s paper explains how it works in a way that fits the patterns we see. His “shifting balance theory” is one of those discoveries that seems to be cut from whole cloth rather than built on the ideas of others.

    And pertinent to this blog, a full understanding of Wright’s theory pretty much yanks the rug out from under evolutionary psychology. Evo psych is based upon the idea that if you can name and describe a genetic trait it must be adaptive, but Wright showed that actual speciation is far too messy for that.

    • This provides a nice example of theory versus data:

      Evolution. 2007 Nov; 61(11):2528-43. doi: 10.1111/j.1558-5646.2007.00219.x. Epub 2007 Sep 25.

      Spatial Differentiation for Flower Color in the Desert Annual Linanthus Parryae: Was Wright Right?
      Douglas W Schemske, Paulette Bierzychudek

      PMID: 17894812 DOI: 10.1111/j.1558-5646.2007.00219.x

      Understanding the evolutionary mechanisms that contribute to the local genetic differentiation of populations is a major goal of evolutionary biology, and debate continues regarding the relative importance of natural selection and random genetic drift to population differentiation. The desert plant Linanthus parryae has played a prominent role in these debates, with nearly six decades of empirical and theoretical work into the causes of spatial differentiation for flower color. Plants produce either blue or white flowers, and local populations often differ greatly in the frequencies of the two color morphs. Sewall Wright first applied his model of “isolation by distance” to investigate spatial patterns of flower color in Linanthus. He concluded that the distribution of flower color morphs was due to random genetic drift, and that Linanthus provided an example of his shifting balance theory of evolution. Our results from comprehensive field studies do not support this view. We studied an area in which flower color changed abruptly from all-blue to all-white across a shallow ravine. Allozyme markers sampled across these regions showed no evidence of spatial differentiation, reciprocal transplant experiments revealed natural selection favoring the resident morph, and soils and the dominant members of the plant community differed between regions. These results support the hypothesis that local differences in flower color are due to natural selection, not due to genetic drift.

      • Matt Skaggs says:

        “This provides a nice example of theory versus data:”

        Nah. Folks have been taking potshots at Wright’s theory for over 80 years, resulting in some good challenges that have yet to be resolved. This one doesn’t rise to anywhere near that level. I’m afraid I got a chuckle out of this part:

        “Our results from comprehensive field studies do not support this view. We studied an area in which flower color changed abruptly from all-blue to all-white across a shallow ravine.”

  26. Anonymous says:

    Derivation of theory by means of factor analysis or Tom Swift and his electric factor analysis machine
    J. Scott Armstrong (1967)


    This was the first cautionary tale style paper I read while in grad school. It’s lesson is still relevant.

  27. Mark Muldoon says:

    D. Gale and L. S. Shapley (1962), College admissions and the stability of marriage, The American Mathematical Monthly, 69:9–15.

    This short paper is wonderfully clear – almost chatty – and yet eventually won Shapley a Nobel prize in Economics. Although Gale and Shapley didn’t know it at the time, the algorithm they invented had been discovered inedependently and was used – indeed, still is used – by the National Resident Matching Program to match junior doctors with training posts in American hospitals. This real-world applicability caused ravening packs of economists, computer scientist and mathematicians to fall on the problem and gnaw it to shreds, yielding many publications, but the original is still the most fun to read.

  28. paul alper says:

    My favorite is

    “Effects of remote, retroactive intercessory prayer on outcomes in patients with bloodstream infection: randomised controlled trial” by Leonard Leibovici. It appeared in the BMJ, 22 December 2001. The conclusion is that “Remote intercessory prayer said for a group of patients is associated with a shorter hospital stay and shorter duration of fever in patients with a bloodstream infection, even when the intervention is performed 4–10 years after the infection.” In more detail, “Mortality was 28.1% (475/1691) in the intervention group and 30.2% (514/1702) in the control group… The length of stay in hospital and duration of fever were significantly shorter in the intervention group (P=0.01 and P=0.04, respectively).”
    The conclusion is “Remote, retroactive intercessory prayer can improve outcomes in patients with a bloodstream infection. This intervention is cost effective, probably has no adverse effects, and should be considered for clinical practice.” The savings in our skyrocketing health care costs cannot be understated.

  29. Anon says:

    Efron, B., 1986. Why isn’t everyone a Bayesian?. The American Statistician, 40(1), pp.1-5.

    Dennis Lindley’s response to Efron in particular has helped me understand the Bayesian argument. I’m still trying to figure it out, but this was very useful. The responses from others were good too, but I find Lindley’s writing especially clear and cogent to me, a non-statistician!

    • Andrew says:


      The Efron paper is fine for what it is, but I wouldn’t call it one of the best scientific papers ever written. It’s a thoughtful expression of a position—I’ve written such papers myself—but I would not say that it makes a scientific contribution on its own.

  30. gdanning says:

    Ward, et al, The Perils of Policy by P-value: Predicting Civil Conflicts. Clearly written and on important methodological and substantive topics.

  31. Einstein, A.; Podolsky, B.; Rosen, N. (1935-05-15). “Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?” (PDF). Physical Review. 47 (10): 777–780.

    Strong contender for most philosophical paper ever. Einstein and co put their finger precisely on the weirdest features of QM. Bell turned philosophy into experiment in another landmark paper.

    Bell, J. S. (1964). “On the Einstein Podolsky Rosen Paradox” (PDF). Physics Physique Физика. 1 (3): 195–200.

  32. Nick Adams says:

    Alfred Knudson (1971) “Mutation and Cancer: a statistical study of retinoblastoma”
    Logical analysis of a simple dataset (the relative incidence of unilateral and bilateral retinal tumours in their familial and sporadic forms) produced a remarkably prescient theory of carcinogenesis – Knudson’s Two Hit hypothesis.

  33. David P says:

    Paul Samuelson’s note written in words of one syllable: “Why we should not make mean log of wealth big though years to act are long.” He cheated a little in footnotes. Available at e.g.,

  34. Winston says:

    I don’t know if it quite counts as science, but Tony Hoare’s 1978 paper “Communicating Sequential Processes” is a masterpiece. I still read it every so often. The treeatment of concurrency in the Go programming language was heavily influenced by it – Rob Pike is also a fan.

  35. Peter Chapman says:

    Baby bear’s dilemma; a statistical tale. Agronomy Journal, 1982. On the use of multiple comparisons for comparing varieties of oats for making porridge.

  36. Keith O'Rourke says:

    Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician – Donald B. Rubin

  37. Mark Samuel Tuttle says:

    “What the Frog’s Eye Tells the Frog’s Brain”, Jerome Lettvin …

    At one time, I think this was the most cited scientific paper, ever.

    Never had the pleasure of meeting him. Lettvin was a physician during the Battle of the Bulge.

    “Body Maps on the Human Genome”, Christopher Cherniak

    — Mark

  38. EB says:

    Cohen, J. (1994). The Earth is Round (p < .05)
    This line alone warrants consideration…
    “And we, as teachers, consultants, authors, and otherwise perpetrators of quantitative methods, are responsible for the ritualization of null hypothesis significance testing (NHST; I resisted the temptation to call it statistical hypothesis inference testing) to the point of meaninglessness and beyond.”

    And pretty much anything written by Paul Meehl.

    • Andrew says:


      Hmmm, I guess I should clarify. See comment here. I’m more interested in papers with some research contributions, not just position papers, no matter how influential and charming they may be.

    • Bill Anderson says:

      Thanks to all for lots of interesting suggestions. I’ll still go back to Kac. The title is fantastic, as is the paper. One should also add the paper of Gordon, Webb, and Wolpert, which answered Kac’s question with a resounding No. Wikipedia has the relevant citations.

  39. Ed Hagen says:

    I debated submitting this one because Andrew requested no jokes, but the impact of Sokal’s paper far transcended a mere joke:

    • Sokal’s paper had more than a little of seeing the mote in someone else’s eye while ignoring the beam in one’s own.

      Several years before Sokal’s paper was published, The Canadian Journal of Physics published a paper that purported to use chaos theory to “prove” that feminism was dangerous to society, using “evidence” drawn from the author’s informal interviews with undergraduate physics majors, combined with some incoherent ranting about chaos theory, working parents, and the nuclear family. G.R. Freeman, “Kinetics of nonhomogeneous processes in human society: Unethical behaviour and societal chaos,” Can. J. Phys. 68, 794-798 (1990).

      Given this, Sokal would have done better to be a good deal humbler rather than crowing about how bad peer review was in other disciplines.

      • Andrew says:


        I don’t get why Sokal, just cos he’s a physicist, should feel humble because some other physicist was a horse’s ass. Sokal is not responsible for every stupid and obnoxious thing that physicists do.

        • Andrew: Your point would be more persuasive if Sokal had used his prank only to criticize the editors of the journal Social Text, but he generalized from that one journal to broadly criticize the entire field of science studies and the humanities in general.

  40. Dan says:

    This paper is fun to read and certainly influential:

    The Magical Number Seven, Plus or Minus Two Some Limits on Our Capacity for Processing Information

  41. Carlos Ungil says:

    Jaynes’ Information Theory and Statistical Mechanics

    Markowitz’s Portfolio Selection

  42. Michael Schwartz says:

    So many great ones already listed, and so many that are new to me that I look forward to reading.

    As an applied clinical scientist with no formal statistical training, the Ionannides paper “Why most published research findings are false” had a big impact on my thinking. I don’t know if it’s really the kind of thing you were looking for here…

    Of course, now he’s even more famous for being one of the most prominent and vocal covid-19 skeptics. Hmmmm…

  43. Jason says:

    Bill Hamilton (1971) Geometry for the Selfish Herd, Journal of Theoretical Biology

  44. Mike says:

    Lawrence Slobodkin (1986) “The Role of Minimalism in Art and Science”, The American Naturalist

  45. Gur Huebrman says:

    WE wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A,). This structure has novel features which are of considerable biological interest.

    (Almost) ending
    It has not escaped our notice that the specific pairing we have postulated irrimediately suggests a possible copying mechanism for the genetic material.

    Watson & Crick 1953. Yes, that Watson.

    Incidentally, how did they choose the order of the authors?

  46. Zad says:

    John Nelder’s “From Statistics to Statistical Science” published in JRSS Series D

  47. Anonymous says:

    Waldfogel, Joel. “The Deadweight Loss of Christmas.” The American Economic Review 83, no. 5 (1993): 1328-336. Accessed June 9, 2020.

  48. Gabe Durazo says:

    Krugman’s Theory of Interstellar Trade (1978) is a fun one!

    Claude Shannon and John Conway are known to have fun and easy to read papers, but I couldn’t pick a good representative here.

  49. Andrew (another one) says:

    Rose’s Sick Individuals and Sick Populations is something I come back to again and again and has influence in epidemiology.

    Rose, G. (2001). Sick individuals and sick populations. International journal of epidemiology, 30(3), 427-432.

  50. Megen de la Mer says:

    WS Jevons 1875-78 writings on sunspot theory nicely captures uncertainty around correlation/causation and still seems to attract a new advocate every couple of decades.

  51. Suhas Mathur says:

    Reducing the Dimensionality of Data with Neural Networks, Science, July 2006
    G. E. Hinton and R. R. Salakhutdinov

    I think this was the earliest paper in the current (i.e. second) coming of neural networks, that reignited interest in the field. As is the case for papers in the journal Science, it is very compact, just about 3 pages.

    • Anoneuoid says:

      It always amazes me that for years deep learning was discouraged because “one layer could approximate any function”.

      When I was learning about it I would always come across stuff saying that many layers were not needed due to that proof that ignored computational resources required. You can still find these types of posts on stack exchange.

      Then it turned out to be the most efficient way to perform many tasks.

      • somebody says:

        As far as I can tell, nobody knows why they work beyond saying the phrase “low dimensional manifold” and throwing a salt shaker over your shoulder, so not sure they were really “wrong.” Just so happened that someone eventually tried it and it worked.

        • Anoneuoid says:

          The part thats wrong is assuming a proof that ignores how practical an approach is should guide what we do in practice.

          • Ori says:

            Although the case in point is not strictly an “impossibility proof”, I find JS Bell sums up the fallacy nicely: “what is proved by impossibility proofs is lack of imagination.”

          • somebody says:

            Ah yeah, that’s true. Reminds me of Troubling Trends in Machine Learning Scholarship

            Mathiness manifests in several ways: First, some papers abuse mathematics to convey technical depth—to bulldoze rather than to clarify. Spurious theorems are common culprits, inserted into papers to lend authoritativeness to empirical results, even when the theorem’s conclusions do not actually support the main claims of the paper. We (JS) are guilty of this in [70], where a discussion of “staged strong Doeblin chains” has limited relevance to the proposed learning algorithm, but might confer a sense of theoretical depth to readers.
            The ubiquity of this issue is evidenced by the paper introducing the Adam optimizer [35]. In the course of introducing an optimizer with strong empirical performance, it also offers a theorem regarding convergence in the convex case, which is perhaps unnecessary in an applied paper focusing on non-convex optimization. The proof was later shown to be incorrect in [63].

        • David J. Littleboy says:

          There was an article in Quanta (fairly recently) pointing out that what they do is essentially _texture_ recognition. So they’re really good at tasks that can be solved by differentiate textures or pattern recognition (e.g. they’re perfect for move generation in Go). Also, they were shown to be really good at differentiating between portable and fixed medical X-ray equipment. So there’s no more need for salt.

          To the best I can tell, the things that Minsky and Papert proved the perceptron model couldn’t do current neural networks still can’t do. I suppose “Rebooting AI” is too long to be listed as a favorite article, but it’s good.

      • It’s sort of like you can represent any smooth function on a compact set as a polynomial. but sometimes it’s useful to run your function through inv_logit to bound it between 0,1… composition of functions is maybe more efficient even if it’s not strictly necessary

        • Anoneuoid says:

          I dont know enough to tell, but do you think that similarity might provide insight as to the success of deep learning?

          I mean the easiest way to do something depends on the tools available, maybe with a different type of computer (in the general sense) a different approach would be better.

  52. Suhas Mathur says:

    Bits Through Queues, 1996, V. Ananthram and S. Verdu

    This is one my favorite information theory papers. Very well written and easy to read. It sets up a queue as a noisy channel, through which information can be made to flow by encoding it in the inter-arrival times between entities arriving at the queue. The channel capacity for various types of queues is derived.

  53. Jason Grossman says:

    Berger and Sellke, Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence

  54. d says:


  55. Christos says:

    Deterministic Nonperiodic Flow; Edward N. Lorenz; Journal of the Atmospheric Sciences (1963) 20 (2): 130–141.
    The “butterfly” system of ODEs. Motivated by and derived from atmospheric chaos.

  56. The Vole says:

    The Economic Organisation of a P.O.W. Camp
    R. A. Radford
    Economica New Series, Vol. 12, No. 48 (Nov., 1945), pp. 189-201

    • The Vole says:

      ECONOMETRICA: APR 1937, VOLUME 5, ISSUE 2 p.147-159

      Mr. Keynes and the “Classics”; A Suggested Interpretation
      J. R. Hicks

      And +1 on Akerlof Lemon Paper.

      Martin Weitzman: Prices vs. Quantities” (1974)

    • Ben says:

      The radio inspection analogy is pretty funny. I kinda disagree with the idea that there’s some great airplane/radio engineering-specific secret that transfers to learning about biology though.

  57. David J. Littleboy says:

    Ah! I just remembered: “The Tau Manifesto”.

    It turns out that Pi can be argued to the wrong thing for trig/calculus/analysis. Pi is the circumference over the _diameter_, but for doing a lot of math, things are defined in terms of the radius. So 2*pi appears all over the place in math. In particular Tau (which is 2*pi) _means_ one whole cycle of (the argument to) a trig function, so when it’d be natural to talk about whole cycles or whole turns, pi only gets you half way there. A lot of high school and first/second year college math makes way more sense if you’ve read The Tau Manifesto.

    This is interesting because it means that our whole cultural gestalt that pi is somehow transcendentally fundamental to the universe might be quite inscrutable to an alien intelligence that just did their math using the more rational circle constant.

  58. Richard Nisbett’s & Timothy Wilson’s “Telling more than we can know”:

    Not sure how replicable those experiments are from today’s perspective. But they nicely fit the neuropsychological literature (e.g., Michael Gazzaniga’s work) showing that there is an entire brain system dedicated to spinning just-so stories that try to make sense of our own (and others’) behavior without having any privileged introspective insight into the actual roots of our behavior. Decades earlier Freud made a similar point. And although he may have erred on many (or even most) other accounts, I think he was dead-on in this case.

  59. Jesper Schneider says:

    From the days when (medical) science was slow: John Snow On the Mode of Communication of Cholera.

  60. Alex Foss says:

    Rosenhan, David (19 January 1973). “On being sane in insane places”. Science. 179 (4070): 250–258.

    From wikipedia:

    The Rosenhan experiment or Thud experiment was conducted to determine the validity of psychiatric diagnosis. The experimenters feigned hallucinations to enter psychiatric hospitals, and acted normally afterwards. They were diagnosed with psychiatric disorders and were given antipsychotic drugs. The study was conducted by psychologist David Rosenhan, a Stanford University professor, and published by the journal Science in 1973 under the title “On being sane in insane places”. It is considered an important and influential criticism of psychiatric diagnosis.

    • Ney says:

      Rosenhan’s publication was probably based on fraud.

      From wikpedia:

      In a 2019 popular book on Rosenhan by author Susannah Cahalan, The Great Pretender, the veracity and validity of the Rosenhan experiment was questioned; Cahalan argues that Rosenhan never published further work on the experiment’s data, nor did he deliver on a book on it that he had promised. Moreover, she presents her inability to find the experiment’s subjects, save two—a Stanford graduate student who had experiences similar to Rosenhan’s, and one whose positive psychiatric hospital experience was excluded from the published results.[7][8][better source needed] As noted by Alison Abbott in a review of the book in the journal Nature, Kenneth J. Gergen, a Stanford University colleague stated that “some people in the department called him a bullshitter’, a conclusion with which Cahalan appeared to be in agreement, although, Abbott writes, “[s]he cannot be completely certain that Rosenhan cheated. But she is confident enough to call her engrossing, dismaying book The Great Pretender.”[9]
      See also

  61. Asher says:

    Edward Lorenz, Deterministic nonperiodic flow‏, Journal of the atmospheric sciences, 1963‏

  62. N/A says:

    Albert Einstein’s “On a Heuristic Point of View about the Creation and Conversion of Light” (1905)

    The math in the paper is not that hard to follow even if you are not a physicist and know the basics.

    The paper presents simple heuristic argument (not a theory) that light is made of quanta using clever metaphorical thinking.

    He figures out that the mathematical equation for blackbody entropy is identical to that of a entropy in a molecular gas when you substitute the number of gas molecules in the molecular entropy for the E/hν in the blackbody entropy. Then like a bolt from the blue, no gradual development, what is called the most revolutionary sentence written by a physicist of the twentieth century: “According to the assumption to be contemplated here, when a light ray is spreading from a point, the energy is not distributed continuously over ever-increasing spaces, but consists of a finite number of ‘energy quanta’ that are localized in points in space, move without dividing, and can be absorbed or generated only as a whole.” (Folsing, Albrecht (1997), Albert Einstein: A Biography, trans. Ewald Osers, Viking)

    Then he goes to solve some concrete physical problems using this heuristically derived viewpoint. One of them was explanation of photoelectric effect.

    What makes this most amazing paper of all time is that Einstein was the only physicist to think that light had energy quanta over 18 years in the middle of quantum physics revolution. His Nobel price specially mentioned Photoelectric effect, yet the ‘heuristic viewpoint’ was seen as error made by genius until Compton discovered Compton effect 1923.

    Plank felt need to defend Einstein when he recommended him for university position due this crazy view. Plank didn’t believe in photons but he was the biggest champion of Einstein for other reasons.

  63. Therac says:

    An Investigation of the Therac-25 Accidents, Nancy G. Leveson and Clark S. Turner, 1993.

    This paper, decades old, always gives me the chills. I don’t usually get that from analysis of computer systems.

    • dl says:

      phew…that’s an interesting one.
      how’s this for irony, too? one of the authors is listed at the end of the article as “Boeing Professor of Computer Science and Engineering at UW.”

  64. Winston says:

    A few people have mention John Bell on QM. I’d add “ Is the Moon There When Nobody Looks? Reality and the Quantum Theory” by N David Mermin:

    It’s an absolutely wonderful exposition of matters related to Bell’s theorem, so good that Feynman wrote the author a fan letter about a version of it.

  65. Ron Kenett says:

    three suggestions:

    1. Cox, David R (1972). “Regression Models and Life-Tables”. Journal of the Royal Statistical Society, Series B. 34 (2): 187–220. JSTOR 2985181. MR 0341758.

    2. Samuel Karlin and James McGregor (1959) A CHARACTERIZATION OF BIRTH AND DEATH PROCESSES, PNAS March 1, 1959 45 (3) 375-379;

    3. Box, G. E. P. (198O) Sampling and Bayes’ Inference in Scientific Modelling and Robustness, JRSS, Al43,

  66. David Walker says:

    Monetary Theory and the Great Capitol Hill Baby Sitting Co-op Crisis
    Authors: Joan Sweeney and Richard James Sweeney
    Source: Journal of Money, Credit and Banking, Vol. 9, No. 1, Part 1 (Feb., 1977), pp. 86-89

    Short and readable, it explains in simple human terms the need for monetary policy in any society which uses a medium of exchange.

    It was also the basis of perhaps Paul Krugman’s most famous explanation, appearing in his book Peddling Prosperity and also in this Slate column:

    Krugman reportedly called it “life-changing”. It’s a monument to the power of a good story.

  67. Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.

    Hick, W. E. (1952). On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology, 4(1), 11-26.

  68. Treisman, A. (1977). Focused attention in the perception and retrieval of multidimensional stimuli. Perception & Psychophysics, 22(1), 1-11.
    Seminal psychological work on selective perception
    As far as I can see only work by woman that has been recommended

    Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.

    Hick, W. E. (1952). On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology, 4(1), 11-26.

    All three works address crucial psychological questions about time and strategies used to make decisions

  69. E. Craig Dukes says:

    Hubble’s paper which led to the Big Bang theory of the beginning of the universe: “A relation between distance and radial velocity among extra-galactic nebulae”. PNAS March 15, 1929 15 (3) 168-173;

  70. carlos arnade says:

    Hands down meets the critieria for great paper

    Wallace:On the Tendency of Varieties to depart indefinitely from the Original Type

    He unified all living creatures past, present and future with 2 simple rules. Darwin did the same but it took him a whole book.

    Even better Wallace got the idea during a 3 day delirium from malaria.

  71. Jay says:

    I had some time to spare so I compiled all the papers and found their DOIs.

    In case anyone is interested, here’s a csv:

Leave a Reply