Skip to content

Tukey’s philosophy

The great statistician John Tukey, in his writings from the 1970s onward (and maybe earlier) was time and again making the implicit argument that you should evaluate a statistical method based on what it does; you should {\em not} be staring at the model that purportedly underlies the method, trying to determine if the model is “true” (or “true enough”). Tukey’s point was that models can be great to inspire methods, but the model is the scaffolding; it is the method that is the building you have to live in.

I don’t fully agree with this philosophy–I think models are a good way to understand data and also often connect usefully to scientific models (although not as cleanly as is thought by our friends who work in economics or statistical hypothesis testing).

To put it another way: What makes a building good? A building is good if it is useful. If a building is useful, people will use it. Eventually improvements will be needed, partly because the building will get worn down, partly because the interactions between the many users will inspire new, unforeseen uses, partly for the simple reason that if a building is popular, more space will be desired. At that point, work needs to be done. And, at that point, wouldn’t it be great if some scaffolding were already around?

That scaffolding that we’d like to have . . . if we now switch the analogy back from buildings to statistical methods, that scaffolding is the model that was used in constructing the method in the first place.

No statistical method is perfect. In fact, it is the most useful, wonderful statistical methods that get the most use and need improvements most frequently. So I like the model and I don’t see the virtue in hiding it and letting the method stand alone. The model is the basis for future improvements in many directions. And this is one reason why I think that one of the most exciting areas in statistical research is the systematization of model building. The network of models and all that.

But, even though I don’t agree with the implicit philosophy of late Tukey (I don’t agree with the philosophy of early Tukey either, with all that multiple comparisons stuff), I think (of course) that he made hugely important contributions. So I’d like to have this philosophy out there for statisticians and users to evaluate on their own.

I have not ever seen Tukey’s ideas expressed in this way before (and they’re just my own imputation; I only met Tukey once, many years ago, and we spoke for about 30 seconds), so I’m posting them here, on the first day of this new decade.


  1. Basil says:

    Try reading or watching the documentary, "The Case for Christ". It's by Lee Strobel, a lawyer and journalist for Chicago Tribune who started a book to disprove Jesus. He was an atheist when he began the book and became a believer during his research. The story of Jesus is incredibly accurate.

  2. Dominik Lukeš says:

    The problem with your building allegory (as with so many allegories) is that there are always more stories to be told.

    Another thing about buildings that have proved useful or popular is that they often outlive their utility without us noticing. Far too much is invested in them (materially and symbollically) for us to be able to tear them down and build new ones when the time has come. Also, very often we take shortcuts when building them intending to go in later and fix them but the shortcuts keep on working so we leave them in, sometimes forgetting that they were shortcuts in the first place and take them to stand for something real.

    This happens in the humanities and sciences all the time. The causality of genes, rational choice theory, conversational maxims, formal semantics, etc.

  3. I did my own armchair research on the historical Jesus inspired by Bill Maher's Religulous documentary.

    Prior to the documentary, I assumed a historical Jesus existed that was a messianic leader that got crucified because why make up something as historically banal as a fringe messianic leader that gets killed by authorities. After the documentary I thought it was counterintuitively plausible that no Jesus ever existed, and that he's more a version of the telephone game played with near eastern religious ideas over thousands of years.

    I played around on wikipedia and internet sources to try to get more info on the key facts from Maher indicating no historical jesus existed, and was even more surprised by the results -from what I could tell there's no clear historical evidence that Jesus either existed or didn't exist, and even from armchair bayesian type extrapolations it's a coin toss (apparently Maher exaggerated the extent to which there's secular scholar consensus that key Christian narrative elements match more ancient religious narratives). What an annoying state of inconclusiveness.

  4. Andrew Gelman says:


    The analogy of the building is just for fun. I have written more seriously about statistical modeling in my research papers, for example here and here, as well as in Bayesian Data Analysis.

  5. Basil says:

    I don't believe the fact that Jesus existed can be doubted at all. Historical evidence has been gathered and early exponential church growth leads conclusively to this messiac figure truly existing. Your doubt of the miracles recorded would be more interesting to write about than your socioidiocracy. For you to say that Jesus didn't exist is practically the same as saying the holocaust didn't happen because pictures weren't published until years later. Manuscripts from the first four gospels have been shown as early as atleast 70 AD.

  6. Kevin says:

    I would distinguish the ideas of a robust statistical METHOD and an underlying statistical MODEL. In my mind the former is applicable generally, and the latter is specific to a type of data (naming latent variables for example).

    I don't think Tukey was arguing that fully model agnostic data analysis was the ideal. He explicitly states in the intro to EDA that the goal of analysis is to make an incremental inference about the data. His points about methods were related to ROBUSTNESS, that a good method ought to be able to give you the answer you weren't looking for when it's there.

    Put another way, my understanding is that Tukey would have promoted robust methods to advance statistical models.

    Taking liberties with the analogy, a robust method may be like the construction crane that needs to be taller than the building in order to build it up.

    From Tukey's intro (memory dependent paraphrase): "It is important to understand what you CAN DO before you measure how well you seem to have done it." I take understanding what you can do to mean knowing how high you can build your model.

  7. Bill Jefferys says:

    There are NO manuscripts of any part of the NT that are as early as 70 AD. The earliest known manuscript FRAGMENT is the Rylands Library Papyrus P52, which has been dated on paleographic evidence anywhere from CE 100 to the second half of the second century. This FRAGMENT contains about 114 letters. That's all.

    I don't know what sources you are reading, but if they make such claims, they are unreliable and you should ignore them.

    I have generally thought that the probability that an historical Jesus existed is somewhat less than the probability that an historical Socrates existed and somewhat greater than the probability that an historical King Arthur existed. Probably closer to the latter than the former. There are a number of scholars who think, on the evidence, that there was no historical Jesus. For anyone to compare the evidence for the Holocaust to the evidence for an historical Jesus is, to my mind, perverse and ill-informed.

    And I don't care who knows who wrote this.

  8. Cool bio on your username link, Prof. Jefferys. You've had a well-lived life and retirement.

  9. Anon says:

    How did Jesus get involved in this? I don't follow.

  10. Andrew Gelman says:

    There's also this.

  11. Basil says:

    Dr. Gelman mentioned his lack of knowledge of Jesus in one of the above responses, so I refered him to a documentary by a former atheist.

    I believe the year I quoted was of the transcribers written year, not the paleo. age. Here's a website that speaks of this different age mesaurements for three different incomplete New Testaments.
    I think even more appropriate is the chart that comes before the most recent written record. It has numerous ancient texts publication years and earliest copy available. The complete copies of the greek new testament are 130 AD and total 5600 copies at 99.5% accuracy in copying. The chart makes comparisons of other books relatively similar in age.
    You can hold to your disbelief that Jesus existed because we can't find a copy of a book written by completely different authors that followed Jesus and hid from persection in less than 100 years of transciption. I've read in numerous studies that all of the 11 disciples of Jesus excluding John died martyr's deaths for the cause of Jesus. There aren't many people who would die for something that wasn't true, especially that have seen firt hand. These books by the discples were later put together to form what we now call the new testament.
    Concerning Socrates: Plato was supposedly a student of the famous Socrates. The earliest recorded document of Plato is from 1100 AD. Plato lived around 400BC. You still find it more probable that Socrates existed after his publications are 1500 years after his death? Sounds like wishful thinking from a very smart person to me.

  12. Bill Jefferys says:

    "I believe the year I quoted was of the transcribers written year, not the paleo. age"

    You wrote, "Manuscripts from the first four gospels have been shown as early as atleast 70 AD."

    I interpreted this to mean that you think that there are manuscripts from CE 70 of the "first four gospels."

    If that is not what you meant, you should write more clearly what you mean.

    That claim, as written, is not true. There are no extant manuscripts of the NT that are that early. If you think otherwise, then "show" them to us.

    Then you say, "The complete copies of the greek new testament are 130 AD and total 5600 copies at 99.5% accuracy in copying."

    The FIRST KNOWN FRAGMENT of the NT is the Rylands P52 I mentioned, which is generally dated by SCHOLARS at about 125 CE.

    There are NO complete manuscript copies of the Greek NT before the fourth century.

    You are relying on unreliable sources. Please cease this.

    Then you say, concerning Socrates: "Plato was supposedly a student of the famous Socrates. The earliest recorded document of Plato is from 1100 AD. Plato lived around 400BC. You still find it more probable that Socrates existed after his publications are 1500 years after his death? Sounds like wishful thinking from a very smart person to me."

    Gosh, I don't know how to respond to this. You are relying for your information about Jesus on manuscript documents that don't exist, and then relying on your information about Socrates on a claim that there are no original manuscripts by Plato earlier than 1100?

    If you wanted to raise my belief that an historical Jesus existed, you should be raising my probability that an historical King Arthur existed (since I am using similar evidence), not trying to decrease the probability that an historical Socrates existed.

    You have a very naive notion about how historical criticism works.

    You need to reboot.

    And, you need to remember that this blog is about statistics, not about proselytizing your peculiar version of Christianity to the great unwashed and unconverted.

    And, P.S. your recently born baby

    is very cute. I wish you and your family a long and happy life.

  13. Nick Cox says:

    What about Judas? Why leave him out of the data summary? Good statisticians always are up front about outliers.

  14. Basil says:

    True, if you want to include him as one of the 12 after his suicide, he was replaced by Thomas if I remember correctly. Thomas suffered a Martyr's death. Judas hung himself after realizing what he did and threw his blood money back at the temple priests. They would also not take this money and hence "the field of blood" exists today.
    NonStatistics response: I am in no way a historical buff. I just clicked on the second link in Google that had about 6 references listed below it. I would agree this is mainly a statistical blog, but Dr. Gelman puts relgious related material on the blog regularly. If he wanted people to not discuss this, he wouldn't use the small religious metaphors in much of his writing. He would also not approve of comments in his threads related to religion.
    Thanks for the complement concerning my baby! He will be 4 months tomorrow. Proverbs 22:6 concerns my child and Ephesians 6:19-20 concerns my responses in this blog. You have a beautiful place in Vermont and excellent contributions to astronomy. I pray you have an excellent and long lived retirement.
    Statistics Response: By decreasing the probability of Socrates existence, I would argue that Jesus's existence is not affected. This is not a dependent event in my mind. I'm just trying to have you put your probabilities for evidence of historical figures in proper order. Increasing your belief that Jesus existed seems to be pointless. Moses went to Pharaoh many times to request him let his people go. Eventually God hardened his heart and no longer spoke to Pharaoh, but God's will proceeded accordingly.

  15. Bill Jefferys says:

    Since I consider the evidence for Socrates considerably greater than the evidence for Jesus, trying to undermine the historicity of Socrates can only simultaneously undermine the historicity of Jesus in my mind, since the available evidence remains the same. (BTW, Plato is not the telling evidence for me. You are beating the wrong horse.)

  16. I think Prof. Atran's take on religion and history is more interesting than arguing over whether Jesus existed (Prof. Atran hits what I think are the biggest questions of all regarding religion, and provides a few candidate epiphanies):

    For fun, Prof. Atran actually gathers some survey data and does some statistical analysis of it here (although not specifically on religion):

  17. Dear Andrew,

    The most recent paper by Tukey on the foundations of statistics is to my knowledge
    Tukey, J. W. More honest foundations for data analysis. Journal of Statistical Planning and Inference 57 (1997) 21-28.
    You will see that he doesn't advocate "hiding the model" at all.

    The way you sketch Tukey's approach I find myself positively reminded of my own way of seeing things (although I also don't advocate "hiding the model"), that I have unfortunately not yet properly put together and published. But let me use a very simple example to illustrate the point that underlying distributional assumptions are only one out of many possible ways to understand a statistical method, and that overall understanding becomes the better, the more such ways are known to the researcher.

    Considering the arithmetic mean, one can say that this is the ML and UMVU estimator for the location parameter of a Gaussian distribution. But there is a whole host of other things to be understood about the arithmetic mean.
    The arithmetic mean can be obtained from a squared loss function, which obviously connects it to the Gaussian likelihood, but may be interesting in its own right in some applications.
    However (following Tukey), the Gaussian justification is not so convincing after all, because we don't really believe any data to stem from an exact Gaussian distribution, and if your data are for example from a 0.99*Gaussian+0.01Cauchy distribution, the arithmetic mean becomes a perfect mess, at least in theory. Remarkable about this is that you need an awful lot of data to distinguish such a distribution from a Gaussian distribution, so you may well call it "approximately Gaussian", which is as strong a statement in favour of "real Gaussianity" as we can ever get.
    On the positive side, you can interpret the arithmetic mean as "distributing the overall sum of all data equally to every single instance", which may be something you are interested in in terms of interpretation if for example the data refer to amounts of money or something one can count, regardless of whether they are distributed approximately Gaussian or not. On the other hand, I know a few applications in which you can argue that this property is exactly what you want the least.
    The influence function of the arithmetic mean tells you that the influence of data on it is proportional to its size. Mathematically equivalent (but probably not interpretatively) is the observation that the mean can be characterised by the fact that by changing any data value by an amount of epsilon, the mean is changed by epsilon/n, regardless of what the data value is (absolute, and relative to the rest of your sample). Again, it is possible to nominate applications where this is either required or something that you definitely don't want, regardless of the underlying distribution.
    The mean is a disaster in terms of robustness if you have an unlimited value range, but if your data is restricted on some interval, the robustness problem goes out of the window, unless the main body of your data is concentrated on a very small subset of your interval.
    As opposed to the sample median, the arithmetic mean treats the data as information on interval scale level (and there are instances in which a researcher may want this even if the data are in fact not interval scaled in Stevens's sense).
    One could go on and say some stuff about what the mean is going to do under asymmetric distributions and under what conditions the researcher is going to like this or not.
    So that's the point: we should be generally interested in what the methods do. Under which models they are good or not so good is relevant, but it's by far not all that is of interest.

  18. DKB @ NYU says:

    I'm not sure what you mean by "models," but an economics colleague referred to models as measurement tools. They help us to organize data in ways that give us sharper insights. Conditional, of course, on the model, but you have to start somewhere.