Exploratory and confirmatory data analysis

Seth points me to this discussion he wrote on Tukey’s famous book, Exploratory Data Analysis. I use exploratory methods all the time and have thought a lot about Tukey and his book and so wanted to add a few comments.

In particular, I’d like to separate Seth’s important points about statistical practice from his inflammatory rhetoric and pop sociology. (Disclaimer: I engage in rantin’, ragin’, and pop sociology all the time—but, when I do it, it’s for a higher purpose and it’s ok.)

I have several important points to make here, so I recommend you stop whatever else you’re doing and read all of this.

1. As Seth discusses, so-called exploratory and confirmatory methods are not in opposition (as is commonly assumed) but rather go together. The history on this is that “confirmatory data analysis” refers to p-values, while “exploratory data analysis” is all about graphs, but both these approaches are ways of checking models. I discuss this point more fully in my articles, Exploratory data analysis for complex models and A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. The latter paper is particularly relevant for the readers of this blog, I think, as it discusses why Bayesians should embrace graphical displays of data—which I interpret as visual posterior predictive checks—rather than, as is typical, treating exploratory data analysis as something to be done quickly before getting to the real work of modeling.

2. Let me expand upon this point. Here’s how I see things usually going in a work of applied statistics:

Step 1: Exploratory data analysis. Some plots of raw data, possibly used to determine a transformation.

Step 2: The main analysis—maybe model-based, maybe non-parametric, whatever. It is typically focused, not exploratory.

Step 3: That’s it.

I have a big problem with Step 3 (as maybe you could tell already). Sometimes you’ll also see some conventional model checks such as chi-squared tests or qq plots, but rarely anything exploratory. Which is really too bad, considering that a good model can make exploratory data analysis much more effective and, conversely, I’ll understand and trust a model a lot more after seeing it displayed graphically along with data.

3. Seth writes:

A more accurate title of Tukey’s book would have been Low-Status Data Analysis. Graphs and transformations are low-status. They are low-status because graphs are common and transformations are easy. Anyone can make a graph or transform their data. I believe they were neglected for that reason. To show their high status, statistics professors focused their research and teaching on more difficult and esoteric stuff — like complicated regression. That the new stuff wasn’t terribly useful (compared to graphs and transformations) mattered little. Like all academics — like everyone — they cared enormously about showing high status. It was far more important to be impressive than to be useful.

This is, in my experience, ridiculous. Seth doesn’t just say that some work that is useful is low status, or that some work that is low status is useful. He says that useful statistical research work is generally low status. No, no, no, no! It’s hard to be useful! Just about everybody in statistics tries to do work that is useful.

OK, I know what Seth is talking about. I used to teach at Berkeley (as did Seth), and indeed the statistics department back then was chock-full of high-status professors (the department was generally considered #1 or #2 in the world) who did little if anything useful in applied statistics. But they were trying to be useful! They were just so clueless that they didn’t know better. And there was also some socialization going on, where the handful of faculty members who really were doing useful work seemed to most highly value the non-useful work of the others. It’s certainly true that they didn’t appreciate graphical methods or the challenges of getting down and dirty with data. (They might have dismissed such work as being insufficiently general and enduring, but in my experience such applied work has been crucial in motivating the development of new methods.) And they were also dismissive of applied research areas such as survey research that are fascinating and important but did not happen to be “hot” at the time. This is consistent with Seth’s hypothesis of status-seeking, but I’m inclined to give the more charitable interpretation that my Berkeley colleagues wanted to work on what they viewed as the most important and challenging problems.

I repeat: I completely disagree with Seth’s claim that, in statistics, it is “low status” to develop useful methods. Developing useful methods is as central to statistics as saving souls is in church—well, I’m guessing on this latter point, but stay with me on this, OK?—it’s just hard to do, so some people occupy themselves in other useful ways such as building beautiful cathedrals (or managing bazaars). But having people actually use your methods—that’s what it’s all about.

Seth is the psychologist, not me, so I won’t argue his claim that “everyone cares enormously about showing high status.” In statistics, though, I think he has his status attributions backward. (I also worry about the circularity of arguments about status, like similar this-can-explain-anything arguments based on self-interest or, for that matter, “the unconscious.”)

4. Hey, I almost forgot Seth’s claim, “Anyone can make a graph or transform their data.” No, no, no. Anyone can run a regression or an Anova! Regression and Anova are easy. Graphics is hard. Maybe things will change with the software and new media—various online tools such as Gapminder make graphs that are far far better than the Excel standard, and, with the advent of blogging, hot graphs are popular on the internet. We’ve come a long way from the days in which graphs were in drab black-and-white, when you had to fight to get them into journals, and when newspaper graphics were either ugly or (in the case of USA Today) of the notoriously trivial, “What are We Snacking on Today?”, style.

Even now, though, if you’re doing research work, it’s much easier to run a plausible regression or Anova than to make a clear and informative graph. I’m an expert on this one. I’ve published thousands of graphs but created tens of thousands more that didn’t make the cut.

One problem, perhaps, is that statistics advice is typically given in terms of the one correct analysis that you should do in any particular setting. If you’re in situation A, do a two-sample t-test. In situation B, it’s Ancova; for C you should do differences-in-differences; for D the correct solution is weighted least squares, and so forth. If you’re lucky, you’ll get to make a few choices regarding selection of predictors or choice of link function, but that’s about it. And a lot of practical advice on statistics actually emphasizes how little choice you’re supposed to have—the idea that you should decide on your data analysis before gathering any data, that it’s cheating to do otherwise.

One of the difficulties with graphs is that it clearly doesn’t work that way. Default regressions and default Anovas look like real regressions and Anovas, and in many cases they actually are! Default graphics may sometimes do a solid job at conveying information that you already have (see, for example, the graphs of estimated effect sizes and odds ratios that are, I’m glad to say, becoming standard adjuncts to regression analyses published in medical and public health journals), but it usually takes a bit more thought to really learn from a graph. Even the superplot—a graph I envisioned in my head back in 2003 (!) back at the very start of our Red State, Blue State project, before doing any data analysis at all—even the superplot required a lot of tweaking to look just right.

Perhaps things will change. One of my research interests is to more closely tie graphics to modeling and to develop a default process for looking through lots of graphs in a useful way. Researchers were doing this back in the 1960s and 70s–methods for rotating point clouds on the computer, and all that—but I’m thinking of something slightly different, something more closely connected to fitted models. But right now, no, graphs are harder, not easier, than formal statistical analysis.

Seth also writes:

Most statistics professors and their textbooks have neglected all uses of graphs and transformations, not just their exploratory uses. I used to think exploratory data analysis (and exploratory science more generally) needed different tools than confirmatory data analysis and confirmatory science. Now I don’t. A big simplification.

All I can say is: Things are changing! The most popular book on Bayesian statistics and the most popular book on multilevel modeling have a strong exploratory framework and strongly support the view that similar tools are used for exploratory and confirmatory data analysis. (Not exactly the same tools, of course: there are lots of technical issues specific to graphics, and other technical issues specific to probability calculations. But I agree with Seth, there’s a lot of overlap.)

5. To return briefly to Tukey’s extremely influential book: EDA was published in 1977 but I believe he began to work in that area in the 1960s, about ten or fifteen years after doing his also extremely influential work on multiple comparisons (that is, confirmatory data analysis). I’ve always assumed that Tukey was finding p-values to be too limited a tool for doing serious applied statistics–something like playing the piano with mittens. I’m sure Tukey was super-clever at using the methods he had to learn from data, but it must have come to him that he was getting the most from his graphical displays of p-values and the like, rather than from their Type 1 and Type 2 error probabilities that he’d previously focused so strongly on. From there it was perhaps natural to ditch the p-values and the models entirely—as I’ve written before, I think Tukey went a bit too far in this particular direction—and see what he could learn by plotting raw data. This turned out to be an extremely fruitful direction for researchers, and followers in the Tukey tradition—I’m thinking of statisticians such as Bill Cleveland, Howard Wainer, Andreas Buja, Diane Cook, Antony Unwin, etc.—are continuing to make progress here.

(I’ll have to talk with my colleagues who knew Tukey to see how accurate the above paragraph is as a description of his actual progression of thoughts, rather than merely my rational reconstruction.)

The actual methods and case studies in the EDA book . . . well, that’s another story. Hanging rootograms, stem-and-leaf plots, goofy plots of interactions, the January temperature in Yuma, Nevada—all of this is best forgotten or, at best, remembered as an inspiration for important later work. Tukey was a compelling writer, though—I’ll give him that. I read Exploratory Data Analysis twenty-five years ago and was captivated. At some point I escaped its spell and asked myself why I should care about the temperature in Yuma–but, at the time, it all made perfect sense. Even more so once I realized that his methods are ultimately model-based and can be even more effective if understood in that way (a point that I became dimly aware of while completing my Ph.D. thesis in 1990—when I realized that the model I’d spent two years working on didn’t actually fit my data—and which I first formalized at a conference talk in 1997 and published in 2003 and 2004. It’s funny how slowly these ideas develop.).

P.S. It’s funny that Seth characterizes Freakonomics as low-status economics. Freakonomics is great, but this particular “rogue economist” was tenured at the University of Chicago and had been given major awards by the economics profession. The problem here, I think, is Seth’s tendency to characterize everyone and everything into good guys and bad guys. Levitt’s a good guy and low-status work is good, therefore Levitt’s work must be low-status. Levitt’s “real innovation” (in Seth’s words) was to do excellent, headline-worthy work, then to actually get the headlines, to do more excellent, newsworthy work, and to attract the attention of a dedicated journalist. (Now he has to decide where to go next.)

That said, the economics profession (and academia in general) faces some tough questions, such as: How much in the way of resources should go toward studying sumo wrestlers and how much should go toward studying international trade? I assume Seth would argue that sumo wrestling is low-status and deserves more resources, while the study of international trade is overvalued and should be studied less. This line of thinking can run into trouble, though. For example, consider various declining academic subjects such as anthropology, philosophy, and classics. I’m not quite sure how Seth (or others) define status, but it’s my impression that—in addition to their difficulties with funding and enrollment—anthropology, philosophy, and classics are lower status than more applied subjects such as economics, architecture, and law. I guess what I’m saying is: Low status isn’t always such a good thing! Some topics are low-status for a good reason, for example that they’ve been bypassed or discredited. Consider various forgotten fads such as astrology or deconstructionism.

(I have some sympathy for “let the market decide” arguments: we don’t need a commisar to tell us how many university appointments should be made in art history and how many in statistics, for example. Still, even in market or quasi-market situation, someone has to decide. For example, suppose a particular economics department has the choice of hiring a specialist in sumo wrestling or in international trade. They still have to make a decision.)

P.P.S. You might wonder why I’m spilling so much ink (so to speak) responding to Seth. I’ll give two reasons (in addition to that he’s my friend, and one of the functions of this blog is to allow me to share thoughts that would otherwise be restricted to personal emails). First is the journalistic tradition of the hook to hang a story on. All these thoughts spilled out of me, but Seth’s thoughts were what got me started in this particular instance. Second, I know that exploratory statistical ideas have been important in Seth’s own applied research, and so his views on what makes these methods work are worth listening to, even if I disagree on many points.

In any case, I’ll try at some point to figure out a way to package these and related ideas in a more coherent way and perhaps publish them in some journal that no one reads or put them deep inside a book that nobody ever reads past chapter 5. Blogging as the higher procrastination, indeed.

P.P.P.S. Sometimes people ask me how much time I spend on blogging, and I honestly answer that I don’t really know, because I intersperse blogging with other work. This time, however, I know the answer because I wrote this on the train. It took 2 hours.

14 thoughts on “Exploratory and confirmatory data analysis

  1. It's great to get your views about all this. My comments about "statistics professors" were meant to apply to the whole history of the field (starting in the 1920s, say) not just today & yesterday where I agree things are different. It's easier to do an ANOVA than make a graph, you say. For most of the 1900s, that was false — especially if by "graph" you mean scatterplot. In 1940, it was a lot easier to make a scatterplot than do an ANOVA. One effect of systematic neglect is to make the stuff you never do (such as graphs) harder relative to the stuff you usually do (such as ANOVA). Several years ago, the psychology dept at Berkeley had a statistics consultant (a psych grad student) who didn't know how to use SPSS to make a graph.

    "But they were trying to be useful! They were just so clueless that they didn't know better." You know them much better than I do but I think there is some truth to that. A friend of mine in Croatia said that after Soviet Russia fell, they were free to decorate their houses — but they had forgotten how. There's certainly an increased interest among stat professors in being useful, but, from the long neglect of being useful, they have forgotten how. What your comments don't mention is what I didn't bother mentioning: All other departments are the same way. (In my experience.) In every department they look down on being useful. In some more than others, sure, and it isn't constant over time, sure, but the general preference for useless over useful is blindingly clear. This is why academia is called "ivory tower". This is why Thorstein Veblen could write a whole book about status display, not give any sources, yet be highly persuasive. One of his chapters was about professors. So to me you seem to be arguing that stat professors are somehow different than all other professors. Not to mention all other human beings. Unlikely.

  2. As you note, when EDA came out John Tukey was already famous (by statistician standards).

    If somebody less esteemed had published EDA, would anyone have paid much attention? Would it have had any impact at all? If Tukey had published it as his main work product prior to tenure review, would he have gotten tenure at Berkeley?

    You wrote "there was also some socialization going on, where the handful of faculty members who really were doing useful work seemed to most highly value the non-useful work of the others." That's a curious statement, and I wonder if you have thoughts on why this was/is.

  3. Great post, two hours well spent!

    Absolutely agree about integrating graphics and models: models should reveal structure in the data, and once structure is found, it should be easy to visualize. If the model cannot be visualized in one way or another, it is a sign of trouble. So I agree with you and with Seth that EDA is too restrictive a description for this area.

    Anyone who disagrees with Andrew on "graphics is hard" should hop over to my blog. Many of my posts take days to compose, and even then, the charts are not yet publishable.

    Unlike Andrew, I find Tukey's writing style different and enjoyable but often hard to grasp because it's too vague.

    And oops, my just published book is low status since I am trying to make statistics accessible but I think it will be useful to many.

  4. Zbicyclist – interestingly Fisher often bitterly complained that the real mathematicians around him would never help him make his work more rigorous.

    In looking for how other faculty can help in your research _without_ stealing your thunder/limelight – techincal skills that you are not quite up on is likely high on the list.

    Andrew: so you do sometimes exceed the 15 minute blog rule ;-)


  5. Thanks to you and Seth for great posts!

    Minor point, even if graphs are hard, I don't think that they are understood to be hard by social scientists. Are they understood to be hard by statisticians?

  6. Seth:

    In econ and poli sci today, it's much much easier to run reasonably-good regressions than to make useful graphs. Psychology may be different, though.

    On your other point, I'm not suggesting that professors are different from everybody else (except, of course, in the ways that they are). Here's my point: Whether or not status-seeking is the universal behavioral solvent you seem to feel it is, I don't think it explains much. It seems to serve for you the same tautological purpose that for others is served by explanations such as "self-interest" or "unconscious drives." Basically, if someone does something you don't like, you're attributing it to status-seeking. In my experience in statistics departments, it is applied work, not theoretical work, that has the highest status. Not always, and not everywhere, but most places.

    Also, I respect that you've learned a lot from Thorstein Veblen. I'm a big fan of Mark Twain, too. But I don't know that I'd cite either of these long-dead authors as authorities on the contemporary academic or business scene. I'd rather just appreciate these authors for themselves and move forward.


    The Berkeley stat dept of the 1990s was unusual for its time, I believe, in being so theoretically focused. I do think there is such a thing as mob psychology, and I think the few professors there who did applied work there were socialized to believe that theoretical work was better. (I think that, even among those who liked what I did, they would've felt comfortable if I'd proved a few more theorems.)

    No, I don't think Tukey would've gotten tenure at a top place for EDA alone. On the other hand, the Tukey followers I mentioned in my above entry have all had successful academic careers while focusing on statistical graphics. The difference, I guess, is that they couldn't get away with simply writing EDA-style idiosyncratic books full of assertions and bright ideas; they had to write about their ideas more systematically. But that's the academic, scholarly way, and not such a horrible thing, I think.

    For that matter, I don't know if I would've gotten tenure at Berkeley for doing applied statistics and developing new methodology without proving theorems . . . Hey, wait a minute, yes, maybe I do know the answer to that one!


    I suppose that social scientists don't think it's hard to make graphs, but they generally don't really understand the difference between a good graph and a bad graph. Or, to put it another way, they realize there are such things as good graphs (as in Tufte) but they think of these more as gimmicks than as serious data analysis.

    A social scientist can push a button and run a regression–and he or she can even thing seriously about it, considering issues such as measurement error, endogeneity, and so forth. But most social scientists–and most statisticians–don't have a framework for thinking systematically about graphs. Tukey's EDA book was a start but only a start. In my 2003 and 2004 papers, I tried to build upon Tukey's ideas and say explicitly some of the concepts that I believe were implicit in his work. Formalizations that might be useful to those of us who are not Tukeylike in our statistical intuitions and might appreciate some more general principles.

  7. I think that you could work through a list of things people have spotted in graphs and find a few tests that would cover most of them. E.g. use some orthogonal family to expand the relationships or distribution functions considered slightly and test against this. How well would human eyes do against such competition (e.g. when presented with generated data with known right answers)? It might at least be good practice for the test subjects.

  8. Andrew – my experiences in statistics departments with two exceptions seems to be the exact opposite of yours with Berkely seeming the most usual.

    One of the two exceptions and the biggest was a primarily Bayesian department and that might explain a lot of it.

    But the following was typical – an enjoyable short course/seminar given by and for the statistics faculty had an applied section that was solely about different ways one could prove the central limit theorem (again this was the applied component).

    Usefulness is in the eye of the beholder and there is nothing more practical than a good theory – but fairly recently and persisting, I belive many statistics faculty are primarily as Don Rubin put it obssessed with "baby" mathematics.


  9. Seth: "A friend of mine in Croatia said that after Soviet Russia fell, they were free to decorate their houses — but they had forgotten how."

    Interesting how the Russians were able to check whether people in Yugoslavia were decorating their houses. A country must be really powerful to enforce such regulations in a foreign country that is not your ally or "satellite".

  10. Anon, I was using "the fall of Soviet Russia" as shorthand for the broad retreat of their influence. Maybe my Croatian friend said "when communism went away".

    Andrew, before computers, scatterplots were a lot easier to do than regressions. Sure, it's different now. To repeat, I am talking about the whole history of statistics, not just the current situation.

    Veblen described a lot of facts in The Theory of the Leisure Class, drawn from many time periods and cultures. That his evidence was so broad is a good reason to think the aspect of human nature he described hasn't gone away in the last 100 years. I'm just repeating Veblen, basically. To describe my argument as "tautological" ignores all the facts that Veblen gave.

  11. There are two points I think are important to add:

    i) what I value most from Tukey's work are not the plots or techniques he published but the whole philosophy of data analysis, which is so much different from what is usually taught in statistics.

    ii) the (only) reason why graphics are hard for most statistician is the fact that they never ever really learned how to use graphics. Students spend days and weeks to understand regression and ANOVA, and all statistics software offers these things with a single click – so that feels easy. IMHO we need to change the philosophy behind what we teach in statistics class in order to make a change.

  12. I think we can say this:

    Anyone can make a bad graph
    Anyone can fit a bad model
    Anyone can point and click in an SPSS GUI and get something that LOOKS like statistics.


    It is (often) hard to make a good graph
    It is (often) hard to make a good model
    It is (almost) never easy to do good statistics.

    Didn't George Box say

    "There are no routine statistical questions, only questionable statistical routines."?

Comments are closed.