Seth points me to this discussion he wrote on Tukey’s famous book, Exploratory Data Analysis. I use exploratory methods all the time and have thought a lot about Tukey and his book and so wanted to add a few comments.
In particular, I’d like to separate Seth’s important points about statistical practice from his inflammatory rhetoric and pop sociology. (Disclaimer: I engage in rantin’, ragin’, and pop sociology all the time—but, when I do it, it’s for a higher purpose and it’s ok.)
I have several important points to make here, so I recommend you stop whatever else you’re doing and read all of this.
1. As Seth discusses, so-called exploratory and confirmatory methods are not in opposition (as is commonly assumed) but rather go together. The history on this is that “confirmatory data analysis” refers to p-values, while “exploratory data analysis” is all about graphs, but both these approaches are ways of checking models. I discuss this point more fully in my articles, Exploratory data analysis for complex models and A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. The latter paper is particularly relevant for the readers of this blog, I think, as it discusses why Bayesians should embrace graphical displays of data—which I interpret as visual posterior predictive checks—rather than, as is typical, treating exploratory data analysis as something to be done quickly before getting to the real work of modeling.
2. Let me expand upon this point. Here’s how I see things usually going in a work of applied statistics:
Step 1: Exploratory data analysis. Some plots of raw data, possibly used to determine a transformation.
Step 2: The main analysis—maybe model-based, maybe non-parametric, whatever. It is typically focused, not exploratory.
Step 3: That’s it.
I have a big problem with Step 3 (as maybe you could tell already). Sometimes you’ll also see some conventional model checks such as chi-squared tests or qq plots, but rarely anything exploratory. Which is really too bad, considering that a good model can make exploratory data analysis much more effective and, conversely, I’ll understand and trust a model a lot more after seeing it displayed graphically along with data.
3. Seth writes:
A more accurate title of Tukey’s book would have been Low-Status Data Analysis. Graphs and transformations are low-status. They are low-status because graphs are common and transformations are easy. Anyone can make a graph or transform their data. I believe they were neglected for that reason. To show their high status, statistics professors focused their research and teaching on more difficult and esoteric stuff — like complicated regression. That the new stuff wasn’t terribly useful (compared to graphs and transformations) mattered little. Like all academics — like everyone — they cared enormously about showing high status. It was far more important to be impressive than to be useful.
This is, in my experience, ridiculous. Seth doesn’t just say that some work that is useful is low status, or that some work that is low status is useful. He says that useful statistical research work is generally low status. No, no, no, no! It’s hard to be useful! Just about everybody in statistics tries to do work that is useful.
OK, I know what Seth is talking about. I used to teach at Berkeley (as did Seth), and indeed the statistics department back then was chock-full of high-status professors (the department was generally considered #1 or #2 in the world) who did little if anything useful in applied statistics. But they were trying to be useful! They were just so clueless that they didn’t know better. And there was also some socialization going on, where the handful of faculty members who really were doing useful work seemed to most highly value the non-useful work of the others. It’s certainly true that they didn’t appreciate graphical methods or the challenges of getting down and dirty with data. (They might have dismissed such work as being insufficiently general and enduring, but in my experience such applied work has been crucial in motivating the development of new methods.) And they were also dismissive of applied research areas such as survey research that are fascinating and important but did not happen to be “hot” at the time. This is consistent with Seth’s hypothesis of status-seeking, but I’m inclined to give the more charitable interpretation that my Berkeley colleagues wanted to work on what they viewed as the most important and challenging problems.
I repeat: I completely disagree with Seth’s claim that, in statistics, it is “low status” to develop useful methods. Developing useful methods is as central to statistics as saving souls is in church—well, I’m guessing on this latter point, but stay with me on this, OK?—it’s just hard to do, so some people occupy themselves in other useful ways such as building beautiful cathedrals (or managing bazaars). But having people actually use your methods—that’s what it’s all about.
Seth is the psychologist, not me, so I won’t argue his claim that “everyone cares enormously about showing high status.” In statistics, though, I think he has his status attributions backward. (I also worry about the circularity of arguments about status, like similar this-can-explain-anything arguments based on self-interest or, for that matter, “the unconscious.”)
4. Hey, I almost forgot Seth’s claim, “Anyone can make a graph or transform their data.” No, no, no. Anyone can run a regression or an Anova! Regression and Anova are easy. Graphics is hard. Maybe things will change with the software and new media—various online tools such as Gapminder make graphs that are far far better than the Excel standard, and, with the advent of blogging, hot graphs are popular on the internet. We’ve come a long way from the days in which graphs were in drab black-and-white, when you had to fight to get them into journals, and when newspaper graphics were either ugly or (in the case of USA Today) of the notoriously trivial, “What are We Snacking on Today?”, style.
Even now, though, if you’re doing research work, it’s much easier to run a plausible regression or Anova than to make a clear and informative graph. I’m an expert on this one. I’ve published thousands of graphs but created tens of thousands more that didn’t make the cut.
One problem, perhaps, is that statistics advice is typically given in terms of the one correct analysis that you should do in any particular setting. If you’re in situation A, do a two-sample t-test. In situation B, it’s Ancova; for C you should do differences-in-differences; for D the correct solution is weighted least squares, and so forth. If you’re lucky, you’ll get to make a few choices regarding selection of predictors or choice of link function, but that’s about it. And a lot of practical advice on statistics actually emphasizes how little choice you’re supposed to have—the idea that you should decide on your data analysis before gathering any data, that it’s cheating to do otherwise.
One of the difficulties with graphs is that it clearly doesn’t work that way. Default regressions and default Anovas look like real regressions and Anovas, and in many cases they actually are! Default graphics may sometimes do a solid job at conveying information that you already have (see, for example, the graphs of estimated effect sizes and odds ratios that are, I’m glad to say, becoming standard adjuncts to regression analyses published in medical and public health journals), but it usually takes a bit more thought to really learn from a graph. Even the superplot—a graph I envisioned in my head back in 2003 (!) back at the very start of our Red State, Blue State project, before doing any data analysis at all—even the superplot required a lot of tweaking to look just right.
Perhaps things will change. One of my research interests is to more closely tie graphics to modeling and to develop a default process for looking through lots of graphs in a useful way. Researchers were doing this back in the 1960s and 70s–methods for rotating point clouds on the computer, and all that—but I’m thinking of something slightly different, something more closely connected to fitted models. But right now, no, graphs are harder, not easier, than formal statistical analysis.
Seth also writes:
Most statistics professors and their textbooks have neglected all uses of graphs and transformations, not just their exploratory uses. I used to think exploratory data analysis (and exploratory science more generally) needed different tools than confirmatory data analysis and confirmatory science. Now I don’t. A big simplification.
All I can say is: Things are changing! The most popular book on Bayesian statistics and the most popular book on multilevel modeling have a strong exploratory framework and strongly support the view that similar tools are used for exploratory and confirmatory data analysis. (Not exactly the same tools, of course: there are lots of technical issues specific to graphics, and other technical issues specific to probability calculations. But I agree with Seth, there’s a lot of overlap.)
5. To return briefly to Tukey’s extremely influential book: EDA was published in 1977 but I believe he began to work in that area in the 1960s, about ten or fifteen years after doing his also extremely influential work on multiple comparisons (that is, confirmatory data analysis). I’ve always assumed that Tukey was finding p-values to be too limited a tool for doing serious applied statistics–something like playing the piano with mittens. I’m sure Tukey was super-clever at using the methods he had to learn from data, but it must have come to him that he was getting the most from his graphical displays of p-values and the like, rather than from their Type 1 and Type 2 error probabilities that he’d previously focused so strongly on. From there it was perhaps natural to ditch the p-values and the models entirely—as I’ve written before, I think Tukey went a bit too far in this particular direction—and see what he could learn by plotting raw data. This turned out to be an extremely fruitful direction for researchers, and followers in the Tukey tradition—I’m thinking of statisticians such as Bill Cleveland, Howard Wainer, Andreas Buja, Diane Cook, Antony Unwin, etc.—are continuing to make progress here.
(I’ll have to talk with my colleagues who knew Tukey to see how accurate the above paragraph is as a description of his actual progression of thoughts, rather than merely my rational reconstruction.)
The actual methods and case studies in the EDA book . . . well, that’s another story. Hanging rootograms, stem-and-leaf plots, goofy plots of interactions, the January temperature in Yuma, Nevada—all of this is best forgotten or, at best, remembered as an inspiration for important later work. Tukey was a compelling writer, though—I’ll give him that. I read Exploratory Data Analysis twenty-five years ago and was captivated. At some point I escaped its spell and asked myself why I should care about the temperature in Yuma–but, at the time, it all made perfect sense. Even more so once I realized that his methods are ultimately model-based and can be even more effective if understood in that way (a point that I became dimly aware of while completing my Ph.D. thesis in 1990—when I realized that the model I’d spent two years working on didn’t actually fit my data—and which I first formalized at a conference talk in 1997 and published in 2003 and 2004. It’s funny how slowly these ideas develop.).
P.S. It’s funny that Seth characterizes Freakonomics as low-status economics. Freakonomics is great, but this particular “rogue economist” was tenured at the University of Chicago and had been given major awards by the economics profession. The problem here, I think, is Seth’s tendency to characterize everyone and everything into good guys and bad guys. Levitt’s a good guy and low-status work is good, therefore Levitt’s work must be low-status. Levitt’s “real innovation” (in Seth’s words) was to do excellent, headline-worthy work, then to actually get the headlines, to do more excellent, newsworthy work, and to attract the attention of a dedicated journalist. (Now he has to decide where to go next.)
That said, the economics profession (and academia in general) faces some tough questions, such as: How much in the way of resources should go toward studying sumo wrestlers and how much should go toward studying international trade? I assume Seth would argue that sumo wrestling is low-status and deserves more resources, while the study of international trade is overvalued and should be studied less. This line of thinking can run into trouble, though. For example, consider various declining academic subjects such as anthropology, philosophy, and classics. I’m not quite sure how Seth (or others) define status, but it’s my impression that—in addition to their difficulties with funding and enrollment—anthropology, philosophy, and classics are lower status than more applied subjects such as economics, architecture, and law. I guess what I’m saying is: Low status isn’t always such a good thing! Some topics are low-status for a good reason, for example that they’ve been bypassed or discredited. Consider various forgotten fads such as astrology or deconstructionism.
(I have some sympathy for “let the market decide” arguments: we don’t need a commisar to tell us how many university appointments should be made in art history and how many in statistics, for example. Still, even in market or quasi-market situation, someone has to decide. For example, suppose a particular economics department has the choice of hiring a specialist in sumo wrestling or in international trade. They still have to make a decision.)
P.P.S. You might wonder why I’m spilling so much ink (so to speak) responding to Seth. I’ll give two reasons (in addition to that he’s my friend, and one of the functions of this blog is to allow me to share thoughts that would otherwise be restricted to personal emails). First is the journalistic tradition of the hook to hang a story on. All these thoughts spilled out of me, but Seth’s thoughts were what got me started in this particular instance. Second, I know that exploratory statistical ideas have been important in Seth’s own applied research, and so his views on what makes these methods work are worth listening to, even if I disagree on many points.
In any case, I’ll try at some point to figure out a way to package these and related ideas in a more coherent way and perhaps publish them in some journal that no one reads or put them deep inside a book that nobody ever reads past chapter 5. Blogging as the higher procrastination, indeed.
P.P.P.S. Sometimes people ask me how much time I spend on blogging, and I honestly answer that I don’t really know, because I intersperse blogging with other work. This time, however, I know the answer because I wrote this on the train. It took 2 hours.