In the middle of a long comment thread on a silly Psychological Science paper, Ed Hagen wrote:
Exploratory studies need to become a “thing.” Right now, they play almost no formal role in social science, yet they are essential to good social science. That means we need to put as much effort in developing standards, procedures, and techniques for exploratory studies as we have for confirmatory studies. And we need academic norms that reward good exploratory studies so there is less incentive to disguise them as confirmatory.
The problem goes like this:
1. Exploratory work gets no respect. Do an exploratory study and you’ll have a difficult time getting it published.
2. So, people don’t want to do exploratory studies, and when someone does do an exploratory study, he or she is motivated to cloak it in confirmatory language. (Our hypothesis was Z, we did test Y, etc.)
3. If you tell someone you will interpret their study as being exploratory, they may well be insulted, as if you’re saying their study is only exploration and not real science.
4. Then there’s the converse: it’s hard to criticize an exploratory study. It’s just exploratory, right? Anything goes!
And here’s what I think:
Exploration is important. In general, hypothesis testing is overrated and hypothesis generation is underrated, so it’s a good idea for data to be collected with exploration in mind.
But exploration, like anything else, can be done well or it can be done poorly (or anywhere in between). To describe a study as “exploratory” does not get it off the hook for problems of measurement, conceptualization, etc.
For example, Ed Hagen in that thread mentioned that horrible ovulation and clothing paper, and its even more horrible followup where the authors pulled the outdoor temperature variable out of a hat to explain away an otherwise embarrassing non-replication (which shouldn’t’ve been embarrassing at all given the low low power and many researcher degrees of freedom of the original study which had gotten them on the wrong track in the first place). As I wrote in response to Hagen, I love exploratory studies, but gathering crappy one-shot data on a hundred people and looking for the first thing that can explain your results . . . that’s low-quality exploratory research.
From “EDA” to “Design of exploratory studies”
With the phrase “Exploratory Data Analysis,” the statistician John Tukey named and gave initial shape to a whole new way of thinking formally about statistics. Tukey of course did not invent data exploration, but naming the field gave a boost to thinking about it formally (in the same way that, to a much lesser extent, our decades of writing about posterior predictive checks has given a sense of structure and legitimacy to Bayesian model checking). And that’s all fine. EDA is great, and I’ve written about connections between EDA and Bayesian modeling; see here and here.
But today I want to talk about something different, which is the idea of design of an exploratory study.
Suppose you know ahead of time that your theories are a bit vague and omnidirectional, that all sorts of interesting things might turn up that you will want to try to understand, and you want to move beyond the outmoded Psych Sci / PPNAS / Plos-One model of chasing p-values in a series of confirmatory studies.
You’ve thought it through and you want to do it right. You know it’s time for exploration first and confirmation later, if at all. So you want to design an exploratory study.
What principles do you have? What guidelines? If you look up “design” in statistics or methods textbooks, you’ll find a lot of power calculations, maybe something on bias and variance, and perhaps some advice on causal identification. All these topics are relevant to data exploration and hypothesis generation, but not directly so, as the output of the analysis is not an estimate or hypothesis test.
So I think we—the statistics profession—should be offering guidelines on the design of exploratory studies.
An analogy here is observational studies. Way back when, causal inference was considered to come from experiments. Observational studies were second best, and statistics textbooks didn’t give any advice on the design of observational studies. You were supposed to just take your observational data, feel bad that they didn’t come from experiments, and go from there. But then Cochran, and Rosenbaum, and Angrist and Pischke, wrote textbooks on observational studies, including advice on how to design them. We’re gonna be doing observational studies, so let’s do a good job at them, which includes thinking about how to plan them.
Same thing with exploratory studies. Data-based exploration and hypothesis generation are central to science. Statisticians should be involved in the design as well as the analysis of these studies.
So what advice should we give? What principles do we have for the design of exploratory studies?
Let’s try to start from scratch, rather than taking existing principles such as power, bias, and variance that derive from confirmatory statistics.
– Measurement. I think this has to be the #1 principle. Validity and reliability: that is, you’re measuring what you think you’re measuring, and you’re measuring it precisely. Related: within-subject designs or, to put it more generally, structured measurements. If you’re interested in studying people’s behavior, measure it over and over, ask people to keep diaries, etc. If you’re interested in improving education, measure lots of outcomes, try to figure out what people are actually learning. And so forth.
– Open-endedness. Measuring lots of different things. This goes naturally with exploration.
– Connections between quantitative and qualitative data. You can learn from those open-ended survey responses—but only if you look at them.
– Where possible, collect or construct continuous measurements. I’m thinking of this partly because graphical data analysis is an important part of just about any exploratory study. And it’s hard to graph data that are entirely discrete.
I think much more can be said here. It would be great to have some generally useful advice for the design of exploratory studies.