Skip to content

Geoff Norman: Is science a special kind of storytelling?

Javier Benítez points to this article by epidemiologist Geoff Norman, who writes:

The nature of science was summarized beautifully by a Stanford professor of science education, Mary Budd Rowe, who said that:

Science is a special kind of story-telling with no right or wrong answers. Just better and better stories.

Benítez writes that he doesn’t buy this.

Neither do I, but I read the rest of Norman’s article and I really liked it.

Here’s a great bit:

I [Norman] turn to a wonderful paper by Cook et al. (2008) that described three fundamentally different kinds of research question:

1. Description

“I have a new (curriculum, questionnaire, simulation, OSCE method, course) and here’s how I developed it”

That’s not even poor research. It’s not research at all.

2. Justification

“I have a new (curriculum, module, course, software) and it works. Students really like it” OR “students self-reported knowledge was higher” OR “students did better on the final exam than a control group” OR even “students had lower mortality rates after the course”

OK, it’s research. But is it science? After all, what do we know about how the instruction actually works? Do we have to take the whole thing on board lock, stock and barrel, to get the effects? What’s the active ingredient? In short, WHY is it better? And that brings us to

3. Clarification

“I have a new (curriculum, module, course, software). It contains a number of potentially active ingredients including careful sequencing of concepts, imbedding of concepts in a problem, interleaved practice, and distributed practice. I have conducted a program of research where these factors have been systematically investigated and the effectiveness of each was demonstrated”.

That’s more like it. We’re not asking if it works, we’re asking why it works. And the results truly add to our knowledge about effective strategies in education. So one essential characteristic is that the findings are not limited to the particular gizmo under scrutiny in the study. The study adds to our general understanding of the nature of teaching and learning.

I don’t want to get caught up in a debate on what’s “science” or what’s “research” or whatever; the key point is that science and statistics are not, and should not, be, just about “what works” but rather “How does it work?” This relates to our earlier post about the problem with the usual formulation of clinical trials as purely evaluative, which leaves you in the lurch if someone doesn’t happen to provide you with a new and super-effective treatment to try out.

To put it another way, the “take a pill” or “black box” approach to statistical evaluation would work ok if we were regularly testing wonder-pills. But in the real world, effects are typically highly variable, and we won’t get far without looking into the damn box.

Norman also writes:

The most critical aspect of a theory is that, instead of addressing a simple “Does it work?,” to which the answer is “Yup” or “Nope”, it permits a critical examination of the effects and interactions of any number of variables that may potentially influence a phenomenon. So, one question leads to another question, and before you know it, we have a research program. Programmatic strategies inevitably lead to much more sophisticated understanding. And along the way, if it’s really going well, each study leads to further insights that in turn lead to the next study question. And each answer leads to further insight and explanation.

One interesting thing here is that this Lakatosian description might seem at first glance to be a good description, not just of healthy scientific research, but also of degenerate research programmes associated with elderly-words-and-slow-walking, or ovulation and clothing, or beauty and sex ratio, or power pose, or various other problematic research agendas that we’ve criticized in this space during the past decade. All these subfields, which have turned to be noise-spinning dead ends, feature a series of studies and publications, with each new result leading to new questions. But these studies are uncontrolled. Part of the problem there is a lack of substantive theories—no, vague pointing to just-so evolutionary stories is not enough. But another problem is statistical, the p-values and all that.

Now let’s put all these ideas together:
– If you want to engage in a scientifically successful research programme, your theories should be “thick” and full of interactions, not mere estimates of average treatment effects. That’s fine: all the areas we’ve been discussing, including the theories we don’t respect, are complex. Daryl Bem’s ESP hypotheses, for example: they were full of interactions.
– But now the next step is to model those interactions, to consider them together. If, for example, you decide that outdoor temperature is an important variable (as in that ovulation-and-clothing paper), you go back and include it in analysis of earlier as well as later studies. And if you’re part of a literature that includes other factors such as age, marital status, political orientation, etc., then, again, you include all these too.
– Including all these factors and then analyzing in a reasonable way (I’d prefer a multilevel model but, if you’re careful, even some classical multiple comparisons approach could do the trick) would reveal not much other than noise in those problematic research areas. In contrast, in a field where real progress is being made, a full analysis should reveal persistent patterns. For example, political scientists keep studying polarization in different ways. I think it’s real.

The point of my above discussion is to elaborate on Norman’s article by emphasizing the interlocking roles of substantive theory, data collection, and statistical analysis. I think that in Norman’s discussion, he’s kinda taking for granted that the statistical analysis will respect the substantive theory, but we see real problems when this doesn’t happen, in papers that consider isolated hypotheses without considering the interactions pointed to even within their own literatures.

P.S. Regarding the title of this post, here’s what Basbøll and I wrote a few years ago about science and stories. Although here we were talking not about storytelling but about scientists’ use of and understanding of stories.


  1. Jag Bhalla says:

    “Papers that consider isolated hypotheses” also run the risk of “rigor distortis” errors. Of being narrowly rigorously correct but in almost all real cases broadly wrong or irrelevant. Beware imprudent use of the “all else equal” move. In many real situations you can’t keep all else equal.

    See Garret Hardin’s First Law of Human Ecology: “We can never do merely one thing.”
    Any seeming pill-like intervention has numerous effects and shifts many other factors.

    A field that suffers especially badly from rigor distortis tendencies is economics.
    Why There Are No “Unfailed” Markets In Reality

    • Jonathan (another one) says:

      If I may respond to your essay, Jag: for *this* economist, answering the questions “what is the implication of rational behavior” or “does the particular market under discussion look like the workings of rational behavior” and even “the behavior of this market might seem irrational, but given the incentives of the market participants, I’m probably thinking about it incorrectly… let’s see what I got wrong (other than the fact that people aren’t perfectly rational maximizers)” are both normatively and positively interesting questions. As to spectacularly wrong predictions, figuring out why those turned out differently may indeed be caused on some factor that one assumed would remain constant which did not. That’s no more something to be embarrassed about (assuming your original predictive claim was carefully couched) than one should be embarrassed to say: “Reagan will lose if the generalization holds true that divorced men are unelectable.” (Note saying: “Reagan will lose. No divorced man has ever been elected President” while qualitatively similar, ought to embarrass the predictor. Your critique seems to me to properly chastise the *confidence* of economists, but not their methodology. Social science is largely a series of uncontrolled experiments. It’s unsurprising that a lot of bad inferences are drawn. It’s human nature that some of those bad inferences are stated more confidently than the data and the theory warrant.

  2. Ed Hagen says:

    “Part of the problem there is a lack of substantive theories—no, vague pointing to just-so evolutionary stories is not enough.”

    Fun fact. This volume:

    played a seminal role in the rise of evolutionary psychology (see, e.g., ch. 4). The co-editor, Susan Gelman, is (I believe) Andrew’s sister.

  3. Clyde Schechter says:

    What I think is missing from the perspective taken is recognition that description, justification, and clarification feed each other and evolve over fairly long periods of time. Working from rich theories to formulate and test new ideas is actually a late-stage condition that results from many iterations of a description-justification-clarification cycle.

    Take electromagnetism as an example. The phenomena of magnetism and static electricity were known at least as far back as the ancient Greeks. Only in the past couple of centuries years did scientists begin to catalog and systematize those observations into a somewhat coherent compendium of known phenomena, phenomena that could be used to speculate on theories, and to develop instrumentation that could be used to test theories experimentally and generate new knowledge of less obvious facts and improve theories.

    Even then, it took nearly a century to go from the early work of, say Faraday and Coulomb, to James Clerk Maxwell’s synthesis of electricity, magnetism, and light. And there were many false steps along the way. One could even say that Maxwell’s equations, although they neatly implied all the earlier known phenomena of electricity and magnetism, and predicted electromagnetic radiation, were a somewhat superficial “explanation.” They were still more phenomenological than explanatory. It took several more decades before the work of Einstein, Dirac, and others established that Maxwell’s equations synthesizing electricity and magnetism were the consequences of a much deeper theory of Lorentz invariance. And still more decades to develop modern quantum electrodynamics and quantum field theory. None of these great intellectual achievements would have been possible without the earlier “what happens if I do this?” kind of simple experimentation. And the development of instrumentation to test the limits of the principles of quantum field theory would not have been possible without the phenomenologic implications of earlier, less profound theories.

    There is also some irony in that while modern physics has developed really deep insights into the underlying “why and how” of what we see in the world, the theories are so computationally intensive that, as a practical matter, they can be directly applied only to relatively simple and limited systems. The study of, for example, living organisms, though in principle reducible to the calculations of quantum field theory (and even just quantum electrodynamics), could never be carried out in that way, and would probably miss important insights at larger levels of organization of matter and energy if it could.

    I think that the attempt to systematically study human behavior with a scientific approach didn’t really get underway until after World War II. I think it will take decades or centuries of just identifying and cataloging the phenomena that need to be explained by any good theory will be needed before non-trivial theories can begin to be formulated. Baby steps first.

    • Anoneuoid says:

      I think that the attempt to systematically study human behavior with a scientific approach didn’t really get underway until after World War II. I think it will take decades or centuries of just identifying and cataloging the phenomena that need to be explained by any good theory will be needed before non-trivial theories can begin to be formulated. Baby steps first.

      This has been happening, and actually seems to have been much more common before WWII. I found one of my earlier posts responding to this common trope w/ some examples:

      I really encourage people to expose themselves to the pre-NHST literature. It is a great guide for how to actually approach a problem scientifically (rather than through the lens of generations of NHST-guided thinking).

  4. Yes, it’s all telling stories. Let me put it in terms of The Prestige.

    The pledge

    Newton formulates laws of gravity and now we can predict the positions of the planets and tides and build better cannons.

    The turn

    Observations violating Newton’s theory are found, showing it isn’t true, it’s only an approximation.

    The prestige

    It’s approximations all the way down, making science no more “truthful” than literature.

    Cue ensuing postmodern ennui.

    Cue Keith O’Rourke to explain the pragmatist escape hatch where the scientist is hiding to help mitigate that ennui.

  5. ojm says:

    Probably a bit cliche to make this argument but – I reckon more ‘bad science’ is probably done in the name of developing a ‘why’ explanation before a ‘what’ is actually solidly established than vice-versa.

    I know I’ve definitely wasted time trying to ‘explain’ something that seems cool before taking the proper time to properly establish that it is in fact a real thing. You only realise there is no ‘what’ there when you actually return from story land.

    It seems like many scientific phenomenon are in fact weird ‘shut up and calculate’ things that we later develop convenient stories about so that we can more easily remember them. Story agnosticism can be a good thing!

Leave a Reply