How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.

Posted on April 7, 2014 10:30 AM by Andrew

In “Story: A Definition,” visual analysis researcher Robert Kosara writes:

A story

ties facts together. There is a reason why this particular collection of facts is in this story, and the story gives you that reason.

provides a narrative path through those facts. In other words, it guides the viewer/reader through the world, rather than just throwing them in there.

presents a particular interpretation of those facts. A story is always a particular path through a world, so it favors one way of seeing things over all others.

The relevance of these ideas to statistical graphics is apparent.

From a completely different direction, in “When do stories work? Evidence and illustration in the social sciences,” Thomas Basbøll and I write:

Storytelling has long been recognized as central to human cognition and communication. Here we explore a more active role of stories in social science research, not merely to illustrate concepts but also to develop new ideas and evaluate hypotheses, for example in deciding that a research method is effective. We see stories as central to engagement with the development and evaluation of theories, and we argue that for a story to be useful in this way, it should be anomalous (representing aspects of life that are not well explained by existing models) and immutable (with details that are well-enough established that they have the potential to indicate problems with a new model).

We draw a connection to posterior predictive checking, which I earlier had argued is fundamentally connected with statistical graphics and exploratory data analysis (see this paper from 2003 and this one 2004).

I don’t have anything more to say on this right now. I just wanted to juxtapose these two perspectives, each of which connect statistical graphics to literature, but in a different way. Kosara focuses on the idea that stories have narrative and viewpoint, and Basbøll and I focus on the idea that effective stories are anomalous and immutable. All these ideas seem important to me, and it would be interesting to think about how they fit together.

9 thoughts on “How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.”

jrc on April 7, 2014 5:04 PM at 5:04 pm said:

Statistical Writing as Literature and Statistical Reasoning as Literary Criticism

I think comparing statistical reasoning with literature is the wrong metaphor. Yes, there is something similar between writing literature and writing up quantitative analyses – that is, they are both writing. Writing generally intends to convey a message or idea of some sort. We organize our thinking into ideas, put these ideas into some logical or rhetorically powerful order, and then translate these ideas in words. The remaining thing in the world – the report or the novel – is a constructed object of words intended to convey an idea. But that does not mean that “literature” is like statistical reasoning, it means that literature and academic reports are both kinds of writing.

I think what you are getting at is that the rhetorical aspect of academic writing is just as important as the rhetorical aspect of literature – a great story badly written isn’t worth reading, just as an interesting idea poorly tested empirically isn’t worth using to update our priors about the world.

But what is more interesting, to me, is how statistical reasoning is like literary criticism. In both of these cases, there is some objective thing there to be analyzed – a dataset or a piece of literature. We take a series of well-developed but imperfect theoretical tools (hermeneutics, statistics) to this thing in the world: close reading, deconstruction, historical/biographical techniques (on the one hand) or least-squares, cluster analysis, hierarchical modelling (on the other). And then we organize the results of those analyses into logical/rhetorical arguments regarding the nature of that thing.

And what gives either of these interpretive modes epistemological weight (causes us to update our priors about the thing under consideration) is how well the point is argued argued (logically and rhetorically). I think this is one of the key points that many researchers with only basic statistical training miss, and a different perspective on the perils of ignorant NHST. Stars in your regression tables are meaningless. What is convincing (or not) is the argument you give regarding those stars: was the experiment properly conceived? were protocols followed? is your variation exogenous? does it test the thing in the world you claim it tests?

I think the interesting thing about the similarities between literature and statistics is that both are primarily a rhetorical enterprises engaged in the production of truth. Literature and statistical reports are both writing, but statistical reasonsing is not like literature, it is like literary criticism. To me, though, the most important point is in thinking about statistical arguments as rhetorical arguments.
Steve Sailer on April 7, 2014 5:28 PM at 5:28 pm said:

In my experience, the concept of the exception that proves [supports] the rule [tendency] is very useful in thinking about which stories make good examples for a statistical hypothesis, but most people are resistant to agreeing with that in the abstract.
- Andrew on April 7, 2014 5:38 PM at 5:38 pm said:
  
  Steve:
  
  As Thomas and I discuss in our paper, it is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is that stories should not generally be viewed as direct evidence for learning about the world, but rather they should be considered as tools for probing our understanding. Hence the importance (and attraction) of stories that are anomalous, which make us say, in the famous words attributed to Isaac Asimov, “not ‘Eureka’ but ‘hmm . . . that’s funny . . .'”
  
  Probably worth its own blog post…
  - Phillip M. on April 8, 2014 3:08 AM at 3:08 am said:
    
    +1e25
    
    I like Stevens use of ‘tendency’ above. The point in any scientific endeavor is not to prove anything, but rather to take an observation, make a hypo, and ball that little edumacated suspicion up into a little paper wad. Take the wad, throw it into a body of water and then begin taking rifle shots (tests) at it (and invite other researchers to do the same). If it continues to float despite your (and others’) attempts to sink it, then there is some likelihood in truth to the hypo, until otherwise excepted.
    
    So I don’t think the exception proves the rule, I think the contrapositive merely helps discount the exceptions. Call it employing the ‘evil eye’ in scientific efforts.
    
    But conducting and communicating research in this way, which I feel is appropriate and important, does have the tendency to defy the ‘impact’ factor. Even scientific communities prefer binary outcomes or interpretations to a continuous distribution of probable ones – they’re just more ‘exciting’.
Peter on April 7, 2014 8:12 PM at 8:12 pm said:

Both literature and statistics involve induction, that is, reasoning from the specific to the general. In literature, a general idea or principle is often presented in the context of particular events happening to specific characters. In statistics, we try to infer a general principle from the specific data on hand.

(maybe this isn’t quite the simile being discussed in the post…)
- Daniel Gotthardt on April 7, 2014 8:47 PM at 8:47 pm said:
  
  Peter, I think Andrew and Thomas Basboll make a good point that stories can actually be used not only for inductive reasoning. They rather stress that anecdotes can not be used for generalization, instead good (anamolous and immutable) stories can be used for model-checking, especially critical evaluation. One could consider this an informal versions of a falsification strategy, good stories could be seen as an analogy to “good” or “hard” tests in the Popperian sense. Hard tests and good stories allow us to learn about holes in our models and encourage us to improve them. In a way one could say that they are necessary for learning in general but especially to advance scientific knowledge.
Robert L Bell on April 11, 2014 8:42 PM at 8:42 pm said:

As I try to teach my students, so many unlikely events are possible that some unlikely event is bound to occur – thus coincidence.

And by coincidence, just this afternoon I was holding forth on how some of us tried – back in the eighties – to introduce narrative into the chemical literature as a tool for organizing our presentations. In the end it was not a successful trial, for the temptation was strong to bend the facts to fit the narrative instead of cutting the narrative to fit the facts.

But the idea does have its strengths and I wish you all success.
Pingback: Data Viz News [50] | Visual Loop
Pingback: It is somewhat paradoxical that good stories tend to be anomalous, given that when it comes to statistical data, we generally want what is typical, not what is surprising. Our resolution of this paradox is . . . - Statistical Modeling, Causal Inference, a

Comments are closed.