Skip to content

Real statistics and folk statistics: modeling mental models

I was lucky to see most of the talk that Josh Tenenbaum gave in the psychology department a couple weeks ago. He was talking about some experiments that he, Charles Kemp, and others have been doing to model people’s reasoning about connectedness of concepts. For example, they give people a bunch of questions about animals (is a robin more like a sparrow than a lion is like a tiger, etc.), and then they use this to construct an implicit tree structure of how people view animals. (The actual experiments were interesting and much more sophisticated than simply asking about analogies; I’m just trying to give the basic idea.) Here’s a link to some of this work.

My quick thought was that Tenenbaum, Kemp, et al. were using real statistics to model people’s “folk statistics” (by which I mean the mental structures that people use to model the world). I have a general sense that folk statistical models are more typically treelike or even lexicographical, whereas reality (for social phenomena) is more typically approximately linear and additive. (I’m thinking here of Robyn Dawes’s classic paper on the robust beauty of additive models, and similar work on clinical vs. statistical prediction.) Anyway, the method is interesting. I wondered whether, in the talk, Tenenbaum might have been slightly blurring the distinction between normative and descriptive, in that people might actually think in terms of discrete models, but actual social phenomena might be better modeled by continuous models. So, in that sense, even if people are doing approximate Bayesian inference in their brains, it’s not quite the Bayesian inference I would do, because people are working with a particular set of discrete, even lexicographic, models, which are not what I suspect are good descriptions of most of the phenomena I study (although they might work for problems such as classifying ostriches, robins, platypuses, etc.).

Near the end of his talk, Tenenbaum did give an example where the true underlying structure was Euclidean rather than tree-like (it was a series of questions about the similarity of U.S. cities), and, indeed, there he could better model people’s responses using an underlying two-dimensional model (roughly but not exactly corresponding to the latitude-longitude positions of the cities) than a tree model, which didn’t fit so well.

I sent Tenenbaum my above comment about real and folk statistics, and he replied:

I’d expect that for either the real world or the mind’s representations of the world, some domains would be better modeled in a more discrete way and others in a more continuous way. In some cases those will match up – I talked about these correspondences towards the end of the talk, not sure if you were still there – while in other cases they might not. It would be interesting to think about both kinds of errors: domains which our best scientific understanding suggests are fundamentally continuous while the naive mind treats them as more discrete, and domains which our best scientific understanding suggests are discrete while the naive mind treats them as more continuous. I expect both situations exist.

Also, the “naive mind” is quite an idealization here. The kind of mental representation that someone adopts, and in particular whether it’s more continuous or discrete, is likely to vary with expertise, culture, and other experiential factors.

My reply:

I think the discrete/continuous distinction is a big one in statistics and not always recognized. Sometimes when people argue about Bayes/frequentist or parametric/nonparametric or whatever, I think the real issue is discrete/continuous. And I wouldn’t be surprised if this is true in psychology (for example, in my sister s work on how children think about essentialism).

Tenenbaum replied to this with:

While the focus for most of my talk emphasized tree-structured representations, towards the end I talked about a broader perspective, looking at how people might use different forms of representations to make inferences about different domains. Even the trees have a continuous flavor to them, like phylogenetic trees in biology: edge length in the graph matters for how we define the prior over distributions of properties on objects.

I’ll buy that.

On a less serious note . . .

This reminds me of all sorts of things from children’s books, such as pictures of animals that include “chicken” and “bird” as separate and parallel categories, or stories in which talking cats and dogs go fishing and catch and eat real fish! (The most bizarre of all these, to me, are the Richard Scarry stories in which the sentient characters include a cat, a dog, and a worm, and they go fishing. My naive view of the “great chain of being” would put fish above worms, but I guess Scarry had a different view.)


  1. derek says:

    It reminds me of Mickey Mouse who has a colleague, Goofy, who is a dog, and a pet dog, Pluto.

  2. Keith O"Rourke says:

    Reminds me of work I was once involved in trying using panels of experts to extract representations

    Naglie G,et al. Convening Expert Panels to Identify Mental Capacity Assessment Items. Canadian Journal on Aging 14 (4):697-705, 1995.

    An interesting one was an index to predict a court finding of child neglect – we extracted a highly non-additive one based on 5 dimensions (which almost caused someone's Phd committee to fail them because they could not get a better R^2 using an optimal additive linear model with the same 5 dimensions)

    But also, humans can often do better with sub-optimal representations than optimal ones – an old reference being

    Schroder, H.M; Driver, J.J.; and Streufert, S. (1967) Human Information Processing. New York: Holt, Rinehart, and Winston.


  3. David says:

    Wow, timely. I've been ruminating on cognitive models of learning for the last couple days. Understanding the variability in how people organize things mentally is very interesting.

  4. Michael says:

    He was named Lowly Worm, though.