Bayesian brains?

Psychology researcher Alison Gopnik discusses the idea that some of the systematic problems with human reasoning can be explained by systematic flaws in the statistical models we implicitly use.

I really like this idea and I’ll return to it in a bit. But first I need to discuss a minor (but, I think, ultimately crucial) disagreement I have with how Gopnik describes Bayesian inference. She writes:

The Bayesian idea is simple, but it turns out to be very powerful. It’s so powerful, in fact, that computer scientists are using it to design intelligent learning machines, and more and more psychologists think that it might explain human intelligence. Bayesian inference is a way to use statistical data to evaluate hypotheses and make predictions. These might be scientific hypotheses and predictions or everyday ones.

So far, so good. Next comes the problem (as I see it). Gopnik writes:

Here’s a simple bit of Bayesian election thinking. In early September, the polls suddenly improved for Obama. It could be because the convention inspired and rejuvenated Democrats. Or it could be because Romney’s overly rapid response to the Benghazi attack turned out to be a political gaffe. Or it could be because liberal pollsters deliberately manipulated the results. How could you rationally decide among those hypotheses? . . .

Combining your prior beliefs about the hypotheses and the likelihood of the data can help you . . . In this case, the inspiring convention idea is both likely to begin with and likely to have led to the change in the polls, so it wins out over the other two.

I have no problem with the general message here (which is why I label it as only “slightly” garbled) but I’d like to make one correction on the details. (I’m a statistician; I care about details.) I’m (slightly) unhappy with Gopnik’s framing of the problem as A, or B, or C. It’s really A, B, and C. The appropriate claim, I believe, is not that A is more likely than B or C, but rather that the continuous effect A is probably larger than B or C.

As noted, this point is minor–I have no problem with Gopnik’s summary that one of the hypotheses “wins out over the other two.” But I think we are led to confusion if we place ourselves in an either/or setting. (This is probably a good place for me to plug my article with Kari Lock from a couple years ago on Bayesian combination of state polls and election forecasts, where we use continuous weighting.)

Blame the discrete models, not the priors

One way this seemingly minor point can matter is when we follow Gopnik’s suggestion that Bayesian inference “might explain human intelligence.” I agree that we naturally think discretely. But discrete thinking does not describe how much of the biological social world works. Out there are lots and lots of varying effects of varying sizes. If we, as humans, take these continuous phenomena and try to model them discretely, we will trip up, in predictable ways–even if we use (discrete) Bayesian methods.

To put it another way: what if Josh Tenenbaum and his colleagues (not mentioned in Gopnik’s article but you can search for them here on the blog) are right that our brains use some sort of approximate discrete Bayesian reasoning to make decisions and perform inferences about the world? Then this should imply some predictable errors.

Gopnik asks, “If kids are so smart, why are adults so stupid?” She’s referring to this experiment done in her lab: “We gave 4-year-olds and adults evidence about a toy that worked in an unusual way. The correct hypothesis about the toy had a low ‘prior’ but was strongly supported by the data. The 4-year-olds were actually more likely to figure out the toy than the adults were.”

In that example, Gopnik might well be correct: it seems reasonable to suspect that a kid will have a better prior than an adult on how a toy works.

More generally, though, I think we should avoid the temptation to think that, when a Bayesian inference goes wrong, it has to be a problem with the prior. That’s old-fashioned thinking, the idea that the likelihood is God-given and known perfectly, leaving us all to fight over our priors. In many cases, the model matters (for example, in our discussion above about natural-seeming but flawed discrete models). Even if the data model generally makes sense, its details can matter: as I point out to my students, the prior only counts once in the posterior, but the likelihood comes in over and over again, once for each data point.

If, as I think is the case, our brains like discrete models (perhaps they can be more quickly coded and computed) but the world is continuous and varying, this suggests interesting systematic ways that our brains might be misunderstanding the world in everyday reasoning. (Conversely, if discrete models really do have major computational advantages, maybe statisticians like myself should be giving them a second look.)

P.S. This post had been titled, “I notice a (slightly) garbled version of Bayesian inference, which provokes some thoughts on the applicability of Bayesian models of human reasoning.” But I think the new title is better!

13 thoughts on “Bayesian brains?

  1. As is so often the case, ordinary deductive logic explains the reasoning. The Bayesian reconstruction is a biproduct of its being built on deductive logic about truth functional connectives. For some discussion on non-Bayesian uses of background to determine what is (and is not) well-corroborated, perhaps see errorstatistics.com.

  2. > kid will have a better prior than an adult
    My speculation would have been a weaker prior that did not ruin the likelihood
    (and this being unusual given as you say its usually the other way around)

    My other speculation, picked-up from Karl Primbram, would be it has to be _muscle like_
    and to me muscles act very smoothly but discretely (finite set of forces).

    So if by continuous you really mean something that cannot be represented by a rational number, I think you are on the wrong track for anything that is not just an abstraction.

    • I agree. Another way to put it is that when Gopnik writes “the correct hypothesis about the toy had a low prior”, she skips over the question of *whose* prior assigns a low probability to the correct hypothesis. It may be the case that adults have typically accumulated evidence that results in a prior that puts a lot of mass elsewhere in the space of possible hypotheses about the toy, whereas children simply haven’t.

      • Actually, Gopnik doesn’t skip over the question of whose prior — it’s just not in the part AG quoted here. Just prior (haha) to the phrase “the correct hypothesis about the toy had a low prior” she writes, “As we get older our “priors,” rationally enough, get stronger and stronger. We rely more on what we already know, or think we know, and less on new data. In some studies we’re doing in my lab now, my colleagues and I found that the very fact that children know less makes them able to learn more.”

  3. Funny, I find bayesian reasoning most likely to be used on the level of continuous representations and autonomous or autonomic responses. They have had most time to evolve towards the theoretical optimum, which under uncertainty must be some kind of bayesianism — unless the internal representation of the world is totally lacking so that there is only a reflexive loop.

    On the level of concepts we rely more on deduction and uncertainty is poorly represented.

    The power of continuous representations is the ability to generalize well. Discretization destroys the topology: things that are nearby in the world are not nearby in the representation.

  4. interesting post!

    perhaps categorical perception fits what you are thinking about as the “interesting systematic ways that our brains might be misunderstanding the world in everyday reasoning.” categorical perception looks at how learning discrete categories over a continuous dimension shapes our perception of that dimension. the perceptual magnet effect also would fit this label. feldman, griffiths, & morgan (2009) have an interesting paper in psychological review exploring this possibility.

    at any rate, i haven’t discussed this personally with josh or tom, but my own intuition is that likelihoods as God-given is a useful simplification to get started in understanding how people learn in a domain. that’s how i use them at least. there has been a little work looking into what sorts of likelihoods people use and how that varies with domain/context by navarro & lee (2012), but it still is quite limited.

    i agree with janne that discretization destroys the ability of continuous representations to generalize things close in some dimensions. however, i wouldn’t frame it as a bad thing, but a good thing to make it possible to generalize well between things that on the surface look very different (e.g., dolphins and cats).
    =joe

  5. Janne: I agree that any generalization (abstraction/representation) is necessarily continuous (between any two there is a third in between) but anything that currently _exists_ in this particular universe is discrete (at some point between two things there no longer is a third thing).

    And brains exist.

  6. A basic principle of discrete models is that the different categories should be mutually exclusive. Pointing out the problems with violating this rule should not be taken as a criticism of discrete models in general.

  7. This whole “irrational” trope fries our ice cream. It is just dumb. By definition, all animals have to, had to be, very good at probabilities and managing risk and uncertainty – which are very different.

    All life forms MUST operate logically and probabilistically to survive. duh

    This has been a very clever and deeply dishonest lie promoted by economists who found that their models didn’t work – so they demonize humans as “irrational.” The whole behavioral econ scam is based on this. it has proved quite popular and successful as many lies are – why lying is an adaptive strategy.

    In addition, we just found one little paper by some really obscure guys in Ireland that show human decision making is really logical and probabilistic – except when there is a lot of noise. It’s just one study but….

    Our view is that natural language and the ideologies that it creates are the problem here. It’s just silly semantics. Apparently, if we look at how neurons work they are pretty continuous and designed to track the real world — not human words and ideology.

    Ideology is basically a power over others and sales tactic so it will create discreet, simple categories ‘cus those tax the brain the least. Dichotomies and labels are always popular. They don’t exist, we make them up.

    The brain is designed to get stuff in space and maintain homeostasis in the process by exchanging energy. Any brain that can’t git wid the stats of this — every moment of every day – dies.

    Study neurons and how they calculate stimuli. Start with vision.

Comments are closed.