When does mathematical tractability predict human behavior?

This is Jessica. I have been reading theoretical papers recently that deal with human behavior in aggregate (e.g., game theory, distribution shift in ML), and have been puzzling a little over statements made about the link between solution complexity and human behavior. For example: 

In a book on algorithmic game theory: “Can a Nash equilibrium be computed efficiently, either by an algorithm or the players themselves? In zero-sum games like Rock-Paper-Scissors, where the payoff pair in each entry sums to zero, this can be done via linear programming, or if a small amount of error can be tolerated, via simple iterated learning algorithms […] These algorithms give credence to the Nash equilibrium concept as a good prediction of behavior in zero-sum games.” 

Similarly, from a classic paper on the complexity of computing a Nash equilibrium

“Universality is a desirable attribute for an equilibrium concept. Of course, such a concept must also be natural and credible as a prediction of behavior by a group of agents— for example, pure Nash seems preferable to mixed Nash, in games that do have a pure Nash equilibrium. But there is a third important desideratum on equilibrium concepts, of a computational nature: An equilibrium concept should be efficiently computable if it is to be taken seriously as a prediction of what a group of agents will do. Because, if computing a particular kind of equilibrium is an intractable problem, of the kind that take lifetimes of the universe to solve on the world’s fastest computers, it is ludicrous to expect that it can be arrived at in real life.”

Finally, in a paper on strategic classification (where Contestant refers to a person best responding to a classifier by manipulating their features in order to get a better label):

“We develop our classifier using c_assumed, but then for tests allow Contestant to best-respond to the classifier given the cost function c_true. We note that finding the best response to a linear classifier given the cost function c_true is a simple calculus problem.”

When I read things like this, I wonder: What is the landscape of assumptions behind statements like this, which might seem to suggest that a solution concept is a more valid predictor of human behavior if an algorithm can efficiently solve for an exact solution? As someone who often thinks about how human cognition compares to different kinds of statistical processing, I find this idea intriguing but also kind of murky.

For instance, the first example above reads to me like an assertion that ‘if it’s computable, then it’s likely people will achieve it’ whereas the second reads more like, ‘if it’s computable, then it’s within the realm of possibility that people can achieve it.’ But is this difference an artifact of different writing styles or a real difference in assumptions? 

I can see making the argument, as the second quote above implies, that if no algorithm other than brute force search through all possible solutions can solve it, then people may not ever reach the equilibrium because human time is finite. But often solution concepts in theoretical work are exact. Only one of the above examples mentions approximate correctness, which is closer to concepts from foundations of learning theory, like probability approximately correct (PAC) learning, where it’s not about learning an exact solution but about distinguishing when we can learn one within some error bound, with some probability. This kind of approximate guarantee seems more reasonable as an expectation of what people can do. I’m thinking about all the work on heuristics and biases implying that for many judgments people rely on simple approximations and sometimes substitutions of hard to estimate quantities, like deciding whether or not to purchase insurance based on how bad you feel when you imagine your home on fire. 

So I asked a colleague who is a computer science / game theorist, Jason Hartline, what to make of statements like this. Here’s his explanation:

The work on computation and equilibrium really starts with the negative result.  If a computer can’t compute it, neither can people.  This negative result is a critique on one of Nash’s main selling points: existence.   And again, this is a negative perspective:  a model of behavior cannot be correct if it has nothing to say about some settings.  The pure game theorists then say that Nash exists everywhere so (to have a double negative) it’s not obviously wrong.  The work on computation says that well, maybe it is wrong, since if computers can’t find Nash, then people can’t find Nash.  So there are settings where Nash cannot describe people.

Ok, so what do these “algorithmic game theorists” think about games where Nash is computable, like zero sum games.  Well, this is a negative negative result.  We can’t prove that Nash is wrong here using computation.  But is still not a positive result for people playing Nash.  I’m not sure what there is for theory people in thinking about whether this negative negative result should be a positive result.

Whether people do or do not play Nash seems like more of an empirical question than a theoretical question.  Here the experiments point to stuff like quantal response equilibrium, or cognitive hierarchy, or quantal cognitive hierarchy.  Sometimes these models are good for theory, and in these cases, theorists are happy to use them instead.

It seems from this that the example statements above are essentially saying is that if we can’t efficiently compute it, we can’t expect people to reach it. What’s still a bit ambiguous to me is the underlying assumption about the comparability of human information processing and computation. When can we make logical statements that imply some form of equivalence in computability between the two and where should we expect it to break down? Maybe people who work on human versus computational learning have thoughts on this, or even neuroscience or mathematical psych.  Apparently there is also the Church-Turing thesis (see the section on philosophy of mind).

47 thoughts on “When does mathematical tractability predict human behavior?

    • This is an incredibly bold claim so it’d be nice to see evidence for it. Why on earth would evolution have made humans capable of solving general equilibrium problems, instead of the (probably) <10 specific ones that occur 99% of the time?

      And how would evolution would have done this if (as is likely) there is no feasible-size logic circuit to solve such equilibrium problems?

      • It doesn’t seem like dl is saying that people can solve “general equilibrium problems” as opposed to a relevant subset.

        With respect to general problems, if you blunder into a strategy for achieving equilibrium, you could recognize its benefits without recognizing that it’s actually an equilibrium and repeat it in the future. Evolution doesn’t need to be involved, even for wide adoption of the solution, given humans’ capacity for imitation at the level of individuals and cultures.

        I mean, society as a computer is in Hitchhiker’s Guide, so it must be true, right?

        • “Blundering” into the equilibrium of a hard game is about as likely as “blundering” into a 100-letter password.

      • well but if you are talking games that don’t occur in real life, then you are in the realm of pure math and it’s irrelevant whether your average human can solve it…so the critique of nash doesn’t work in that context… …and in any case egt/behavioral game theory shows that agents can learn unfamiliar games too…

        • The theoretical results show that in fact a wide array of games are too complicated for humans to find the equilibrium of. In fact many natural ones are PPAD-complete, roughly meaning that if you can solve one kind of game efficiently, you could solve them all. (Kind of like how almost any combinatorial problem above a low complexity threshold ends up being NP-complete.)

          The onus is rather on the evolutionary psychologist to show that the games they claim are learnable, are indeed learnable. The default assumption, given our knowledge, is that most games are hard. (Too hard for humans.)

        • I don’t really know this stuff, how do you define learnable? Put me down in front of a large traveling salesman problem, and while I won’t find the solution, I also won’t wander aimlessly. Does that count?

      • See the comment by Sandro below:

        https://statmodeling.stat.columbia.edu/2021/07/06/when-does-mathematical-tractability-predict-human-behavior/#comment-1906767

        Cognition not needed. Think of it as a dynamical system where selection pressure drives out unsuccessful strategies. Genetic transmission of strategies means higher payoffs lead to more offspring playing the successful strategies. Classic reference is John Maynard Smith’s book “Evolution and the Theory of Games”.

  1. Roger Penrose doesn’t believe the Church-Turing thesis: he famously thinks that human brains are able to solve not just NP-complete but uncomputable problems — recognising mathematical truth. He thinks quantum gravity is involved. As far as I can tell, this hypothesis does get some respect, but mostly because he’s Roger Penrose.

    • I don’t know anybody who works in actual theory of mind or computation who takes Penrose’s claims seriously. They get propagated by dilettantes who think about theory of mind for 10 minutes every 10 years.

      • I’m not saying that anyone thinks he’s right, but quite a few people seem to think his argument is interesting or important enough to be introduced and argued against — that’s the only way I’ve encountered the argument. For example, Scott Aaronson and Daniel Dennett

    • Penrose’s claim is based on a mathematical error. For some reason, he thinks that logicians can determine that something is true without being able to prove it. Many logicians have tried to explain his mathematical error, but he persists. If you’ve taken a course in mathematical logic, you can spot the error early in his first book on the topic.

  2. I think the definition of “computable” has gotten fuzzy here. The second quote implies that something is computable if it is P, solvable in polynomial time. But determining whether a problem is P or not is itself very difficult and remains an open question in many important, real-world applications. It’s not clear to me from the quote whether the author is dismissing the plausibility of all theories iff they’ve been proven to be P.

    If that’s the case, I’m still skeptical: if the solution to (or algorithm for solving) a problem is evolutionarily hardwired into a species, it reduces the required computing resources. For example, neurotypical human beings have a part of our brain that’s able to pose, model, and solve complex social problems like, “If Alice knows that Bob has a crush on her, but Bob only suspects that Alice knows, and Alice believes Bob doesn’t suspect, what are the conditions under which I should tell Alice what Bob suspects about what Alice knows about what Bob feels?” So the human brain doesn’t have to go through all the complex calculations that a computer would to get an answer.

    As for where the hardwired solution came from in the first place, remember that natural selection is a brute force process for trying random variations on existing solutions, that takes place continuously and in parallel for eons. By “in parallel” I mean every solution is simultaneously and blindly tested for every current problem of adaptation, and many failed solutions are banked in our DNA for later generations to try on new problems. Even natural selection can’t solve a P-hard problem with brute force, but it doesn’t have to: it’s far easier to find one of the infinite hard problems that you can solve by brute force than it is to find the brute-force solution to a specific hard problem.

    All that said, the author’s definition is for feasibility, not computability. Some problems, like computing the unconditional complexity of a sequence (without narrowing the permissible solutions to a specific algorithm), cannot be solved, period. Those probably don’t describe human behavior.

    • Michael Nelson says:

      “and solve complex social problems like, “If Alice knows that Bob has a crush on her, but Bob…”

      This seems like a problem to which every outcome is the solution. I’m not sure I get the larger picture here but in this case it seems like it would be easy to think a human “solved” a problem when all they did was make random decisions that lead to some random outcome. How would you know what outcome they were solving for to know that they came up with some kind of optimized solution?

  3. The idea is essentially this.

    To the best of our current knowledge, computer can simulate a human brain, without being much bigger than the brain itself, and without the simulation being much slower than the brain itself.

    The results you mention in this post say: in order to find a Nash equilibrium of even a moderate-size game, a computer needs to either be insanely big, or take an insanely long time. (Think size of the solar system, age of the solar system.)

    Therefore, a human brain (or several) can’t find that equilibrium either. Because if it could, a reasonable-size computer could simulate it and find the equilibrium as well.

    • What you say is 100% true. If the question we’re asking is, can a brain solve a particular hard problem posed to it? Then, of course not. But a more relevant question is, can a brain perform a behavior that is the solution to some hard problem? Of course it can.

      It’s like, can I mentally compute the prime factors of a very large composite number? Nope. Can I plug a particular prime into a bunch of very large composites and find one of which it’s a factor? Yep.

      So a theory about human behavior that supposes human brains are solving very hard problems is false, but a theory about human behavior that supposes certain behaviors can only be predicted by solving very hard problems may be true.

        • “As in, how baseball outfielders instantly plot curves of the ball.”

          Dunno about that one, seems like the trajectory of a baseball is a relatively simple problem. Even simpler if the computer has ten years experience following the trajectory of thousands of fly balls.

        • I should probably drop it (heh), but I only meant that at the crack of the bat it’s a hard multivariate prediction that the human action of the outfielder solves, as in Michael’s paragraph 1; and if you were to try to program a computer to send the outfielder to the right place at the end of the curve in real time, that would be very, very hard. Human action appears to be doing calculus under a strict time constraint. Or maybe not. Finis.

        • That robot catches a ball tossed—gently—directly in its direction from about 10 feet away, 80% of the time. Major leagues here we come. Move it 200 feet away and make it turn and run, searching the sky, and what is the accuracy? Well worth your 5 seconds of googling! Maybe worth 10! Come on, be nice, I only suggested an analogy that some problems are hard to plot but routinely solved. Other people here are suggesting that I and others are underestimating the Nash computability problems, and that’s fine and interesting.

      • What you’re saying sounds to me basically like the following.

        “A human can’t guess a 100-letter password. But a human could type in the 100-letter password, if it given to them.”

        Well sure. But how is the human getting a hold of the password in the first place?

        The theoretical results mentioned in this post imply that the 100-letter password (the Nash equilibrium) can’t be found in any reasonable timeframe by any computational device of reasonable size. So a theory that claims humans are going around executing Nash equilibria without “solving for them” is kind of like a theory that claims humans are going around typing in correct 100-letter passwords on the first try.

    • Good question. Any iguana experts out there?

      But also: How do I “compute” where and when to put down a foot when I’m walking on irregular ground? I certainly don’t do it intentionally or consciously. But maybe my brain responds well to “negative reinforcement” — if some “computation” results in a fall, sprained ankle, or other injury, then my brain presumably assigns a low probability to future use of the movement that resulted in the misstep?

      • I am going to make a bold conjecture. Is it possible that our understanding of how things come to be (how this transforms to that and so on) is subject to fundamental overarching limitation set by the organization of the substrate of our minds; and so that there are modes of deciding that are forever outside the scope of our conceptual apparatus. Conversely, we tend therefore inevitably, inexorably and without exception to seek solutions to decision and prediction problems within the scope of algorithms, be they sequential, parallel, stochastic; nevertheless the scope of our imagination is limited to what the mental substrate will support. Perhaps there are “algorithmics” that stand in relation to our conceptual apparatus, in the same way that higher transfinite recursions stands in relation to the ordinary induction on the integers which we “see”. (The “Goodstein” sequence converges to zero; though in a number of steps whose terms cannot be enumerated in any sense that is concrete to us.) In short, is it possible that there are problem-solving modes the outline of which we cannot even conceive, within the scope of what the substrate of our thinking will support?

  4. This touches on a lot of areas, so I wanted to make a few somewhat related remarks:

    1) Although any computable function can be performed by a serial tape machine with an alphabet of discrete symbols (this is what Turing showed), that doesn’t mean that all computers need be serial, discrete, or symbolic. For example, old naval fire control computers and other analog computers can do things instantaneously that, in a digital computer, would require onerous numerical integration. Biological systems are built very differently from human-engineered systems, so it is not clear whether something that is difficult or complex for one type of system would necessarily be difficult or complex for another.

    2) What is optimal depends on the constraints of the problem, and biological systems are under many constraints that may not be included in a model. If behavior is “non-optimal”, this may just indicate that the model doesn’t include the necessary constraints. Similarly, adding constraints may make finding an optimum (for whatever is being optimized) easier than in a more abstract setting.

    3) When we’re talking about human minds, we often assume that we can introspect, to a degree, about what we are doing at any given moment. As a result, it often seems unrealistic that we would calculate or compute some solution, and maybe it would be silly if we were doing it “consciously”. The same complaint could be made that an electron does not “calculate” its attraction to charged bodies around it. Much of what goes on in the world, including in our own heads, is not so open to introspection (though, contra Hinton’s remarks in the other post, this doesn’t mean that they are not open to *explanation*). So typically when we have a mathematical model of human or animal behavior, the model is not explaining “what it feels like” to produce the behavior; instead, like any other mathematical model of a phenomenon, it is saying, “the behavior can be explained as the outcome of a set of processes applied to particular types of representations.”

    • The organizational and biological substrate of the mind determines the conceptual categories in which we perceive and in which we think. There may be modes of attention and of thinking that are outside our ken; which we cannot even imagine.

        • Precisely those the nature of which we cannot even imagine. But perhaps we can imagine that there may *be* such (although we cannot imagine *what* they are).

    • Nicely put and with regard to your later Orange Catholic Bible comment – the tick is a great example (apologies for suggesting we are like ticks).

      From Animal Sensing, Acting and Knowing: Bridging the Relations between Brains, Bodies and World.

      “the tick has no visual or aural apparatus [only senses presence of butyric acid, physical contact and heat], and gives no evidence of having even the ability of detecting anything other than the three specific aspects of the world above, it would make no sense to say that the tick “knows” that the blood it feasts on is carried by such unfathomable phenomena as “horses”, “cows” and “pigs”, much less that it “ knows” that it is the sweat of these animals that carries the butyric acid that alone sets the tick’s function cycle in action, feeding it and allowing it to survive. And yet: there are such things as horses, cows and pigs actually existing in the world and it is the sweat of these animals that carries the butyric acid that alone sets the tick’s “function cycle” in action, feeding it and allowing it to survive.”

      So just by adequately representing three limited aspects of reality, the tick has a wrong model that is useful enough to survive and today where I live, flourish!

      • The Vermin only tease and pinch
        Their Foes superior by an Inch.
        So, Naturalists observe, a Flea
        Hath smaller Fleas that on him prey,
        And these have smaller yet to bite ’em,
        And so proceed ad infinitum:
        Thus every Poet, in his Kind
        Is bit by him that comes behind.

        -Swift

  5. A useful concept for this discussion is PAC learnability (https://en.wikipedia.org/wiki/Probably_approximately_correct_learning) which describes, more or less, whether you can improve your performance on a problem via trial and error if you start with no prior knowledge. The idea here is that if a problem is efficiently PAC learnable then you can evolve toward a Nash equilibrium without any explicit computation, by either human or machine.

    Unfortunately, there are some open questions about the relationship between PAC learnability and computational complexity. We’re pretty sure, however, that there are some problems that are easy to solve algorithmically but that it’s hard to learn the solution to via trial and error. It’s also open to what extent PAC learnability is a good model for problems learnable by neural networks (like our brain).

  6. The halting problem is undecidable in general. But I can tell you that in the following


    m, b = 1, 1
    def f(x):
    return m * x + b

    def g(x):
    return g(x)

    f halts and g doesn’t. A general problem can be NP-complete, while the tiny subclass of the problem actually encountered in nature can be easily computable or approximated heuristically. All this to say, I don’t find these abstract philosophical strategies for model evaluation terribly compelling. The assumptions are hard and can cut either way in a way that feels obvious depending on how you frame them–I’d only really trust empirics here. I think of economics stuff like Cobb-Douglas or whatever less as laws like Noether’s conservation laws and more as a family of narratives that I draw upon to guide model development, just building blocks that give me some structure to grab onto, more tools than worldviews.

  7. It seems strange to believe that because a particular Nash Equilibrium is hard to compute (in terms of time or resources), it is not realistic that humans are finding the Nash Equilibrium. There are billions of people, and I have a very simple algorythm for finding the Nash equilibrium. 1. Guess at a solution.
    2. If not satisfied, imitate someone’s solution who is more satisfied.
    3. Repeat until satisfied.

    Over the course of human history, this will often get us to a solution, and explains why people are so wedded to tradition and customs. By imitating others, we are using the entire computational power of humanity (or at least certain subgroups). It also is not the case that every possibility is equally available to me. So, we start with guesses that are close to the equilibrium.

    • Right, if we are talking about “solved problems” it stands to reason that people stumbled on actionable rules of thumb that approximate the formal solutions.

    • It is very easy to give examples where your 1-2-3 strategy does not converge to any equilibrium. Look up “congestion game” for a small introduction.

      Billions of people is nothing compared to the computational resources that the theoretical results imply are necessary to solve even moderate-sized games.

      • The book that my first quote came from has a nice introduction to congestion games in fact. While the sentence I quoted obviously confused me a bit, as a non-theorist trying to get acquainted with these topics I find Roughgarden’s writing very useful.

  8. Interesting observation. Coincidentally, I have been thinking about a similar issue in psycholinguistics. In sentence processing, so-called “information theoretic” accounts of a rational sentence comprehender (a human) are all the rage. Examples of this are the “entropy” and “surprisal” based accounts, “uniform information density”, and the “noisy channel” view of sentence processing. All of these invoke some kind of information-theoretic forumulation about the uncertainty associated with uncertain input. The evidence for these accounts is shaky at best (this is not especially a critique of information-theoretic accounts, pretty much all theoretical accounts are backed up by extremely shaky evidence, most of the time fictional claims where everyone decides to agree that p-values larger than 0.05 are in fact smaller than 0.05). But what’s interesting here is not the shaky evidence but the assumption that these mathematical concepts can translate to behavior observed in experimental data that is unfolding in 100s of milliseconds. It turns out that psychologists have thought about this point. Luce has written a nice article on this point, “Whatever Happened to Information Theory in Psychology?”, published in Review of General Psychology, 2003, (Vol 7, Np 2, 183-188), in which he quotes Claude Shannon expression some annoyance at people using his mathematical theory of communication willy-nilly for every conceivable psychology application. Here is a quote from Luce’s paper:

    “The enthusiasm—nay, faddishness— of the times is hard to capture now. Many felt that a very deep truth of the mind had been uncovered. Yet, Shannon was skeptical. He is quoted as saying “Information theory has perhaps ballooned to an importance beyond its actual accomplishments” (as cited in Johnson, 2001). And Myron Tribus (1979, p. 1) wrote: “In 1961 Professor Shannon, in a private conversation, made it quite clear to me that he considered applications of his work to problems outside of communication theory to be suspect and he did not attach fundamental significance to them.” These skeptical views strike me as appropriate.”

    In recent years, information-theoretic accounts have started receiving a lot more attention, and sometimes the story really starts to push the limits of credibility. For example, there is a PNAS article on the noisy channel view (basically, Bayes’ rule dressed up as Shannon’s information theory applied to sentence processing), in which the claim is that if you hear a sentence like “Mary gave the candle the girl” the reader will mentally back off to a more plausible interpretation (the input is noisy, so the system backs off to the prior–Bayes rule) that “Mary gave the candle to the girl”, by inserting a “to” mentally to restructure the sentence. However, the key problem here is that the experimental task asked the participant a question after each sentence was shown: e.g., Did Mary give the girl the candle? Or Did Mary give the candle the girl? What this question does is it primes the participant to back off to a more plausible interpretation. The experiment hardly shows that people do this kind of restructuring spontaneously. Of course if you prime them to rethink the sentence they will. Amazing side note: the editor was none other than John Anderson, a famous psychologist at CMU. I guess the authors of the PNAS paper (some of them are friends/colleagues of mine) was an all-star cast so that paper got accepted without any caveats to the conclusions. If I were editing or reviewing that paper, I would have asked the authors to write a limitations section that the empirical demonstration for noisy channel is not as strong as it looks, because of the way the task is set up.

    Anyway, my point is that like Jennifer, I find this overhyping of such mathematical constructs as frameworks for explaining moment-by-moment human behavior to be questionable at best. I don’t mind if people make these kinds of theoretical proposals, but I would take them more seriously if they were underhyped and presented as a possible way to think about some fictional “idealized” comprehender, and that reality might be a lot more messy and complicated. I suspect that these approaches are treated as extremely attractive because it feels right that a mathematical theory of communcation should apply to everything under the sun that involves any kind of communication. Sort of like those physicists (nut-jobs in my opinion) who think that the laws of physics should apply to moment-by-moment cognitive processes in humans.

      • This is a strange one. I would summarize the article as saying, “organisms have memory and use their memory in combination with environmental constraints to select adaptive actions.” Or maybe more broadly, “memory takes many forms.”

        My first reaction was, therefore, negative because to me, there are not novel claims and, as presented in the article, are too vague and tied up in obscure verbiage to have much explanatory or predictive utility.

        But then I realized that I am not the target audience for this article. The target audience seems to be people who are familiar with the terms from philosophy that they invoke, and that is not me. Their paper is saying something like, “if you know something about what CS Pierce said, here is how you can apply those concepts to a different domain.”

        So to the right audience, I can see how this would be useful, it’s just that it wasn’t particularly useful for me. The article is making an analogy, and those can be powerful tools for gaining insight. Of course, the problem with analogies is that they can give a false sense of understanding when the two things being analogized are not perfectly aligned, which except for perfect isomorphisms, is all the time. This is how you get things like the endless stream of “simple models of evolution” parodied by Shalizi or the wheel-spinning info theory work that Shravan mentions in psycholinguistics. Someone thinks that their understanding of one domain can be directly imported to understand another, but it is never so tidy.

        • Thanks this is a fair comment “if you know something about what CS Pierce said, here is how you can apply those concepts to a different domain”. Sort of like using complex analysis to explain something to those who have not studied complex numbers.

          I was thinking it might be helpful in thinking about Bayesian Workflow as a ongoing conversation involving many contingences “an emergent, many ends-directed world of promiscuous, unforeseeable possibility and interacting [purposes and goals]”

      • Hi Keith, sorry, I didn’t see this response until today. I didn’t understand what you meant by “This seemed sensible?”.

        Also, very sorry to Jessica for calling her Jennifer by mistake! Gelman and (Jennifer) Hill is very salient in my head, so I misretrieved the name.

        • No problem, gec and my response to him allowed some clarification.

          I believe it is more in your area of interest and wondered how it might fit in with wider views of conversational trajectories.

          But only if you find it of interest.

Leave a Reply to Michael Nelson Cancel reply

Your email address will not be published. Required fields are marked *