“Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.”

What better day than Christmas, that day of gift-giving, to discuss “loss aversion,” the purported asymmetry in utility, whereby losses are systematically more painful than gains are pleasant?

Loss aversion is a core principle of the heuristics and biases paradigm of psychology and behavioral economics.

But it’s been controversial for a long time.

For example, back in 2005 I wrote about the well-known incoherence that people express when offered small-scale bets. (“If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization. The required utility function for money would curve so sharply as to be nonsensical (for example, U($2000)-U($1000) would have to be less than U($1000)-U($950)).”)

When Matthew Rabin and I had (separately) published papers about this in 1998 and 2000, we’d attributed the incoherent risk-averse attitude at small scales to “loss aversion” and “uncertainty aversion.” But, as pointed out by psychologist Deb Frisch, it can’t be loss aversion, as the way the problem is set up above, no losses are involved. I followed up that “uncertainty aversion” could be logically possible but I didn’t find that labeling so convincing either; instead:

I’m inclined to attribute small-stakes risk aversion to some sort of rule-following. For example, it makes sense to be risk averse for large stakes, and a natural generalization is to continue that risk aversion for payoffs in the $10, $20, $30 range. Basically, a “heuristic” or a simple rule giving us the ability to answer this sort of preference question.

By the way, I’ve used the term “attitude” above, rather than “preference.” I think “preference” is too much of a loaded word. For example, suppose I ask someone, “Do you prefer $20 or [55% chance of $30, 45% chance of $10]?” If he or she says, “I prefer the $20,” I don’t actually consider this any sort of underlying preference. It’s a response to a question. Even if it’s set up as a real choice, where they really get to pick, it’s just a preference in a particular setting. But for most of these studies, we’re really talking about attitudes.

The topic came up again the next year, in the context of the (also) well-known phenomenon that, when it comes to political attitudes about the government, people seem to respond to the trend rather than the absolute level of the economy. Again, I felt that terms such as “risk aversion” and “loss aversion” were being employed as all-purpose explanations for phenomena that didn’t really fit these stories.

And then, in the midst of all that, David Gal published an article, “A psychological law of inertia and the illusion of loss aversion,” in the inaugural issue of the Journal of Judgment and Decision Making, saying:

The principle of loss aversion is thought to explain a wide range of anomalous phenomena involving tradeoffs between losses and gains. In this article, I [Gal] show that the anomalies loss aversion was introduced to explain — the risky bet premium, the endowment effect, and the status-quo bias — are characterized not only by a loss/gain tradeoff, but by a tradeoff between the status-quo and change; and, that a propensity towards the status-quo in the latter tradeoff is sufficient to explain these phenomena. Moreover, I show that two basic psychological principles — (1) that motives drive behavior; and (2) that preferences tend to be fuzzy and ill-defined — imply the existence of a robust and fundamental propensity of this sort. Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.

I’d completely forgotten about this article until learning recently of a new review article by Gal and Derek Rucker, “The Loss of Loss Aversion: Will It Loom Larger Than Its Gain?”, making this point more thoroughly:

Loss aversion, the principle that losses loom larger than gains, is among the most widely accepted ideas in the social sciences. . . . The upshot of this review is that current evidence does not support that losses, on balance, tend to be any more impactful than gains.

But if loss aversion is unnecessary, why do psychologists and economists keep talking about it? Gal and Rucker write:

The third part of this article aims to address the question of why acceptance of loss aversion as a general principle remains pervasive and persistent among social scientists, including consumer psychologists, despite evidence to the contrary. This analysis aims to connect the persistence of a belief in loss aversion to more general ideas about belief acceptance and persistence in science.

In Table 1 of their paper, Gal and Rucker consider several phenomena, all of which are taken to provide evidence of loss aversion, can be easily explained in other ways. Here are the phenomena they talk about:

– Status quo bias

– Endowment effect

– Risky bet premium

– Hedonic impact ratings

– Sunk cost effect

– Price elasticity

– Equity risk premium

– Disposition effect

– Loss/gain framing.

The article also comes with discussions by Tory Higgins and Nira Liberman and Itamar Simonson and Ran Kivetz and rejoinder by Gal and Rucker.

58 thoughts on ““Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.”

  1. OT; Merry Christmas, Happy New Year, and, “The Ballad of Buster Scruggs” was the best statistics movie of the year.

    /Anxiously awaiting your assessment of the miner’s sampling strategy and the deep discourse throughout the vignettes about uncertainty; and, perhaps, the only thing that’s certain – and why it matters, maybe. Anyway.

    • Interesting thought. Would be interested in more discussion of specifics.

      The prospecting strategy seemed to be based on the notion that gold from a pocket would migrate downhill over time in something like a normal distribution with variance increasing as distance increases. I wonder if this is a real prospecting strategy. It seems to have physics on its side.

  2. Not my field, but this sounds like there might be large differences between different persons. Has anyone ever tried running a batch of these tests on several persons and estimating things like risk-aversion-ness and other factors (for several interesting theories within one model?) for each participant with a multilevel model?

  3. As an applied ecologist, I find the length of the section on “real-world phenomena” disconcerting. Isn’t the real world what we should be interested in? Why no discussion of buying lottery tickets or other gambling, (hard to square with risk aversion), or consideration of what advertisers think will persuade people to buy things?

  4. Going back to the choice between [x+$10] and [55% chance of x+$20, 45% chance of x], I think if you choose the latter there a way in which you are risking a loss, and it might be relevant to the psychology of the choice: if the person who makes the offer is serious, then at the moment they offer you “x + $10” in the first place, that’s money in the bank. Congratulations, you now have $(x + 10). Would you like to keep it, or would you like to risk $10 for a 55% chance to win another $10? So I don’t think it’s right to say that no losses are involved. Perhaps loss aversion is still a possible explanation.

  5. The Gal and Rucker article is excellent. Very thorough.

    My takeaway is that a bias toward inaction is, for the most part, what is going on. This strikes me as very intuitive and grounded in very rational heuristics.

    First, there is an innate bias towards conservatism. You have already decided on one thing, so you need a pretty good reason to do therwise. This is because of the additional cost of looking into the issue again as well as the realization that there was probably some wisdom in the previous choices (both your wisdom and perhaps society’s wisdom as well). Another way to say this is that you have a strong prior that favors your previous choice. Gal and Ruder touch on this explanation.

    The second is suspicion of others, which Gal and Ruder should give more weight to. When someone proposes swapping x for y, you have to ask why they want to swap. What do they know that you don’t? When they want you to swap a certain $10 for an uncertain payoff with odds that favor you, how do you know they aren’t fibbing about the odds? Interaction with salespeople quickly teaches you that you should have a strong bias towards not accepting their deals at face value. Similarly, in a trolley problem, you should have a strong bias towards not pushing fat people off bridges to their deaths because someone told you it was a good idea. There is a saying that a deal that comes to you is a bad deal. Human dishonesty is so ubiquitous, that I think humans have encoded a bias towards inaction into their nature. You rarely lose much by passing up a seemingly good deal, but it is very easy to get the short end of a deal.

    In the lab experiments underlying this research area, we assume the bets offered are honest, but why should humans set aside millennia of collective experience in this one instance? Moreover, it turns out that the experimenters are being dishonest after all. So why should we assume that experimental subjects take these experiments at face value?

  6. Inertia.

    Yesterday, at Christmas, I opined on the importance of inertia in a long marriage (tomorrow is our 43rd wedding anniversary). The audience was my wife, daughter and son-in-law, an engaged couple, and a bachelor.

    I have to say that my emphasis on inertia as an important characteristic in a long marriage was not universally appreciated. :)

  7. One thing I’ve never understood about expected utility maximisation, decision theory, or even posterior predictive distributions is: why take expectations?

    Firstly, why summarise a distribution in a single number and secondly, of all functionals you could use to summarise a distribution as a single number, why the expectation?

      • Sure, or any other functional. To be fair there are axiomatic attempts to justify using expectations (von Neumann, Savage etc etc), I guess I just don’t find them very convincing on their own, nor when compared to actual practice.

        • Presumably the arguments to justify expectations as the thing to do require things like linear/convex combinations of whatever you’re averaging over to make sense (eg to show that you’re after a linear functional).

          But I can imagine a fair few situations where a linear combination of your choices doesn’t really make sense and presumably this would block standard arguments in favour of expectations. Haven’t really thought this one through though.

    • ojm, something I’ve been mulling in this regard a lot lately is Ole Peters work on ‘ergodicity economics’. He and colleagues like Murray Gell-man have made a pretty resounding critique of use of expected utility theory in economics- and use of ensemble expectation more generally. Their analogy is that it is like considering hypothetical parallel worlds, whereas in reality we care about decision making under uncertainties *over time* in our one world/life. Time averages of course diverge from ensemble expectations where ergodicity fails to hold…which is probably quite often!

      • So I guess my general answer is that where your model yields ergodic observables, the expectation value is a useful point summary of uncertainty because it will reflect what tends to happen over time. At least, that’s the best I can come up with…

        • Yep, those are great notes! I hope they write a book. I think their project is on to something big, and the implications seem potentially quite far-ranging (at least to me).
          Here’s a simple problem I am playing with. Let’s say you are evaluating the coin flip gamble they discuss often:
          if heads -> multiply wealth by 1.5
          if tails -> multiple wealth by 0.6
          Assuming a fair coin (p=0.5), the ensemble expectation is positive (1.05), whereas the time average growth rate is negative (log(sqrt(0.9)) (i.e. great simple example of a non-ergodic gamble). What this tells us is to use the time average growth rate to make decisions (i.e. imagine embedding yourself in a sequence of gambles rather than a parallel ensemble). As both resources you linked show, this is actually what the logarithmic utility does (i.e. selects optimal multiplicative growth rate).
          Now, imagine we have uncertainty about p, and wish to use some data to better constrain p, and then make a decision. To my mind, the best route is still to go full Bayes: quantify uncertainty in p using say a Beta distribution, and then evaluate posterior expectation *of the time average growth rate* (in this case given by ). But maybe the best thing to do is just graph/report the full distribution, and then apply some kind of meta-utility to make decision?

        • Interesting! Will have to think on this…

          Another misc thought – I’ve often struggled to articulate my preference for thinking about sample size going to infinity for the same thing of interest vs many repeated applications of the same method to different problems…seems quite similar to time average vs ensemble average in stat setting, with sample size playing the role of ‘time’

        • I think this problem has been known for a LONG time in economics (decades at least). Economists think it is solved by using log returns.

          For instance, if you take log returns, the 1.5/0.6 coin flip has a negative expected return, so the problem disappears.

          log(1.5) = .47
          log(0.6) = .51
          expected log return = -.02

        • Yes, the use of log returns was initially suggested by Bernoulli in 1738! He made a mistake which Laplace corrected in 1814. But the point is that in the conventional framework this is an arbitrary utility, with only a psychological justification. Within that framework, I could simply say that my utility on wealth is linear, and thus I *should* take the gamble (hint: I shouldn’t).
          What Peters et al. are pointing out is that the correctly interpreted version from Laplace on down corresponds to is maximizing an ergodic observable, the multiplicative *rate of growth*. No need for arbitrary utility functions, just time averages over dynamics.
          Really, this is all intro material from here:
          https://arxiv.org/pdf/1405.0585.pdf

        • Chris:

          No. Using log returns has nothing to do with a utility function. The use of log returns is motivated by the desire to accurately calculate average growth rates over time. A simple example shows that calculating the average ensemble return gives the wrong answer and that using log returns gives the right answer.

          Use of log returns is often motivated by the simple example of a stock that goes up 50% one day and down 50% the next. The average return to an investor over the two days is not 0% = 50% + -50%, but rather -25% = 1 – exp(ln(1+.5) + ln(0.5)). No utility functions are invoked in this example. There is just the simple observation that the two-day return is not the simple average of the two one-day returns (ie, the ensemble return), and that using log returns gives you the right answer.

        • Anon:

          Yes, another way of saying this is that if you’re only going to bet once, then it can make sense to accept the wager. It makes sense to evaluate the wager based on expected utility.

          To put it yet another way: In decision analysis, one should analyze strategies, not individual decisions. It can be a good individual decision to take that wager if it is only offered once. But if it will be offered over and over again, the optimal decision of what to do now will depend on what you plan to do in the future.

        • Anon, you are correct of course and I’m not arguing with your analysis. It’s actually the same point that Peters et al are making ;) You want the time average of the dynamic, not the ensemble expectation of an arbitrary utility. I’m thinking you haven’t read the work that ojm and I were discussing…

        • So I find this all pretty interesting and enlightening, but I realise that it still doesn’t seem to answer one of my questions – why average, whether ensemble or time?

          That is, what is the motivation for wanting to calculate to average growth rate as opposed to the minimum, the maximum or the full distribution of growth rates? Is there a basic principle that leads to averages of some sort in decision theory?

          How does minimax/maximin and all that fit into the ergodic vs ensemble picture? Are there other ergodic notions like ‘the maximum/minimum etc over time = the maximum/minimum etc over the ensemble of possibilities?’.

        • ojm,
          Not sure I have a complete answer. I’m interested in other thoughts. But here’s my stab at it. First note that expectations are mathematically well-defined entities, which I think is a huge part of the appeal.
          Why care about the time average? Because it unambiguously tells you what happens to a dynamic (in this case over a series of gambles), as T –> Inf. It tells you whether you are going up or down if you stay in the game long enough (yes, tons of suppressed assumptions here ;)). How long that convergence takes is a whole other problem. If you want to sort multiple propositions, {Q1, Q2, …, Qn}, the time average allows you to order them and make a decision (provided you are in it ‘long enough’). For finite-time returns, presumably other considerations come into play.
          The ensemble expectation is useful likewise if the aggregate outcome across multiple parallel entities experiencing some gamble is of concern. Again, as n –> Inf, we can sort propositions Qn and ultimately make a decision.
          So, in the final analysis, I think the problem is that we need to make a definite decision and *do* something. Seeing the whole distribution of outcomes is not, in itself, helpful. At some level, the probabilities need to get collapsed to a scalar that carries with it a decision.

        • ojm:
          .
          I agree that your question has not been addressed.

          My take is that the expectation is only ONE possible statistic that an investor might take into account. In standard econ theory, the decision is based on TWO statistics, the expectation (the average return the investor foresees) and the variance (the riskiness of the investment). The expectation is not particularly privileged a priori. The min, max or other statistics could potentially play a role in another theory. In standard econ theory, the variance drops out only if the investor is risk-neutral, so the expectation is the only statistic of interest.

        • Hi Anon,

          Thanks.

          So how do investors deal with deciding between two possibilities (e1,v1) and (e2,v2)?

          Actually I now have vague memories of deriving Pareto optimal portfolios in a decision theory course I took a long time ago (which I think stoked some of my skepticism of expected utility!)

        • > decision analysis, one should analyze strategies, not individual decisions. It can be a good individual decision to take that wager if it is only offered once. But if it will be offered over and over again, the optimal decision of what to do now will depend on what you plan to do in the future.

          I went back to some old notes and things and yes, I think this is pretty spot on.

          I’m not sure the ergodic folk really manage to get away from ensembles – rather they seem to ultimately be analysing ensembles of decisions in time (eg strategies = one time decision about a sequential decision problem).

          How do they evaluate eg the uncertainty in a sequential strategy? They seem to use ensembles here, just of full time trajectories…does this fall in the ‘physicists attempt to re-formulate things that already exist camp’?

          (My original objections to expected utility seem to be discussed in the main literature too – eg von Neumann etc justified utility by assuming probability and rationality axioms etc, Savage tried to justify both using rationality axioms, limitations have been pointed out, alternatives proposed etc)

        • Hi Terry, I’m not an economist so I don’t really know how far their critique undermines fundamental theory there. My impression is that they have poked a pretty deep hole in the usual approach to decision-making problems – i.e. anything where utility functions and uncertainty are evoked together.

          FWIW, a few years back I took a grad level natural resource econ class, which was basically lots of static and dynamic optimization (via Hamiltonian’s). It was a great course, but we didn’t get too far into how stochasticity/uncertainty impacts such analyses. My own extracurricular exploration suggested “a lot”, and the couple examples we covered were definitely using ensemble expectations as plug-in values to the optimization machinery.

          I have a feeling that the ‘ergodicity economics’ project is going to shake things up considerably across the board.

          The biggest questions I have concern how to handle situations that don’t map neatly to an underlying dynamic of multiplicative (or additive) growth. In ecosystem management, we are dealing with fluctuating populations and communities, or saturating stocks of e.g. carbon and nutrients. For managing climate risk, we are definitely dealing with our *one* planet *over time*, and I would argue the biosphere as a whole is deeply non-ergodic. It seems like their perspective should apply in some way.

  8. When I discussed utility/risk functions for money in my freshman/sophomore honors college decision theory class, I always did it in terms of significant amounts of money ($100,000-$1,000,000), since these students (paying large amounts of money for an education) were already making decisions involving perhaps several hundreds of thousands of dollars. It never made sense to me, for the reasons that Andrew states, to consider piddling amounts of money since for most people their utility/risk function ought to be linear for such small amounts. Any reluctance or nonlinearity for small amounts has to be attributed to effects other than a nonlinear utility/risk function, as Andrew lists at the end of his article.

    This also makes it easy to explain why insurance works since when you are talking about the possibility of losing something very valuable — like your house burning down — it is easy to see why making an “unfair” bet with an insurance company (“unfair” in the sense that it has a negative expected return for you) is a sensible thing to do to avoid a large loss, whereas for the insurance company the possibility of a loss of that amount is still very small and thus in its linear range, so the “unfairness” premium becomes the insurance company’s profit (averaged over the entire business).

  9. ojm asks why should we use expectation for decision making? (<a href="http://statmodeling.stat.columbia.edu/2018/12/25/thus-loss-aversion-principle-rendered-superfluous-account-phenomena-introduced-explain/#comment-935000&quot; in this comment above )

    My intuition has always been that the idea is we need to have a decision rule that takes into account and should be continuously dependent on all the different possibilities. Let’s just take a simple one-dimensional case, there is some quantity A which is under our control, using a certain amount of A causes a certain amount of X to occur, with probability p(X|A), and the utility of X is U(X), we need to choose how much A to use. We want a rule D that depends simultaneously and continuously on all the possible X values for a given A and the Utility U(X).

    I think it’s easy to show that an expectation integral over X satisfies, It’s less clear why it might be unique.

    Proposed rule (maximum expected value): choose A such that integral(p(X|A)U(X),X,-inf,inf) is maximized. Let’s call that I(p(X|A)U(X)) and the chosen value A*(I(p(X|A)U(X)))

    I’m hanging out with my kids and have them watching minecraft videos in the background, so it’s not a great time to do careful math, so for now I’ll just mention some things we might care about and can attempt to prove, kinda sketch the way forward:

    1) Prove that I(p(X|A)U(X)) is continuous with respect to continuous perturbations in U(X)

    method: set up U(X)+eps*dU(X) with dU(X) a test function with compact support and maximum value 1, so that the perturbed function is perturbed by at most eps in a continuous manner in the vicinity of some X*. Show that I(p(X|A) (U(X) + eps dU(X))) changes by at most an infinitesimal amount when eps is a nonstandard infinitesimal (because I’m a nonstandard analysis guy). I think this is straightforward, since the integral is linear you wind up with I(p(X|A) U(X)) + eps * I(p(X|A) dU(X)) and eps is infinitesimal and both p(X|A) and dU are limited.

    2) Prove that A*(I(p(X|A) (U(X) + eps dU(X))) changes by at most an infinitesimal when eps is infinitesimal. This one seems harder, and might require p(X|A) to be “nice” but let’s just assume it’s a standard continuous density that continuously depends on A for now. In other words there are not sudden “transitions” between regimes of X behavior as A changes infinitesimally. That’s pretty normal for scientific models.

    If A* is a maximum of I and is unique, and we perturb U by U + eps * dU, then let A** be the maximum of I(p(X|A) (U+eps*dU))) we’re assuming that p(X|A) is continuously dependent on A, the integral I is continuously dependent on p(X|A) and eps, so if eps is infinitesimal A**-A* should be infinitesimal as well using continuity (this is super handwavy).

    3) Prove similar things for perturbations to p(X|A)… it’s symmetric with the proofs above because the role of p(X|A) and U(X) work the same inside the integral.

    So, as I said, hand waving while my kids watch highly distracting minecraft videos, so if you can point out any basic problems I’ve overlooked, I’d be happy to hear it.

    The next thing would be to somehow prove that no other functional can be continuously dependent on p(X|A) and U(x) simultaneously. It’s not clear to me that the integral has to be the only continuous functional, but then I’ve never really taken functional analysis either. ;-)

    One thing that’s clear is that one of your proposals which I think corresponds to “take the A that maximizes p(X|A) U(X)” is not continuous. we can easily imagine a bimodal p(X|A) and we can perturb U(X) in the vicinity of one mode and force A to jump dramatically and discretely from one mode to another. It seems bad for a decision rule to be non-continuous with respect to infinitesimal perturbations in U(X)

    Any thoughts? I’d be happy to come back and think more carefully about it all.

        • Not surprisingly, Choquet expected utility (ie using non additive measure theory) also exists…the general problem seems to me to be related to various notions of ‘aggregation’ in order to make a single decision. Adding things is one way to do this, but is by no means the only and obvious way…

        • Sure there are plenty of decision rules, the question is what properties do they enjoy. It seems to me like you want continuity, if an infinitesimal perturbation of the model can produce a nearstandard change in the decision that’s a bad property to have.

          Also, I think quantile based rules ignore all but one value of the uncertain variable. That doesn’t seem to be a good property. For example if there’s a 30% chance of a super good result, a decision based on the median ignores that entirely.

          Finally Wald’s essentially complete class theorem seems relevant, but I’d have to revisit it to see the details

        • If you want one that isn’t strongly dependent on a single outlier then anything related to the mean seems bad…more generally I doubt there is a unique correct answer

        • The “single outlier” issue is a sample issue. In a Bayesian context the decision rule is a functional of the posterior distribution. The sample comes in during inference to generate the posterior, if you want robustness I think it should come into the choice of model, not so much the decision rule.

        • The question of uniqueness probably has to do with the extent of the restriction on the properties, I think the interesting question is what is the smallest set of restrictions that results in the maximized expectation as the unique result, and do we agree that all of those requirements are important. It would be nice to have an answer, this is something I’ve thought about before but never really attacked carefully.

        • So it looks like we all agree that for purposes of a formalized decision rule, a distribution needs to be collapsed to a scalar. In essence our posterior densities need to be integrated over regardless, so the question is whether to integrate over the whole thing to deliver the expected utility, or if instead maybe something like a quantile function (which is also an expectation of sorts) should be used. Are we on same page there?

        • We definitely need a rule that picks a single value of the control variable, A in my example, but I don’t think that’s enough to require an integral. The rule “pick A such that U(X) for the X with highest density p(X|A) is maximized” doesn’t require an integral, but it also is not satisfactory to me for various reasons.

          My intuition is that an axiom “The decision rule should simultaneously depend on all possible values of the uncertain variable” imposes the requirement of an integral, but I can show that an integral satisfies it, but not that there does not exist any other method of satisfying it. Could there be a kind of aggregation method that’s like an integral but doesn’t involve summation?

          I do think we can go stronger though, the role that a given value of an uncertain variable X plays should be linear in the utility U(X). This is because the utility already expressed nonlinear preferences and so the aggregation method shouldn’t distort the preferences. So with this we can show integral(p(X|A)U(X), X, minf, inf) works. Next if we can argue that it should be linear in p we are done I think… And I’m having a hard time seeing why we should put up with nonlinear transformations of the p. If a possibility becomes twice as likely it should seem to factor twice as much into the decision. Also if p is zero near some X then it should factor not at all.

          So I think some set of requirements like the decision rule should simultaneously consider all options, should be linear in the utility so as not to distort preferences, should not consider possibilities with zero probability, and should consider perturbations to p linearly leads to expectation.

          I can’t see why a decision rule should distort preferences nonlinearly or distort probability nonlinearly or ignore some possibilities

        • Daniel, I think I agree.
          “I do think we can go stronger though, the role that a given value of an uncertain variable X plays should be linear in the utility U(X). This is because the utility already expressed nonlinear preferences and so the aggregation method shouldn’t distort the preferences.”

          That is also my intuition here. I’ve think you’ve got it. The non-linearity should be in the utility. The decision rule depending on the whole distribution of the uncertain variable – do we take that as axiomatic, or is there some demonstrable class of conditions where that is “best” (for some definition of “best” :))?

          I also think ojm is asking about a broader set of possibilities, like minima or maxima or minimax, whatever. To my mind, all of that should be in the model, i.e. the utility and the probability distribution, e.g.:
          We have rv X, utility U(max(X)), so in that case our expectation is over U(Z)p(Z)dZ, where Z is the EVT distribution of X (e.g. Gumbel or something).

          The point is that we are always trying to bring our analysis down to the level where we have a distribution quantifying our uncertainty, at which level we want to be linear in p and U, and the only thing to do at that point that respects the whole distribution is to integrate over it (i.e. take an expectation).

        • “The decision rule depending on the whole distribution of the uncertain variable – do we take that as axiomatic…”

          I certainly don’t have a problem with that being axiomatic, but I could imagine that we could demonstrate that we get better decisions when we do take it entirely into account. An example would be to use a utility that goes very large in regions that are ignored by other rules. So in instances like that you’re ignoring possibilities that are very important.

          Finally there’s Wald’s complete class theorem: https://projecteuclid.org/euclid.aoms/1177730345

          But I’d have to go back and look carefully at its technical requirements.

        • So apparently the classic theorem is something like:

          > A complete and transitive preference relation satisfies continuity and independence if and only if it admits a expected utility representation

          and both the continuity and independence axioms seem to me to lean on convex combinations and/or linear-esque ideas.

          So basically I think something like what I thought applies: you end up with a linear functional representation theorem by assuming preferences satisfy linear style axioms. Presumably if you drop one

        • As I see it continuity is absolutely required for a real world decision theory, you can’t have a theory where infinitesimal changes in utility produce large changes in decisions etc.

          Also if the independence is with respect to irrelevant alternatives which I think it is, that seems necessary too, eliminating some of your options shouldn’t change your decision so long as your decision remains an option. So while the rules you mention may be more interesting, they’ll also be very paradoxical.

        • There are numerous discussions about pros and cons of these axioms, they are hardly self evident to me…

          I don’t really understand people’s desire for unique ways of thinking about these issues, but even more so the willingness to accept axioms coz the give the desired result.

        • >I don’t really understand people’s desire for unique ways of thinking about these issues, but even more so the willingness to accept axioms coz the give the desired result.

          I don’t know about all that, I just want small changes in the modeling situation to lead to small changes in the decision, and I don’t see how anything rational could come out of a decision rule where you’re told to choose between A,B,C and you choose A, but then someone says “Ok, here’s A, and by the way, it turns out C wasn’t really an option they ran out of C” and you suddenly say “well if that’s the case, hell forget A I’ll take B”

          If those are the only two requirements to lead to an expectation representation theorem, then they seem perfectly fine to me.

        • Couple of things

          – I think you need to be careful about moving between formal and informal claims. A formal axiom doesn’t necessarily accurately capture the informal claim you’re making

          – If the interpretation is ‘sometimes these axioms are reasonable under a carefully chosen domain of application’ then I’m fine with that. It’s when, as in people’s interpretations of the Cox theorem, the move is made to ‘therefore this is the unique answer’.

          Eg in Cox’s case some very simple conditions are relaxed to more general and imo reasonable requirements you suddenly get a whole diverse family of ‘plausibility’ measures. Each of which appears to have some intuitive appeal under different scenarios.

          Similarly, my impression with the decision theory stuff is that when you relax the assumptions to more general and reasonable cases you get a whole family of possible approaches.

          Why the resistance to pluralism in thinking about uncertainty, decision etc?

          PS the outlier thing is about model misspecification – ie how your procedure behaves under a different sort of perturbation. This shows that what one means by ‘continuity’ depends on the details – again, the mean is very continuous as a function of the individual observations but very susceptible to individual outliers (which are, somewhat ironically, very common!)

        • RE: Formalization, yes you’re right. the formal claim for independence is something about if you prefer A to B then you should prefer a Lottery(A,C) to Lottery(B,C) when the lotteries work the same. To me this is unobjectionable. In the lottery you’re either going to get A,B,C your chance of getting C is the same between the two, so your preference should be determined by whether you like A better than B.

          I’ve already said I think continuity is essential, but I think the continuity you are referring to is the kind described here:

          https://plato.stanford.edu/entries/decision-theory/

          meaning that if you like A < B < C then you can create a lottery between A and C where you would be indifferent between that and B. I think this is your "convex combination" issue. and I could maybe see objections to this. The kind of continuity I care about is continuity of the decision to perturbations in the utility and that already presupposes the existence of a utility. The kind of theorems out there are more about "is a utility sufficient?" rather than "given a utility, is an expectation sufficient?"

          So I think our question which goes something more like "given a utility, is an expectation sufficient?" is either unanswered or answered by some other set of theorems.

          Personally I'm fine with axiomatizing the existence of a utility. I think I'm ok with requiring that to set up a decision rule you need to decide how to assign a scalar to a situation. The question I have is "how do you use the utility?" and I want the "use" to be continuous to perturbations in the utility and in the predictive distribution, and be linear in the Utility (so as not to distort preferences already assumed to be described by U).

          I wonder where that really gets us?

      • Breakdown point is relevant to finite sample estimators but much less relevant to decision theory since in decision theory we aren’t operating on a finite sample but instead a whole predictive distribution.

        But I do think that an analogous property is probably desirable. For example if the predictive distribution puts a small probability on a very good (or bad) outcome you want that to influence your decision. If there’s a 1/1000 chance your investment will result in a cure for Ebola, you want your decision rule to know that right?

  10. (“If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization. The required utility function for money would curve so sharply as to be nonsensical (for example, U($2000)-U($1000) would have to be less than U($1000)-U($950)).”)

    Imagine a person is indifferent between [x+$10] and [[50% chance of x+21, 50% chance of x], for any x.
    This person’s preferences can be represented by the following utility function:
    f(x+10)=.5[f(x+21)+f(x)].

    Equivalently, f(x)=.5[f(x+11)+f(x-10)]. Person is always indifferent between the status quo and a 50% chance of winning 11 and losing 10.
    Point #1: The phenomenon has nothing to do with “loss aversion.” Once we represent the phenomenon as an algebraic function, we see that any gamble involving a possible loss can be reframed as a gamble only involving gains. That is, the widespread belief among psychologists and economists that “Rabin’s calibration theorem” is caused by “loss aversion” is (ironically) a framing effect.

    Point #2: In order to understand the phenomenon, it helps to convert the recursive function into a standard function where f(x) is defined in terms of x.
    If f(x+10)=.5[f(x+21)+f(x)] then f(x)=1-.9999^x. It is helpful to transform the theorem so that f(1)=1. So f(x)=10,000*(1-.9999^x). That is, the limit of f(x) as x approaches infinity is 10,000. That is, an infinite amount of money is only worth 10,000 as much as $1.
    This is an absurd utility function that would not describe any real person.
    More generally, if f(x+A)=.5[f(x+B)+f(x)] then f(x)=1-k^x where k^(-A)+k^(B-A)=2.

    Point #3: The trick is that the phrase “for any x” is a seemingly innocuous but extremely restrictive algebraic assumption. It implies a negative exponential utility function where the limit of f(x) as x approaches infinity is very low. That is, the phenomenon has absolutely nothing to do with any psychological explanation (status quo bias, reference dependence, loss aversion, etc.). It is a purely algebraic phenomenon.
    “If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization.” This statement should be rephrased as “The assumption that a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x implies the existence of a utility function where U($100 trillion) is about 10,000 times greater than U($1).” This attitude can be expressed in terms of a utility function, but the utility function implied is insanely implausible.
    It’s really like a “proof by contradiction.” If we assume a person is slightly risk averse for all values of x, we make obviously false predictions. Ergo, the assumption that a reasonable person would prefer [x+$10] to [55% chance of x+$20, 45% chance of x] FOR ANY X must be false.

    POINT #4: Three Nobel prize winners in economics (Samuelson, Kahneman, Thaler) and one soon to be winner (Rabin) falsely claim this algebraic phenomenon is evidence of a psychological bias called loss aversion. As demonstrated by Gelman (1998), it has nothing to do with loss aversion. As argued here, it has nothing to do with psychology. Four Nobel prize worthy researchers falsely claim an algebraic phenomenon is a psychological phenomenon.

  11. “Matthew Rabin and I had (separately) published papers about this in 1998 and 2000,”

    Rabin’s (2000, Rabin & Thaler, 2001) calibration theorem is formally equivalent to a two paragraph classroom demonstration in Gelman (1998). Gelman’s way of framing the theorem differs from Rabin’s in three very important ways.
    1. Involves gambles that are “pure gains” and do not involve the risk of a loss
    2. Demonstrate the paradox by varying the stakes instead of wealth level.
    Gelman: Person is indifferent between [x+$10] and 55% chance of [x+$20]/45% chance of [x] for any x
    Rabin: Person prefers a gain of $10 to a 55% chance of a gain of $20 at all wealth levels
    3. Gelman uses probabilities .55/.45, Rabin uses 50-50

    The importance of difference 1 is that it rules out loss aversion as an explanation. Difference 2 clarifies the theorem at least three distinct ways.
    1.It makes the conversion into an algebraic equation much more obvious. f(x+10)=.55*f(x+20) +.45*f(x) for all x.
    2. While difference 1 shows that “loss aversion” is not a possible explanation, difference 2 shows that “reference dependence, status quo bias” and other explanations that depend on the concept of “initial wealth level” are not possible. [For example, difference 2 seems to rule out the “reference dependent” explanation in a recent paper [“we show that the paradox truly violates expected utility and that it is caused by reference dependence”]
    https://link.springer.com/article/10.1007/s11166-019-09318-0
    3. This framing makes it easy to see if the assumption “indifferent for all values of x” is intuitively plausible. It is very difficult to ask oneself “would I be indifferent between these two gambles at every wealth level.”
    So, for example, assume x=$1000
    Are you indifferent between $1010 and a 55% chance of $1020 and a 45% chance of $1000? I think most people would prefer the gamble as x increases. Intuitively, the reason you prefer $10 to 55% chance of $20 and 45% chance of $0 is because you want to guarantee you win something. Once the minimum value of the gain is $1000, I prefer the gamble.

    A person who always is indifferent has the utility function f(x)=1-k^x where k is a function of the payoffs used in the example (e.g., $10 and $20). A person who is slightly risk averse regardless of the stake has k very high. So the function approaches an asymptote very quickly. Therefore, the Utility of ($100 trillion) is maybe 10K greater than U($1).

    That is, the assumption that a person would be indifferent for any value of x is algebraically a disaster and intuitively implausible.

    Rabin’s calibration theorem is viewed as evidence for Kahneman and Tversky’s “prospect theory.” Indeed, one could argue that Rabin’s (2000) theorem and Rabin and Thaler’s (2001) simplification paved the way for Kahneman’s 2002 Nobel prize in economics (and maybe Thaler’s 2017 Nobel prize and definitely Rabin’s 202? Nobel prize). Rabin seemed to show mathematically what psychologists had shown empirically – people’s behavior cannot be modeled by expected utility theory because they are loss averse. Prospect theory incorporates loss aversion. Ironically, the specific utility function used in prospect theory (power function) is incompatible with the utility function assumed in Rabin’s theorem (negative exponential).
    http://psych.fullerton.edu/mbirnbaum/calculators/cpt_calculator.htm

    Empirical research estimates that U(x)=x^.6. For intuitive tractability, assume my utility function is U(x)=sqrt of x. What would my preference be in Gelman’s example as x increases?
    x=0
    U($10)=sqrt(10)=3.16
    U[.55($20)+.45($0)=.55*sqrt(20)=2.46

    x=10
    U($20)=sqrt(20)=4.47
    U[.55($30)+.45($10)=.55(sqrt(30)+.45(sqrt(10)=4.43

    x=20
    U($30)=sqrt(30)=5.477
    U[.55($40)+.45*($20)]=.55*sqrt(40)+.45(sqrt(20)=5.49

    So a person with the utility function U(x)=sqrt (x) will switch and prefer the gamble for all x>20.

Leave a Reply

Your email address will not be published. Required fields are marked *