Comments on: “Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.”

By: deb

deb — Mon, 21 Sep 2020 17:42:58 +0000

“Matthew Rabin and I had (separately) published papers about this in 1998 and 2000,”

Rabin’s (2000, Rabin & Thaler, 2001) calibration theorem is formally equivalent to a two paragraph classroom demonstration in Gelman (1998). Gelman’s way of framing the theorem differs from Rabin’s in three very important ways.
1. Involves gambles that are “pure gains” and do not involve the risk of a loss
2. Demonstrate the paradox by varying the stakes instead of wealth level.
Gelman: Person is indifferent between [x+$10] and 55% chance of [x+$20]/45% chance of [x] for any x
Rabin: Person prefers a gain of $10 to a 55% chance of a gain of $20 at all wealth levels
3. Gelman uses probabilities .55/.45, Rabin uses 50-50

The importance of difference 1 is that it rules out loss aversion as an explanation. Difference 2 clarifies the theorem at least three distinct ways.
1.It makes the conversion into an algebraic equation much more obvious. f(x+10)=.55*f(x+20) +.45*f(x) for all x.
2. While difference 1 shows that “loss aversion” is not a possible explanation, difference 2 shows that “reference dependence, status quo bias” and other explanations that depend on the concept of “initial wealth level” are not possible. [For example, difference 2 seems to rule out the “reference dependent” explanation in a recent paper [“we show that the paradox truly violates expected utility and that it is caused by reference dependence”]
https://link.springer.com/article/10.1007/s11166-019-09318-0
3. This framing makes it easy to see if the assumption “indifferent for all values of x” is intuitively plausible. It is very difficult to ask oneself “would I be indifferent between these two gambles at every wealth level.”
So, for example, assume x=$1000
Are you indifferent between $1010 and a 55% chance of $1020 and a 45% chance of $1000? I think most people would prefer the gamble as x increases. Intuitively, the reason you prefer $10 to 55% chance of $20 and 45% chance of $0 is because you want to guarantee you win something. Once the minimum value of the gain is $1000, I prefer the gamble.

A person who always is indifferent has the utility function f(x)=1-k^x where k is a function of the payoffs used in the example (e.g., $10 and $20). A person who is slightly risk averse regardless of the stake has k very high. So the function approaches an asymptote very quickly. Therefore, the Utility of ($100 trillion) is maybe 10K greater than U($1).

That is, the assumption that a person would be indifferent for any value of x is algebraically a disaster and intuitively implausible.

Rabin’s calibration theorem is viewed as evidence for Kahneman and Tversky’s “prospect theory.” Indeed, one could argue that Rabin’s (2000) theorem and Rabin and Thaler’s (2001) simplification paved the way for Kahneman’s 2002 Nobel prize in economics (and maybe Thaler’s 2017 Nobel prize and definitely Rabin’s 202? Nobel prize). Rabin seemed to show mathematically what psychologists had shown empirically – people’s behavior cannot be modeled by expected utility theory because they are loss averse. Prospect theory incorporates loss aversion. Ironically, the specific utility function used in prospect theory (power function) is incompatible with the utility function assumed in Rabin’s theorem (negative exponential).
http://psych.fullerton.edu/mbirnbaum/calculators/cpt_calculator.htm

Empirical research estimates that U(x)=x^.6. For intuitive tractability, assume my utility function is U(x)=sqrt of x. What would my preference be in Gelman’s example as x increases?
x=0
U($10)=sqrt(10)=3.16
U[.55($20)+.45($0)=.55*sqrt(20)=2.46

x=10
U($20)=sqrt(20)=4.47
U[.55($30)+.45($10)=.55(sqrt(30)+.45(sqrt(10)=4.43

x=20
U($30)=sqrt(30)=5.477
U[.55($40)+.45*($20)]=.55*sqrt(40)+.45(sqrt(20)=5.49

So a person with the utility function U(x)=sqrt (x) will switch and prefer the gamble for all x>20.

By: Frank Ramsey

Frank Ramsey — Sun, 20 Sep 2020 15:36:03 +0000

(“If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization. The required utility function for money would curve so sharply as to be nonsensical (for example, U($2000)-U($1000) would have to be less than U($1000)-U($950)).”)

Imagine a person is indifferent between [x+$10] and [[50% chance of x+21, 50% chance of x], for any x.
This person’s preferences can be represented by the following utility function:
f(x+10)=.5[f(x+21)+f(x)].

Equivalently, f(x)=.5[f(x+11)+f(x-10)]. Person is always indifferent between the status quo and a 50% chance of winning 11 and losing 10.
Point #1: The phenomenon has nothing to do with “loss aversion.” Once we represent the phenomenon as an algebraic function, we see that any gamble involving a possible loss can be reframed as a gamble only involving gains. That is, the widespread belief among psychologists and economists that “Rabin’s calibration theorem” is caused by “loss aversion” is (ironically) a framing effect.

Point #2: In order to understand the phenomenon, it helps to convert the recursive function into a standard function where f(x) is defined in terms of x.
If f(x+10)=.5[f(x+21)+f(x)] then f(x)=1-.9999^x. It is helpful to transform the theorem so that f(1)=1. So f(x)=10,000*(1-.9999^x). That is, the limit of f(x) as x approaches infinity is 10,000. That is, an infinite amount of money is only worth 10,000 as much as $1.
This is an absurd utility function that would not describe any real person.
More generally, if f(x+A)=.5[f(x+B)+f(x)] then f(x)=1-k^x where k^(-A)+k^(B-A)=2.

Point #3: The trick is that the phrase “for any x” is a seemingly innocuous but extremely restrictive algebraic assumption. It implies a negative exponential utility function where the limit of f(x) as x approaches infinity is very low. That is, the phenomenon has absolutely nothing to do with any psychological explanation (status quo bias, reference dependence, loss aversion, etc.). It is a purely algebraic phenomenon.
“If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization.” This statement should be rephrased as “The assumption that a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x implies the existence of a utility function where U($100 trillion) is about 10,000 times greater than U($1).” This attitude can be expressed in terms of a utility function, but the utility function implied is insanely implausible.
It’s really like a “proof by contradiction.” If we assume a person is slightly risk averse for all values of x, we make obviously false predictions. Ergo, the assumption that a reasonable person would prefer [x+$10] to [55% chance of x+$20, 45% chance of x] FOR ANY X must be false.

POINT #4: Three Nobel prize winners in economics (Samuelson, Kahneman, Thaler) and one soon to be winner (Rabin) falsely claim this algebraic phenomenon is evidence of a psychological bias called loss aversion. As demonstrated by Gelman (1998), it has nothing to do with loss aversion. As argued here, it has nothing to do with psychology. Four Nobel prize worthy researchers falsely claim an algebraic phenomenon is a psychological phenomenon.

By: Chris Wilson

Chris Wilson — Wed, 09 Jan 2019 02:06:31 +0000

In reply to Anonymous.

OJM,
I think I agree. I was hoping to engage more of a discussion with them on Twitter but oh well.
https://mobile.twitter.com/WilsonCattle/status/1080173945809240067
I think we need ensembles to understand and model uncertainty in parameters/models but need to be careful in specifying dynamics for our decisions.

By: ojm

ojm — Tue, 08 Jan 2019 23:49:45 +0000

In reply to Anonymous.

> decision analysis, one should analyze strategies, not individual decisions. It can be a good individual decision to take that wager if it is only offered once. But if it will be offered over and over again, the optimal decision of what to do now will depend on what you plan to do in the future.

I went back to some old notes and things and yes, I think this is pretty spot on.

I’m not sure the ergodic folk really manage to get away from ensembles – rather they seem to ultimately be analysing ensembles of decisions in time (eg strategies = one time decision about a sequential decision problem).

How do they evaluate eg the uncertainty in a sequential strategy? They seem to use ensembles here, just of full time trajectories…does this fall in the ‘physicists attempt to re-formulate things that already exist camp’?

(My original objections to expected utility seem to be discussed in the main literature too – eg von Neumann etc justified utility by assuming probability and rationality axioms etc, Savage tried to justify both using rationality axioms, limitations have been pointed out, alternatives proposed etc)

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 18:06:42 +0000

In reply to ojm.

RE: Formalization, yes you’re right. the formal claim for independence is something about if you prefer A to B then you should prefer a Lottery(A,C) to Lottery(B,C) when the lotteries work the same. To me this is unobjectionable. In the lottery you’re either going to get A,B,C your chance of getting C is the same between the two, so your preference should be determined by whether you like A better than B.

I’ve already said I think continuity is essential, but I think the continuity you are referring to is the kind described here:

https://plato.stanford.edu/entries/decision-theory/

meaning that if you like A < B < C then you can create a lottery between A and C where you would be indifferent between that and B. I think this is your "convex combination" issue. and I could maybe see objections to this. The kind of continuity I care about is continuity of the decision to perturbations in the utility and that already presupposes the existence of a utility. The kind of theorems out there are more about "is a utility sufficient?" rather than "given a utility, is an expectation sufficient?"

So I think our question which goes something more like "given a utility, is an expectation sufficient?" is either unanswered or answered by some other set of theorems.

Personally I'm fine with axiomatizing the existence of a utility. I think I'm ok with requiring that to set up a decision rule you need to decide how to assign a scalar to a situation. The question I have is "how do you use the utility?" and I want the "use" to be continuous to perturbations in the utility and in the predictive distribution, and be linear in the Utility (so as not to distort preferences already assumed to be described by U).

I wonder where that really gets us?

By: ojm

ojm — Tue, 01 Jan 2019 06:56:05 +0000

In reply to ojm.

Couple of things

– I think you need to be careful about moving between formal and informal claims. A formal axiom doesn’t necessarily accurately capture the informal claim you’re making

– If the interpretation is ‘sometimes these axioms are reasonable under a carefully chosen domain of application’ then I’m fine with that. It’s when, as in people’s interpretations of the Cox theorem, the move is made to ‘therefore this is the unique answer’.

Eg in Cox’s case some very simple conditions are relaxed to more general and imo reasonable requirements you suddenly get a whole diverse family of ‘plausibility’ measures. Each of which appears to have some intuitive appeal under different scenarios.

Similarly, my impression with the decision theory stuff is that when you relax the assumptions to more general and reasonable cases you get a whole family of possible approaches.

Why the resistance to pluralism in thinking about uncertainty, decision etc?

PS the outlier thing is about model misspecification – ie how your procedure behaves under a different sort of perturbation. This shows that what one means by ‘continuity’ depends on the details – again, the mean is very continuous as a function of the individual observations but very susceptible to individual outliers (which are, somewhat ironically, very common!)

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 06:04:46 +0000

In reply to ojm.

>I don’t really understand people’s desire for unique ways of thinking about these issues, but even more so the willingness to accept axioms coz the give the desired result.

I don’t know about all that, I just want small changes in the modeling situation to lead to small changes in the decision, and I don’t see how anything rational could come out of a decision rule where you’re told to choose between A,B,C and you choose A, but then someone says “Ok, here’s A, and by the way, it turns out C wasn’t really an option they ran out of C” and you suddenly say “well if that’s the case, hell forget A I’ll take B”

If those are the only two requirements to lead to an expectation representation theorem, then they seem perfectly fine to me.

By: ojm

ojm — Tue, 01 Jan 2019 05:54:24 +0000

In reply to ojm.

There are numerous discussions about pros and cons of these axioms, they are hardly self evident to me…

I don’t really understand people’s desire for unique ways of thinking about these issues, but even more so the willingness to accept axioms coz the give the desired result.

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 05:36:50 +0000

In reply to ojm. As I see it continuity is absolutely required for a real world decision theory, you can't have a theory where infinitesimal changes in utility produce large changes in decisions etc. Also if the independence is with respect to irrelevant alternatives which I think it is, that seems necessary too, eliminating some of your options shouldn't change your decision so long as your decision remains an option. So while the rules you mention may be more interesting, they'll also be very paradoxical.

By: ojm

ojm — Tue, 01 Jan 2019 03:19:39 +0000

In reply to ojm. Oops Presumably if you drop one or more you can get more interesting decision theories not tied so strongly to expectations

By: ojm

ojm — Tue, 01 Jan 2019 03:18:40 +0000

In reply to ojm.

So apparently the classic theorem is something like:

> A complete and transitive preference relation satisfies continuity and independence if and only if it admits a expected utility representation

and both the continuity and independence axioms seem to me to lean on convex combinations and/or linear-esque ideas.

So basically I think something like what I thought applies: you end up with a linear functional representation theorem by assuming preferences satisfy linear style axioms. Presumably if you drop one

By: Bill Jefferys

Bill Jefferys — Tue, 01 Jan 2019 01:18:20 +0000

In reply to Bill. [I posted this: For some reason, my last name got deleted. But I do take responsibility for what I write.]

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 00:18:52 +0000

In reply to ojm.

“The decision rule depending on the whole distribution of the uncertain variable – do we take that as axiomatic…”

I certainly don’t have a problem with that being axiomatic, but I could imagine that we could demonstrate that we get better decisions when we do take it entirely into account. An example would be to use a utility that goes very large in regions that are ignored by other rules. So in instances like that you’re ignoring possibilities that are very important.

Finally there’s Wald’s complete class theorem: https://projecteuclid.org/euclid.aoms/1177730345

But I’d have to go back and look carefully at its technical requirements.

By: ojm

ojm — Mon, 31 Dec 2018 22:30:54 +0000

In reply to Anonymous.

Hi Anon,

Thanks.

So how do investors deal with deciding between two possibilities (e1,v1) and (e2,v2)?

Actually I now have vague memories of deriving Pareto optimal portfolios in a decision theory course I took a long time ago (which I think stoked some of my skepticism of expected utility!)

By: Chris Wilson

Chris Wilson — Mon, 31 Dec 2018 17:50:02 +0000

In reply to ojm.

Daniel, I think I agree.
“I do think we can go stronger though, the role that a given value of an uncertain variable X plays should be linear in the utility U(X). This is because the utility already expressed nonlinear preferences and so the aggregation method shouldn’t distort the preferences.”

That is also my intuition here. I’ve think you’ve got it. The non-linearity should be in the utility. The decision rule depending on the whole distribution of the uncertain variable – do we take that as axiomatic, or is there some demonstrable class of conditions where that is “best” (for some definition of “best” :))?

I also think ojm is asking about a broader set of possibilities, like minima or maxima or minimax, whatever. To my mind, all of that should be in the model, i.e. the utility and the probability distribution, e.g.:
We have rv X, utility U(max(X)), so in that case our expectation is over U(Z)p(Z)dZ, where Z is the EVT distribution of X (e.g. Gumbel or something).

The point is that we are always trying to bring our analysis down to the level where we have a distribution quantifying our uncertainty, at which level we want to be linear in p and U, and the only thing to do at that point that respects the whole distribution is to integrate over it (i.e. take an expectation).

By: Daniel Lakeland

Daniel Lakeland — Mon, 31 Dec 2018 16:45:03 +0000

In reply to ojm.

We definitely need a rule that picks a single value of the control variable, A in my example, but I don’t think that’s enough to require an integral. The rule “pick A such that U(X) for the X with highest density p(X|A) is maximized” doesn’t require an integral, but it also is not satisfactory to me for various reasons.

My intuition is that an axiom “The decision rule should simultaneously depend on all possible values of the uncertain variable” imposes the requirement of an integral, but I can show that an integral satisfies it, but not that there does not exist any other method of satisfying it. Could there be a kind of aggregation method that’s like an integral but doesn’t involve summation?

I do think we can go stronger though, the role that a given value of an uncertain variable X plays should be linear in the utility U(X). This is because the utility already expressed nonlinear preferences and so the aggregation method shouldn’t distort the preferences. So with this we can show integral(p(X|A)U(X), X, minf, inf) works. Next if we can argue that it should be linear in p we are done I think… And I’m having a hard time seeing why we should put up with nonlinear transformations of the p. If a possibility becomes twice as likely it should seem to factor twice as much into the decision. Also if p is zero near some X then it should factor not at all.

So I think some set of requirements like the decision rule should simultaneously consider all options, should be linear in the utility so as not to distort preferences, should not consider possibilities with zero probability, and should consider perturbations to p linearly leads to expectation.

I can’t see why a decision rule should distort preferences nonlinearly or distort probability nonlinearly or ignore some possibilities

By: Chris Wilson

Chris Wilson — Mon, 31 Dec 2018 14:23:09 +0000

In reply to ojm. So it looks like we all agree that for purposes of a formalized decision rule, a distribution needs to be collapsed to a scalar. In essence our posterior densities need to be integrated over regardless, so the question is whether to integrate over the whole thing to deliver the expected utility, or if instead maybe something like a quantile function (which is also an expectation of sorts) should be used. Are we on same page there?

By: Anonymous

Anonymous — Mon, 31 Dec 2018 12:04:07 +0000

In reply to Anonymous.

ojm:
.
I agree that your question has not been addressed.

My take is that the expectation is only ONE possible statistic that an investor might take into account. In standard econ theory, the decision is based on TWO statistics, the expectation (the average return the investor foresees) and the variance (the riskiness of the investment). The expectation is not particularly privileged a priori. The min, max or other statistics could potentially play a role in another theory. In standard econ theory, the variance drops out only if the investor is risk-neutral, so the expectation is the only statistic of interest.

By: Daniel Lakeland

Daniel Lakeland — Mon, 31 Dec 2018 06:00:09 +0000

In reply to ojm. The question of uniqueness probably has to do with the extent of the restriction on the properties, I think the interesting question is what is the smallest set of restrictions that results in the maximized expectation as the unique result, and do we agree that all of those requirements are important. It would be nice to have an answer, this is something I've thought about before but never really attacked carefully.

By: Daniel Lakeland

Daniel Lakeland — Mon, 31 Dec 2018 05:57:00 +0000

In reply to ojm. The "single outlier" issue is a sample issue. In a Bayesian context the decision rule is a functional of the posterior distribution. The sample comes in during inference to generate the posterior, if you want robustness I think it should come into the choice of model, not so much the decision rule.

By: Daniel Lakeland

Daniel Lakeland — Mon, 31 Dec 2018 05:52:54 +0000

In reply to ojm. Breakdown point is relevant to finite sample estimators but much less relevant to decision theory since in decision theory we aren't operating on a finite sample but instead a whole predictive distribution. But I do think that an analogous property is probably desirable. For example if the predictive distribution puts a small probability on a very good (or bad) outcome you want that to influence your decision. If there's a 1/1000 chance your investment will result in a cure for Ebola, you want your decision rule to know that right?

By: ojm

ojm — Mon, 31 Dec 2018 05:33:31 +0000

In reply to Daniel Lakeland. If you want one that isn’t strongly dependent on a single outlier then anything related to the mean seems bad...more generally I doubt there is a unique correct answer

By: Daniel Lakeland

Daniel Lakeland — Mon, 31 Dec 2018 05:16:44 +0000

In reply to ojm.

Sure there are plenty of decision rules, the question is what properties do they enjoy. It seems to me like you want continuity, if an infinitesimal perturbation of the model can produce a nearstandard change in the decision that’s a bad property to have.

Also, I think quantile based rules ignore all but one value of the uncertain variable. That doesn’t seem to be a good property. For example if there’s a 30% chance of a super good result, a decision based on the median ignores that entirely.

Finally Wald’s essentially complete class theorem seems relevant, but I’d have to revisit it to see the details

By: ojm

ojm — Mon, 31 Dec 2018 03:13:46 +0000

In reply to ojm.

Not surprisingly, Choquet expected utility (ie using non additive measure theory) also exists…the general problem seems to me to be related to various notions of ‘aggregation’ in order to make a single decision. Adding things is one way to do this, but is by no means the only and obvious way…

By: ojm

ojm — Mon, 31 Dec 2018 02:59:46 +0000

In reply to ojm.

A quick google shows quantile decision theory exists

https://core.ac.uk/download/pdf/23800383.pdf

By: ojm

ojm — Mon, 31 Dec 2018 02:52:57 +0000

In reply to Daniel Lakeland.

Also not a great time for me but…

Might be relevant that the mean is a very smooth function of the observations, in the sense that small changes in the individual observations give small changes in its value, but also that it is very susceptible to outliers eg has a breakdown point of zero (one outlier can destroy its behaviour)

For breakdown point see https://en.m.wikipedia.org/wiki/Robust_statistics

By: Daniel Lakeland

Daniel Lakeland — Mon, 31 Dec 2018 00:33:10 +0000

ojm asks why should we use expectation for decision making? (http://statmodeling.stat.columbia.edu/2018/12/25/thus-loss-aversion-principle-rendered-superfluous-account-phenomena-introduced-explain/#comment-935000" in this comment above )

My intuition has always been that the idea is we need to have a decision rule that takes into account and should be continuously dependent on all the different possibilities. Let’s just take a simple one-dimensional case, there is some quantity A which is under our control, using a certain amount of A causes a certain amount of X to occur, with probability p(X|A), and the utility of X is U(X), we need to choose how much A to use. We want a rule D that depends simultaneously and continuously on all the possible X values for a given A and the Utility U(X).

I think it’s easy to show that an expectation integral over X satisfies, It’s less clear why it might be unique.

Proposed rule (maximum expected value): choose A such that integral(p(X|A)U(X),X,-inf,inf) is maximized. Let’s call that I(p(X|A)U(X)) and the chosen value A*(I(p(X|A)U(X)))

I’m hanging out with my kids and have them watching minecraft videos in the background, so it’s not a great time to do careful math, so for now I’ll just mention some things we might care about and can attempt to prove, kinda sketch the way forward:

1) Prove that I(p(X|A)U(X)) is continuous with respect to continuous perturbations in U(X)

method: set up U(X)+eps*dU(X) with dU(X) a test function with compact support and maximum value 1, so that the perturbed function is perturbed by at most eps in a continuous manner in the vicinity of some X*. Show that I(p(X|A) (U(X) + eps dU(X))) changes by at most an infinitesimal amount when eps is a nonstandard infinitesimal (because I’m a nonstandard analysis guy). I think this is straightforward, since the integral is linear you wind up with I(p(X|A) U(X)) + eps * I(p(X|A) dU(X)) and eps is infinitesimal and both p(X|A) and dU are limited.

2) Prove that A*(I(p(X|A) (U(X) + eps dU(X))) changes by at most an infinitesimal when eps is infinitesimal. This one seems harder, and might require p(X|A) to be “nice” but let’s just assume it’s a standard continuous density that continuously depends on A for now. In other words there are not sudden “transitions” between regimes of X behavior as A changes infinitesimally. That’s pretty normal for scientific models.

If A* is a maximum of I and is unique, and we perturb U by U + eps * dU, then let A** be the maximum of I(p(X|A) (U+eps*dU))) we’re assuming that p(X|A) is continuously dependent on A, the integral I is continuously dependent on p(X|A) and eps, so if eps is infinitesimal A**-A* should be infinitesimal as well using continuity (this is super handwavy).

3) Prove similar things for perturbations to p(X|A)… it’s symmetric with the proofs above because the role of p(X|A) and U(X) work the same inside the integral.

So, as I said, hand waving while my kids watch highly distracting minecraft videos, so if you can point out any basic problems I’ve overlooked, I’d be happy to hear it.

The next thing would be to somehow prove that no other functional can be continuously dependent on p(X|A) and U(x) simultaneously. It’s not clear to me that the integral has to be the only continuous functional, but then I’ve never really taken functional analysis either. ;-)

One thing that’s clear is that one of your proposals which I think corresponds to “take the A that maximizes p(X|A) U(X)” is not continuous. we can easily imagine a bimodal p(X|A) and we can perturb U(X) in the vicinity of one mode and force A to jump dramatically and discretely from one mode to another. It seems bad for a decision rule to be non-continuous with respect to infinitesimal perturbations in U(X)

Any thoughts? I’d be happy to come back and think more carefully about it all.

By: Chris Wilson

Chris Wilson — Sun, 30 Dec 2018 23:38:47 +0000

In reply to Anonymous.

ojm,
Not sure I have a complete answer. I’m interested in other thoughts. But here’s my stab at it. First note that expectations are mathematically well-defined entities, which I think is a huge part of the appeal.
Why care about the time average? Because it unambiguously tells you what happens to a dynamic (in this case over a series of gambles), as T –> Inf. It tells you whether you are going up or down if you stay in the game long enough (yes, tons of suppressed assumptions here ;)). How long that convergence takes is a whole other problem. If you want to sort multiple propositions, {Q1, Q2, …, Qn}, the time average allows you to order them and make a decision (provided you are in it ‘long enough’). For finite-time returns, presumably other considerations come into play.
The ensemble expectation is useful likewise if the aggregate outcome across multiple parallel entities experiencing some gamble is of concern. Again, as n –> Inf, we can sort propositions Qn and ultimately make a decision.
So, in the final analysis, I think the problem is that we need to make a definite decision and *do* something. Seeing the whole distribution of outcomes is not, in itself, helpful. At some level, the probabilities need to get collapsed to a scalar that carries with it a decision.

By: Daniel Lakeland

Daniel Lakeland — Sun, 30 Dec 2018 23:19:57 +0000

In reply to ojm. No, but I'd be happy to discuss this with you some here, if you're interested ;-) I'll start a new thread below.

By: ojm

ojm — Sun, 30 Dec 2018 22:17:19 +0000

In reply to Anonymous.

So I find this all pretty interesting and enlightening, but I realise that it still doesn’t seem to answer one of my questions – why average, whether ensemble or time?

That is, what is the motivation for wanting to calculate to average growth rate as opposed to the minimum, the maximum or the full distribution of growth rates? Is there a basic principle that leads to averages of some sort in decision theory?

How does minimax/maximin and all that fit into the ergodic vs ensemble picture? Are there other ergodic notions like ‘the maximum/minimum etc over time = the maximum/minimum etc over the ensemble of possibilities?’.

By: ojm

ojm — Sun, 30 Dec 2018 22:10:30 +0000

In reply to Daniel Lakeland. Do you have a reference for showing this leads to expectations?

By: Chris Wilson

Chris Wilson — Sun, 30 Dec 2018 18:59:19 +0000

In reply to Anonymous. Anon, you are correct of course and I’m not arguing with your analysis. It’s actually the same point that Peters et al are making ;) You want the time average of the dynamic, not the ensemble expectation of an arbitrary utility. I’m thinking you haven’t read the work that ojm and I were discussing...

By: Andrew

Andrew — Sun, 30 Dec 2018 16:49:19 +0000

In reply to Anonymous.

Anon:

Yes, another way of saying this is that if you’re only going to bet once, then it can make sense to accept the wager. It makes sense to evaluate the wager based on expected utility.

To put it yet another way: In decision analysis, one should analyze strategies, not individual decisions. It can be a good individual decision to take that wager if it is only offered once. But if it will be offered over and over again, the optimal decision of what to do now will depend on what you plan to do in the future.

By: Anonymous

Anonymous — Sun, 30 Dec 2018 15:40:45 +0000

In reply to Anonymous.

Chris:

No. Using log returns has nothing to do with a utility function. The use of log returns is motivated by the desire to accurately calculate average growth rates over time. A simple example shows that calculating the average ensemble return gives the wrong answer and that using log returns gives the right answer.

Use of log returns is often motivated by the simple example of a stock that goes up 50% one day and down 50% the next. The average return to an investor over the two days is not 0% = 50% + -50%, but rather -25% = 1 – exp(ln(1+.5) + ln(0.5)). No utility functions are invoked in this example. There is just the simple observation that the two-day return is not the simple average of the two one-day returns (ie, the ensemble return), and that using log returns gives you the right answer.

By: Chris Wilson

Chris Wilson — Sat, 29 Dec 2018 16:52:34 +0000

In reply to Anonymous.

Yes, the use of log returns was initially suggested by Bernoulli in 1738! He made a mistake which Laplace corrected in 1814. But the point is that in the conventional framework this is an arbitrary utility, with only a psychological justification. Within that framework, I could simply say that my utility on wealth is linear, and thus I *should* take the gamble (hint: I shouldn’t).
What Peters et al. are pointing out is that the correctly interpreted version from Laplace on down corresponds to is maximizing an ergodic observable, the multiplicative *rate of growth*. No need for arbitrary utility functions, just time averages over dynamics.
Really, this is all intro material from here:
https://arxiv.org/pdf/1405.0585.pdf

By: Anonymous

Anonymous — Sat, 29 Dec 2018 16:24:50 +0000

In reply to Chris Wilson.

I think this problem has been known for a LONG time in economics (decades at least). Economists think it is solved by using log returns.

For instance, if you take log returns, the 1.5/0.6 coin flip has a negative expected return, so the problem disappears.

log(1.5) = .47
log(0.6) = .51
expected log return = -.02

By: Chris Wilson

Chris Wilson — Sat, 29 Dec 2018 14:16:15 +0000

In reply to Terry.

Hi Terry, I’m not an economist so I don’t really know how far their critique undermines fundamental theory there. My impression is that they have poked a pretty deep hole in the usual approach to decision-making problems – i.e. anything where utility functions and uncertainty are evoked together.

FWIW, a few years back I took a grad level natural resource econ class, which was basically lots of static and dynamic optimization (via Hamiltonian’s). It was a great course, but we didn’t get too far into how stochasticity/uncertainty impacts such analyses. My own extracurricular exploration suggested “a lot”, and the couple examples we covered were definitely using ensemble expectations as plug-in values to the optimization machinery.

I have a feeling that the ‘ergodicity economics’ project is going to shake things up considerably across the board.

The biggest questions I have concern how to handle situations that don’t map neatly to an underlying dynamic of multiplicative (or additive) growth. In ecosystem management, we are dealing with fluctuating populations and communities, or saturating stocks of e.g. carbon and nutrients. For managing climate risk, we are definitely dealing with our *one* planet *over time*, and I would argue the biosphere as a whole is deeply non-ergodic. It seems like their perspective should apply in some way.

By: Terry

Terry — Sat, 29 Dec 2018 08:05:13 +0000

In reply to Chris Wilson. I looked at their work and it looks very interesting. Has anyone critiqued it? Are there obvious flaws?

By: Martha (Smith)

Martha (Smith) — Sat, 29 Dec 2018 03:29:14 +0000

In reply to Bill. Good points.

By: ojm

ojm — Sat, 29 Dec 2018 01:48:50 +0000

In reply to Chris Wilson.

Interesting! Will have to think on this…

Another misc thought – I’ve often struggled to articulate my preference for thinking about sample size going to infinity for the same thing of interest vs many repeated applications of the same method to different problems…seems quite similar to time average vs ensemble average in stat setting, with sample size playing the role of ‘time’

By: Daniel Lakeland

Daniel Lakeland — Sat, 29 Dec 2018 01:12:13 +0000

In reply to ojm. I find the most useful justification that the decision should depend simultaneously and continuously on the utility of all possible outcomes.

By: Chris Wilson

Chris Wilson — Sat, 29 Dec 2018 01:06:00 +0000

In reply to ojm.

Yep, those are great notes! I hope they write a book. I think their project is on to something big, and the implications seem potentially quite far-ranging (at least to me).
Here’s a simple problem I am playing with. Let’s say you are evaluating the coin flip gamble they discuss often:
if heads -> multiply wealth by 1.5
if tails -> multiple wealth by 0.6
Assuming a fair coin (p=0.5), the ensemble expectation is positive (1.05), whereas the time average growth rate is negative (log(sqrt(0.9)) (i.e. great simple example of a non-ergodic gamble). What this tells us is to use the time average growth rate to make decisions (i.e. imagine embedding yourself in a sequence of gambles rather than a parallel ensemble). As both resources you linked show, this is actually what the logarithmic utility does (i.e. selects optimal multiplicative growth rate).
Now, imagine we have uncertainty about p, and wish to use some data to better constrain p, and then make a decision. To my mind, the best route is still to go full Bayes: quantify uncertainty in p using say a Beta distribution, and then evaluate posterior expectation *of the time average growth rate* (in this case given by ). But maybe the best thing to do is just graph/report the full distribution, and then apply some kind of meta-utility to make decision?

By: Bill

Bill — Fri, 28 Dec 2018 23:28:23 +0000

When I discussed utility/risk functions for money in my freshman/sophomore honors college decision theory class, I always did it in terms of significant amounts of money ($100,000-$1,000,000), since these students (paying large amounts of money for an education) were already making decisions involving perhaps several hundreds of thousands of dollars. It never made sense to me, for the reasons that Andrew states, to consider piddling amounts of money since for most people their utility/risk function ought to be linear for such small amounts. Any reluctance or nonlinearity for small amounts has to be attributed to effects other than a nonlinear utility/risk function, as Andrew lists at the end of his article.

This also makes it easy to explain why insurance works since when you are talking about the possibility of losing something very valuable — like your house burning down — it is easy to see why making an “unfair” bet with an insurance company (“unfair” in the sense that it has a negative expected return for you) is a sensible thing to do to avoid a large loss, whereas for the insurance company the possibility of a loss of that amount is still very small and thus in its linear range, so the “unfairness” premium becomes the insurance company’s profit (averaged over the entire business).

By: ojm

ojm — Fri, 28 Dec 2018 23:21:55 +0000

In reply to ojm.

And this article is quite relevant to some of my questions:

‘The time resolution of the St Petersburg paradox’

https://royalsocietypublishing.org/doi/full/10.1098/rsta.2011.0065

By: ojm

ojm — Fri, 28 Dec 2018 18:01:48 +0000

In reply to Chris Wilson.

Thanks for the hot tip! The notes here look interesting: https://ergodicityeconomics.com/lecture-notes/

By: Chris Wilson

Chris Wilson — Fri, 28 Dec 2018 13:50:36 +0000

In reply to Chris Wilson. So I guess my general answer is that where your model yields ergodic observables, the expectation value is a useful point summary of uncertainty because it will reflect what tends to happen over time. At least, that’s the best I can come up with...

By: Chris Wilson

Chris Wilson — Fri, 28 Dec 2018 13:31:33 +0000

In reply to ojm. ojm, something I’ve been mulling in this regard a lot lately is Ole Peters work on ‘ergodicity economics’. He and colleagues like Murray Gell-man have made a pretty resounding critique of use of expected utility theory in economics- and use of ensemble expectation more generally. Their analogy is that it is like considering hypothetical parallel worlds, whereas in reality we care about decision making under uncertainties *over time* in our one world/life. Time averages of course diverge from ensemble expectations where ergodicity fails to hold...which is probably quite often!

By: ojm

ojm — Fri, 28 Dec 2018 00:50:42 +0000

In reply to ojm. Presumably the arguments to justify expectations as the thing to do require things like linear/convex combinations of whatever you’re averaging over to make sense (eg to show that you’re after a linear functional). But I can imagine a fair few situations where a linear combination of your choices doesn’t really make sense and presumably this would block standard arguments in favour of expectations. Haven’t really thought this one through though.

By: ojm

ojm — Fri, 28 Dec 2018 00:14:35 +0000

In reply to Kyle C. Sure, or any other functional. To be fair there are axiomatic attempts to justify using expectations (von Neumann, Savage etc etc), I guess I just don’t find them very convincing on their own, nor when compared to actual practice.

By: Kyle C

Kyle C — Thu, 27 Dec 2018 23:29:31 +0000

In reply to ojm. Do you mean as opposed to the worst case, for example?