OJM,

I think I agree. I was hoping to engage more of a discussion with them on Twitter but oh well.

https://mobile.twitter.com/WilsonCattle/status/1080173945809240067

I think we need ensembles to understand and model uncertainty in parameters/models but need to be careful in specifying dynamics for our decisions.

> decision analysis, one should analyze strategies, not individual decisions. It can be a good individual decision to take that wager if it is only offered once. But if it will be offered over and over again, the optimal decision of what to do now will depend on what you plan to do in the future.

I went back to some old notes and things and yes, I think this is pretty spot on.

I’m not sure the ergodic folk really manage to get away from ensembles – rather they seem to ultimately be analysing ensembles of decisions in time (eg strategies = one time decision about a sequential decision problem).

How do they evaluate eg the uncertainty in a sequential strategy? They seem to use ensembles here, just of full time trajectories…does this fall in the ‘physicists attempt to re-formulate things that already exist camp’?

(My original objections to expected utility seem to be discussed in the main literature too – eg von Neumann etc justified utility by assuming probability and rationality axioms etc, Savage tried to justify both using rationality axioms, limitations have been pointed out, alternatives proposed etc)

]]>RE: Formalization, yes you’re right. the formal claim for independence is something about if you prefer A to B then you should prefer a Lottery(A,C) to Lottery(B,C) when the lotteries work the same. To me this is unobjectionable. In the lottery you’re either going to get A,B,C your chance of getting C is the same between the two, so your preference should be determined by whether you like A better than B.

I’ve already said I think continuity is essential, but I think the continuity you are referring to is the kind described here:

https://plato.stanford.edu/entries/decision-theory/

meaning that if you like A < B < C then you can create a lottery between A and C where you would be indifferent between that and B. I think this is your "convex combination" issue. and I could maybe see objections to this. The kind of continuity I care about is continuity of the decision to perturbations in the utility and that already presupposes the existence of a utility. The kind of theorems out there are more about "is a utility sufficient?" rather than "given a utility, is an expectation sufficient?"

So I think our question which goes something more like "given a utility, is an expectation sufficient?" is either unanswered or answered by some other set of theorems.

Personally I'm fine with axiomatizing the existence of a utility. I think I'm ok with requiring that to set up a decision rule you need to decide how to assign a scalar to a situation. The question I have is "how do you use the utility?" and I want the "use" to be continuous to perturbations in the utility and in the predictive distribution, and be linear in the Utility (so as not to distort preferences already assumed to be described by U).

I wonder where that really gets us?

]]>Couple of things

– I think you need to be careful about moving between formal and informal claims. A formal axiom doesn’t necessarily accurately capture the informal claim you’re making

– If the interpretation is ‘sometimes these axioms are reasonable under a carefully chosen domain of application’ then I’m fine with that. It’s when, as in people’s interpretations of the Cox theorem, the move is made to ‘therefore this is the unique answer’.

Eg in Cox’s case some very simple conditions are relaxed to more general and imo reasonable requirements you suddenly get a whole diverse family of ‘plausibility’ measures. Each of which appears to have some intuitive appeal under different scenarios.

Similarly, my impression with the decision theory stuff is that when you relax the assumptions to more general and reasonable cases you get a whole family of possible approaches.

Why the resistance to pluralism in thinking about uncertainty, decision etc?

PS the outlier thing is about model misspecification – ie how your procedure behaves under a different sort of perturbation. This shows that what one means by ‘continuity’ depends on the details – again, the mean is very continuous as a function of the individual observations but very susceptible to individual outliers (which are, somewhat ironically, very common!)

]]>>I don’t really understand people’s desire for unique ways of thinking about these issues, but even more so the willingness to accept axioms coz the give the desired result.

I don’t know about all that, I just want small changes in the modeling situation to lead to small changes in the decision, and I don’t see how anything rational could come out of a decision rule where you’re told to choose between A,B,C and you choose A, but then someone says “Ok, here’s A, and by the way, it turns out C wasn’t really an option they ran out of C” and you suddenly say “well if that’s the case, hell forget A I’ll take B”

If those are the only two requirements to lead to an expectation representation theorem, then they seem perfectly fine to me.

]]>There are numerous discussions about pros and cons of these axioms, they are hardly self evident to me…

I don’t really understand people’s desire for unique ways of thinking about these issues, but even more so the willingness to accept axioms coz the give the desired result.

]]>As I see it continuity is absolutely required for a real world decision theory, you can’t have a theory where infinitesimal changes in utility produce large changes in decisions etc.

Also if the independence is with respect to irrelevant alternatives which I think it is, that seems necessary too, eliminating some of your options shouldn’t change your decision so long as your decision remains an option. So while the rules you mention may be more interesting, they’ll also be very paradoxical.

]]>Oops

Presumably if you drop one or more you can get more interesting decision theories not tied so strongly to expectations

]]>So apparently the classic theorem is something like:

> A complete and transitive preference relation satisfies continuity and independence if and only if it admits a expected utility representation

and both the continuity and independence axioms seem to me to lean on convex combinations and/or linear-esque ideas.

So basically I think something like what I thought applies: you end up with a linear functional representation theorem by assuming preferences satisfy linear style axioms. Presumably if you drop one

]]>[I posted this: For some reason, my last name got deleted. But I do take responsibility for what I write.]

]]>“The decision rule depending on the whole distribution of the uncertain variable – do we take that as axiomatic…”

I certainly don’t have a problem with that being axiomatic, but I could imagine that we could demonstrate that we get better decisions when we do take it entirely into account. An example would be to use a utility that goes very large in regions that are ignored by other rules. So in instances like that you’re ignoring possibilities that are very important.

Finally there’s Wald’s complete class theorem: https://projecteuclid.org/euclid.aoms/1177730345

But I’d have to go back and look carefully at its technical requirements.

]]>Hi Anon,

Thanks.

So how do investors deal with deciding between two possibilities (e1,v1) and (e2,v2)?

Actually I now have vague memories of deriving Pareto optimal portfolios in a decision theory course I took a long time ago (which I think stoked some of my skepticism of expected utility!)

]]>Daniel, I think I agree.

“I do think we can go stronger though, the role that a given value of an uncertain variable X plays should be linear in the utility U(X). This is because the utility already expressed nonlinear preferences and so the aggregation method shouldn’t distort the preferences.”

That is also my intuition here. I’ve think you’ve got it. The non-linearity should be in the utility. The decision rule depending on the whole distribution of the uncertain variable – do we take that as axiomatic, or is there some demonstrable class of conditions where that is “best” (for some definition of “best” :))?

I also think ojm is asking about a broader set of possibilities, like minima or maxima or minimax, whatever. To my mind, all of that should be in the model, i.e. the utility and the probability distribution, e.g.:

We have rv X, utility U(max(X)), so in that case our expectation is over U(Z)p(Z)dZ, where Z is the EVT distribution of X (e.g. Gumbel or something).

The point is that we are always trying to bring our analysis down to the level where we have a distribution quantifying our uncertainty, at which level we want to be linear in p and U, and the only thing to do at that point that respects the whole distribution is to integrate over it (i.e. take an expectation).

]]>We definitely need a rule that picks a single value of the control variable, A in my example, but I don’t think that’s enough to require an integral. The rule “pick A such that U(X) for the X with highest density p(X|A) is maximized” doesn’t require an integral, but it also is not satisfactory to me for various reasons.

My intuition is that an axiom “The decision rule should simultaneously depend on all possible values of the uncertain variable” imposes the requirement of an integral, but I can show that an integral satisfies it, but not that there does not exist any other method of satisfying it. Could there be a kind of aggregation method that’s like an integral but doesn’t involve summation?

I do think we can go stronger though, the role that a given value of an uncertain variable X plays should be linear in the utility U(X). This is because the utility already expressed nonlinear preferences and so the aggregation method shouldn’t distort the preferences. So with this we can show integral(p(X|A)U(X), X, minf, inf) works. Next if we can argue that it should be linear in p we are done I think… And I’m having a hard time seeing why we should put up with nonlinear transformations of the p. If a possibility becomes twice as likely it should seem to factor twice as much into the decision. Also if p is zero near some X then it should factor not at all.

So I think some set of requirements like the decision rule should simultaneously consider all options, should be linear in the utility so as not to distort preferences, should not consider possibilities with zero probability, and should consider perturbations to p linearly leads to expectation.

I can’t see why a decision rule should distort preferences nonlinearly or distort probability nonlinearly or ignore some possibilities

]]>So it looks like we all agree that for purposes of a formalized decision rule, a distribution needs to be collapsed to a scalar. In essence our posterior densities need to be integrated over regardless, so the question is whether to integrate over the whole thing to deliver the expected utility, or if instead maybe something like a quantile function (which is also an expectation of sorts) should be used. Are we on same page there?

]]>ojm:

.

I agree that your question has not been addressed.

My take is that the expectation is only ONE possible statistic that an investor might take into account. In standard econ theory, the decision is based on TWO statistics, the expectation (the average return the investor foresees) and the variance (the riskiness of the investment). The expectation is not particularly privileged a priori. The min, max or other statistics could potentially play a role in another theory. In standard econ theory, the variance drops out only if the investor is risk-neutral, so the expectation is the only statistic of interest.

]]>The question of uniqueness probably has to do with the extent of the restriction on the properties, I think the interesting question is what is the smallest set of restrictions that results in the maximized expectation as the unique result, and do we agree that all of those requirements are important. It would be nice to have an answer, this is something I’ve thought about before but never really attacked carefully.

]]>The “single outlier” issue is a sample issue. In a Bayesian context the decision rule is a functional of the posterior distribution. The sample comes in during inference to generate the posterior, if you want robustness I think it should come into the choice of model, not so much the decision rule.

]]>Breakdown point is relevant to finite sample estimators but much less relevant to decision theory since in decision theory we aren’t operating on a finite sample but instead a whole predictive distribution.

But I do think that an analogous property is probably desirable. For example if the predictive distribution puts a small probability on a very good (or bad) outcome you want that to influence your decision. If there’s a 1/1000 chance your investment will result in a cure for Ebola, you want your decision rule to know that right?

]]>If you want one that isn’t strongly dependent on a single outlier then anything related to the mean seems bad…more generally I doubt there is a unique correct answer

]]>Sure there are plenty of decision rules, the question is what properties do they enjoy. It seems to me like you want continuity, if an infinitesimal perturbation of the model can produce a nearstandard change in the decision that’s a bad property to have.

Also, I think quantile based rules ignore all but one value of the uncertain variable. That doesn’t seem to be a good property. For example if there’s a 30% chance of a super good result, a decision based on the median ignores that entirely.

Finally Wald’s essentially complete class theorem seems relevant, but I’d have to revisit it to see the details

]]>Not surprisingly, Choquet expected utility (ie using non additive measure theory) also exists…the general problem seems to me to be related to various notions of ‘aggregation’ in order to make a single decision. Adding things is one way to do this, but is by no means the only and obvious way…

]]>A quick google shows quantile decision theory exists

]]>Also not a great time for me but…

Might be relevant that the mean is a very smooth function of the observations, in the sense that small changes in the individual observations give small changes in its value, but also that it is very susceptible to outliers eg has a breakdown point of zero (one outlier can destroy its behaviour)

For breakdown point see https://en.m.wikipedia.org/wiki/Robust_statistics

]]>My intuition has always been that the idea is we need to have a decision rule that takes into account and should be continuously dependent on all the different possibilities. Let’s just take a simple one-dimensional case, there is some quantity A which is under our control, using a certain amount of A causes a certain amount of X to occur, with probability p(X|A), and the utility of X is U(X), we need to choose how much A to use. We want a rule D that depends simultaneously and continuously on all the possible X values for a given A and the Utility U(X).

I think it’s easy to show that an expectation integral over X satisfies, It’s less clear why it might be unique.

Proposed rule (maximum expected value): choose A such that integral(p(X|A)U(X),X,-inf,inf) is maximized. Let’s call that I(p(X|A)U(X)) and the chosen value A*(I(p(X|A)U(X)))

I’m hanging out with my kids and have them watching minecraft videos in the background, so it’s not a great time to do careful math, so for now I’ll just mention some things we might care about and can attempt to prove, kinda sketch the way forward:

1) Prove that I(p(X|A)U(X)) is continuous with respect to continuous perturbations in U(X)

method: set up U(X)+eps*dU(X) with dU(X) a test function with compact support and maximum value 1, so that the perturbed function is perturbed by at most eps in a continuous manner in the vicinity of some X*. Show that I(p(X|A) (U(X) + eps dU(X))) changes by at most an infinitesimal amount when eps is a nonstandard infinitesimal (because I’m a nonstandard analysis guy). I think this is straightforward, since the integral is linear you wind up with I(p(X|A) U(X)) + eps * I(p(X|A) dU(X)) and eps is infinitesimal and both p(X|A) and dU are limited.

2) Prove that A*(I(p(X|A) (U(X) + eps dU(X))) changes by at most an infinitesimal when eps is infinitesimal. This one seems harder, and might require p(X|A) to be “nice” but let’s just assume it’s a standard continuous density that continuously depends on A for now. In other words there are not sudden “transitions” between regimes of X behavior as A changes infinitesimally. That’s pretty normal for scientific models.

If A* is a maximum of I and is unique, and we perturb U by U + eps * dU, then let A** be the maximum of I(p(X|A) (U+eps*dU))) we’re assuming that p(X|A) is continuously dependent on A, the integral I is continuously dependent on p(X|A) and eps, so if eps is infinitesimal A**-A* should be infinitesimal as well using continuity (this is super handwavy).

3) Prove similar things for perturbations to p(X|A)… it’s symmetric with the proofs above because the role of p(X|A) and U(X) work the same inside the integral.

So, as I said, hand waving while my kids watch highly distracting minecraft videos, so if you can point out any basic problems I’ve overlooked, I’d be happy to hear it.

The next thing would be to somehow prove that no other functional can be continuously dependent on p(X|A) and U(x) simultaneously. It’s not clear to me that the integral has to be the only continuous functional, but then I’ve never really taken functional analysis either. ;-)

One thing that’s clear is that one of your proposals which I think corresponds to “take the A that maximizes p(X|A) U(X)” is not continuous. we can easily imagine a bimodal p(X|A) and we can perturb U(X) in the vicinity of one mode and force A to jump dramatically and discretely from one mode to another. It seems bad for a decision rule to be non-continuous with respect to infinitesimal perturbations in U(X)

Any thoughts? I’d be happy to come back and think more carefully about it all.

]]>ojm,

Not sure I have a complete answer. I’m interested in other thoughts. But here’s my stab at it. First note that expectations are mathematically well-defined entities, which I think is a huge part of the appeal.

Why care about the time average? Because it unambiguously tells you what happens to a dynamic (in this case over a series of gambles), as T –> Inf. It tells you whether you are going up or down if you stay in the game long enough (yes, tons of suppressed assumptions here ;)). How long that convergence takes is a whole other problem. If you want to sort multiple propositions, {Q1, Q2, …, Qn}, the time average allows you to order them and make a decision (provided you are in it ‘long enough’). For finite-time returns, presumably other considerations come into play.

The ensemble expectation is useful likewise if the aggregate outcome across multiple parallel entities experiencing some gamble is of concern. Again, as n –> Inf, we can sort propositions Qn and ultimately make a decision.

So, in the final analysis, I think the problem is that we need to make a definite decision and *do* something. Seeing the whole distribution of outcomes is not, in itself, helpful. At some level, the probabilities need to get collapsed to a scalar that carries with it a decision.

No, but I’d be happy to discuss this with you some here, if you’re interested ;-) I’ll start a new thread below.

]]>So I find this all pretty interesting and enlightening, but I realise that it still doesn’t seem to answer one of my questions – why average, whether ensemble or time?

That is, what is the motivation for wanting to calculate to average growth rate as opposed to the minimum, the maximum or the full distribution of growth rates? Is there a basic principle that leads to averages of some sort in decision theory?

How does minimax/maximin and all that fit into the ergodic vs ensemble picture? Are there other ergodic notions like ‘the maximum/minimum etc over time = the maximum/minimum etc over the ensemble of possibilities?’.

]]>Do you have a reference for showing this leads to expectations?

]]>Anon, you are correct of course and I’m not arguing with your analysis. It’s actually the same point that Peters et al are making ;) You want the time average of the dynamic, not the ensemble expectation of an arbitrary utility. I’m thinking you haven’t read the work that ojm and I were discussing…

]]>Anon:

Yes, another way of saying this is that if you’re only going to bet once, then it can make sense to accept the wager. It makes sense to evaluate the wager based on expected utility.

To put it yet another way: In decision analysis, one should analyze strategies, not individual decisions. It can be a good individual decision to take that wager if it is only offered once. But if it will be offered over and over again, the optimal decision of what to do now will depend on what you plan to do in the future.

]]>Chris:

No. Using log returns has nothing to do with a utility function. The use of log returns is motivated by the desire to accurately calculate average growth rates over time. A simple example shows that calculating the average ensemble return gives the wrong answer and that using log returns gives the right answer.

Use of log returns is often motivated by the simple example of a stock that goes up 50% one day and down 50% the next. The average return to an investor over the two days is not 0% = 50% + -50%, but rather -25% = 1 – exp(ln(1+.5) + ln(0.5)). No utility functions are invoked in this example. There is just the simple observation that the two-day return is not the simple average of the two one-day returns (ie, the ensemble return), and that using log returns gives you the right answer.

]]>Yes, the use of log returns was initially suggested by Bernoulli in 1738! He made a mistake which Laplace corrected in 1814. But the point is that in the conventional framework this is an arbitrary utility, with only a psychological justification. Within that framework, I could simply say that my utility on wealth is linear, and thus I *should* take the gamble (hint: I shouldn’t).

What Peters et al. are pointing out is that the correctly interpreted version from Laplace on down corresponds to is maximizing an ergodic observable, the multiplicative *rate of growth*. No need for arbitrary utility functions, just time averages over dynamics.

Really, this is all intro material from here:

https://arxiv.org/pdf/1405.0585.pdf

I think this problem has been known for a LONG time in economics (decades at least). Economists think it is solved by using log returns.

For instance, if you take log returns, the 1.5/0.6 coin flip has a negative expected return, so the problem disappears.

log(1.5) = .47

log(0.6) = .51

expected log return = -.02

Hi Terry, I’m not an economist so I don’t really know how far their critique undermines fundamental theory there. My impression is that they have poked a pretty deep hole in the usual approach to decision-making problems – i.e. anything where utility functions and uncertainty are evoked together.

FWIW, a few years back I took a grad level natural resource econ class, which was basically lots of static and dynamic optimization (via Hamiltonian’s). It was a great course, but we didn’t get too far into how stochasticity/uncertainty impacts such analyses. My own extracurricular exploration suggested “a lot”, and the couple examples we covered were definitely using ensemble expectations as plug-in values to the optimization machinery.

I have a feeling that the ‘ergodicity economics’ project is going to shake things up considerably across the board.

The biggest questions I have concern how to handle situations that don’t map neatly to an underlying dynamic of multiplicative (or additive) growth. In ecosystem management, we are dealing with fluctuating populations and communities, or saturating stocks of e.g. carbon and nutrients. For managing climate risk, we are definitely dealing with our *one* planet *over time*, and I would argue the biosphere as a whole is deeply non-ergodic. It seems like their perspective should apply in some way.

]]>I looked at their work and it looks very interesting.

Has anyone critiqued it? Are there obvious flaws?

]]>Good points.

]]>Interesting! Will have to think on this…

Another misc thought – I’ve often struggled to articulate my preference for thinking about sample size going to infinity for the same thing of interest vs many repeated applications of the same method to different problems…seems quite similar to time average vs ensemble average in stat setting, with sample size playing the role of ‘time’

]]>I find the most useful justification that the decision should depend simultaneously and continuously on the utility of all possible outcomes.

]]>Yep, those are great notes! I hope they write a book. I think their project is on to something big, and the implications seem potentially quite far-ranging (at least to me).

Here’s a simple problem I am playing with. Let’s say you are evaluating the coin flip gamble they discuss often:

if heads -> multiply wealth by 1.5

if tails -> multiple wealth by 0.6

Assuming a fair coin (p=0.5), the ensemble expectation is positive (1.05), whereas the time average growth rate is negative (log(sqrt(0.9)) (i.e. great simple example of a non-ergodic gamble). What this tells us is to use the time average growth rate to make decisions (i.e. imagine embedding yourself in a sequence of gambles rather than a parallel ensemble). As both resources you linked show, this is actually what the logarithmic utility does (i.e. selects optimal multiplicative growth rate).

Now, imagine we have uncertainty about p, and wish to use some data to better constrain p, and then make a decision. To my mind, the best route is still to go full Bayes: quantify uncertainty in p using say a Beta distribution, and then evaluate posterior expectation *of the time average growth rate* (in this case given by ). But maybe the best thing to do is just graph/report the full distribution, and then apply some kind of meta-utility to make decision?

This also makes it easy to explain why insurance works since when you are talking about the possibility of losing something very valuable — like your house burning down — it is easy to see why making an “unfair” bet with an insurance company (“unfair” in the sense that it has a negative expected return for you) is a sensible thing to do to avoid a large loss, whereas for the insurance company the possibility of a loss of that amount is still very small and thus in its linear range, so the “unfairness” premium becomes the insurance company’s profit (averaged over the entire business).

]]>And this article is quite relevant to some of my questions:

‘The time resolution of the St Petersburg paradox’

https://royalsocietypublishing.org/doi/full/10.1098/rsta.2011.0065

]]>Thanks for the hot tip! The notes here look interesting: https://ergodicityeconomics.com/lecture-notes/

]]>So I guess my general answer is that where your model yields ergodic observables, the expectation value is a useful point summary of uncertainty because it will reflect what tends to happen over time. At least, that’s the best I can come up with…

]]>ojm, something I’ve been mulling in this regard a lot lately is Ole Peters work on ‘ergodicity economics’. He and colleagues like Murray Gell-man have made a pretty resounding critique of use of expected utility theory in economics- and use of ensemble expectation more generally. Their analogy is that it is like considering hypothetical parallel worlds, whereas in reality we care about decision making under uncertainties *over time* in our one world/life. Time averages of course diverge from ensemble expectations where ergodicity fails to hold…which is probably quite often!

]]>Presumably the arguments to justify expectations as the thing to do require things like linear/convex combinations of whatever you’re averaging over to make sense (eg to show that you’re after a linear functional).

But I can imagine a fair few situations where a linear combination of your choices doesn’t really make sense and presumably this would block standard arguments in favour of expectations. Haven’t really thought this one through though.

]]>Sure, or any other functional. To be fair there are axiomatic attempts to justify using expectations (von Neumann, Savage etc etc), I guess I just don’t find them very convincing on their own, nor when compared to actual practice.

]]>Do you mean as opposed to the worst case, for example?

]]>Firstly, why summarise a distribution in a single number and secondly, of all functionals you could use to summarise a distribution as a single number, why the expectation?

]]>Yesterday, at Christmas, I opined on the importance of inertia in a long marriage (tomorrow is our 43rd wedding anniversary). The audience was my wife, daughter and son-in-law, an engaged couple, and a bachelor.

I have to say that my emphasis on inertia as an important characteristic in a long marriage was not universally appreciated. :)

]]>