Russian roulette: You can have a deterministic potential-outcome framework, or an asymmetric utility function, but not both

Jonas Mikhaeil and I write:

It has been proposed in medical decision analysis to express the “first do no harm” principle as an asymmetric utility function in which the loss from killing a patient would count more than the gain from saving a life. Such a utility depends on unrealized potential outcomes, and we show how this yields a paradoxical decision recommendation in a simple hypothetical example involving games of Russian roulette. The problem is resolved if we allow the potential outcomes to be random variables. This leads us to conclude that, if you are interested in this sort of asymmetric utility function, you need to move to the stochastic potential outcome framework. We discuss the implications of the choice of parameterization in this setting.

I like this paper! Working out the example and writing it up helped me understand a bunch of things that had puzzled me regarding causal modeling and inference.

Jonas and I engaged on this project after hearing from Amanda Kowalski about her recent paper with Neil Christy, which got us thinking about what you can get from stochastic models for potential outcomes.

24 thoughts on “Russian roulette: You can have a deterministic potential-outcome framework, or an asymmetric utility function, but not both

  1. Possibly related: I just read a very good novel. From Wikipedia: “Doctor Fischer of Geneva or The Bomb party (1980) is a novel by the English novelist Graham Greene. The eponymous party has been examined as an example of a statistical search problem.”

    Dr. Fischer is a cruel man who invites rich people to dinner and insults them. They must take it without protest, and if they do, they get a “prize” of jewelry or something worth about $20,000. For his last party, he has English Christmas crackers. Five of them have checks for $2 million (I forget the exact amount), and the sixth has a bomb which will kill whoever pulls the cracker. The interesting question for present purposes is which order they volunteer to go in. If you go first, you have a 5/6 chance of survival; if second, 4/5 (conditional on the first one not blowing up) and so forth. It is a matter of psychology and preferences. In the novel, it is further complicated by the fact that one of the guests wishes to commit suicide.

    • Having grown up in a family where Christmas crackers are a tradition… they require two people to pull, don’t they (i.e., one on each end)? Each person would thus pull 2 (i.e., 1/3 of a pool of 6) of the crackers: One with the person sitting to their right, and one with the person sitting to their left. I guess that would affect the probabilities, if not the strategies.

      • Good extensions. In the novel, the crackers are put into a bran bin far enough way to avoid collateral damage, and you have to pull both sides yourself. The narrator, who wishes to die, lost one hand in the London Blitz, so he pulls one end with his teeth.

  2. This is a timely post (or at least a timely paper) for me, because I’m thinking again about utility functions and the role of uncertainty. I’ll describe my issue; Andrew, maybe you or Jonas or someone else who has thought about this stuff can help me out…although my problem is quite a bit more complicated than the Russian Roulette example.

    I have a client with a ginormous electric bill (well over half a billion dollars per year), spread over a bunch of facilities all around the world. Each facility has its own electricity budget, typically maybe $80K per month. They can either buy electricity on the ‘spot market’, i.e. pay the real-time price of electricity, or they can buy a ‘hedge’: the company pays a given amount per kWh now, for the right to use electricity some number of months in the future.

    In principle (and, I would argue, in practice!) my client ought to take a company-wide, long-term view when trying to figure out how much electricity to buy in the form of hedges versus the spot market, but in fact each facility’s energy manager is judged based on how well they do compared to their budget. And this evaluation is quite asymmetrical: going over budget by $100K in a year is really bad, whereas going under budget by $100K is only moderately good.

    I should also mention that the risks are really driven by the occasional extreme event. The term “black swan” is overused in this context, but there are genuine black swan events on rare occasions (rare by definition, of course), as well as less-extreme events that are rare but not extremely rare. For instance, the typical February-average spot price of electricity in Texas is maybe $30 per MWh, and for many years it was never over $50, but a few years ago there was an ice storm that shut down a lot of wind generators and natural gas pipelines and this loss of generating capacity, combined with Texas’s nearly unique isolation from the rest of the national electric grid, led to prices over $5000 per MWh for several days. My client company has the ability to greatly decrease their demand for electricity for hours or even a couple of days at a time, but even so they ended up using almost their entire annual electricity budget within a few weeks at their Texas facilities.

    I’m on a small consulting team that has made a computer application to help the company decide how much to hedge. We don’t think we can beat the electricity markets, but we do think/hope that we can make use of market signals to figure out what the energy traders think about the probability of a future extreme price event: We fit a time series model to a Winsorized version of historical prices and use this to predict future prices in the absence of something crazy happen. The more the hedge price exceeds the forecast price, the higher the risk of an extreme pricing event…or at least, that’s the theory behind our calculation.

    We also forecast the client’s future demand for electricity.

    And we’ve used historical prices and hedge prices from many facilities to try to estimate the expected premium — how much extra does it cost to buy electricity in advance rather than waiting and buying on the spot market, on average, for a given spread between (forecast price if there is no extreme event) and (hedge price).

    The final part of what we need is some way of taking our client’s risk preferences into account. We took a kind of ‘revealed preferences’ approach to this: we asked about the hedging decisions they’ve made over the past five years, and how they feel about the decision process that led to those decisions. The key point is that they are willing to pay a premium on average in order to make sure they aren’t going to have a really huge bill (say 4x the budgeted amount), even for a single month at a single facility.

    Our contacts at the company are fairly sophisticated thinkers about this stuff and are aware of the problems with judging based on hindsight. In the end we were able to gin up a utility function that does a decent job at behaving as a proxy for the people who have been making these decisions. The client is generally happy with that because this lets them semi-automate the decision process rather than having to spend time and mental effort every month, deciding what future hedges to buy at all of their facilities. (By the way, our contacts are the company are also aware that it might not really make sense to worry so much about a single month at a single facility, but that is the way the company is currently structured to operate so for now that’s what we are doing).

    So far, so good… but I’m not really happy with the utility function that I ended up coming up with. Actually I’m not sure whether I’m unhappy with the function itself, or with the way I currently parameterize it: it has two parameters that don’t have a clear connection to anything the client (or I) normally think about.

    I don’t want to describe it in detail because I don’t want to anchor anyone who might be willing to provide advice about this. Let me just share some thinking about how to specify the utility function, maybe others can weigh in with helpful insights.

    One possibility is to specify an asymmetric utility relative to the budget. For instance, if the electricity budget in some month is $80K, we could define a utility U($x) = ($80K – $x) for x = $80K. If z > 1, the penalty for exceeding the budget is greater than the reward for under-paying by the same amount.

    Another possibility is to specify a utility function for the money the company retains after paying their bills. We could imagine some big pool of money, maybe roughly the entire annual budget, that each month’s bill is paid from, and have a nonlinear utility function such that a large reduction in the pool of money leads to a much larger decrease in utility than does a modest reduction in the pool. (Our current utility function takes this approach).

    A simple solution to our unhappiness with our current function might be to simply re-parameterize the function we have now, or else to add a layer of intuitive questions from which we calculate the parameters. For instance, we could ask: “if you have a choice between spending a guaranteed $80K next July, versus (95% chance of spending $60K, 5% chance of spending $300K), which would you prefer?”. Answer a few questions like that, and we can determine the parameter values that go into our current function.

    I’d be very interested in hearing any specific approach or approaches any of you can recommend, related to anything discussed here.

    • Hedging is a form of insurance and it sounds like your model is capable of estimating the cost of that insurance (price premiums for hedging) as well as the degree to which hedges protect your client against price spikes. I don’t see that much is gained by specifying a utility function. It can automate the decisions as you say, but at the risk of making the wrong decisions (perhaps all the time). Why not query the decision makers with the risk-reward profile from your model and have them choose what they like best. I suppose you could structure those choices and perhaps derive an automated procedure for making the decisions from their responses. But specifying a utility function seems to me like burying the necessary decisions under a rug. I often feel this way about optimizations – rather than “solving” the problem, I think the main value of models is to illuminate the choices that must be made – and providing that information to people who make the decisions.

      I can see the value of automating the process if, as you suggest, these decisions must be made often and in numerous places. But even then, posing the tradeoffs and having decision makers identify their preferred point on a risk-reward frontier might provide enough structure to automate the decisions without specifying an explicit utility function.

      • The company has about 450 facilities in about 150 substantially different electricity markets. Having humans make hedging choices several times a year for each facility is a burden they would like to reduce. If we can give automated advice that is pretty close to what they would decide with human judgment, that can save them some time and hassle. Also, it ensures a kind of consistency in the decisions, so there’s less chance of someone making a choice that is hard to justify if it turns out badly (and this is a real factor: people are in fact worried about being punished for ‘bad’ choices as judged my Monday-morning quarterbacks).

        • My job is not to fix their corporate decision-making, it’s to provide tools they can use to evaluate the statistical distribution of electricity costs that they’ll experience, conditional on the decisions they make.

          Yes, looks like a classic xy problem, which is why it doesn’t quite make sense: https://en.wikipedia.org/wiki/XY_problem

          The actual comparison they need is between the current uncoordinated method and the new proposed centralized method (whatever the details end up being).

          Also note local managers may have key insider info about stuff like maintenance of the power plants, grid etc that will never make it into a quantitative model.

          I am proposing that, despite its flaws, the current method may remain superior.* And the peculiar diversity of approaches used should be studied, and possibly mimicked by any centralized model. In particular, there could be multiple models deployed with different utility functions to avoid a “monocropping” problem.

          * Please consider that you have projected your own (and possibly the company contact’s) opinion (that it is “stupid”) and that is why the “old way” is not being used to inform your model.

        • Everyone is aware that the ‘right’ way to look at this problem is to consider the entire portfolio over a span of time, not to try to optimize the decision for each individual facility in each separate month. A diversity of approaches might make sense but it shouldn’t be a haphazard one that occurs purely because each individual facility manager is looking out for his own interests, with varying degrees of competence and attention to the issue.

          Overall I don’t think this is an XY problem so much as a classic misalignment of incentives, which happens all the time.

    • Each facility has its own electricity budget, typically maybe $80K per month.

      […]

      And this evaluation is quite asymmetrical: going over budget by $100K in a year is really bad, whereas going under budget by $100K is only moderately good.

      Seems like an xy problem, which is making it more confusing than need be.

      Some thoughts:

      1) Is your client the company, or these managers getting blame/credit? Seems like two partially conflicting sets of priorities.

      2) The actual concern should be something like widgets produced per $ spent on electricity. Looking at electricity alone the optimal decision is to shut down the facility.

      3) In the cases where prices jump to unprecedented levels theres gotta be substantial chance you won’t be getting electricity regardless of any contracts. Ie, counterparty risk is correlated with the price.

      4) There is no need to go 100% spot or futures. You can have a mix.

      • Anon, see, this is why I haven’t given up on you. These are good thoughts.

        1. Is your client the company, or these managers getting blame/credit? Seems like two partially conflicting sets of priorities.

        Absolutely right. The interests of the company are not perfectly aligned with the interests of the managers…arguably not even well aligned. Some people who are fairly high up in the company recognize this, but the problem hasn’t been fixed.

        I’m not even sure how to answer the question of who, exactly, is my client. I guess I’d have to say the guy who gives me my marching orders is really my client — he’s the one who tells me what to do, and who decides whether I’m doing a good job — and he’s one of the people who recognizes that the company’s policies for how they evaluate facility managers do not incentivize those managers to do what is best for the company. He is trying to centralize some of the decision-making around buying hedges so that at least that part of the decision-making is handled closer to optimally for the company, and individual facility managers would then be off the hook for some of the fluctuations in the energy budget.

        2. The company doesn’t make widgets (by any definition), it provides a service, but the widget example will do fine: I think any company in an energy-intensive business faces many of the same issues. Think about a company makes aluminum cans. They sign contracts to sell a million cans over the next three months, at $0.18 each, a price that is moderately profitable on average but with a profit that is sensitive to energy costs. Unless they want to breach their contracts, they have to make the cans. They can take their chances on the real-time market, or they can hedge some or all of their energy purchases. In principle (and maybe in practice) they could also write the contracts such that the customer bears some of the energy price risk, but some customers won’t like that uncertainty and will either go to another vendor or will hedge their energy cost risk; either way, somewhere someone is facing the choice of whether to buy electricity on the real-time market or to buy hedges.

        3. “In the cases where prices jump to unprecedented levels theres gotta be substantial chance you won’t be getting electricity regardless of any contracts. Ie, counterparty risk is correlated with the price.”

        Right again! Relatedly, my client does have some contracts in which the purchaser of their service is on the hook for a portion of the energy costs…and in some cases, when those costs have been far higher than expected, the purchaser has simply refused to pay! As I understand it, in essence they said “we acknowledge that if you sue us you can make us pay, but if you do that then we will never do business with you again.” I don’t have a view into that part of my client’s business so I don’t know how common that is, but I know it has happened with at least one major customer.

        4. “There is no need to go 100% spot or futures. You can have a mix.”

        Right. It’s rare for them to be fully hedged or fully un-hedged, they usually hedge between 20% and 90% of their electricity purchases, depending on the perceived volatility of the market and other factors.

        • He is trying to centralize some of the decision-making around buying hedges so that at least that part of the decision-making is handled closer to optimally for the company, and individual facility managers would then be off the hook for some of the fluctuations in the energy budget.

          Sounds like the actual problem is “find a work-around for the culture relying on the poor heuristic of higher than expected electricity bill -> bad facility management.”

          The chosen solution is to transfer power/responsibility for the electric bill away from those managers.

          Then your task to answer something like:
          “Is hundreds of people doing their own thing actually worse than a new way of centralizing the screwups? In particular when dealing with ~ once per decade/century price spikes.”

          Or perhaps it is to find an impressive argument to justify the above, which is already presumed true.

          It seems to be something along those lines at least.

          I’d start with the principle of robustness via diversity. It seems to be chosen by nature to deal with those kinds of problems. Eg 15% of the pop is resistant to A but more sensitive to B/C/D. Then 15% is resistant to B, but tradeoff is more sensitive to A/C/D, etc.

        • Anon,
          I do think there are problems with the way the business is run — this is surely true of most businesses — but that’s almost beside the point. However the decisions are made, there should be some utility function and they should be trying to maximize utility. My job is not to fix their corporate decision-making, it’s to provide tools they can use to evaluate the statistical distribution of electricity costs that they’ll experience, conditional on the decisions they make. If you can help me with ways to think about the utility function, that would be great. If you’re just going to do your usual thing of saying everyone in charge of everything is stupid, that doesn’t help me, even if it were true.

        • What about your utility function? Presumably it involves some combination of “whatever optimizes value for the company” (professionalism) and “whatever maximizes my own benefit in terms of continued employment, etc.” (self-interest; restatable as “whatever makes my client happiest”). Your conundrum seems to arise at least partly from the fact that these two utilities are not aligned in this case.

          Perhaps working out the utility functions for the two separate cases and then determining your own priority on how to apportion the weights is the way to approach the problem.

          Another interesting aspect of the problem is that the company’s asymmetrical utility function should change over time as your algorithm gains acceptance and individual managers become better able to blame the algorithm rather than be blamed themselves.

    • Phil, did the blog eat some of your text in the middle related to less than and greater than signs?

      “we could define a utility U($x) = ($80K – $x) for x = $80K. If z > 1”

      I mean, I assume you introduced z somewhere and it was eaten.

      The first thing I suggest is making the problem dimensionless, then each facility can be compared to something appropriate for that facility. So something like E = (income – expenses) / value_at_risk. Since I happen to know that you’re talking about a storage facility value_at_risk might be something like the amount of spoilage that’s reasonably possible. Or perhaps the uninsured out-of-pocket spoilage possible or whatever.

      then, I really like logarithmic utility as a first pass… log(E) is the utility function for the facility, and you maximize its expected value under your model for the possible future outcomes based on prices etc. Now, however, you’ll also want some model for spoilage, so you can trade off spending electricity to prevent spoilage, vs saving electricity given low risk of spoilage.

      One of the concerns might be if they allow the temp of the facility to rise to save money, assuming they’ll be able to tide over until prices drop (say in the evening) but then the power goes out and they can’t bring it back down until it’s restored.

      • Ha, yeah, that x=$80K is supposed to be (x less than or equal to $80K), and then there’s supposed to be another equation, U($x) = z ($80K – $x), for the case that (x greater than 80K). I wish this blog could handle mathematical symbols. Grr.

        You’re right that scaling everything to the individual facility makes sense, although most of the facilities are roughly the same size (in terms of dollars per year) so this is maybe a factor-of-two effect, or less, for most of the facilities. At any rate one of the two parameters in our current utility function is a scale parameter, adjustable with a slider in the computer program that we’ve written.

        “One of the concerns might be if they allow the temp of the facility to rise to save money, assuming they’ll be able to tide over until prices drop (say in the evening) but then the power goes out and they can’t bring it back down until it’s restored.”…

        Even if they just do normal operations, no attempt to change the temperature in response to price signals, they are still vulnerable to power outages.

        Long power outages like that are not impossible but (1) these places have huge heat capacity and can coast for hours without a problem; (2) they also have generators that are typically not big enough to power the whole facility but that can do enough to extend their problem-free period out to a day or more. Very few power outages last more than a day. Where they think there’s an elevated risk of such long outages, they install more generating capacity or storage. (Over their many facilities, they’ve got a big range of linear electric generators, batteries, solar, and conventional generators).

        Finally: if the price pattern of the coming day is fairly predictable, they can pre-cool when the energy cost is especially low, hold the temperature at that lower value with an energy consumption that is barely higher than normal, and then shut off the chillers when the energy cost is extremely high. In this paradigm the temperature never goes above the baseline temperature. This is their preferred approach.

        • I wonder how much they have an affect on risk of power outage. Like, suppose it’s a hot day and the grid is at risk. If they plan right and reduce temp overnight, then disconnect at noon and try to tide over until say 9pm … does this lower the risk that there will be an extended outage? Or are they big but not that big compared to a region so their contribution to reduced heating of transmission lines and transformers and etc etc doesn’t actually reduce their risk of losing power for extended periods?

          Anyway, I imagine if they hedge their power consumption a bunch, they’d just power through the hot part knowing they’ve got fixed electricity prices and let someone else take the financial risk… but does that increase their physical risk of power loss?

          Interesting questions.

        • So Phil, doing a more careful analysis, to get a dimensionless group you’d want to compare income to a rate of return on the size of the risk… my previous analysis had dimensions of 1/time. but also you want to compare buying hedges to buying bonds of similar risk.

          log((sales_income – operating_expenses + rate * unspent_budget) / (rate * size_of_facility) ) where the rate should probably be the interest rate they could earn on bonds with a similar risk profile to their operation. The two rates might not be equal.

          in the full model it’d be maximizing that expectation based on your model. Now, there’s a tradeoff between buying the hedges vs buying bonds with the money they would have used on the hedges.

          operating costs would include:

          1) spot cost of electricity
          2) cost of generator fuel
          3) cost of wear and tear on generators
          4) cost of damaged goods
          5) cost of hedges

          And there’s also a tradeoff between choosing to power through the heat with hedges, or cut off power and run the generator, including the costs of operating the generators.

          The sophisticated version of this would include automation of decision making a day ahead of time on cooling strategy, daily weather fluctuations, and an agent based simulation of a full year or quarter or whatever say hour-by-hour, with risk of power outage and spot price fluctuations included. You’d run batches of say 100 hour by hour simulations of say 3 month periods, and then stochastically find the optimal degree of hedging and generator usage.

          Hit me up if someone’s interested in going that direction.

  3. I thought that it was a settled issue sometime back in an interaction with S. Greenland in this blog that causal effects should be viewed as distributions. Perhaps someone with a better memory, or google prompt can provide a more explicit reference? Most medical treatments seem to fit in this extended causal framework. It is rare that treatments have an outcome that looks like a point mass or is well summarized by an expectation.

    • Roger:

      It came up in a conversation with Andrew Vickers as explained in this post from 2021. At the end of the post I wrote, “In comments, several people make the point that the two frameworks discussed above are mathematically equivalent, and there is no observable difference between them,” which is correct if you are going to do classical (Bayesian) decision analysis but not if you are using an asymmetric utility function, hence the new paper.

      • Thanks! That is precisely the post that I recalled reading, but couldn’t find. I’m not sure I understand your last sentence above: Doesn’t classical Bayesian decision analysis allow for asymmetric loss?

        • Roger:

          To be precise, the difficulty arises with a utility that depends on unrealized potential outcomes. We refer to this as “asymmetric loss,” but what’s relevant is that the asymmetry involves both y^1 and y^0. In classical Bayesian decision analysis, the utility depends only on the realized outcome.

Leave a Reply

Your email address will not be published. Required fields are marked *