Speculating about assigning probabilities to speculative theories: Lots of problems here.

Asher Meir writes:

Whenever I see theories about ancient stuff I always feel it is very speculative. “This artifact is a stone ax from a hominid from c. 800,000 BP”. “The Samson story in the book of Judges is based on folk legends about a Hercules-like half-man, half-god figure, but edited to make it conform to a monotheistic worldview”. To the extent that these conclusions really represent the best understanding of experts, they sound to me like a maximum likelihood estimator when the likelihood function is very flat. In such a case, the most likely of all possible explanations is quite unlikely to be the right one.

I’m not really an expert in anthropology or in philology, but when I do look at the professional discussions – and not only at the conclusions – I still have the feeling that scholars are trying to weigh one explanation against others and agree which is the most likely of all explanations considered. I don’t really doubt that they are very good at this, but I’m not sure they realize how far “the most likely” can be from “likely”.

When you have a continuous variable, you can give some kind of confidence interval. You can draw it into your graph with interval bars or whatever. I think it’s worth thinking about how we could generalize this idea to disjoint theories. It’s either the humerus of a huge dinosaur or the femur of a smaller one. It’s either a stone ax or a sharpening tool. Either this ancient narrative took a legend about a superhuman figure and cut him down to human size, or perhaps it took stories about an actual historical hero and talked up his feats. Etc.

Instead of arguing over which explanation is the correct one – which as far as I can tell means arguing over which explanation is most likely to be correct – experts could argue over what probabilities to attach to each explanation. I often feel that when I read something like “experts agree that this is a dinosaur bone” it really means something like: “There is a 40% chance that this is a dinosaur bone, but also a 20% chance it belonged to a hippopotamus and a 10% chance for each of four other, comparatively improbable possibilities, including that it is not ancient at all.” Which is much more informative.

My reply: I think it’s fine to assign such probabilities. We should just be aware of some difficulties that always seem to arise when we try to do so.

First, there’s the political or religious angle. How can we say that there’s an essentially zero probability that Moses crossed the Red Sea or that that Mormon dude found gold tablets underground or that God said to Abraham, “Kill me a son” or whatever? For politeness we’re supposed to either give these sorts of things meaningful probabilities or just somehow avoid talking about them.

Second, there are all the things that stop people from freely betting on these probabilities: thin markets, the vig, betting limits that make it difficult to arbitrage, uncertainty in collection, etc. All of that arises even in bets with outcomes that can be measured (recall some of the implausibly high probabilities implicitly assigned to some outlandish events during the 2020 presidential election). With the dinosaur bone thing it’s even more difficult because sometimes you can never be sure about the answer.

The other challenge is how to think about the “all other possibilities” option. Once you include “all other possibilities” in the probability tree, you can’t do any sort of serious Bayesian inference because there’s no reasonable way to set up a probability model for the data conditional on reality being something not yet considered in the model.

My usual answer to this last point is to emphasize that all our models are conditional on some assumptions, which is fine, but it’s counter to the scientific goal of evaluating imperfect hypotheses about the world. That’s one reason, perhaps that key reason, that I consider Bayesian inference to be a valuable tool of science but not a good framework for scientific learning.

I use Bayesian inference to learn within models, to evaluate models, and to make decisions, but when I’m doing science or engineering I don’t have a sense of working with an overall Bayesian structure.

P.S. Unrelatedly, Meir writes:

I sent my “Gibbs sampling” thing to a few people, and it turns out almost none of them understood it. Sampling is a very common technique in rap music where you include a clip from a song, usually an old one, and you overlay it with new vocals, instrumentals etc. (Go to 1:45 here.) The Bee Gees are the Gibb brothers. (Brothers Gibb.) So a rap song that includes a clip from a Bee Gees song (there are many) is “Gibbs sampling”. I thought it was very funny but evidently the Venn diagram of the people who are into the various pieces of this riddle intersects only on me.

54 thoughts on “Speculating about assigning probabilities to speculative theories: Lots of problems here.

  1. It’s an interesting idea, but suffers from several issues. I think the “all other possibilities” thing on its own is a dealbreaker. Aside from that, we’d have to contend with a lack of statistical literacy among humanities profs (not universal, but we can see how statistics on its own is tricky for people who do it every day–now we have to deal with a bunch of probability estimates from people who haven’t taken a stats class in decades). And a lack of consensus: Ken Ham will assign p(Moses crossed the Red Sea)=1, while Richard Dawkins will estimate it to be 0. Have we really gained any information? Even among non-celebrity historians, I fear that there will be a tendency to overestimate the probability of their pet theory (to the extent of giving 100% probabilities), which is something that can be seen even in the “hard” sciences (e.g. https://www.science.org/content/article/reversing-legacy-junk-science-courtroom)

  2. The other challenge is how to think about the “all other possibilities” option. Once you include “all other possibilities” in the probability tree, you can’t do any sort of serious Bayesian inference because there’s no reasonable way to set up a probability model for the data conditional on reality being something not yet considered in the model.

    The possibilities in the denominator are the ones you know about. If someone comes up with a new one it changes the equation. This makes all the others less likely, possibly by a lot if it is a really good explanation.

    This is how humans think in general, I don’t see where there is a problem.

    Of course, these days dogmatism (resistance to exploring novel explanations) is making a comeback. Newton would get called a conspiracy theorist for connecting seemingly unrelated dots (and actually he was paranoid who thought people were conspiring against him to hide women under his bed and such). But for a few hundred years after the reformation, western civilization was generally open to exploring novel ways of thinking about problems (ie, populating the denominator of Bayes rule).

    I mean, what is the actual evidence Julius Caesar crossed the rubicon? If you look it up there is very little concern for such issues. But probably some manuscript written down 1000 years later, ultimately based off propaganda Caesar had produced. Then maybe some other document dating to the 1400s supposedly based on something written by Cicero. And how were these documents dated? Carbon-14 dating, which was originally calibrated to the bible (“known” dates of egyptian artifacts) and then used to calibrate tree rings, which are then used to calibrate radiocarbon.

    People get upset when you question such things, but it is essential to populate that denominator with as many possibilities as you can.

    • There are hundreds of years of incredibly detailed scholarship by philologists and historians because of “concern for such issues”. Whole academic fields are devoted to them and there are pop books and textbooks as introductions. Just to get started, the best sources for Caesar’s revolt are Cicero’s contemporary letters. They have dates in the Republican calendar and consular years, and scholars went to a lot of trouble to rebuild the list of consuls and anchor it in absolute time. The manuscripts of Cicero’s letters are dated by paleography, ie. the handwriting. The science of textual criticism works out how the manuscripts relate to what Cicero once wrote, and when the latest common ancestor of the surviving MS likely dated.

      There were scandals about the possible forgery of ancient texts in the 15th and 16th century! A good history of classical scholarship will get you started on how those scandals were resolved.

      • Thanks, I was just guessing Cicero.

        Which of Ciceros letters describes the crossing of the rubicon event?

        What is the earliest dated manuscript, where/when was it found, and by whom?

        I think you’ll find getting the answers to these questions is not at all easy. There is no reason that must be the case, apparently most people just don’t care so it doesn’t get put on wikipedia, etc. “Says so in a bunch of books” is evidence enough.

        • Like most things, its much easier to find if its your profession to find it. Stock traders can bring up the financial details of a company and historical yields of various assets in their country in a few clicks, engineers can bring up properties of different alloys, and ancient historians and philologists can bring up sources for an ancient event and the history of manuscripts. If you want the manuscripts of Caesar’s “Civil War” or Cicero’s letters, you pick up what is called a critical edition which has a list of manuscripts and discussion of their strengths and weaknesses at the beginning (usually in Latin) and notes on how the various manuscripts are different from the printed text in the margins. These days many of the manuscripts are digitized, earlier you needed to check a printed catalogue. The Perseus Project has a translation of Cicero’s letters organized by date rather than correspondent https://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.02.0022%3Atext%3DF%3Abook%3D16%3Aletter%3D11 Histories of the Roman civil war and biographies of the protagonists discuss the basis for various stories like the “die is cast” quote. People writing for a mass audience leave most of this out because they think its too technical or because they think their readers want unambiguous answers. Wikipedia has special quirks, they have a taboo against citing primary sources! There have been cases where someone edits an article to cite the only ancient text that says X, and an editor reverts it to just citing some 19th century reference book which misrepresents that text because of one of Wikipedia’s rules.

        • If you want a book which goes into the nitty gritty of a mysterious period of ancient history in plain language, I can recommend “Akhenaten: A Historian’s View” by Ronald T. Ridley. He does not discuss our general chronology of ancient Egypt in much detail, Peter James’ “Centuries of Darkness” is a bit old and technical but it has not quite been disproved despite many attempts. Some people do try to present things which are provisional as facts which no sensible and right-minded person could question, but people are human (in the case of Egyptian chronology, every year or two I find a professional quietly saying that hardly anyone would accept the current system if it were published today, or noting some of the problems squaring it with archaeological timelines in other regions and then changing the subject).

        • Re: Wikipedia primary sources: If there’s a contemporaneous letter that says Caesar crossed the Rubicon on such and such a date, the Wikipedia article can’t cite that letter as evidence that Caesar crossed the Rubicon on such and such a date, but it would be able to say that “at least one contemporaneous letter claims that Caesar crossed the Rubicon on such and such a date” with the letter as citation.

        • John N-G: I am sure there are ways to get around Wikipedia’s absurd policies, but they take time and effort to learn and the editor with more time to play barracks-room lawyer has the advantage. I provided a link to the blog post about one editor’s experience.

      • Wow, I had assumed it was mentioned in Bellum Civile, but apparently it isn’t there either:

        Several contemporary authors wrote about the outbreak of the civil war, but the crossing of the Rubicon itself, now a famous symbol of Caesar’s irrevocable decision to invade Italy, does not take on special significance in the extant literary tradition until decades later. Caesar’s own commentarii record his passage from Ravenna to Ariminum but, perhaps not surprisingly, are silent about his crossing of the provincial boundary (B Civ. 1.8); Asinius Pollio, who was with Caesar at the Rubicon, almost certainly described the crossing in some detail, but his lost account may be glimpsed only through the f ilter of later writers, which makes it difficult to discover what significance, if any, he attached to the event;1 Cicero’s writings include a negative response to the start of the war and to Caesar’s actions in general, but they make no mention of the river. Early in the next century, however, the history of Velleius Paterculus provides the earliest extant mention of the Rubicon, marking the point in the surviving literature where the crossing itself begins to be synonymous with the outbreak of civil war, but Velleius includes no details of the event and does not ascribe to it any further significance (2.49.4).2

        It is only in the literature of the Neronian period and later that we find fully developed “Rubicon narratives.” As a result, our modern view of the crossing depends almost entirely upon three relatively late passages: an episode in Lucan’s epic poem (1.183–235) and the full but differing accounts in the biographies of Caesar by Plutarch (Caes. 32) and Suetonius (Jul. 31–32). By this time, more than a century after the event, the Rubicon has taken on great interpretive meaning, looming large as the thin line between war and peace, between personal ambition and political consensus, and between the old and the new Roman state.

        https://www.jstor.org/stable/10.7834/phoenix.65.1-2.0074

        Before even getting into the manuscript aspect, we are generations seperated from whatever actually happened.

        • Phil: I’m on the wrong computer, but see (IIRC) https://rambambashi.wordpress.com/2012/09/14/original-research-at-wikipedia/ I have personal experience trying and failing to get someone’s statement about their autobiography accepted (someone’s website with their CV is self-published and haram, but online journalism is OK, even though the journalist probably asked the subject and did no further checking too).

          Wikipedia has a mass of contradictory and indifferently enforced policies, and if you have endless time and are willing to learn their weird rules you can get things through a hostile editor, but it costs time and energy. German wikipedia seems more scholarly / scientific than English wikipedia.

        • And Phoenix is a very respectable classics / ancient Greek and Roman history journal! So the literature for researchers has lots of questions, but the average TV documentary or encyclopedia article avoids them because they don’t think their audience wants ambiguity.

          (OTOH, Cicero’s letters make clear that a few days after 12 January 49 BCE Caesar had lead part of his army out of his province and towards Rome, so in that sense they document the crossing of the Rubicon ).

        • Sean,
          I recall that Andrew wasn’t able to add some information to his own Wikipedia page (such as the fact that he is married, and the number of children that he has), so he mentioned it in an interview somewhere and then was able to use that interview as a source. So, you’re right that they have some absurd policies. But it’s odd because they have a page that discusses their policies and they do not seem absurd.
          Maybe an issue with Andrew editing his own page is: how would Wikipedia know it was really him? If someone said “I’m Andrew Gelman and I have fourteen children”, how would they check? Whereas if there’s some interview in a newspaper or something then you can hope that the interviewer will have made sure they’re talking to the right person. I could see how a personal website would be a gray area as well.

        • @ Sean,

          What is your opinion on this “crossing the rubicon”? From looking into it a bit now I am pretty confident the story is made up or at least heavily dramatized. It isn’t even clear what river “Rubicon” referred to at the time…

        • Anoneuoid: I guess the question is, what does “crossing the Rubicon” mean to you? There is a lot of evidence that around January 49 BCE Julius Caesar lead an army from the Po Valley towards Rome where he seized power, leading to years of civil war and his assassination. Contemporary sources include Caesar’s Commentaries, Cicero’s letters, and coins dated by consular years. The specific details such as “did he say ‘alea iacta est’ as he watched his troops leave their province?” can be debated forever! I would agree that encyclopedias, documentaries, and mass-market books often present one story as the definitive version but if you look closely you can find all kinds of debates.

        • This is the Suetonius account:

          As he stood there, undecided, he received a sign. A being of marvellous stature and beauty appeared suddenly, seated nearby, and playing on a reed pipe. A knot of shepherds gathered to listen, but when a crowd of his soldiers, including some of the trumpeters, broke ranks to join them, the apparition snatched a trumpet from one of them, ran to the river, and sounding the call to arms blew a thunderous blast, and crossed to the far side. At this, Caesar exclaimed: ‘Let us follow the summons, of the gods’ sign and our enemy’s injustice. The die is cast.’

          https://www.poetryintranslation.com/PITBR/Latin/Suetonius1.php#anchor_Toc276121822

          It is written over a hundred years after the event and involves a supernatural pipe player.

          I’m sure there is something that happened the story is based on, but doesn’t seem any more reliable than the bible as a source of factual info.

          And once again, that isn’t even getting to the issue that the source manuscripts probably come from ~1000 years later.

    • Anoneuoid –

      > Of course, these days dogmatism (resistance to exploring novel explanations) is making a comeback.

      > > I mean, what is the actual evidence Julius Caesar crossed the rubicon?

      Citation needed. What’s the actual evidence that (1) dogmatism ever left in a relative sense and/or that it’s (2) making a comeback?

      Just ’cause you might feel there’s an identifiable patten of change doesn’t make it so.

  3. Asher Meir writes: “I sent my “Gibbs sampling” thing to a few people, and it turns out almost none of them understood it. […] I thought it was very funny but evidently the Venn diagram of the people who are into the various pieces of this riddle intersects only on me.”

    Clearly, you have the wrong set of friends. :-) Or maybe they’re just too young. I loved it, myself.

  4. Asher Meir needs to read press releases and science journalism better. Journalism about research tends to exaggerate conclusions, exaggerate novelty, and be vague about how the conclusion was reached and other possibilities. That is just the nature of the system of journalism (including, more and more often, university public affairs offices). If you want the details you need longer publications by experts aimed at an audience with skills and background knowledge.

  5. Interesting, but I have a very hard time imagining how one assigns a probability to the options in “either this ancient narrative took a legend about a superhuman figure and cut him down to human size, or perhaps it took stories about an actual historical hero and talked up his feats.” If you ask someone for a probability for something like this, you’ll just get a made up number that’s worse than no number at all — a meaningless quantitative veneer. In fact, the only decent idea that comes to mind for devising a sensible probability would be to calculate the fraction of professionals in the field who hold various non-probabilistic opinions! (E.g. 80% of historians think this ancient narrative took a legend about a superhuman figure and cut him down to human size.) Not that that’s great, but it’s better than pretending that “one historian assigns an 80% probability to the statement that this ancient narrative took a legend about a superhuman figure and cut him down to human size” means anything more than “he/she thinks it’s pretty likely.”

    4 out of 5 dentists agree with this comment.

    • “If you ask someone for a probability for something like this, you’ll just get a made up number that’s worse than no number at all — a meaningless quantitative veneer.”

      +1

      Scientists’ estimates of the probability that their own idea is the correct one will always be subject to the Lake Woebegone Effect (…where all the children are above average).

      There is nothing more boring than someone’s likelihood estimate expressed as a probability about some nonquantitative phenomenon. The algorithm that produces the number might be interesting, but the number not so much.

  6. Andrew, you say: Once you include “all other possibilities” in the probability tree, you can’t do any sort of serious Bayesian inference because there’s no reasonable way to set up a probability model for the data conditional on reality being something not yet considered in the model.

    I’m confused by that comment because once you include “all other possibilities” in the probability tree, you ARE considering it in the model, aren’t you?

    I’m currently working on a project that requires modeling the statistical distribution of future electricity costs. How much will a megawatt-hour of electricity cost in Chicago in July 2024 (on average over all weekday afternoons), for example, for commercial customers who are on a real-time pricing plan? I fit a model based on historical data, assuming a shifted lognormal distribution of prices…but I don’t think a lognormal distribution has the right tail. In any market like this there is a small but non-negligible chance of something crazy happening. For example, there was an ice storm in Texas in February 2021 that shut down a huge amount of generating capacity and led to a huge price spike. The electricity price is usually something like $40/MWh in February in Texas but it went to $6000/MWh for a week. In many markets we have no data on any actual crazy events like this, but the potential is always there: terrorist attack, nuclear power plant taken offline, unprecedented weather event, whatever.

    Anyway, in most markets we have a decade of data that don’t contain any crazy months, or maybe have one crazy month. Usng the non-crazy data we generate a forecast distribution for future months under the assumption that they’re non-crazy. But the model we actually use is a mixture model, in which the bulk of the distribution is non-crazy but we have a little admixture of craziness, in the form of another lognormal distribution with high GM and GSD, and we assign a bit of probability to it. We have some information from the futures market that we can use to determine what ‘the market’ thinks the risk of a crazy price is, but even beyond that we add a soupçon of craziness probability because we think the market systematically underestimates, or at least underprices, the risk of an extremely rare but extremely costly event.

    So…isn’t this an example of including “all other possibilities” in the probability tree?

    • I think including an infinite variance / high kurtosis mixture does a good job for risk management and captures the possibility that electricity will be $100 kwh/hour, but it still doesn’t capture the possibility that North Korea or Russia nuclear strikes major cities in the US and electricity prices in Chicago go to infinity, though I think there is a discrete non-zero probability mass (and very low) on that.

      • somebody,
        There may well be a trick or method that I’m missing, but I couldn’t find a good way to calibrate (or apply prior information to) a single long-tailed distribution. We have a fair amount of historical data, almost all of it from non-crazy months, so the distribution needs to be consistent with that. But we also need to have something like a 1% chance that a future month has a crazy price (more than 3 GSD higher than the GM of the non-crazy months)…but it can’t be too crazy (which of course we can impose by applying a constraint on the upper end if we need to). It’s the bit about needing to conform to the distribution of historical data, while still having a tail that makes sense, that makes it tricky. At least to me. Anyway I found it easiest to just conceptualize it as two separate distributions. It’s easy to think about ‘how likely is it that there will be some huge shock to the system, and, if that happens, what will the price distribution be.’ (It’s also relevant that that is an easy concept to explain to my client.)

        In the ‘all models are wrong’ sense, the mixture model is obviously wrong. It’s not really the case that months are either crazy or non-crazy. What if the Texas ice storm had lasted half as long or had only knocked out half as much capacity? Or 1/4? Or 1/8? How many grains of sand are in a heap, and how extreme does an event have to be in order to cause a crazy month of electricity prices? So I’m not going to defend it as the ‘right’ model. But it is easy to understand, explain, and implement. At least for me, that wasn’t the case when I tinkered with some single long-tailed distributions.

        • I’ve done a fair amount of probabilistic modeling myself, including two books on the subject, so I do believe it is feasible (and worthwhile) to specify as many possibilities as possible. But I also believe in Black Swans – which by definition cannot be quantified (or perhaps I should say “should not”). The existence of Black Swans need not render the quantified model useless – but it should impose some humility on its creators. The real problem is when you start believing your models “are” reality rather than imperfect models of it.

    • Phil. The difficulty comes when you have higher dimensional outcomes. Instead of say price of electricity in Chicago consider price of electricity in every major commercial market in the US simultaneously. I don’t know how the markets are structured exactly. But let’s pretend it’s at least one per state, and maybe in states like CA it might be several separate markets, like SoCal, NoCal, and various regional markets in rural areas….

      So now you need to predict let’s call it 100 dimensional vector of prices. How do you include “all the ways in which weird stuff could happen?”

      I mean, Chicago goes through the roof from a major ice storm, and Florida is wiped out by a hurricane and Texas suddenly has too much electricity capacity because it can’t export… And SoCal has a lot of sun but it’s medium cool weather… Etc etc

      • Sure, indeed we have discussed exactly this sort of thing: the company shouldn’t be optimizing their decisions at each location independently, they should take into account the relationship between them. The fact that all of Europe is now experiencing historically high energy prices at the same time is a great example.

        But one shouldn’t just throw up one’s hands and say “brew me a pot of tea, I must consult the tea leaves.” You’ve gotta do the best you can.

        • Why not? We have a mixture model that includes a long-tailed distribution, through which we try to capture “all other options” that we don’t explicitly model. We know we don’t have enough events from that distribution to model it on its own so we rely mostly on the prior, which in this case is mostly guesswork.

          Both you and Dale suggest there’s something wrong with this but I’m not seeing why. Please explain using small words and tell me what you think I should do instead.

        • Phil
          I am not objecting to what you have described that you did. That is, unless you are saying that your model has included “all other options” in which case I do object. I’m sure I can think of options that your mixture model does not cover, but as I’ve said, some things are just not quantifiable nor should we believe we can quantify everything.

        • Dale,
          The upper component of our distribution is intended to cover “all other options” that are not modeled in component 1. Trivially, it cannot do so accurately because we have nowhere near enough data to characterize the distribution, nor even to know how frequently we should draw from the upper distribution rather than the lower. So if you want to argue that we aren’t _correctly_ modeling component 2, well, what can I say other than “of course we aren’t, how could we?”

          But you seem to be saying something stronger, which is that we needn’t _try_ to quantify the frequency or consequences of “anything else”, and that seems wrong to me. We have to somehow make a decision on how much electricity to buy in advance (if any), and how far in advance. It’s one of those situations in which even “no decision” is a decision. We can decide to ignore the possibility that “anything else” might happen, but that’s equivalent to assigning a probability of 0 that “anything else” will happen. Or we can make up a statistical distribution that is intended to capture “anything else” as well as we can, recognizing that we aren’t going to do it perfectly. The latter seems much better to me than the former, especially since the “anything else” possibilities are largely responsible for the decision to buy in advance.

          Are we talking past each other somehow, or do you think I’m missing something?

        • Phil. If you talk about *one* location then the outcome is a scalar “spot price of electricity” and yes, you can put a probability distribution over all possible outcomes, just have some long tailed distribution whose support includes all quantities from -Inf to Inf.

          But suppose you have 1000 different locations. You need to place a 1000 dimensional multidimensional distribution over the entire spot-price vector. Said distribution has to meaningfully include in the high probability region all the plausible ways in which things could go wrong, and NOT include in the high probability region any of the things which are horribly implausible that you could potentially encode (such as for example every neighboring metropolitan area having prices either going to 0 or to $10k/MWh in alternating order).

          The task of encoding reasonable priors on 1000 dimensional spaces is really hard. You’ll have to pick some specific methodology. And I guarantee once you do I can sample from that vector and find some problem with the result. You’ll agree it’s a problem. You’ll tweak it, and then we’ll do some sampling, and again it’ll have a problem. It’ll just always have a problem. There’s just too much volume in 1000 dimensions!

          I mean, you can make an attempt, and you should, and there are some smart ways to go about this stuff I think. But it’ll likely always include a bunch of wacky outcomes as similarly probable to very mundane everyday outcomes, because it’s just hard to whack-a-mole all the weirdness.

        • Let’s not forget that in addition to the spot price, there’s also the question of how much generation capacity is really available at that price. In a totally unregulated market perhaps when generation capacity goes to 100kW for all of Chicago, some super rich dude can just pay $1M/kWh and keep his penthouse suite comfy. But in reality we all know that it doesn’t really matter what the price is if there’s just one guy with a portable diesel generator for all of Chicago basically no-one is getting power except him no matter what. So the dimensionality of the problem is a little more complicated than just spot prices, it’s also about transmission lines being intact and generators being online and fuel being available etc etc. Maybe your company has insurance against the price of electricity going above $2500/MWh but in reality they can’t buy power at all even though the insurance is obligated to pay for it.

          When I say “all the things that could happen” it really means a very high dimensional space of complexity that includes transmission line status and whether the Independent System Operator is even online or has been taken out by a terrorist denial of service attack on their computer infrastructure, or etc.

        • Phil
          With respect to whether you should try to include “everything” in your probability distributions, I don’t think you should. As long as your distribution is continuous and has tails that include -Infinity to +Infinity as possibilities, you could say that everything has been included. An EMP could knock out all generating capacity – the resulting impacts would be included in that probability distribution although the estimated probability of that event was not derived from anything meaningful. To try to include a meaningful estimate of the probability of such EMP or the probability of large meteor crashes impacting both supply and demand strike me as bad idea – not that they are impossible, but I don’t see any legitimate way to model these, given our current state of knowledge (subject to change, of course).

          Probability distributions are great ways to model many uncertainties, but not all. For low probability-high consequence events with no good ways to estimate probabilities, there are other techniques that lead to better decision making. More qualitative analysis can be a better aid to decision making than forcing the quantitative models to try to include everything. I think the risk is two-fold: first, the quantitative models become less useful by mixing fairly solid analysis with mostly speculative hand-waving, and second, the quantitative model is not likely to capture adequately the likelihood or consequences of Black Swans.

          I think the issue of where to draw the line between what is included in your model and what is not is case dependent and something every analyst must decide. Ultimately, the choices you make will be tested by your consulting clients, regulators, and perhaps even in courts of law. If your model did not include the possibility of meteor impacts on electricity prices, I don’t think your work will be undermined, but if you fail to include potential record breaking cold weather in Texas, then I’d say your model is inadequate.

        • Phil wrote:

          “The upper component of our distribution is intended to cover “all other options” that are not modeled in component 1. Trivially, it cannot do so accurately because we have nowhere near enough data to characterize the distribution, nor even to know how frequently we should draw from the upper distribution rather than the lower. So if you want to argue that we aren’t _correctly_ modeling component 2, well, what can I say other than “of course we aren’t, how could we?”

          I think this discussion has gone pretty far afield from what Andrew was talking about. An example of the sort of problem where you list, evaluate, and resolve every possibility is trying to figure out why your car won’t start. It is still science and it still requires a logic model of the system, but every part can be tested P/F and you just go do it. Perhaps if one were to try and predict which part failed from a limited data set, Bayesian reasoning might be useful, but in most real-world applications there is no need for anything other than brute force testing.

          The model used to figure out why your car won’t start is what I call a “Tic-tac-toe model” (I have a pet name for every concept!). While the complexity is just beyond what we can lay out in our head – it’s hard for us to see all possible ends of a tic-tac-toe game after the second move but easy for a computer – everything can be determined to an adequate degree through testing. As soon as you add a single distribution – a variable that can only be estimated – you have a different type of problem where it becomes meaningful to talk about different modes of reasoning.

        • Dale,
          If all of the generating capacity is knocked out then there is no electricity to buy and your electricity cost will be zero. The cost per MWh would be undefined in that case.
          If that’s an example of what you mean when you say we shouldn’t try to model everything, I guess I agree. In a case like that our client would be unable to fulfill their contractual obligation of keeping their warehouses below a specified temperature in order to keep their clients’ food frozen. (Well, actually they have a fair amount of emergency generating capacity at most of their sites that would let them make it through a day or so without external power, but at some point they will be unable.) I don’t know what their contracts specify in a case like that, or whether this would end up in court to decide whether there’s a force majeure exception or something similar.

          But we really are trying to create a distribution — actually it’s a joint distribution of (price, quantity) — that includes every possible (price, quantity) combination that leads to nonzero electricity costs.

          Do I really mean _every possible_? Maybe not. For instance, in most markets there is a limit, set by regulators, on how high the price can go. But what if there’s an emergency and the relevant legislatures pass a law that removes or increases that limit? Should we try to cover that possibility? I think you’d say No. I might agree that in practice we aren’t going to try to do that, which I guess is a weak way of agreeing with you. But I would also say that in principle we _should_ try to do that; the reason not to do so would be pragmatic rather than principled. It would be a question time and effort expended vs very low potential gain in the form of improved recommendations.

          Daniel,
          Yes to your point that we have to do something specific, that’s my point too. We need to be able to turn the crank and get out an answer: we recommend buying such-and-such amount of electric energy in advance. We have statistical models that forecast how much energy a facility will need to buy, and what the price will be. Actually we have a joint distribution of these. But we are believe the resulting joint distribution from our models does not represent reality, because the models are trained on data from the past but things can happen in the future that have never happened in the past. Specifically, we think there is a higher chance of extremely high future prices than is predicted by our models. We are trying to address that by basically manually adjusting the price distribution so that it doesn’t have that shortcoming, or, more correctly, so that we don’t know how to change it to make it better. I (still) don’t understand what is wrong with that approach. Surely it’s better than proceeding with a model that has known shortcomings?

          You keep mentioning the multidimensional problem of dealing with many facilities simultaneously. I agree that is a hard problem but I don’t understand why you keep bringing it up in this context. The fact that we aren’t going to be able to model it perfectly..sure, we can’t even model a single facility perfectly, there’s absolutely no chance that we could even do the joint decision for two facilities perfectly, and our client owns many more than two facilities. But I don’t see how this leads to the conclusion that we shouldn’t do the best we reasonably can. Here’s my request: instead of pointing out the difficulties and telling me I’m doing something wrong, tell me what I should do instead.

        • Phil, I agree with you that you have to do the best you can. I just think that the higher the dimensionality of the problem the more likely you are to be doing something that you know is obviously wrong but you don’t realize you’re doing it.

          What should you be doing instead? I’m not sure. Sometimes I think rather than working with continuous distributions you might be better off to work with a mixture model of discrete distributions… ie. a “basket of events you consider worth considering”. The approach there has the problem that it won’t include lots of possibilities…. but it has also the advantage that your distribution won’t include a lot of **impossibilities**.

          You can of course fuzz up this basket a little… a gaussian mixture model or what have you. Sometimes having a restricted model which doesn’t include every possible thing that might happen might actually ultimately give you better probabilities because it’s also not erroneously putting probability on a lot of things you think **couldn’t** happen.

        • I think mixing in a fat tailed distribution works fine for decision making insofar as the only thing that’s relevant for decision making is the single variable of price. Not to re-open a beaten to death argument, but if your utility function only only depends on price and cost, then it doesn’t matter why prices rose > $1000/kwh, so long as your model has positive mass > $1000/kwh that’s decently calibrated it functionally “accounts for everything”.

          But if prices spiking to $1000 for a snowstorm means x is the best bid but prices spiking to the same $1000 for a coronal mass ejection means y is the best bid, then the approach meaningfully diverges from “accounting for everything”.

        • The price ceiling example is a good one. If prices go to $1000 because of a price ceiling, shortages are likely, so the optimal early purchase is higher than if prices go to $1000 because that’s just where they stopped. In this case, even a perfect price oracle that predicts the exact price with 0% error could be improved on by accounting for more stuff. But you can only do the best you can do; erring on the safe side by mixing in a heavy tailed distribution that’s biased higher where utility has a steeper derivative is I think a good application of the precautionary principle.

        • I’ll stick to the price ceiling example. Yes, prices are regulated – but as you (Phil) suggests, under extreme circumstances legislatures might change the rules. Or perhaps a case will be brought to the Supreme Court that will cause regulation of electricity prices to be declared an unconstitutional taking (please let’s not debate whether that is possible or not – we can always find a different example). Similarly, the comet strike could wipe out all generating capacity except for x%, where x<0%. I’ll agree that your analysis should try to include all such events to the extent reasonable – at least that is the goal.

          However, from my experience, if you maintained that your mixed distribution approach has taken these things into account, then I think you are on very shaky ground. There is a good chance your analysis will be disregarded. I, for one, would not want to be on a witness stand testifying to such analysis. And, if I failed to account for such events (with the justification that there was no reasonable way to assign probabilities), I don’t believe it would hurt my credibility at all.

          Phil, you seem to be taking a theoretical point (which I largely agree with ) and forcing it into a practical application where it doesn’t fit. Drawing the line between what has and has not been included in your analysis is, to a great extent, what your expertise is about. You will always be open to criticism that you didn’t include A, B, or C, but the test of that will be whether there was enough information/methodology to reasonably include it. If you maintain that your probability distribution has included “all” possibilities, then good luck to you. I would never want to testify to that.

    • Worth pointing out that, under current market rules, the probability energy prices go to $6,000/MWh in Chicago is 0%, since the current tariff limits energy prices to $3,750. (https://pjm.com/-/media/committees-groups/committees/mic/2021/20210407/20210407-mic-info-only-lmp-during-reserve-shortage.ashx).

      In line with what others have pointed out above, I’m not sure that estimating the probability of extreme events (in this particular case) is as impossible as is suggested. (Not saying it’s easy, either, and getting an accurate distribution is likely very challenging.) But you can use generation mixes, projects currently in development on the interconnection queue, estimated changes in load, forward prices for coal and natural gas, etc. to get a decent idea of what set of contingencies would be needed for extreme events to occur.

      • Yes, even Texas has decreased the maximum…which makes no sense to me if they aren’t going to require hardening of the infrastructure.

        The point of Texas’s wild-west-style (very lightly regulated) market is to let the magic of markets do the work. If there’s a chance of an ice storm and most wind turbines are not ice resistant, someone will invest the money to make ice-resistant turbines in the hope of earning the money back (and more) when electricity goes to thousands of dollars per MWh. It turns out companies were not willing to invest the extra money because they didn’t figure the payoff would be high enough, so they had major blackouts and brownouts and it was pretty disastrous — including some people dying — with many unhappy customers (and voters). A source of unhappiness was that some people got enormous electricity bills because they continued to use electricity when the price went sky-high. A reasonable response would be to require new production to be more weatherproof. Or they could increase the upper limit of the charges and thereby double down on the theory that if companies stand to make enough money by being last man standing they will invest the money to harden their infrastructure. But to decrease the max charge and not take other action…I don’t get it.

        That said, I’m not an expert on the Texas energy market and I may be missing something.

        • “But to decrease the max charge and not take other action…I don’t get it.”

          ‘ “This just sort of takes us back to where we were,” Jones said. “The ($9,000) price created a lot of sticker shock, and the proof is in the pudding — it didn’t work” ‘

          In effect the utility commission is telling the generators they got the deal they asked for and they didn’t invest in capacity. So the utility commission is tweaking the deal to incentives:

          “the utility commission…is considering a related measure to allow the lower ceiling of $5,000 per megawatt hour to kick in earlier than the $9,000 cap did. Commissioners have said their aim is to prompt generators to bring capacity online before power reserves on the grid fall to crisis levels, as well as to maintain enough financial inducements for generators to invest in infrastructure.”

  7. Hi Andrew, long time reader, first time commenter, just wanted to say how much I enjoy and look forward to your posts. Thanks again for sharing them with the community.

    K

  8. When I was an undergrad, there was one anthropologist in the department who was skeptical about many of the archeological claims, and every year in one of the basic archeology courses would give the “anti-lecture”. Basically he would take things from modern society like Playboy centerfolds or pictures on currencies, and interpret them like they had been archeological relics. Not that he was “right”, but it was illuminating to see how easy it was to draw what seemed like reasonable conclusions based on those “relics” that were in fact quite wrong

  9. Wait, is the skepticism about the stone axe about its date or something else? Different dating methods may be more or less subject to error and uncertainty (eg contamination), but the fundamental theory underlying them is often rather well understood, and tends to properly accommodate various sources of uncertainty, at least in the dating papers themselves. I think it’s actually the more “statistical” methods of dating that do a poor job there! (e.g. evaluating model / prior sensitivity in phylogenetic divergence time estimation). Fundamentally I don’t think we’re too outside the wheelhouse of e.g. predicting when we’ll next see a comet (not for another 50,000 years! etc.) , depending on the method involved.

    If it’s about the tool typology (calling it an axe), you should take care to not import any functional associations from colloquial uses of the similar term. You *can* do stuff like use-wear analysis to inform functional inference (with lots of uncertainty still!), but there shouldn’t be any assumption that 800ky old bifacial hand-axes were used for chopping trees or anything (last I heard, a more common idea was that they were little portable multitools — need a specific type of flake? pop whatever you want right off. But there was still lots of disagreement there!)

  10. Glenn Shafer’s Evidence Theory is the right mathematical framework for uncertain reasoning if one fears that something may happen, that one is not even able to figure out. The Possibility Set, which is called Frame of Discernment, is not an algebra. No complementation operation is available, hence no residual event can be defined. If one assigns a mass to the belief that some “unknown unknown” may appear, computations turn out differently from the standard Bayesian framework. Here is the basic book, as well as a couple of introductory papers:

    – Shafer, G. (1976) A Mathematical Theory of Evidence. Princeton University Press.
    – Fioretti, G. (2009) Evidence Theory as a Procedure for Handling Novel Events. Metroeconomica, 60 (2): 283-301.
    – Koks, D. and Challa, S (2005) An Introduction to Bayesian and Dempster-Shafer Fusion. WP DSTO-TR-1436. Available at http://robotics.caltech.edu/~jerma/research_papers/BayesChapmanKolmogorov.pdf

  11. “This artifact is a stone ax from a hominid from c. 800,000 BP” reminds me how a judge assign likelihood on suspects. Presumably a judge uses probabilities in their minds as well, and it can be quite cumbersome to think about how big the probability threshold is for a judge to make the verdict, which is always a binary action.

  12. > How can we say that there’s an essentially zero probability that Moses
    > crossed the Red Sea or that that Mormon dude found gold tablets
    > underground or that God said to Abraham, “Kill me a son” or whatever? For
    > politeness we’re supposed to either give these sorts of things meaningful
    > probabilities or just somehow avoid talking about them.

    A myth is still a myth, regardless of whether some people don’t know that it is a myth. Whether you should point out to someone that they are in error depends on the context. In a serious book or lecture, you need to stick to the facts. In a social event, you can avoid topics where the other people are confused.

  13. I thought the Meir discussion was fine until he switched from talking about likelihoods to talking about probabilities. His broad point is that it would be useful to retain in the popular discussion the possible alternatives that are almost as likely as the currently favored one. It seems sensible to state that experts currently think it is half as likely that the bone is a hippopotamus bone than a dinosaur bone. That doesn’t preclude thinking that future discoveries or theories might bring up a new alternative that swamps either of the two currently favored ones.

    • It seems sensible to state that experts currently think it is half as likely that the bone is a hippopotamus bone than a dinosaur bone. That doesn’t preclude thinking that future discoveries or theories might bring up a new alternative that swamps either of the two currently favored ones.

      There is always the possibility its an alien bone.

Leave a Reply

Your email address will not be published. Required fields are marked *