‘No regulatory body uses Bayesian statistics to make decisions’

This post is by Lizzie. I also took the kitten photo — there’s a white paw taking up much of the foreground and a little gray tail in the background. As this post is about uncertainty, I thought maybe it worked.

I was back east for work in June, drifting from Boston to Hanover, New Hampshire and seeing a couple colleagues along the way. These meetings were always outside, often in the early evenings, and so they sit in my mind with the lovely luster of nice spring weather in the northeast, with the sun glinting in at just the right angle.

One meeting was sitting on a little sloping patch of grass in a backyard in Arlington, where I was chatting with a former postdoc, who now works for a consulting company tightly intertwined with US government. When he was in my lab he and I learned Bayesian statistics (and Stan), and I asked him how much he was using Bayesian approaches. He smiled slyly at me and told me a story about a recent meeting he was at where one of the senior people said:

“No regulatory body uses Bayesian statistics to make decisions.”

He quickly added that he’s not at all sure this is true, but that it encapsulates a perspective that is not uncommon in his world.

The next meeting was next to the Connecticut river and with a senior ecologist, who works on issues with some real policy implications: how to manage beetle populations as they take off for the north with warming (hello, or should I say goodbye, New Jersey pine barrens), the thawing Arctic, and more. I was asking him if he thought this statement was true, which he didn’t answer, but set off on a different declaratory statement:

“The problem with Bayesian statistics is their emphasis on uncertainty.”

Ah. Uncertainty. Do you think uncertainty is the most commonly used word in the title of blog posts here? (Some recent posts here, here and here.)

In response to my colleague I may have blurted out something like ‘but I love uncertainty!’ or ‘that is a great thing about Bayesian!’ and so the conversation veered deeply into a ditch, from which I am not sure that it ever recovered. I said something along the lines of, isn’t it better to have all that uncertainty out in the middle of the room? Rather than trying to fit in under the cushions of the sofa as I feel so many ecologists do when they do their models in sequential steps, dropping off uncertainty along the way (often using p-values of delta AIC values of 2 or…) to drive ahead to their imaginary land of near-certainty? (I know at some point I also poorly steered it towards my thoughts on whether climate change scientists have done themselves a service or disservice in shying away from communicating uncertainty; I regret that.)

We left mired in the muck that so many of the ecologists around me feel about Bayesian — too much emphasis on uncertainty, too little concrete information that could lead to decision making.

So I pose this back to you all: what should I have said in response to either of these remarks? I am looking for excellent information, and persuasive viewpoints.

I’ll open the floor with what I thought a good reply from Michael Betancourt for the first quote: fisheries, and that Bayesian gives better options to steer policy. For example, if you want maximum sustainable yield without crashing a fish stock, you can more easily suggest a quantile of catch that puts you a little more firmly in ‘non-crashing’ outcome.

56 thoughts on “‘No regulatory body uses Bayesian statistics to make decisions’

  1. We left mired in the muck that so many of the ecologists around me feel about Bayesian — too much emphasis on uncertainty, too little concrete information that could lead to decision making.

    This just seems like ill-informed nonsense. (data, prior) -> full bayes => posterior over states of world -> expected utility function => decision.

    The problem I’ve seen is that defining a utility function often involves some kind of intractable political debate between stakeholders on what the actual priorities are, while point estimation procedures allow me to pre-define a false dichotomy, which is essentially equivalent to me smuggling in my own decisions before the analysis without anyone noticing.

  2. I suppose I would have communicated that inference and decision analysis are separate steps, and that an method which claims to give you an actionable conclusion in one step (p<0.05, therefore we should do this) is conflating the two.

    I'd rather infer the posterior distribution as best I can, and then send it (or an adequate summary thereof) to a distinct decision analysis which plugs in costs and benefits over that distribution to get something a decision-maker can use.

    Though this seems like something any author at this blog might say, so maybe you already said it and it led into a ditch.

  3. Lizzie:

    Your discussion of decision making reminds me of this post from last year, where I wrote:

    John Cook writes, “statistics is all about reasoning under uncertainty.”

    I agree, and I think this is a good way to put it.

    Statistics textbooks sometimes describe statistics as “decision making under uncertainty,” but that always bothered me, because there’s very little about decision making in statistics textbooks. “Reasoning” captures it much more than “decision making.”

    • I meant to link that post also (there were so many to choose from) and I like it. But do you think you can take ‘decision making’ away from folks who think that’s what their statistics are for (and replace it with reasoning)?

      • Are the people who think their statistics are for decision making working as “statisticians”, and thus reasoning under uncertainty about what they can infer from the data; or are they applied researchers trying to make decisions based on the interpretation of the statistics they got from the statistician?

        I mean they are probably both but I think those are two different hats to wear. When you are a statistician you maybe should be reasoning under uncertainty and quantifying uncertainty. When you are an applied researcher or policy maker, you should be taking the statistical uncertainty and turning it into decision making in the presence of uncertainty.

        I think that gets at something…

  4. 1) If you’re not making decisions, there’s no need for Bayes. If you are, you’re Bayesian whether you like it or not – there is no decision theory that isn’t Bayesian.

    2) Informing decisionmakers with data is great, misinforming them is not. If you’re hiding actual uncertainties, you’re not doing your job.

    • I haven’t tried (1)! I’ll try it out as I like it.

      For (2), I have struggled to get my colleagues to see their approaches as hiding uncertainties … I am not sure if it’s willful ignorance or they really think it’s fine.

    • I think these points can be made even more compelling by using specific examples where frequentists use prior information. The most obvious examples relate to the standard lit review. Quantitatively, we use it for power estimation and research design. Qualitatively, we only propose a study when prior evidence suggests your experimental hypothesis. Wouldn’t you rather make this determination at least partially quantitatively? Aren’t you adding (invisible) uncertainty by not doing so? Might you be able to make more effective arguments in funding proposals by putting actual numbers on the current state of knowledge and the potential impact of your study on that knowledge (in terms of distributions)? Of course, it’s taken forever to get people to do, and learn how to do, power analyses, so this might be seen as a negative!

      • Michael:

        When the group I worked with doing cost-effectiveness analysis of funding randomized clinical trial in 1986, we used empirically informed priors for treatment effects and rates for adoption of treatment in practice given study results.

        I don’t remember any push back or criticism of quantifying prior information for that. Unfortunately the only thing the group could get published from it was a short note giving an overview of the primary investigator wrote.

        We did write an interactive software program to undertake a meta-analysis to get the prior for treatment effects and do the cost-effectiveness analysis that the NIH made publicly available, though don’t no if anyone used it.

        Though we do need to keep in mind, not all Bayesian analysis are well done and appropriate (and my prior is that is much lower than many hope.)

    • David:

      I don’t understand why you say, “If you’re not making decisions, there’s no need for Bayes.” I use Bayesian analysis all the time to study public opinion, and it’s not for making decisions. I know that some political professionals use insights from polling to make decisions, and that’s fine, but when I’m doing it, it’s to understand opinion, not to make actions about it.

  5. > “No regulatory body uses Bayesian statistics to make decisions.”

    Was not the primary analysis of one (Pfizer) of the two biggest and impactful clinical trials of the past decade Bayesian?

  6. Lizzie said,
    “,,, said “The problem with Bayesian statistics is their emphasis on uncertainty.”

    Ah. Uncertainty. Do you think uncertainty is the most commonly used word in the title of blog posts here? (Some recent posts here, here and here.)

    In response to my colleague I may have blurted out something like ‘but I love uncertainty!’ or ‘that is a great thing about Bayesian!’ and so the conversation veered deeply into a ditch, from which I am not sure that it ever recovered. I said something along the lines of, isn’t it better to have all that uncertainty out in the middle of the room? Rather than trying to fit in under the cushions of the sofa as I feel so many ecologists do when they do their models in sequential steps, dropping off uncertainty along the way (often using p-values of delta AIC values of 2 or…) to drive ahead to their imaginary land of near-certainty? (I know at some point I also poorly steered it towards my thoughts on whether climate change scientists have done themselves a service or disservice in shying away from communicating uncertainty; I regret that.)

    We left mired in the muck that so many of the ecologists around me feel about Bayesian — too much emphasis on uncertainty, too little concrete information that could lead to decision making.

    So I pose this back to you all: what should I have said in response to either of these remarks? I am looking for excellent information, and persuasive viewpoints.”

    I can’t really answer your question, but will add a gripe of my own: I’ve been trying a new primary care physician, and it really bugs me when she asks, “What exactly …?” questions. Or when we spend most of the appointment discussing one thing, and then just before I leave, she says she wants me to start taking a particular drug not related to what we have been discussing , but doesn’t say why. So it’s not just ecologists who are uncertainty-avoidant. It just seems like she’s off in a fairy tale land (and maybe she thinks the same of me??)

      • David said,
        “I think you need to find a new primary care doctor.”

        As I mentioned, the one I was talking about IS my new primary care doctor. She’s not as bad as the old one, but still not very good. (Regrettably, many doctors don’t take on new patients over age 65, so it’s especially hard to find a good one if you’re old.)

  7. Just to let you know where I stand, the job talk I gave for my current job at Flatiron Institute was titled Taking Uncertianty Seriously. In my experience, ecologists are the most Bayes-happy group in academia. Maybe that’s because I hang out with too many ecologists like Lizzie, but I’d say even the international conferences in ecology are very Bayes heavy.

    P.S. Here’s the U.S. Food and Drug Administration’s guidance on using Bayesian stats in medical device trials.

    • Bob said, “In my experience, ecologists are the most Bayes-happy group in academia.”

      In my experience, a lot of phylogenetic folks are also Bayes-happy.

      • Is it though? I don’t have any particular insight but those guidelines are not even from the same division. Center for Devices and Radiological Health / Center for Biologics Evaluation and Research.

    • Yes, I would say perhaps ecologists that want to drive policy are less bayesian (I don’t particularly know because that has never been the focus of people I’ve worked with). But my department started dipping its toes into Bayesian models 15 years ago. And now every lab has at least one person who is using it for their own data or to explore some data sets that have been collected annually for a couple of decades. And yes, also using it for phylogenetic data.

      I think part of the attractiveness of bayesian methods in ecology is that we study systems that are infinitely complex (in theory) and it is difficult to know exactly what we don’t know. We often have a good handle about general relationships and major factors that influence a system of study, even when we don’t know exactly HOW those work. So modeling those relationships can help narrow down where to look for the parts we haven’t accounted for… and in the predictive sense *really* useful at helping to figure out how inadequate your theoretical model is.

  8. Could this be treated as an empirical question? Critics seem to believe that decisions are somehow better when based on frequentist methods. Disproving that claim is a pretty tall order, but you don’t have to start out by comparing real-life cases, which are likely difficult to find in an apples-to-apples way. A simulation-based study might be better: design a decision tree that mimics policymakers’ decision making (they surely exist in the literature), then simulate results of both types across many circumstances and see when and how decisions are different. Yet another design might take actual policymakers and randomly assign them to read versions of a white paper that give the same results in frequentist or Bayesian terms.

    In the meantime, you could just point out to critics that their argument is unsupported by empirical evidence, even the frequentist kind!

    • [1] A decision is taken after consideration of explicit prior experience which is manifestly relevant to the question at hand.
      [2] A decision is taken after consideration of implicit prior experience, the experience itself and its relevance not easily made manifest, but which, in circumstances where one is being forced to decide, seems to be the best evidence which can be mustered — and admittedly it may not be very good.
      [3] The decision is put off until better evidence can be assembled.
      [4] A decision is made by some device: tossing a coin, a die; the barrel of a revolver is spun; the pattern of the seeds scattered on a board; the serial order in the alphabet of the surname of the captain of a ferry crossing somewhere in the Greek islands; and so on.

      In [1] and [2] one supposes that there is some rational basis for choice. In [3] one declines to act unless at some future point [1] or [2] obtain. In [4] one is under some compulsion to act; but does not pretend he has any grounds at all to prefer one path or another.

      In [1],[2] the role of a “reference class” in some guise is acknowledged. In [3] it is acknowledged we need one, but do not have it; and we will wait. In [4] we acknowledge we haven’t got it, and cannot or will not wait.

      If I need to know how to plan for Martian Christmas as it is celebrated beneath the surface of Europa, at the summer solstice of Jupiter, and have to make arrangements in a great hurry, I’m sorely lacking in prior experience and do not know what gifts will be appreciated, and which will be insulting to my hosts. If I am stubborn enough, perhaps I’ll decline the invitation outright [3]. Perhaps I’ll say, well I cannot go wrong if I bring them a Haggis from Kincardine [2]. Or I’ll just consult a fortune-teller, who doesn’t claim to know any Martians on Europa, but perhaps knows more about me than I know myself [4].

    • Michael Nelson said, “Critics seem to believe that decisions are somehow better when based on frequentist methods. ”

      A possible reason for this: Frequentist methodology has a decision-making format (“reject or don’t reject”), whereas using Bayesian methods for decision-making requires you to put a lot more thought into your methodology – which is exactly why I think Bayesian methods ought to be used — less “off the shelf”, more “really think about the problem/situation”.

      • Bayesian paradigm asks me to entertain assigning numerical levels of credence to any and all propositions; from the mundane to the fantastical; and in such a way that these assignments are arithmetical consistent with one another. I think this expectation borders on the fantastical.

  9. People don’t like uncertainty. I think the Ellsberg paradox illustrates this really well. Give people a choice between two urns, urn A which contains 50 red balls and 50 black balls, and urn B which contains 100 balls with an unknown red/black ratio, with the instructions that if they draw a red ball they get $10 and if they draw black ball they lose $10. The vast majority of people will choose to draw from urn A, even though the expected outcome is the same.

    Statistics (done properly) goes against the basic human aversion to uncertainty, & you can see the results of that in scientific journals where dynamite plots and p-value asterisks reign supreme, whereas good uncertainty visualizations are much less popular. We like to be fooled by noise & abstraction, even at the cost of being kicked in the head by reality every now and then. I don’t think that’s going to change anytime soon, & so statisticians have a long & hard road ahead of them.

    • >even though the expected outcome is the same.

      I don’t think this is true. You’re assuming people’s prior makes 50% the expected value for the ratio. But certainly people have experience that if you have control over the contents of the bag, and you’re looking to make money, you’d make the second bag all black balls.

      • Good criticism, however that’s something you could easily control for (not sure whether the original authors did), e.g. randomize the contents of the ambiguous urn. You don’t think that people fundamentally dislike uncertainty?

        • I think people do dislike uncertainty — indeed I know they do — but I agree with Daniel that that particular experiment doesn’t show that, especially when taken on its own.

          Let me put it this way: if I go to a carnival and there’s a game where I can pay $10 to draw a ball from an urn, and if the ball is red I get my $10 back plus another $10, but I am given no information about how many black balls and red balls are in the urn, there’s no way in hell I’d play it, and if you say you would play it I will mock you. Dude, it’s a carnival game, you think they’re going to give you an even break?

  10. Well, in policy decisions, what is the purpose of doing stats?

    1) The correct decision is unknown and the stats are to help make a more informed decision. In this case, uncertainty helps quantify the risk associated with a decision. This feels worthwhile.

    2) Stats are used to prop up a preferred decision, or break some deadlock. In this case, I guess it would be useful to sweep uncertainty under the rug. I wonder if there’s some easier alternative to doing stats in that case, though.

  11. This topic is kind of near to my heart. Some common reasons that have been provided for why uncertainty should be emphasized by default include that without considering it, diversifying one’s strategy is unlikely, as is seeking out more information. The question of why should we communicate uncertainty as a default in settings like policy making is the topic of this Manski paper: https://www.nber.org/papers/w24905 He goes through various rationales that are often given for downplaying uncertainty in favor of point estimates, including the bounded-rationality argument asserting that downplaying uncertainty is useful because it simplifies decision making.

    I wrote a related paper a few years ago about the absennce of uncertainty expressions from many visualizations of data in the public sphere (like government data, data in the media, etc): http://users.eecs.northwestern.edu/~jhullman/Value_of_Uncertainty_Vis_CR.pdf.
    My impression from talking to people like graphic editors at well known media publications is that the system of beliefs that leads to thinking emphasizing uncertainty is a bad idea can be pervasive, even if many people who communicate data for policy can appreciate on a theoretical level why uncertainty is important for decision making. The argument in my paper is that if one’s goal is to visualize data to make some recommendation relevant to policy, presenting uncertainty should be the default strategy because a visualization is more persuasive the more it makes the implicit reference distribution that drives the “message” obvious to the viewer (borrowing Andrew’s theory of EDA from this paper: http://www.stat.columbia.edu/~gelman/research/published/p755.pdf). I.e., you need uncertainty to imply the reference distribution that the data deviates from, which defines what is interesting/worth considering about the signal in the first place. But this argument was more theoretical than practical, as it assumes that people know what to do with uncertainty.

    • What I find strange (ironic?) is that without uncertainty, there are really no decisions to be made. More precisely, I would argue that without uncertainty, machines can be programmed to make all decisions, and are likely to make them better (less bias and more consistency) than humans. So, the same decision makers that want to make decisions fail to realize that it is uncertainty that gives them that role. Perhaps I’ve overstated this, but I don’t think by much.

      • Well, the role of the utility function is being ignored here. Without uncertainty, and if you let the decision maker be king, there’s nothing to do, since the decision maker just chooses “whatever they like best” from the certain outcomes. I think this is actually the typically point of people becoming decision makers. They want to be king and lord it over others. It’s a rare person in politics who is otherwise. at least at high levels. At the local school board or whatnot maybe not so much.

  12. Dear Lizzie
    the Internal Whaling Commission uses a Catch Limit Algorithm, devised in the 90s, that uses the Bayesian framework to decide on the allowable catch of whales (Cooke 1999: https://academic.oup.com/icesjms/article/56/6/797/658119, ungated). It’s part of a management procedure called the Revised Management Procedure (RMP : https://iwc.int/rmp). The catch is that the RMP has yet to be used in practice. Because of the political gridlock between whaling and anti-whaling constituencies within the IWC, it has not been used yet. NAMMCO uses a CLA for setting its whale quotas in the North Atlantic.

  13. In some ways, this issue has nothing to do with Bayes. I’ve written 2 books on simulation analysis in spreadsheets – the same resistance happens. Decision makers prefer a model that provides a single answer rather than a distribution of possible answers. They sort of know the single answer is wrong, but they will try to influence the analysis until it provides a single answer they like. They never like the distributional approach. The answer seems clear to me – most decision makers do not want the responsibility and accountability that comes with having to make a choice. But they do like the authority to make that choice.

    Of course, this is an over-generalization, and some decision makers are better than that. But not most, in my experience.

    • > most decision makers do not want the responsibility and accountability that comes with having to make a choice. But they do like the authority to make that choice.

      As Martha would say, +1.

      I’ll add that I think this statement applies not just to decision makers, but to most people who use/apply statistics. I think this fear underlies much of the resistance not just to Bayesian methods, but to any attempt to do explicit model building rather than use off-the-shelf or “default” methods. Building models, choosing priors, these all involve a series of choices that must be defended where there is often not a single “correct” answer. People worry about making the “wrong” choice, and one way to avoid doing so is to act like you never made any choices at all.

      But as someone aptly quoted on another thread, “if you choose not to decide, you still have made a choice.”

    • From my experience 90% plus prefer to have an authority to make that choice.

      By the way looked at your books table of contents. Curious how you explain Generating Random Numbers (I use digits of Pi to get univariate and multivariate uniform(0,1) and then rejection sampling to get any desired distribution. Inefficient but easy to grasp.)

      • We really didn’t explain in any detail how random (pseudo) numbers are generated. There was an old book by Evans and Olson that did a very nice job of showing how to go from the random uniform (cumulative) to the density function, using the triangular distribution as a concrete example. I will always remember their example, as it made it conceptually concrete.

    • Are you the same Dale Lehman who wrote the “Howard County” mysteries? That’s right next to where I (and Andrew) grew up! Gonna have to put those on the reading list as well as your simulation books…

        • Ah, well, just a funny coincidence I guess.

          The first in the series (“The Fibonacci Murders”) is math-related, which is why I thought there might be a connection. Plus I’ve always had a dream of a side-gig writing detective novels—I guess there was some vicarious hope there.

        • Just ordered up Fibonacci murders…
          Now to figure out why Keith O’Rourke use digits of pi to generate pseudorandom numbers. Seems like this was dealt with somewhere in Knuth, if I can only find the volume. I know it’s just an efficiency of generation issue…
          Then I have to look up Dale Lehman’s methods mentioned by Keith. I’m sure they are right and tight, but will be fun to revisit.

    • Yeah, I was thinking when reading this post and the discussion that a frequentist can just as well despair at the unwillingness of decision makers to face uncertainty and its communication. (Contrary to a somewhat popular opinion here, frequentist statistics is not about hiding uncertainty either.)

  14. Although I doubt Bayesian decision theory is widespread in regulatory bodies, we did put this (not so well-known among Bayesians) quote from Alan Greenspan, who was chair of the Federal Reserve Board of Governors at the time, into a grant application:

    > As a consequence, the conduct of monetary policy in the United States has come to involve, at its core, crucial elements of risk management. This conceptual framework emphasizes understanding as much as possible the many sources of risk and uncertainty that policymakers face, quantifying those risks when possible, and assessing the costs associated with each of the risks. In essence, the risk-management approach to monetary policymaking is an application of Bayesian decision-making. (p.37)

    https://www.jstor.org/stable/3592853?refreqid=excelsior%3A36869ce5178ed988ea4feaba5f22b563&seq=1#metadata_info_tab_contents

  15. Just curious: do people have an easier time of accepting uncertainty in queueing situations?

    If you have a fixed arrival rate of 1/minute and a fixed service time of 0.999999 minutes, all is well, right? If those are the means of the underlying Poisson and exponential processes, you have a real problem and an awful queue.

    My sense is that people are somewhat more accepting there. Is it because most of us have stood in line at the grocery store? Do we need more real-world examples to point at or more simple simulations (something as simple as the Deming red-bead experiment?) to demo?

    • If this is an example of people accepting uncertainty, then I don’t think they would get so upset over lines, waiting on consumer help lines, traffic holdups, etc. While their time value would still upset them, they would more likely see it as a natural result of the random process – instead, I believe most people feel unfairly vicitimized in such circumstances – along the lines of Taleb’s “Fooled by Randomness.”

  16. I’m a bit baffled by this apparently common notion that frequentism produces more actionable results / inference than Bayes. (At least among anyone with a credible 101-level understanding of both)

    Especially given the tortuous near-nonsense of valid statements of frequentist inference & NHST (“fail to reject the Null … an effect as least as extreme as the one observed arising by chance alone over many trials…” etc).

    I guess:

    1. Credible 101-level understandings of both (or either) Freq or Bayes are indeed rare
    2. The ossified arbitrary conventions around Frequentist NHST thresholds (alpha level = 0.05, power = 0.8 etc) appear to folks as simple consensus decision-rules (p-value < 0.05 = DO IT)

    But doesn't seem to take a leap of overpowered empirical ingenuity to come up with similarly simple & arbitrary decision rules in Bayes (a la Kruschke's package https://jkkweb.sitehost.iu.edu/BEST/ etc) (just don’t tell Andrew)

    I guess the difference in Bayes is the relative lack of ossified conventions & veneer of ‘objectivity’ / plausible deniability they provide.

    Maybe the best response is to ask folks to articulate what makes a frequentist finding (& associated arbitrary conventions) so ‘certain’, which might set them down the path of self-discovery…

Leave a Reply

Your email address will not be published. Required fields are marked *