Glenn Shafer: “The Language of Betting as a Strategy for Statistical and Scientific Communication”

Glenn Shafer writes:

I have joined the immense crowd writing about p-values. My proposal is to replace them with betting outcomes: the factor by which a bet against the hypothesis multiplies the money it risks. This addresses the desideratum you and Carlin identify: embrace all the uncertainty. No one will forget that the outcome of a bet is uncertain. See Working Paper 54 here.

And here’s the two-minute version, on a poster:

I sent this to Anna Dreber who suggested using prediction market prices as priors, to which Shafer replied:

See the paragraph entitled “Bayesian interpretation?” on page 7 of the paper I called to your attention, which I attach this time.

My proposal is to report the outcome of a bet instead of a p-value or odds for a proposed bet. As I say on my poster, a 5% significance test is like an all-or-nothing bet: you multiply your money by 0 or 20. People want to report a p-value as the outcome instead of “reject at 5%” or “do not reject at 5%” because they want a more graded report. We can get this with bets that have many possible payoffs.

I think this whole 95% confidence attitude is a bad idea and I think that Shafer’s in a dead end here; I don’t see these methods being useful now or in the future. But, hey, I could be wrong—it’s happened lots of times before!—so I’m sharing it all with you here. I think Dan Kahan might like this stuff.

32 thoughts on “Glenn Shafer: “The Language of Betting as a Strategy for Statistical and Scientific Communication”

  1. The essential problem with this is the lack of a common currency. Individuals have different loss/gain sensitivity because it enters into their entire context of risk (i.e. other bets in their lives, values, serotonin/dopamine levels, etc). How likely am I to be wrong – and how much I care. With p values, like them or not, we are trying to separate those questions.

    • As explained in the paper, the currency does not matter, because the betting score is the amount by which the money risked is multiplied. The idea is that you can risk so little (a penny or a tiny fraction of a penny, if you like) that you do not care about losing it and do not care about any amount you could possibly win. So loss/gain sensitivity does not come into the picture. The bet is just to make a point. If you multiply the money you risk by a large factor, you have discredited the probabilities.

  2. I can’t say I read the Working Paper properly, but it seems to me that this (as much discussion of tests and p-values and also some of Bayesian methodology) is firmly rooted in the “true model” paradigm, i.e. the idea that in fact there is a true parameter which to say implicitly assumes that the underlying model is true, or at least a more general model from which a “true parameter” can somehow be “projected”. Only in that case it makes sense to talk about betting on one parameter against others, P against Q etc. But in fact the models are thought constructs and there is no such thing as an underlying true parameter (and even if there were, one couldn’t observe it), so these bets can never actually be evaluated and winnings paid out. If my (very superficial) reading is correct, I don’t like the idea for this reason.

    Good old de Finetti, very aware of this issue, wouldn’t in his Bayesian account have bets about “parameters” or “true models” but only about future observations.

    • In my working paper, the statistician testing the hypothesis P is not betting on P. She is betting on the observation y for which P gives probabilities. The betting score is a function of y, just like a p-value is. So my proposal is completely in line with de Finetti in this respect, and the criticism is a complete misunderstanding.

  3. A number of people on this blog have suggested betting interpretations before – and I have always found them unsatisfying. As I do for this one. I don’t think it makes things clearer for most people. And, if the purported benefit is that everyone knows a bet is uncertain… well, that is a small benefit for a cumbersome interpretation. In fact, I’m not sure there is any benefit at all. While everyone knows a bet is uncertain, people tend to focus on the outcome – win or lose – which is precisely the problem with p-values.

    • Dale:

      I agree. As I wrote above, “I think that Shafer’s in a dead end here.” But as I also wrote, “I could be wrong—it’s happened lots of times before!—so I’m sharing it all with you here.”

    • There are people who think in terms of betting, and those who don’t. Shafer’s proposal presumably appeals to those who do, but not to those who don’t. I’m in the latter category.

      • “It’s some guy thing mostly” – so is there a huge proportion of the population that the gambling industry is missing? Surely there must be a way to get women to throw away their cash.

        • A substantial number of my friends who follow Twitter are perplexed that more female researchers take a debate backseat to the males: males can get into knockdown arguments with each other. Are women less intellectual risk-takers? I haven’t paid sufficient attention to the observation.

          I don’t know what proportion of women gamble. Would bore me no end. Women like to shop though.

        • I think women tend to be better at conflict management and generally don’t have their self esteem tied up in winning arguments. I think it’s mostly “why bother with this” than “less intellectual risk-taking”

        • Daniel,

          Thanks for responding. As I wrote earlier, my friends have claimed that women appear less apt to land a decisive argument when one could be made. On Linked.com I was more likely to debate with 6 or 7 guys descending on a forum where I was posting. Twitter a bit less conducive to it. I am more willing to wait out an argument until I sense a timely opportunity. I have no time to troll experts who I disagree with.

          A few very confident experts welcome debate. Some tho want to debate on their terms.

          I want to amend that comment about ‘less intellectual risk-taking’. Today, men and women are more guarded because careers can be jeopardized.

          Hobbyists may take more intellectual risks.

          Susan Haack is a fine debater. I would like to see here on Andrew’s blog.

        • More data. This is from the Mayo clinic:

          Compulsive gambling is more common in men than women. Women who gamble typically start later in life and may become addicted more quickly. But gambling patterns among men and women have become increasingly similar.

  4. Example 1 from that paper reads:

    “suppose P says that Y is normal with mean 0 and standard deviation 10, Q says that Y is normal with mean 1 and standard deviation 10, and we observe y = 30. Consider how this information is handled by three different statisticians:

    Statistician A simply calculates a p-value using Y as the test statistic: p(30) = P (Y ≥ 30) ≈ 0.00135. She concludes that P is strongly discredited.

    Statistician B uses the Neyman-Pearson test that rejects when y > 29.68. This test has significance level α = 0.0015, but its power under Q is only about 6%.

    […] the low p-value and the Neyman-Pearson rejection of P give a misleading verdict in favor of Q.“

    Statistician A and statistician B wouldn’t say that the low p-value / rejection are evidence in favour of Q.

    And it is important to note that statistician B fixed the threshold for the test, assuming he wanted alpha=0.0015, before observing the data. The example seem to suggest that the threshold was chosen to be just below the already observed data, but statistician B would never do that.

    • I agree with this criticism. In the latest revision of the paper, posted on October 16, I have revised the example so that Statistician B uses a conventional level of significance. This does not change the point the example is making. I hope that I have made the point clearer in the revision.

  5. I don’t take too much away from things like ‘ever since Kolmogorov, probability theory has been measure theoretic’.

    The reason is, because when going from the mathematical to the real world, Kolmogorov himself noted in his ‘On Tables of Random Numbers’ the contribution of von Mises

    “…the basis for the applicability of the results of the mathematical theory of probability to real ‘random phenomena’ must depend on some form of the frequency concept of probability, the unavoidable nature of which has been established by von Mises in a spirited manner.”

    As well as in his ‘Foundations of the Theory of Probability’

    “In establishing the premises necessary for the applicability of the theory of probability to the world of actual events, the author has used, in large measure, the work of R. v. Mises”

    As far as betting, I wouldn’t personally bet on a single p-value, no matter how small (see Fisher about 80 years ago). I would consider betting based on several statistically significant p-values obtained, even if alpha=.05, because the chances of H0 not being ‘truth’ are significantly lessened with replication.

    Justin

    • As far as betting, I wouldn’t personally bet on a single p-value, no matter how small (see Fisher about 80 years ago). I would consider betting based on several statistically significant p-values obtained, even if alpha=.05, because the chances of H0 not being ‘truth’ are significantly lessened with replication.

      The first p-value gives you an idea of how much you need to spend to ensure the second one is likely to also be below your threshold. This depends on the order of magnitude of the “effect size” and the variability of the measurements.

      If betting on an attempt to exactly repeat the same procedure, then a good rule of thumb would be that using 1 million times the sample size of the first study would yield a “significant” result 100% of the time (about 50% of the time in the same direction though).

      • P-values need to be calculated from measurements? Shocked, shocked I tell you!

        I’d love to have n = 1,000,000. Then n/N would probably be super small and variances would be nice and ‘truth’ could be known. It would also mostly likely make the likelihood swamp all priors.

        Justin

    • The comment about Kolmogorov does not appear in the revised version of the paper. I am not sure why I removed it, but it is more about my 2016 book with Vovk than about the topic of statistical testing by betting. But I do agree with Justin Smith’s comments about Kolmogorov’s views. I have written about them at length in The sources of Kolmogorov’s Grundbegriffe (with Vladimir Vovk). Statistical Science Vol. 21, No. 1, pp. 70-98, 2006, and the related working paper http://www.probabilityandfinance.com/articles/04.pdf.

      As for “betting on a single p-value”, this is not what my paper suggests. It suggests using the result of a bet as a measure of the evidence provided by each replication.

  6. Andrew—can you explain how to make decisions with probability without it looking like betting? Is it just that tacky piles of cash are replaced with elegant utils when we move from bookmaking to economic theory?

    • Bob:

      See chapter 9 of BDA3. We have lots of examples of Bayesian decision analysis, and some of them even involve money. But none of them involve betting. There are lots of decision problems which are not wagers.

      • On p. 238, you define a utility function mapping outcomes to real numbers (3) then recommend making the decision that optimizes expected utility (4). Decisions feel a lot like bets to me and utility a lot like payoff. People even talk that way informally. For instance, a quick Googling found “three defenders shifted to the right side of the infield, betting the burly left-handed slugger would pull any pitch he hit” and “A pharmaceutical firm bets big on a cancer drug”.

        • Bob:

          If you want to label every probabilistic decision a “bet,” then, yes, it’s all bets. To me, the word “bet” suggests particular contexts which don’t fit the three examples of that chapter, which are medical diagnostics, survey incentives, and radon measurement and remediation.

        • The problem is that any *realistic* utility function is multidimensional. And therefore orderable in any useful way…

          …unless you are willing to postulate some form of “exchangeability” between utility dimensions. That is the crux of economic theory.

          Furthermore, in order to make that theory more useful, most people are willing to postulate linearity and some form of choice transitivity (i. e. consistency…). Pulling on that string gives you the whole hawg of economic theories, which can, of course, use probability theory as a tool for decision making.

          The mere fact that the “exchangeability” of utility dimensions is always highly questionable (lots of counter-examples…) and that the transitivity of choices is not universal (again, counter-examples abound) seems to concern almost nobody in the distinguished circle of economists…

          Therefore, what should be questioned is less the probability assignment method(s) than the use of the objects probabilities are assigned to. Correctly assigning probabilities to stupid descriptions of the world will lead to stupid decisions…

        • Emmanuel:

          I agree. Utility is just a mathematical model. As with other mathematical models, it can help organize decision making, but we should be aware of the model’s limitations.

        • Suppose you want to design a feedback engine controller. The controller should decide several thousand times a minute when to ignite each cylinder. You build a model of how much power will be sent to the wheels as a function of all the current control inputs and the timing. the controller can evaluate this function for 100 timings each cycle and choose the predicted optimum… but it can’t do it for a probabilistic ensemble it needs some fixed coefficients that describe the powerplant. so you put one on a test harness, collect a lot of data and come up with a posterior distribution for the coefficients. you must now choose *ONE* sample from your posterior

          for each posterior sample you calculate the actual performance you would get averaged over the entire posterior of possible “true” values. you then pick the sample that did best at supplying power efficiently.

          Can you shoehorn this into a betting story, sure, but it feels a lot less like roulette and a lot more like doing engineering including realistic uncertainty.

          as a linguist I guess you probably realize it’s just all in how you define the word “bet”… but the common definition has more to do with outcomes out of your control than engineering an outcome to do it’s best possible job given what you know

  7. This last string of comments is about making decisions. My new book with Vovk, Game-Theoretic Foundations for Probability and Finance (Wiley, 2019), has a chapter about decision-making. But my working paper is not about this topic.

Leave a Reply to Bob Carpenter Cancel reply

Your email address will not be published. Required fields are marked *