The comment about Kolmogorov does not appear in the revised version of the paper. I am not sure why I removed it, but it is more about my 2016 book with Vovk than about the topic of statistical testing by betting. But I do agree with Justin Smith’s comments about Kolmogorov’s views. I have written about them at length in The sources of Kolmogorov’s Grundbegriffe (with Vladimir Vovk). Statistical Science Vol. 21, No. 1, pp. 70-98, 2006, and the related working paper http://www.probabilityandfinance.com/articles/04.pdf.

As for “betting on a single p-value”, this is not what my paper suggests. It suggests using the result of a bet as a measure of the evidence provided by each replication.

]]>I agree with this criticism. In the latest revision of the paper, posted on October 16, I have revised the example so that Statistician B uses a conventional level of significance. This does not change the point the example is making. I hope that I have made the point clearer in the revision.

]]>Am I suggesting a betting interpretation? I thought I was proposing that we test by betting.

]]>In my working paper, the statistician testing the hypothesis P is not betting on P. She is betting on the observation y for which P gives probabilities. The betting score is a function of y, just like a p-value is. So my proposal is completely in line with de Finetti in this respect, and the criticism is a complete misunderstanding.

]]>As explained in the paper, the currency does not matter, because the betting score is the amount by which the money risked is multiplied. The idea is that you can risk so little (a penny or a tiny fraction of a penny, if you like) that you do not care about losing it and do not care about any amount you could possibly win. So loss/gain sensitivity does not come into the picture. The bet is just to make a point. If you multiply the money you risk by a large factor, you have discredited the probabilities.

]]>Daniel,

Thanks for responding. As I wrote earlier, my friends have claimed that women appear less apt to land a decisive argument when one could be made. On Linked.com I was more likely to debate with 6 or 7 guys descending on a forum where I was posting. Twitter a bit less conducive to it. I am more willing to wait out an argument until I sense a timely opportunity. I have no time to troll experts who I disagree with.

A few very confident experts welcome debate. Some tho want to debate on their terms.

I want to amend that comment about ‘less intellectual risk-taking’. Today, men and women are more guarded because careers can be jeopardized.

Hobbyists may take more intellectual risks.

Susan Haack is a fine debater. I would like to see here on Andrew’s blog.

]]>More data. This is from the Mayo clinic:

]]>Compulsive gambling is more common in men than women. Women who gamble typically start later in life and may become addicted more quickly. But gambling patterns among men and women have become increasingly similar.

Casinos have plenty of elderly women playing low-stakes slot machines. Bingo also attracts a lot of women.

I don’t know what this means, I’m just reporting the results of my field studies.

Bingo hall pic (not cherry-picked by me): https://duckduckgo.com/?q=bingo+hall&iax=images&ia=images&iai=http%3A%2F%2Fkiwanishallbingo.com%2Fimages%2Fhudson-bingo-hall-1.jpg

]]>I think women tend to be better at conflict management and generally don’t have their self esteem tied up in winning arguments. I think it’s mostly “why bother with this” than “less intellectual risk-taking”

]]>A substantial number of my friends who follow Twitter are perplexed that more female researchers take a debate backseat to the males: males can get into knockdown arguments with each other. Are women less intellectual risk-takers? I haven’t paid sufficient attention to the observation.

I don’t know what proportion of women gamble. Would bore me no end. Women like to shop though.

]]>Thank you for so clearly pointing out the problems with utility functions.

]]>P-values need to be calculated from measurements? Shocked, shocked I tell you!

I’d love to have n = 1,000,000. Then n/N would probably be super small and variances would be nice and ‘truth’ could be known. It would also mostly likely make the likelihood swamp all priors.

Justin

]]>Emmanuel:

I agree. Utility is just a mathematical model. As with other mathematical models, it can help organize decision making, but we should be aware of the model’s limitations.

]]>The criminal typo strikes again : I first line, please read “… and therefore NOT orderable…”

]]>The problem is that any *realistic* utility function is multidimensional. And therefore orderable in any useful way…

…unless you are willing to postulate some form of “exchangeability” between utility dimensions. That is the crux of economic theory.

Furthermore, in order to make that theory more useful, most people are willing to postulate linearity and some form of choice transitivity (i. e. consistency…). Pulling on that string gives you the whole hawg of economic theories, which can, of course, use probability theory as a tool for decision making.

The mere fact that the “exchangeability” of utility dimensions is always highly questionable (lots of counter-examples…) and that the transitivity of choices is not universal (again, counter-examples abound) seems to concern almost nobody in the distinguished circle of economists…

Therefore, what should be questioned is less the probability assignment method(s) than the use of the objects probabilities are assigned to. Correctly assigning probabilities to stupid descriptions of the world will lead to stupid decisions…

]]>“It’s some guy thing mostly” – so is there a huge proportion of the population that the gambling industry is missing? Surely there must be a way to get women to throw away their cash.

]]>Suppose you want to design a feedback engine controller. The controller should decide several thousand times a minute when to ignite each cylinder. You build a model of how much power will be sent to the wheels as a function of all the current control inputs and the timing. the controller can evaluate this function for 100 timings each cycle and choose the predicted optimum… but it can’t do it for a probabilistic ensemble it needs some fixed coefficients that describe the powerplant. so you put one on a test harness, collect a lot of data and come up with a posterior distribution for the coefficients. you must now choose *ONE* sample from your posterior

for each posterior sample you calculate the actual performance you would get averaged over the entire posterior of possible “true” values. you then pick the sample that did best at supplying power efficiently.

Can you shoehorn this into a betting story, sure, but it feels a lot less like roulette and a lot more like doing engineering including realistic uncertainty.

as a linguist I guess you probably realize it’s just all in how you define the word “bet”… but the common definition has more to do with outcomes out of your control than engineering an outcome to do it’s best possible job given what you know

]]>Bob:

If you want to label every probabilistic decision a “bet,” then, yes, it’s all bets. To me, the word “bet” suggests particular contexts which don’t fit the three examples of that chapter, which are medical diagnostics, survey incentives, and radon measurement and remediation.

]]>On p. 238, you define a utility function mapping outcomes to real numbers (3) then recommend making the decision that optimizes expected utility (4). Decisions feel a lot like bets to me and utility a lot like payoff. People even talk that way informally. For instance, a quick Googling found “three defenders shifted to the right side of the infield, betting the burly left-handed slugger would pull any pitch he hit” and “A pharmaceutical firm bets big on a cancer drug”.

]]>I veer to Dale’s view. I admit tho, I scratch my head when there has been any reference to betting. It’s some guy thing mostly, it seems

]]>Bob:

See chapter 9 of BDA3. We have lots of examples of Bayesian decision analysis, and some of them even involve money. But none of them involve betting. There are lots of decision problems which are not wagers.

]]>As far as betting, I wouldn’t personally bet on a single p-value, no matter how small (see Fisher about 80 years ago). I would consider betting based on several statistically significant p-values obtained, even if alpha=.05, because the chances of H0 not being ‘truth’ are significantly lessened with replication.

The first p-value gives you an idea of how much you need to spend to ensure the second one is likely to also be below your threshold. This depends on the order of magnitude of the “effect size” and the variability of the measurements.

If betting on an attempt to exactly repeat the same procedure, then a good rule of thumb would be that using 1 million times the sample size of the first study would yield a “significant” result 100% of the time (about 50% of the time in the same direction though).

]]>The reason is, because when going from the mathematical to the real world, Kolmogorov himself noted in his ‘On Tables of Random Numbers’ the contribution of von Mises

“…the basis for the applicability of the results of the mathematical theory of probability to real ‘random phenomena’ must depend on some form of the frequency concept of probability, the unavoidable nature of which has been established by von Mises in a spirited manner.”

As well as in his ‘Foundations of the Theory of Probability’

“In establishing the premises necessary for the applicability of the theory of probability to the world of actual events, the author has used, in large measure, the work of R. v. Mises”

As far as betting, I wouldn’t personally bet on a single p-value, no matter how small (see Fisher about 80 years ago). I would consider betting based on several statistically significant p-values obtained, even if alpha=.05, because the chances of H0 not being ‘truth’ are significantly lessened with replication.

Justin

]]>“suppose P says that Y is normal with mean 0 and standard deviation 10, Q says that Y is normal with mean 1 and standard deviation 10, and we observe y = 30. Consider how this information is handled by three different statisticians:

Statistician A simply calculates a p-value using Y as the test statistic: p(30) = P (Y ≥ 30) ≈ 0.00135. She concludes that P is strongly discredited.

Statistician B uses the Neyman-Pearson test that rejects when y > 29.68. This test has significance level α = 0.0015, but its power under Q is only about 6%.

[…] the low p-value and the Neyman-Pearson rejection of P give a misleading verdict in favor of Q.“

Statistician A and statistician B wouldn’t say that the low p-value / rejection are evidence in favour of Q.

And it is important to note that statistician B fixed the threshold for the test, assuming he wanted alpha=0.0015, before observing the data. The example seem to suggest that the threshold was chosen to be just below the already observed data, but statistician B would never do that.

]]>There are people who think in terms of betting, and those who don’t. Shafer’s proposal presumably appeals to those who do, but not to those who don’t. I’m in the latter category.

]]>Dale:

I agree. As I wrote above, “I think that Shafer’s in a dead end here.” But as I also wrote, “I could be wrong—it’s happened lots of times before!—so I’m sharing it all with you here.”

]]>Good old de Finetti, very aware of this issue, wouldn’t in his Bayesian account have bets about “parameters” or “true models” but only about future observations.

]]>