Phil Nelson writes in the context of a biostatistics textbook he is writing, “Physical models of living systems”:
There are a number of classic statistical problems that arise every day in the lab, and which are discussed in any book:
1. In a control group, M untreated rats out of 20 got a form of cancer. In a test group, N treated rats out of 20 got that cancer. Is this a significant difference?
2. In a control group of 20 untreated rates, their body weights at 2 weeks were w_1,…, w_20. In a test group of 20 treated rats, their body weights at 2 weeks were w’_1,…, w’_20. Are the means significantly different?
3. In a group of 20 rats, each given dose d_i of a drug, their body weights at 2 weeks were w_i. Is there a significant correlation between d and w?
I would like to ask: What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science? (No doubt both approaches agree if the 20 rats are replaced by 20000.)
That is, there must be cautionary case studies in which the assumptions of classical statistics were proved not useful for some real experiment. Such case studies in my opinion are invaluable for focusing students’ attention, particularly if they have already been subjected to a cookbook statistics course.
I’ll always answer a question from a physicist! So here’s what I told him:
Yes, I have an example for you. It is a study with n=3000, looking at the attractiveness of parents and the sexes of their children.
The published analysis compared the proportion of girl births among the parents who were labeled “very attractive,” compared to the proportion of girl births of the other parents. The difference was 0.08 with a standard error of 0.03, thus statistically significant.
However, there is a lot of prior information on this topic. It would be inplausible for the true difference in the population to be as large as 0.01. A reasonable prior distribution might have a mean of 0 and a standard deviation of 0.003. Under such a prior, the Bayesian inference is that the population difference is very close to 0.
To be precise, the posterior mean is 0.0008 (that is, less than 1/10 of one percentage point, for example Pr(girl) changing from 0.488 to 0.489) with a posterior standard deviation of 0.003. Thus, in the Bayesian analysis, the result is not anything close to statistically significant.
In this case, the bad, non-Bayesian, answer impeded the science, at least in the sense that it resulted in a wrong result being published in a reputable journal (Journal of Theoretical Biology, impact factor 3) and also used as the basis of a pop-science book.
Further background is here.
This is the example that keeps on giving. A wonderful illustration of the principle that God is in every leaf of every tree.