William Perkins, Mark Tygert, and Rachel Ward write:
If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . .
The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution—a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment.
Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise.
And this is not merely a theoretical problem. It comes up in real examples.
Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson:
And here are the expected frequencies from the Guo and Thompson model:
The p-value of the chi-squared test is 0.693. That is, nothing going on.
But it turns out that that if you do an equally-weighted mean square test (rather than chi-square, which weights each cell proportional to expected counts), you get a p-value of 0.039. (Perkins, Tygert, and Ward compute the p-value via simulation.) Rejection!
This is no trick. All those zeroes and near-zeroes in the data give you a chi-squared test that is so noisy as to be useless. If people really are going around saying their models fit in such situations, it could be causing real problems.
Here’s what’s going on. The following graph shows the discrepancies ((Observed – Expected)/sqrt(Expected)) which are squared and summed to form the chi-squared statistic:
Only one of the cells is really bad—that point on the lower-right—but it has a high expected value (it’s one of the largest cells in the table), and when you take the equally-weighted mean square (which is equivalent to weighting the contributions to the chi-square in proportion to the expected count), you get a big total value. In the chi-squared statistic, all that noise in the empty cells is diluting the signal.