Ron Bloom wrote in with a question:

The following pseudo-conundrum is “classical” and “frequentist” — no priors involved; only two PDFS (completely specified) and a “likelihood” inference. The conundrum however may be interesting to you in its simple scope; and perhaps you can see the resolution. I cannot; and it is causing me to experience something along the lines of what Kendall says somewhere (about something else entirely) about “… the problem has that aspect of certain optical illusions; giving different appearances depending upon how one looks at it…”

Suppose I have p(x|mu0) and p(x|mu1) both weighted Gaussian sums with stipulated standard deviations and stipulated weights; for definiteness say both are three term sums; moreover all three constituent Gaussians have the common mean named in the expression p(x|mu). so they look like “heavy tailed” Gaussians at least from a distance.

Suppose mu0 < mu1 are both stipulated too; in fact everything is stipulated; so this is *not* an estimation problem; nothing to do with "EM" or maximum likelihood. Just classical test between two simple alternatives. A single datum is acquired: x. The classical procedure for deciding between "H0" and "H1" is to choose the test "size". Put down the threshold cut T on the right tail of p(x|mu0) so the area above that cut is the test size; the power of that test against the stipulated alternative H1 is of course the area above T under p(x|mu1). When the PDFs are Gaussian or in an exponential family or when "a sufficient statistic is available" this procedure above is identical to what one does if he uses the Neyman-Pearson likelihood criterion: which amounts to putting a cut with the same "size" on the more complicated random variable L(x) = p(x|mu0)/p(x|mu1). When the PDFS are nicely behaved or more generally *monotonic* the probability statement about a rejection test on the variate L(x) translates into a a statement about a rejection test on the variate (x) simpliciter. But in the case of this "nice" Gaussian mixture I discover that for mu1 sufficiently close to mu0 (and certain combinations of weights and standard deviations) that the likelihood ratio L(X) is *not* monotonic and so I am suddenly faced with an unexpected perplexity: it seems (to the eye anyway) that there's only one way set up a right-tailed rejection test for such a pair of simple hypotheses: and yet the Neyman Pearson argument seems to say that making that cut using the PDF of L(x) and making that cut using p0(x|mu0) itself will not yield the same "x" --- for the same test size. Can you see the resolution of this (pseudo)-conundrum?

I replied: Yes, I can see how this would happen. Whether Neyman-Pearson or Bayes, if you believe the model, the relevant information is the likelihood ratio, which I can well believe in this example is not a monotonically increasing then decreasing function of x. That’s just the way it is! It doesn’t seem like a paradox to me, as there’s no theoretical result that would imply that the ratio of two unimodal functions is itself unimodal.

Bloom responded:

I finally was able to see what is obvious. That indeed there are many alternative “rejection regions of the same size” and if the PDF of the “alternative” is bumpy (as in this example) or more generally if the likelihood ratio is not monotone (and this is *not* “easy to see” for ratios of “simple” Gaussian mixtures all of whose kernels have common mean) then indeed the best (most powerful) test is not necessarily the upper tail rejection test. See my badly drawn diagram. This by the way can be filed under your topic of how the Gaussianity ansatz sufficiently well-learned can really impede insights that would otherwise be patently obvious (to the unlearned).

Isn’t this why the gods invented the monotone likelihood ratio condition, to protect us from this sort of thing?

Is there a collection of examples like this to help retrain our intuition?

A related non-paradox is that inverting likelihood ratio tests in discrete distributions (e.g. two-sided test for a binomial proportion) can (and does) lead to confidence regions with holes. See for example Blaker, H. (2000). Confidence curves and improved exact confidence intervals for discrete

distributions. Canadian Journal of Statistics 28 (4), 783–798.

Probabilities calculated from discrete distributions must be rational numbers. Simultaneously assuming they can be irrational leads to weird behavior and all sorts of ad hoc adjustments.

I don’t think that’s right.

In theory you could have a binomial random variable where the probability of success was say sqrt(2)/2. Of course the estimate of the probability you get from a finite sample would be a rational number, but the actual infinite-sample frequency could be irrational.

At some practical level, irrationals are the most impractical thing ever. To send someone a message containing the true value of a single draw from Uniform(0,1) would take past the end of the universe, since the value would be irrational almost surely, and would have no symbolic representation (like pi/4) almost surely. So you’d just have to send all the binary digits, and there are an infinity of them.

Isn’t this an oxymoron? Relatedly, I came across this nice set of quotes the other day:

Cardano 1564:

Leibniz 1710:

Bernoulli 1713

blockquote>

…if the integral and absolute certainty, which we designate by letter a or by unity 1, will be thought to consist, for example, of five probabilities, as though of five parts, three of which favor the existence or realization of some events, with the other ones, however, being against it, we will say that this event has 3/5a, or 3/5, of certainty.

De Moivre 1711:

Laplace 1774:

https://beckassets.blob.core.windows.net/product/readingsample/10272974/9781118063255_excerpt_001.pdf

Infinitely thin Buffon’s needles seems to be the first place this idea of “actual infinities” (as opposed to “convenient infinities”) shows up in probability history.