## The anthropic principle in statistics

The anthropic principle in physics states that we can derive certain properties of the world, or even the universe, based on the knowledge of our existence. The earth can’t be too hot or too cold, there needs to be oxygen and water, etc., which in turn implies certain things about our solar system, and so forth.

In statistics we can similarly apply an anthropic principle to say that the phenomena we can successfully study in statistics will not be too small or too large. Too small a signal and any analysis is hopeless (we’re in kangaroo territory); too large and statistical analysis is not needed at all, as you can just do what Dave Krantz calls the Inter-Ocular Traumatic Test.

You never see estimates that are 8 standard errors away from 0, right?

An application of the anthropic principle to experimental design

I actually used the anthropic principle in my 2000 article, Should we take measurements at an intermediate design point? (a paper that I love; but I just looked it up and it’s only been cited 3 times; that makes me so sad!), but without labeling the principle as such.

From a statistical standpoint, the anthropic principle is not magic; it won’t automatically apply. But I think it can guide our thinking when constructing data models and prior distributions. It’s a bit related to the so-called method of imaginary results, by which a model is understood based on the reasonableness of its prior predictive distribution. (Any assessment of “reasonableness” is, implicitly, an invocation of prior information not already included in the model.)

Using the anthropic principle to understand the replication crisis

Given that (a) researchers are rewarded for obtaining results that are 2 standard errors from zero, and (b) There’s not much extra benefit to being much more than 2 standard errors from zero, the anthropic principle implies that we’ll be studying effects just large enough to reach that level. (I’m setting aside forking paths and researcher degrees of freedom which allow researchers on ESP, beauty and sex ratios, etc., to attain statistical significance from what is essentially pure noise.) This implies high type M errors, that is, estimated effects that are overestimates (see Section 2.1 of this paper).

A discussion with Ken Rice and Thomas Lumley

I was mentioning this idea to Ken Rice when I was in Seattle the other day, and he responded:

Re: the anthropic idea, that (I think) statisticians never see really large effects relative to standard errors, because problems involving them are too easy to require a statistician, this sounds a lot like the study of local alternatives, e.g., here and here.

This is old news, of course, but an interesting (and unpublished) variation on it might be of interest to you. When models are only locally wrong from what we assume, it’s possible we can’t ever reliably detect the model mis-specificiation based on the data, but yet that the mis-specification really does matter for inference. Thomas Lumley writes about this here and here.

At first this was highly non-intuitive to me and others; perhaps we were all too used to thinking in the “classic” mode where the estimation problem stays fixed regardless of the sample size, and this phenomenon doesn’t arise. I suspect Thomas’ work has implications for “workflow” ideas too – that for sufficiently-nuanced inferences we can’t be confident that the steps that led them being considered are reliable, at least if they were based on the data alone. Some related impossibility results are reviewed here.

And Lumley points to this paper, Robustness of semiparametric efficiency in nearly-true models for two-phase samples.

1. Bill Jefferys says:

I thought “inter-ocular traumatic test” had been around for quite a long while. I can’t find the source; I thought it might have been Jimmie Savage, but that doesn’t turn anything up. But Dave Krantz isn’t a name I associate with it.

2. Oliver says:

I know that you are not claiming otherwise, but I think that there quite a few exceptions to this principle in statistics. Sometimes statistical thinking is needed to define how best to define an effect, even if the fact that an effect exists is obvious. Statistical modelling can also be required to quantify variation, even if the overall average differences between groups are obvious.

In large observational cohorts it is fairly common to see effect estimates that at least approach the magnitude that you describe, but the problem is to judge whether your model is really measuring what you think it is measuring and whether the result is due to unmeasured confounding or some other bias (which is generally impossible to determine with certainty).

3. Carlos Ungil says:

> Given that (a) researchers are rewarded for obtaining results that are 2 standard errors from zero, and (b) There’s not much extra benefit to being much more than 2 standard errors from zero, the anthropic principle implies that we’ll be studying effects just large enough to reach that level.

And what effect size is “just large enough to reach that level”? Zero effects are also large enough to reach that level if one is prepared to repeat the experiment until the level is reached.

4. Keith O'Rourke says:

> You never see estimates that are 8 standard errors away from 0, right?
Well I one had an investigator that came to see me after the directer of the statistical consulting service at U of T told him to leave his office and not come back until he had an experiment with at least 20 to 30 animals per group.

With 6 animals per group, the means were 8 standard _deviations_ away (or was maybe it was 6). Rather than a direct replication, I suggested moving on the next stage of experiment work that was successful – though not as dramatic.

When I later mentioned this to the directer, he maintained he had done the right thing using the anthropic principle that such large effects were just so rare and so his message almost always would have been correct.

• Keith O'Rourke says:

Thinking about this a bit more – “that (I think) statisticians never see really large effects relative to standard errors,” this may largely be the result of statisticians accepting to work on studies selectively brought to them rather than working more upstream.

Two instances come to mind, a gastroenterology researcher I once worked with sliced a tissue in a different way and increased the amount of information dramatically. They could easily see the advantage the first time they did it.

A surgical researcher I worked with was doing a pilot survey prior to an RCT on a method to reduce surgery cancellations which were high. The phone survey asked patients about to have surgery about their willingness to be in a trail if the trial was already ongoing – to get a better sense of recruitment. None of those surveyed missed their surgery.

So they started using phone reminders just before surgery and the problem went away. The researcher actually returned the funds for the RCT to the granting agency.

So maybe there is mostly a problem of inference of failing to consider selectivity? We should be asking/tracking researchers upstream.

By the way, CS Peirce leaned fairly hard on anthropic principles to justify methods of inquiry – the universe was profitably represent-able in ways that facilitates acting without being frustrated by reality – as that gives intelligence a survival advantage. I think that would rule out a large number of important interactions but not for instance the size of a nuclear explosion.

5. gwern says:

As far as the nonlinear argument goes: can we consider this a version of the ‘bet on sparsity’ principle?

• Keith O'Rourke says:

gwern
“Hedging against slightly more complexity” would seem more apt.

The idea is to use a less complex model that you believe is not complex enough but have some assurance that if the problem is truly much more complex you will get to bail out, whereas if you don’t need to bail out, some good (good for what?) performance measure will be actually better than if you used the just complex enough model.

The idea goes way back, I am guessing astronomers in 1800s but to get the actual mathematics in particular applications, you will need to read Lumley’s posts and papers.

• gwern says:

But per the paper, you don’t get to bail out because you aren’t allocating any samples to intermediate values to check for nonlinearity…

To expand a little: ‘bet on sparsity’ means to do your design & analysis under the assumption that a relatively few variables will account in simple linear ways for a decent chunk of variance, as opposed to a billion tiny variables operating nonlinearly (eg GWAS, where we know a large fraction of human traits in general can be accounted for by a few thousand SNPs acting in an insultingly simple additive fashion, as opposed to the full panel of 500k-1.5m SNPs acting in complicated patterns of dominance & epistasis, and so the lasso works great), because if the latter, you’re doomed anyway. In this Gelman paper, the argument seems identical: you should assume the relationship is relatively linear and has few variables, because if it does have any nonlinearity like quadraticness, your sample size is small enough that you are doomed anyway – especially if you waste samples trying to measure it at all (unless the nonlinearity is so enormous you can’t possibly miss it).

• Keith O'Rourke says:

gwern

OK by sparsity you mean not too many variables operating nonlinearly.

I was actually thinking about an almost proof I was working on many years ago in a simple setting of one variable that might be non-linear but was monotonic. So no need for a sparsity bet as it is very sparse.

> unless the nonlinearity is so enormous you can’t possibly miss it
That was what I meant by “you will get to bail out”.

In those specific applications you refer to I would agree there is a sparsity bet going on.

There are others that would not and there is a bit of discussion of that here if your interested “From a slightly different
direction Tibshirani (2014) argues that enforcing sparsity is not primarily motivated by beliefs about the world, but rather by benefits such as computability and interpretability, indicating how considerations other than correspondence to reality often play an important role in statistics and more generally in science.” http://www.stat.columbia.edu/~gelman/research/unpublished/Amalgamating6.pdf

6. Thomas B says:

Do these statements (assumptions) imply that extreme value theory in statistics has no relevance or utility?

7. Paul Alper says:

Andrew wrote: “In statistics we can similarly apply an anthropic principle to say that the phenomena we can successfully study in statistics will not be too small or too large. Too small a signal and any analysis is hopeless (we’re in kangaroo territory); too large and statistical analysis is not needed at all, as you can just do what Dave Krantz calls the Inter-Ocular Traumatic Test.”

From https://todayinsci.com/R/Rutherford_Ernest/RutherfordErnest-Quotations.htm we have some musing of Sir Ernest Rutherford:

“The only possible conclusion the social sciences can draw is: some do, some don’t.”

“All science is either physics or stamp collecting.”

“If your result needs a statistician then you should design a better experiment.”