## Doomsday! Problems with interpreting a confidence interval when there is no evidence for the assumed sampling model

Mark Brown pointed me to a credulous news article in the Washington Post, “We have a pretty good idea of when humans will go extinct,” which goes:

A Princeton University astrophysicist named J. Richard Gott has a surprisingly precise answer to that question . . . to understand how he arrived at it and what it means for our survival, we first need to take a brief but fascinating detour through the science of probability and astronomy, one that begins 500 years ago with the Polish mathematician Nicholas Copernicus. . . .

Assuming that you and I are not so special as to be born at either the dawn of a very long-lasting human civilization or the twilight years of a short-lived one, we can apply Gott’s 95 percent confidence formula to arrive at an estimate of when the human race will go extinct: between 5,100 and 7.8 million years from now. . . . But for either of those scenarios to be true we must be observing humanity’s existence from a highly privileged point in time: either at the dawn of a technologically advanced, galaxy-hopping supercivilization, or at the end of days for an Earthbound civilization on the brink of extinguishing itself. According to the Copernican Principle, neither one of those scenarios is likely.

That’s old-school, uncritical, gee-whiz science writing for ya. The particular claim is not new; indeed we discussed it right here 13 years ago (see also here and here) I guess we can use the same reasoning to suggest that we’ll be busy debunking this story for many decades to come!

I replied to Mark by sending him a link to my paper with Christian Robert along with this quote:

Mark replied:

The method has a built in bias for residual age to be stochastically increasing in current age. Apply the method to a 90 year old man and the 50% prediction confidence interval for his residual life will be (30,270). Apply it to his 9 year old great grandson and you get (3,27). They would argue that we know something about human lifetimes, and this method is to be used when you essentially know nothing.

As I read it there is a random lifetime of interest, L. We observe A= LU where U is independent of L and is uniform (0,1). The residual life R=L(1-U).=A (1-U)/U . From the percentiles of U/(1-U) we get a 100(1-2b) percent prediction confidence interval for R, A( b/(1-b), (1-b)/b ). If the assumption A=LU is correct then the coverage probability is correct without assumptions on L. That being said if the method is used on a large sample of individuals as in the example above ((A/3, 3A) being a 50% prediction interval), its error rate will probably be a lot larger than 50%.

The method is hard to update. In the Berlin Wall example if you return 5 years later and the wall is still standing what’s the prediction interval for residual life, given this information? The prediction intervals are log symmetric (the prediction interval for log R is symmetric about log(A)). That doesn’t seem reasonable to me.

Hey, living to the age of 270—that’d be cool! I just have to make it to 90, then I’m most of the way there.

In all seriousness, this is a nonparametric frequentist statistical procedure, which is fine—but then this puts a big burden on the sampling model. The key quote from the above news article is, “Assuming that you and I are not so special as to be born at either the dawn of a very long-lasting human civilization or the twilight years of a short-lived one.” That’s an assumption that pretty much assumes the answer already.