A computer program can be completely correct, it can be correct except in some edge cases, it can be approximately correct, or it can be flat-out wrong.

A statistical model can be kind of ok but a little wrong, or it can be a lot wrong. Except in some rare cases, it can’t be correct.

An iterative computation such as a Stan fit can have approximately converged, or it can be far from convergence. Except in some rare cases, it will never completely converge.

Completely correct is defined by matching the spec, not by matching pure mathematics. For example, 1e-20 + 1 is exactly equal to 1 with double-precision floating point. That’s not a bug, it’s just rounding. A statistician might say that floating point is correct except in some edge cases, but it’s never truly right. At most we can get an approximation of pi in the computer because it’s irrational and we only have finite precision.

As to MCMC chains not converging, see Unbiased Markov chain Monte Carlo with couplings

by Pierre E. Jacob, John O’Leary, and Yves F. Atchadé.

The more important point is that HMC is going to come close enough to converging fast enough that we don’t have any tools fine enough to diagnose the fact that it hasn’t technically forgotten its starting state.

Running a well designed iterative sampling procedure is like doing numerical integration… the longer it runs, the closer it gets to the right answer. Whether it’s close enough is entirely up to what you’re planning to use the result for. pi ~ 3 is good enough for a lot of things

Bob:

When I said that a computer program can be completely correct, I didn’t mean mathematically correct, I meant correct in the sense of doing what we want it to do. Think for example about a sorting algorithm.

Regarding so-called perfect sampling etc.: yes, that’s why I said, “except in some rare cases.”

A recent reviewer asked me to put “approximately equals” signs by all my mean differences that were estimated using maximum likelihood estimation.

… Technically correct, but then, maybe I should also be putting approximately equals beside all my standard errors and subsequent statistics calculated from them? Maybe also the descriptive statistics, since I rounded to a finite number?

I just would have thought that what you’re saying here (i.e., no statistical model is technically “correct”) was always assumed by the reader, so EVERY number is approximate. I guess not!

Andrew,

Two bits from a computer scientist …

In computer science, “correctness” turns out to be a subtle issue …

“A computer program can be completely correct, it can be correct except in some edge cases, it can be approximately correct, or it can be flat-out wrong.”

[This is very good, but fails some utility tests. For example, is the operating system on my laptop “correct”? Of course not, but it may fall into your “except in some edge cases” bucket. Similarly, Google Translate is “approximately correct”, as you put it. Finally, in the old days, some compilers generated code that was “flat out wrong” in some cases, though some might argue that it happened only on “edge cases”. My point is that “correctness” may be orthogonal to “utility”. Maybe this was implicit in your assertions.]

“A statistical model can be kind of ok but a little wrong, or it can be a lot wrong. Except in some rare cases, it can’t be correct.”

[Love this, it should be part of every introductory statistics class. However, the way I look at it is that most of statistics fails on edge cases, though eventually I understood that real statisticians knew how to use the methods in ways that avoided the edge cases.]

“An iterative computation such as a Stan fit can have approximately converged, or it can be far from convergence. Except in some rare cases, it will never completely converge.”

[A related issue, that I assume you’ve dealt with somewhere in this blog, is the challenge posed by the notion of “random”. For example, early random number generators could be shown to not generate certain elements of desired enumerations – think analyses of gambling (“When will the house go broke?” More recently, the Mathematica random number generator may be too good in this respect – it seems to cover all enumerations of some set up to a surprising length. This begs the really difficult question “What do we want from a random number generator?”]

> Mathematica random number generator may be too good in this respect

This sounds unusual. You know of a link to something describing this?

Mark said: ‘…”correctness” turns out to be a subtle issue…’

Yes, but perhaps an understatement! :)

After all, the big issue in data science is whether or not we need to know what factors contribute to, for example, the acceptance or rejection of a loan application. In this case, given 5000 loan applications, the only way to know if the output is “right” or “correct” is to make loans to the applications that were approved, track the performance and assess the performance at various times in the future.

So I guess the upshot is when you’re doing modelling or complex programming of any type, no often no one knows what the output will be, so in turn no one knows what’s “right” and what’s “wrong”.

I recall sitting in the computer center (this was some time ago), across a work table from an MBA student who was gloating that his Rsq was 1.3! “The [professor] said it could not go above 1” I looked at the output – he was doing stepwise regression with more variables than cases, and the software just continued to crunch.