In Briggs’s article he writes (http://wmbriggs.com/post/23244/):

“No probability can be defined in frequentist theory unless infinite samples are available.”

This is incorrect I believe. The Strong Law of Large Numbers says that for any e>0, the probability of Heads, p, say for a coin flip, will be in [p-e, p+e] with probability 1-1/(m*e^2), where m is sufficiently large. It does not say anything about “infinity”. In practice, the relative frequency of heads settles down rather fast.

]]>I

I think the phrase is such an obvious idea (following the famous Keynes line, “In the long run we are all dead”) that it’s no surprise that people keep independently coming up with it.

]]>Once you restrict yourself to classes of NP priors (like a GP or a dirichlet process), consistency becomes much more likely.

Or, to say it shorter, I think Wasserman is being disingenuous here.

]]>http://www.stat.cmu.edu/~larry/=sml/Bayes.pdf

His last two paragraphs of section 12.7 are:

This means that, in a topological sense, consistency is rare for Bayesian

procedures. From this result, it can also be shown that most pairs of priors lead to

inferences that disagree. (The agreeing pairs are meager.) Or as Freedman says in his paper:

“ … it is easy to prove that for essentially any pair of Bayesians, each thinks the other

is crazy.”

Now, it is possible to choose a prior that will guarantee consistency in the frequentist

sense. However, Freedman’s theorem says that such priors are rare. Why would a Bayesian

choose such a prior? If they choose the prior just to get consistency, this suggests that

they are realy (sic) trying to be frequentists. If they choose a prior that truly represents

their beliefs, then Freedman’s theorem implies that the posterior will likely be

inconsistent.

Which actually possibly contributes to making a person “cool” in my view and reasoning.

]]>Yes, some discussion here, here, and here (where I slam Dan Simpson! for running only one chain, but also that post has a Man on the Moon title, and I have a horrible feeling that Dan would consider REM hopelessly uncool).

]]>I think you meant “nuisance parameter” instead of “nuance parameter” near the start.

@Keith O’Rourke: Geyer is also the one who has been arguing that you can get away with only running a single chain, e.g., in section 1.11.3 of the intro chapter to the *Handbook of MCMC*. This can be dangerous advice in practice because you can get autocorrelation estimates from one chain that are consistent with convergence even when multiple chains would indicate the entire posterior isn’t being explored. Just because you can’t cover all regions of the space with initialization, I see no reason not to try a few—it helps avoid false positives in convergence assessment based only on autocorrelation estimates.

Very much agree and the usual brief limitations section that can be interpreted (or the author can interpret) to rule out what users will likely think they can (or really really want to) use the methods for, does not cut it.

Also, for those who may not be aware, Charlie Geyer http://www.stat.umn.edu/geyer/lecam/simple.pdf argued for the following attitude about asymptotics:

• Asymptotics is only a heuristic. It provides no guarantees.

• If worried about the asymptotics, bootstrap!

• If worried about the bootstrap, iterate the bootstrap!

I don’t think it’s the case the MLE must unbiased, which is important because that rarely holds, while the MLE being consistent is more common.

]]>