Erik van Zwet writes:
I saw you re-posted your Bayes-solves-multiple-testing demo. Thanks for linking to my paper in the PPS! I think it would help people’s understanding if you explicitly made the connection with your observation that Bayesians are frequentists:
What I mean is, the Bayesian prior distribution corresponds to the frequentist sample space: it’s the set of problems for which a particular statistical model or procedure will be applied.
Recently Yoav Benjamini criticized your post (the 2016 edition) in section 5.5 of his article/blog “Selective Inference: The Silent Killer of Replicability.”
Benjamini’s point is that your simulation results break down completely if the true prior is mixed ever so slightly with a much wider distribution. I think he has a valid point, but I also think it can be fixed. In my opinion, it’s really a matter of Bayesian robustness; the prior just needs a flatter tail. This is a much weaker requirement than needing to know the true prior. I’m attaching an example where I use the “wrong” tail but still get pretty good results.
In his document, Zwet writes:
This is a comment on an article by Yoav Benjamini entitled “Selective Inference: The Silent Killer of Replicability.”
I completely agree with the main point of the article that over-optimism due to selection (a.k.a. the winner’s curse) is a major problem. One important line of defense is to correct for multiple testing, and this is discussed in detail.
In my opinion, another important line of defense is shrinkage, and so I was surprised that the Bayesian approach is dimissed rather quickly. In particular, a blog post by Andrew Gelman is criticized. The post has the provocative title: “Bayesian inference completely solves the multiple comparisons problem.”
In his post, Gelman samples “effects” from the N(0,0.5) distribution and observes them with standard normal noise. He demonstrates that the posterior mean and 95% credible intervals continue to perform well under selection.
In section 5.5 of Benjamini’s paper the N(0,0.5) is slightly perturbed by mixing it with N(0,3) with probability 1/1000. As a result, the majority of the credibility intervals that do not cover zero come from the N(0,3) component. Under the N(0,0.5) prior, those intervals get shrunken so much that they miss the true parameter.
It should be noted, however, that those effects are so large that they are very unlikely under the N(0,0.5) prior. Such “data-prior conflict” can be resolved by having a prior with a flat tail. This is a matter of “Bayesian robustness” and goes back to a paper by Dawid which can be found here.
Importantly, this does not mean that we need to know the true prior. We can mix the N(0,0.5) with almost any wider normal distribution with almost any probability and then very large effects will hardly be shrunken. Here, I demonstrate this by usin the mixture 0.99*N(0,0.5)+0.01*N(0,6) as prior. This is quite far from the truth, but nevertheless, the posterior inference is quite acceptable. We find that among one million simulations, there are 741 credible intervals that do not cover zero. Among those, the proportion that do not cover the parameter is 0.07 (CI: 0.05 to 0.09).
The point is that the procedure merely needs to recognize that a particular observation is unlikely to come from N(0,0.5), and then apply very little shrinkage.
My own [Zwet’s] views on shrinkage in the context of the winner’s curse are here. In particular, a form of Bayesian robustness is discussed in section 3.4 of a preprint of myself and Gelman here. . . .
He continues with some simulations that you can do yourself in R.
The punch line is that, yes, the model makes a difference, and when you use the wrong model you’ll get the wrong answer (i.e., you’ll always get the wrong answer). This provides ample scope for research on robustness: how wrong are your answers, depending on how wrong is your model? This arises with all statistical inferences, and there’s no need in my opinion to invoke any new principles involving multiple comparisons. I continue to think that (a) Bayesian inference completely solves the multiple comparisons problem, and (b) all inferences, Bayesian included, are imperfect.