My (coauthored) books on Bayesian data analysis and applied regression are like almost all the other statistics textbooks out there, in that we spend most of our time on the basic distributions such as normal and logistic and then, only as an aside, discuss robust models such as t and robit.
Why aren’t the t and robit front and center? Sure, I can see starting with the normal (at least in the Bayesian book, where we actually work out all the algebra), but then why don’t we move on immediately to the real stuff?
This isn’t just (or mainly) a question of textbooks or teaching; I’m really thinking here about statistical practice. My statistical practice. Should t and robit be the default? If not, why not?
Some possible answers:
10. Estimating the degrees of freedom in the error distribution isn’t so easy, and throwing this extra parameter into the model could make inference unstable.
9. Real data usually don’t have outliers. In practice, fitting a robust model costs you more in efficiency than you gain in robustness. It might be useful to fit a contamination model as part of your data cleaning process but it’s not necessary once you get serious.
8. If you do have contamination, better to model it directly rather than sweeping it under the rug of a wide-tailed error distribution.
7. Inferential instability: t distributions can yield multimodal likelihoods, which are a computational pain in their own right and also, via the folk theorem, suggest a problem with the model.
6. To make that last argument in reverse: the normal and logistic distributions have various excellent properties which make them work well even if they are not perfect fits to the data.
5. As Jennifer and I discuss in chapters 3 and 4 of our book, the error distribution is not the most important part of a regression model anyway. To the extent there is long-tailed variation, we’d like to explain this through long-tailed predictors or even a latent-variable model if necessary.
4. A related idea is that robust models are not generally worth the effort–it would be better to place our modeling efforts elsewhere.
3. Robust models are, fundamentally, mixture models, and fitting such a model in a serious way requires a level of thought about the error process that is not necessarily worth it. Normal and logistic models have their problems but they have the advantage of being more directly interpretable.
2. The problem is 100% computational. Once stan is up and running, you’ll never see me fit a normal model again.
I don’t know what to think. RIght now I’m leaning toward answer #2 above, but at the same time it’s hard for me to imagine such a massive change in statistical practice. It might well be that in most cases the robust model won’t make much of a difference, but I’m still bothered that the normal is my default choice. If computation wasn’t a constraint, I think I’d want to estimate the t (with some innocuous prior distribution to average over the degrees of freedom and get a reasonable answer in those small-sample problems where the df would not be easy to estimate), or if I had to pick, maybe I’d go with a t with 7 degrees of freedom. Infinity degrees of freedom doesn’t seem like a good default choice to me.