I came across this article on the philosophy of statistics by University of Michigan economist John DiNardo. I don’t have much to say about the substance of the article because most of it is an argument against something called “Bayesian methods” that doesn’t have much in common with the Bayesian data analysis that I do.
If an quantitative, empirically-minded economist at a top university doesn’t know about modern Bayesian methods, then it’s a pretty good guess that confusion holds in many other quarters as well, so I thought I’d try to clear a couple of things up. (See also here.)
In the short term, I know I have some readers at the University of Michigan, so maybe a couple of you could go over to Prof. DiNardo’s office and discuss this with him? For the rest of you, please spread the word.
My point here is not to claim that DiNardo should be using Bayesian methods or to claim that he’s doing anything wrong in his applied work. It’s just that he’s fighting against a bunch of ideas that have already been discredited, within Bayesian statistics. It’s all well and good to shoot at arguments that are already dead, but I think it’s also a good idea to be aware of the best and most current work in a field that you’re criticizing.
To be specific, see pages 7-8 of DiNardo’s article:
1. DiNardo thinks that Bayesians believe that the data generating mechanism is irrelevant to inference. To which I reply, (a) I’m a Bayesian and I believe that the data generating is relevant to inference, and (b) We discuss this in detail in chapter 7 of BDA.
2. DiNardo thinks that stopping rules are irrelevant to Bayesians. Nope. see the example starting in the middle of page 163 in BDA (all references here are to the second edition).
3. DiNardo thinks that Bayesians think that the problem of “how to reason” has been solved. Nope. A Bayesian inference is only as good as its models. Bayesians are busy developing, applying, checking, and extending new classes of models. Consider, for example, the explosion of research in nonparametric Bayes in recent years. Given that “how to reason Bayesianly” includes model specification, and given that model specification is wide open, we would not at all claim that the problem of reasoning has been solved.
4. DiNardo thinks that Bayesians think that “usual (non-Bayesian) practice is very badly wrong.” Sometimes it is, sometimes not. See BDA and ARM for lots of examples. And see here for some inferences that would be difficult to do using classical inference. (I’m not saying it can’t be done, just that it would require a lot of effort, just to reproduce something that can be done straightforwardly using Bayesian methods.)
5. DiNardo thinks that Bayesians think that “randomization rarely makes sense in those contexts where it is most often employed.” Nope. Forget Urbach (1985), whoever he is. Instead, check out section 7.6 of BDA.
6. DiNardo thinks that Bayesians think that “probability does not exist.” Nope. Check out chapter 1 of BDA, in particular the examples of calibrations of record linkage and football point spreads.
The paradox of philosphizing
DiNardo remarks, perhaps accurately, that the literature on the philosophy of statistics is dominated by Bayesians with extreme and often nutty views. And this frustrates him. But here’s the deal: A lot of the good stuff is not explicitly presented as philosophy.
When we wrote Bayesian Data Analysis, we were careful not to include the usual philosophical arguments that were at that time considered standard in any Bayesian presentation. We decided to skip the defensiveness and just jump straight to the models and the applications. This worked well, I think, but it has led the likes of John DiNardo and Chris Burdzy (as discussed earlier on this blog) to not notice the philosophical content that is there.
And this is the paradox of philosophizing. If we had put 50 or 100 pages of philosophy into BDA (rather than discussing model checking, randomization, the limited range of applicability of the likelihood principle, etc., in separate places in the book), that would’ve been fine, but then we would’ve diluted our message that Bayesian data analysis is about what works, not about what is theoretically coherent. Many people find philosophical arguments to be irritating and irrelevant to practice. Thus, to get to the point, it can be a good idea to avoid the philosophical discussions. But, as the saying goes, if philosophy is outlawed, only outlaws will do philosophy. DiNardo is responding to the outlaws. I hope this blog will wake him up and make him see the philosophy that is all around him every day.
P.S. DiNardo does recognize the diversity of Bayesian approaches:
In what follows, when I [DiNardo] describe something as “Bayesian” I do not mean to suggest any writer in particular holds all the views so attributed here. There is considerable heterogeneity: some view concepts like “the weight of evidence” as important, others do not. Some view expected utility as important, other do not. This is not intended to be a “primer” on Bayesian statistics. Neither is it intended to be a “critique” of Bayesian views. . . . My purpose is not to do Bayesian ideas justice (or injustice!) but rather, to try selectively choose some implications of various strands of Bayesianism and non-Bayesianism for actual statistical practice that highlight their differences so as to be clear to a non-Bayesian perspective.
But . . . it’s all about what goes into the “various strands” that DiNardo selects. If he were to compare the applied relevance of, say, BDA and ARM, to the applied relevance of a classical text such as that of LeCam, I think he’d be seeing quite a different picture in terms of relative usefulness of the Bayesian and non-Bayesian approach.
What I suspect–any readers who know DiNardo can ask him directly–is that he is simply unaware of the modern approach to Bayesian data analysis which is based on modeling and active model checking (“severe testing,” to use the phrase of Deborah Mayo). I don’t expect that seeing my books would make DiNardo a convert to the Bayesian approach, but it might make him realize that practical Bayesians such as myself are not quite as silly as he might imagine.
P.P.S. I cite my own writings because that’s what I’m most familiar with. It should be easy enough to form a similar reply using the words of others.
Andrew, I am missing something, I think. Looking at BDA p. 163, the example is an alleged iid binomial sample that is manifestly not iid because of serial correlation. It's a nice example, but I don't see the application to stopping rules.
Jeffreys: While not so explicit in that example, I think Gelman is pointing out that posterior predictive p-values can depend on stopping rules even if inference about the model parameters do not. For instance, assume that the stopping rule is stop sampling after 8 zeros. That is irrelevant to inference about theta, by the likelihood principle, but it will enter into the posterior predictive p-value because the predictive distribution may or may model the stopping rule correctly. If you ignore the stopping rule, and make the test statistic T be the mean of the last 8 values, then T will look very extreme compared to Trep that ignore the stopping rule but perfectly normal for Trep that incorporate the stopping rule. Rubin emphasizes this point in his discussion of Gelman's 1996 paper as well.
Bill: See Exercise 6.6 on page 193.
“our message that Bayesian data analysis is about what works, not about what is theoretically coherent.”
I find this statement appealing but it does undersell Bayesian methods a bit. Theoretical coherence is very helpful. Thinking clearly about data analysis, particularly when confronted with new types of data, is an important skill. When I read Rubin/Gelman style papers, it’s clear that the theoretical coherence of their Bayesian perspective is central to their ability to think about data analysis. While lots of frequentist econometrics works, its lack of theoretical coherence can leave students lost.
Compare BDA to a book like Venables and Ripley or Hastie et al. From a short-run perspective, most of the methods in the latter books work, often better than the basic models presented in BDA. You can use the latter to analyze many types of data successfully. But from a long-run perspective they don’t work that well. They don’t give you a coherent framework to think about and conduct data analysis. As a value investor, I put my money with BDA.
Andrew, John,
Thanks, that makes it very clear.
Bill
I am not applying bayesian methods but I read your well written article on with Shalizi and finally have some sort of sense about how you work, test, reject and improve models. The problem lies somewhere else:
Working as an economist at a empirically oriented chair of sociology you encounter numerous scholars at our faculty and other institutions that hate statistics, do not use them, can not cope with numbers and models and foremost sometimes despise statistics in their classes. So students are often confronted with some sort of criticism of statistical methods in general that we have to dispel on a daily basis. And here come the frequentists as you called them.
The most modest argument that someone can make to explain statisticall reasoning, convince students to learn the methods and overthrow stupid arguements against working empirically is that the influence of variable x on variable y with a probability of …percent is not zero.
I don not see a similar convincing argument in BA that would draw students away from determinism. I know this is a sort of very specific problem, but we face it on a daily basis. Do you have a similar argument that would do the job?
I think DiNardo's article is a very interesting one, but not really a Bayes-Frequentist comparison. For he rarely presents both the Bayesian and non-Bayesian solutions to the same problem. This is something which is continuously done in these situations, usually by both Bayesians and non-Bayesians. It is a classical political tactic to advocate your method with "ideology" (e.g. "confidence intervals always gaurantee long-run performance" "Bayesian inference automatically leads to the correct conclusions" are very unhelpful comments) and then look for apparently bad example in the other's camp. You say "look I've found a defect in your method in this problem", but do not then go and say why it is defective, nor actually show how your "ideology" would actually fix this "defect". Edwin Jaynes 1976 "Bayesian intervals vs confidence intervals" is the only article I have read which actually shows you BOTH the "standard" solutions, and he also attempts to give an explanation for why they are different. Unfortunately, Jaynes couldn't help himself in bagging the non-Bayesian methods for what he sees as obviously stupid (or at least that's how I read it).
This article seems more like its focused on the "pre-data" situations, where both the data and the parameters are uncertain. Thus, from a "probability as extended logic" view (which is more general than "Bayesian"), both the data and the parameters of the model should be formulated as part of the "hypothesis space", with only the prior information being a part of the conditions. So "Bayes theorem" should read P(Data,parameters | I) = P(Data|I) P(paramters|Data,I). So the standard "posterior" should be modified by P(Data|I) which represents a "sampling weight", and I represents the knowledge about the design of experiment and functional form of the model. But this is all PRE-DATA. Another thing, from "frequentist" viewpoint, nothing stops one from considering the "sampling distribution" of the Bayesian posterior – it can be thought of as just another statistic.
The likelihood principle discussed in this article is tackled from the "pre-data" viewpoint, and it does seem silly in that context. But maybe to put it another way, the likelihood principle is like saying "once you are certain of something, do not speak of its probability". Once something which was uncertain becomes certain, it makes no sense to speak of its "probability". This is exactly what happens with data. before collection, it makes sense to speak of what data sets you may observe. After collection, you know what data was observed, so it is not "random" anymore. This is precisely why the design of experiment is irrelevant for INFERENCE ABOUT PARAMETERS. However the experimental design IS IMPORTANT FOR PREDICTION OF FUTURE DATA. Bayesian posterior predictive densities do depend on the design of the experiment (for the constants in the likelihood function do not cancel out in predictives). I think this may explain why the frequentist sees the LP as odd, because they are interested in repetitions of the experiment, not in inference about the experiment observed. It may be useful to compare posterior predictive densities for future data as oppose to posterior densities for the parameters of interest. These may have a closer connection to what frequentists are actually interested in. Although, seeing as I haven't actually done any comparisons in this space myself, this could make things more confusing and convoluted!
One of the things I absolutely LOVED about this article, is the irony of DiNardo's "desiderata" for severe tests. For why shouldn't this "severe testing" hypothesis procedure apply in the "metastatistics" world of Bayes vs frequentist inference? Find a "pathological" statistical problem (this is a severe test I would assume), and demand the best solution from each method which makes use of the same information, and is subject to the same criterion of performance. The Cauchy distribution provides one potential "pathology" (all moments are either undefined or infinite, and sample mean has same distribution as any one observation).
Apologies for the lengthy comment, but I couldn't help myself! Whether you agree or disagree with DiNardo's view, I think his article is very thought provoking (but then, I like the philosophy/foundations of statistics – so this may seem like drivel to someone else).