Joshua Vogelstein points me to this paper by Gerd Gigerenzer and Julian Marewski, who write:
The idol of a universal method for scientific inference has been worshipped since the “inference revolution” of the 1950s. Because no such method has ever been found, surrogates have been created, most notably the quest for significant p values. This form of surrogate science fosters delusions and borderline cheating and has done much harm, creating, for one, a flood of irreproducible results. Proponents of the “Bayesian revolution” should be wary of chasing yet another chimera: an apparently universal inference procedure. A better path would be to promote both an understanding of the various devices in the “statistical toolbox” and informed judgment to select among these.
I agree, although I might change “select among” to “combine” in the final sentence.
I think your readers might like this paper. Most statisticians I know prefer (and recommend) a very small set of tools in the toolbox, much like most doctors use a very small subset of diagnostic codes available from the DSM-5 and ICD-10. It makes sense because we know a certain set better than others. Perhaps we could do better at explicitly acknowledging that the reason we would use our preferred method is familiarity, rather than superiority, possibly referencing Hoadley’s
I coined a phrase called the “Ping-Pong theorem.” This theorem says that if we revealed to Professor Breiman the performance of our best model and gave him our data, then he could develop an algorithmic model using random forests, which would outperform our model. But if he revealed to us the performance of his model, then we could develop a segmented scorecard, which would outperform his model.
Regarding the toolbox, yes, that’s the topic of my paper, “How do we choose our default methods?”, which I recommend to all statistics students.
Regarding the “ping-pong theorem,” I prefer the term “leapfrog” which I think better characterizes the forward progress that comes from building upon and improving the ideas of others. See footnote 1 of this paper:
Progress in statistical methods is uneven. In some areas the currently most effective methods happen to be Bayesian, while in other realms other approaches might be in the lead. The openness of research communication allows each side to catch up: any given Bayesian method can be interpreted as a classical estimator or testing procedure and its frequency properties evaluated; conversely, non-Bayesian procedures can typically be reformulated as approximate Bayesian inferences under suitable choices of model. These processes of translation are valuable for their own sake and not just for communication purposes. Understanding the frequency properties of a Bayesian method can suggest guidelines for its effective application, and understanding the equivalent model corresponding to a classical procedure can motivate improvements or criticisms of the model which can be translated back into better understanding of the procedures. From this perspective, then, a pure Bayesian or pure non-Bayesian is not forever doomed to use out-of-date methods, but at any given time the purist will be missing some of the most effective current techniques.
Again, note the idea of seeking to understand and improve methods that come from alternative perspectives, not merely choosing between static alternatives.
For a different take:
You have conceded the point that there is no universal method for scientific inference by shifting focus from that to a “factory” that generates methods “for specific problem classes.”
Our followup paper was related and maybe more towards choosing rather than developing methods http://www.stat.columbia.edu/~gelman/research/published/authorship2.pdf
Recently I cam across this – Knowledge translation in biostatistics: a survey of current practices, preferences, and barriers to the dissemination and uptake of new statistical methods – http://onlinelibrary.wiley.com/doi/10.1002/sim.6633/abstract
Knowing some of the authors, I sent off this comment
“It comes from my experience with other statisticians being way to eager to try out new and non-standard methods when collaborating with clinicians. My experience is surely biased but makes it difficult for me to interpret your paper. I certainly agree with the dearth of professional development for practicing statisticians and your suggestions to improve that.
Ideally perhaps, I wish the question was more of the form of “how often were you unable to undertake an analysis that you judged would be likely to be more appropriate in important ways” – rather than new or non-standard. Perhaps non-standard was meant to get at something similar?”
(The first author replied, agreeing about the over-eagerness about the new and possibly better phrasing of the question.)
Unfortunately this paper by Gerd Gigerenzer and Julian Marewski is behind a paywall from me.
Perhaps nothing should be published without having been reproduced.
The thing I fear is when I read bloggers who view themselves as hybrids who use whatever method is most appropriate. Bayesian This, Frequentist That, Some Other Thing… But when they switch up methodologies, are they really aware of the subtleties that can make or break their analysis?
I’ve interviewed a consultant who was smart enough to pick up R and a survival analysis package quickly for an engagement. Unfortunately he didn’t understand how to test his model rigorously and so his analysis was misleading. I’ve sat in presentations of survival analysis where the presenter — who was new to the technique — had a leak from the future and wouldn’t acknowledge it when challenged. I still don’t fully understand the circumstances under which apparently non-informative Bayesian priors turn out to be informative in ways that taint the model, though at least I’m aware that it can happen.
So I fear the results of those who are too eager to switch between methodologies. Sticking with the toolkit you know isn’t always an option, and it doesn’t guarantee that you will master that toolkit, but at least there’s a good chance that you will properly apply the tools you know well.
That’s all at an individual level. At a team level, it makes a lot of sense to apply different methods on the same problem, to sanity check things. And at the Advancement of Science level, schools, methodologies, and techniques will leap frog each other and advance the state of the art. I worry about the “hybrid” individual practitioner.
And p-hacking your way to false beliefs accomplishes what exactly?
Not sure what you’re talking about here.
‘new methods=not p-hacking’
– Sticking with ‘p-hacking’ and pretending that ‘doing it well’ results in better decisions than embracing and applying new methods is a recipe for continuing decision processes that demonstrably result in more bad decisions than the ‘new methods’.
– Labeling someone who is in the process of learning new methods as a “hybrid individual practitioner” and in turn denigrating their attempts to improve their analytic skill and the analytic processes where they work is shortsighted and provides a roadblock to needed change.
It appears I misunderstood your point.
Holding tight to bad methodology is a recipe for continue to do the same thing despite the results telling you it does not work.
When p-hacking results converge with appropriately designed and executed studies, what does that tell you?
That “even a blind chicken sometimes finds a kernel on occasion”.
Being stubbornly attached to worse methods is a far greater risk to a business than is eagerness to improve analytic methods.
Agree that you are raising serious concerns though not necessarily Bayesian versus Frequentest and what my comment above was trying to get at.
> I still don’t fully understand the circumstances under which apparently non-informative Bayesian priors turn out to be informative in ways that taint the model, though at least I’m aware that it can happen.
I do know some experienced Bayesians have been caught by that and it took them a while to realize what was going on (e.g. Peter Thall by self admission).
Strangely though, plots of implied or marginal priors (simulating from just the prior) should allow one to notice the potential problem (given one can adequately sample from the prior) though not fix it or understand it well. My guestimate of the percent of Bayesian analyses where this is done is 5% (which maybe OK for the real experts).
I gave some references on this in another comment http://statmodeling.stat.columbia.edu/2016/03/16/lack-of-free-lunch-again-rears-ugly-head/#comment-266372
My apologies if I misunderstood your comment. I saw your comment as equally focused on ‘eagerness to embrace new methods’ which is often a debate within business organizations and often rife with what I consider to be a fallacious argument: that applying bad methods well is better than putting in the effort to learn and apply new methods.
OK, I get your objection now. And you’re correct that I wasn’t saying stick with an inferior method. I’m just worried about a single individual who thinks they’ve mastered both Frequentist and Bayesian methodologies, for example, and switches between them on a question-by-question basis. (Based on various blog postings I’ve read about not being Bayesian or Frequentist.)
I disagree with the quote that Andrew highlights that warns against Bayesian Inference as a generalized mechanism. In fact that’s why BUGS, JAGS, and Stan exist: Bayesian inference is a machine that can run a vast number of model types in a fairly modular way. Of course, there are all kinds of details to the design, implementation and interpretation of specific models — I use survival models as an example. But I think that mastering the Bayesian inferential machinery: priors, likelihoods, posteriors, convergence, posterior predictive checks, etc, would serve someone well in a wide variety of applications — wider, I would think, than spending a similar amount of time mastering a similar breadth of, say, Frequentist knowledge. Just my opinion.
I agree fully.
Thanks for the pointer to the other thread. I’ve gotten both of the articles you link to and they’re excellent!
If a better method agrees with a worse method, what can you conclude from their convergence?