Can we use Bayesian methods to resolve the current crisis of unreplicable research?

In recent years, psychology and medicine have been rocked by scandals of research fraud. At the same time, there is a growing awareness of serious flaws in the general practices of statistics for scientific research, to the extent that top journals routinely publish claims that cannot be replicated. All this is occurring despite (or perhaps because of?) statistical tools such as Type 1 error control that are supposed to restrict the rate of unreliable claims. We consider ways in which prior information and Bayesian methods might help resolve these problems.

Here are the details, and here are the slides from the last time I gave this talk.

I can’t attend (wrong continent). Are the slides being posted? I hope that you are not taking the position that some Bayesians took during the Bem incident that any hypothesis they don’t like should be priored out of existence.

Ian:

I posted a link to the slides, which can also be found here. Last time I gave the talk at a psychology department. This time the audience is more general, I’ll have to think about whether I need to adapt it.

Regarding your last sentence above: That one’s worth a post all its own!

Thanks for the slides. On a related note, I was about to point out this paper where the authors attempt to estimate the number of false discoveries in all of science to see what your opinion was (http://biostatistics.oxfordjournals.org/content/15/1/1.abstract). But it appears that you beat me to the punch as you are a discussant!

I agree with your commentary. The main weakness in science is not type I error slipping false discoveries though the cracks, but rather the degrees of freedom in data collection, cleaning, and exploratory data analysis which allow us to optimize significance like an objective function. On the other hand, perhaps you might have been a bit easier on them. The only way to get an analytic handle on the problem is to make the kind of simplifying assumptions that they made.

I’d seriously love a blog post on this.

To me research-fraud and non-replicability seem pretty different problems so I’m curious how you think any one tool can fix them both. Even more I’d love to know how Bayesian reasoning is that hammer, you think.

Finally, I find assigning blame to “Type 1 error control” also a bit intriguing.

Rahul:

When I say “despite (or perhaps because of?) statistical tools such as Type 1 error control that are supposed to restrict the rate of unreliable claims,” I’m not trying to assign blame to Type 1 error control, it’s more that I’m saying that these methods for restricting the rate of unreliable claims do not seem to have worked out. If Type 1 error control, etc., deserve blame, it’s only that they could be giving people a false sense of security.

The analogy I’ve sometimes given is that Type 1 error control (and other, functionally-equivalent approaches such Bayes factor thresholds under conventional priors) is like building a fence around your house, with more restrictive rules such as p=.001 corresponding to higher fences. But now suppose that (a) trusting the fence, you decide not to lock your house or secure any of your belongings that are inside, but, meanwhile (b) thieves have figured out how to climb the fence. Then you’re in big trouble. Substitute “fence” for “statistical significance + publication in a prestigious journal” and substitute “decide not to lock your house” for “decide to trust publications in Psychological Science etc,” and you see where we stand.

Andrew:

I like your fence analogy. Can you now extend it to illustrate how a Bayesian fence would look like? I’m curious to know.

PS. Research fraud is akin to a contractor who told you he constructed a 8 feet high electrified fence with razor wire but in reality just planted a tiny hedge. Or, even more apt, a contractor who gets paid purely by the number of fences (journal articles) he claims to have constructed every day.

IMHO getting rid of fences themselves is not the natural solution to this problem.

Rahul:

Rather than a fence, I’d prefer a series of hedges. With some guard dogs and some locks on the jewelry, just in case. To put it another way, no, I don’t think there’s a Bayesian fence. The problem with the “fence” is that it is an attempt to divide messy continuous reality into discrete Yes and No decisions. Sometimes, of course, we do need to make decisions. But I don’t think it’s the role of the scientific process to make a yes/no decision on hypotheses about social priming or whatever. I prefer to accept that these phenomena exist in some form but that the effects vary by person and by circumstance and not try so hard to get to a position of certainty.

Andrew:

I think the crux is if you like or hate yes / no decisions. I think in practical applications those are terribly important and there really isn’t a good way to circumvent them.

Imagine I’m Underwriters Laboratories or some such. Most of my guidelines are currently of the form

“If storing valuables need fence of 6 feet “or similar.I don’t know if one could frame Bayesian codes specifying which combinatorial combinations of hedges, dogs and locks are recommended and which not.

In other words, I can see how one could live with it for a purely academic study on, say,

“hypotheses about social priming”but for a lot of more practical matters I’m still stumped how one could adopt your prescriptions.Rahul:

I would like scientific papers and research reports more generally to acknowledge uncertainty and variation rather than (1) implying that a single study or set of small experiments has established the truth and (2) implying that this truth is eternal.

When it is time to make decisions, corporate or otherwise, I think decision makers should use all available information rather than treating a single claim as true, just because it happens to be statistically significant and published.

In BDA, we have a chapter on decision analysis; this should give some insight on how my approach would work there. But the papers in Psychological Science etc are not about decision making, and I don’t see the usefulness of yes/no decisions there.

> divide messy continuous reality into discrete Yes and No decisions

that are almost completely delegated to statistical machinery that investigators are excused from being expected to understand themselves but are coached to carry out in ways that hides thier lack of understanding (e.g. using the correct degrees of freedom).

Much more insidious – the “police” direct “winners” (judged to have not broken any rules) through the fences of unsuspecting families’ homes, assuring the winners that they are the real owners of these home and anyone who thinks otherwise just does not understand “due process of law” and will not be allowed to make claims of ownership in any court.

[…] working on the definition of probability post, I saw Gelman’s advertisement subtitled “Can we use Bayesian methods to resolve the current crisis of unreplicable […]

Hi Andrew,

Thanks for posting those slides! Regarding the last two slides though, are those data just hypothetical or did they come from somewhere? The data seem to be partitioned into the same groups on both slides, but still appear quite different. Am I missing something here?