John Talbott points me to this, which I briefly mocked a couple months ago. I largely agree with the critics of this research, but I want to reiterate my point from earlier that all the statistical sophistication in the world won’t help you if you’re studying a null effect. This is not to say that the actual effect is zero—who am I to say?—just that the comments about the high-quality statistics in the article don’t say much to me.
There’s lots of discussion of the lack of science underlying ESP claims. I can’t offer anything useful on that account (not being a psychologist, I could imagine all sorts of stories about brain waves or whatever), but I would like to point out something that usually doesn’t seem to get mentioned in these discussions, which is that lots of people want to believe in ESP. After all, it would be cool to read minds. (It wouldn’t be so cool, maybe, if other people could read your mind and you couldn’t read theirs, but I suspect most people don’t think of it that way.) And ESP seems so plausible, in a wish-fulfilling sort of way. It really feels like if you concentrate really hard, you can read minds, or predict the future, or whatever. Heck, when I play squash I always feel that if I really really try hard, I should be able to win every point. The only thing that stops me from really believing this is that I realize that the same logic holds symmetrically for my opponent. But with ESP, absent a controlled study, it’s easy to see evidence all around you supporting your wishful thinking. (See my quote in bold here.) Recall the experiments reported by Ellen Langer, that people would shake their dice more forcefully when trying to roll high numbers and would roll gently when going for low numbers.
When I was a little kid, it was pretty intuitive to believe that if I really tried, I could fly like Superman. There, of course, there was abundant evidence—many crashes in the backyard—that it wouldn’t work. For something as vague as ESP, that sort of simple test isn’t there. And ESP researchers know this—they use good statistics—but it doesn’t remove the element of wishful thinking. And, as David Weakiem and I have discussed, classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects.
I think it’s naive when people implicitly assume the following dichotomy: either a study’s claims are correct, or that study’s statistical methods are weak. Generally, the smaller the effects you’re studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics.
To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in “legitimate” psychology research. The difference is that when you’re studying a large, robust phenomenon, little statistical errors won’t be so damaging as in a study of a fragile, possibly zero effect.
In some ways, there’s an analogy to the difficulties of using surveys to estimate small proportions, in which case misclassification errors can loom large, as discussed here.
Now to criticize the critics: some so-called Bayesian analysis that I don’t really like
I agree with the critics of the ESP paper that Bayesian analysis is a good way to combine the results of this not-so-exciting new finding that people in the study got 53% correct instead of the expected 50% correct, with the long history of research in this area.
But I wouldn’t use the Bayesian methods that these critics recommend. In particular, I think it’s ludicrous for Wagenmakers et.al. to claim a prior probability of 10^-20 for ESP, and I also think that they’re way off base when they start talking about “Bayesian t-tests” and point null hypotheses. I think a formulation based on measurement-error models would be far more useful. I’m very disturbed by purportedly Bayesian methods that start with meaningless priors which then yield posterior probabilities that, instead of being interpreted quantitatively, have to be converted to made-up categories such as “extreme evidence,” “very strong evidence,” “anecdotal evidence,” and the like. This seems to me to be taking some of the most arbitrary aspects of classical statistics. Perhaps I should call this the “no true Bayesian” phenomenon.
And, if you know me at all (in a professional capacity), you’ll know I hate statements like this:
Another advantage of the Bayesian test that it is consistent: as the number of participants grows large, the probability of discovering the true hypothesis approaches 1.
The “true hypothesis,” huh? I have to go to bed now (no, I’m not going to bed at 9am; I set this blog up to post entries automatically every morning). If you happen to run into an experiment of interest in which psychologists are “discovering a true hypothesis,” (in the statistical sense of a precise model), feel free to wake me up and tell me. It’ll be newsworthy, that’s for sure.
Anyway, the ESP thing is pretty silly and so there are lots of ways of shooting it down. I’m only picking on Wagenmakers et al. because often we’re full of uncertainty about more interesting problems For example, new educational strategies and their effects on different sorts of students. For these sorts of problems, I don’t think that models of null effects, verbal characterizations of Bayes factors, and reassurances about “discovering the true hypothesis” are going to cut it. These methods are important, and I think that, even when criticizing silly studies, we should think carefully about what we’re doing and what our methods are actually purporting to do.