That silly ESP paper and some silliness in a rebuttal as well

Posted on January 6, 2011 8:52 AM by Andrew

John Talbott points me to this, which I briefly mocked a couple months ago. I largely agree with the critics of this research, but I want to reiterate my point from earlier that all the statistical sophistication in the world won’t help you if you’re studying a null effect. This is not to say that the actual effect is zero—who am I to say?—just that the comments about the high-quality statistics in the article don’t say much to me.

There’s lots of discussion of the lack of science underlying ESP claims. I can’t offer anything useful on that account (not being a psychologist, I could imagine all sorts of stories about brain waves or whatever), but I would like to point out something that usually doesn’t seem to get mentioned in these discussions, which is that lots of people want to believe in ESP. After all, it would be cool to read minds. (It wouldn’t be so cool, maybe, if other people could read your mind and you couldn’t read theirs, but I suspect most people don’t think of it that way.) And ESP seems so plausible, in a wish-fulfilling sort of way. It really feels like if you concentrate really hard, you can read minds, or predict the future, or whatever. Heck, when I play squash I always feel that if I really really try hard, I should be able to win every point. The only thing that stops me from really believing this is that I realize that the same logic holds symmetrically for my opponent. But with ESP, absent a controlled study, it’s easy to see evidence all around you supporting your wishful thinking. (See my quote in bold here.) Recall the experiments reported by Ellen Langer, that people would shake their dice more forcefully when trying to roll high numbers and would roll gently when going for low numbers.

When I was a little kid, it was pretty intuitive to believe that if I really tried, I could fly like Superman. There, of course, there was abundant evidence—many crashes in the backyard—that it wouldn’t work. For something as vague as ESP, that sort of simple test isn’t there. And ESP researchers know this—they use good statistics—but it doesn’t remove the element of wishful thinking. And, as David Weakiem and I have discussed, classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects.

I think it’s naive when people implicitly assume the following dichotomy: either a study’s claims are correct, or that study’s statistical methods are weak. Generally, the smaller the effects you’re studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics.

To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in “legitimate” psychology research. The difference is that when you’re studying a large, robust phenomenon, little statistical errors won’t be so damaging as in a study of a fragile, possibly zero effect.

In some ways, there’s an analogy to the difficulties of using surveys to estimate small proportions, in which case misclassification errors can loom large, as discussed here.

Now to criticize the critics: some so-called Bayesian analysis that I don’t really like

I agree with the critics of the ESP paper that Bayesian analysis is a good way to combine the results of this not-so-exciting new finding that people in the study got 53% correct instead of the expected 50% correct, with the long history of research in this area.

But I wouldn’t use the Bayesian methods that these critics recommend. In particular, I think it’s ludicrous for Wagenmakers et.al. to claim a prior probability of 10^-20 for ESP, and I also think that they’re way off base when they start talking about “Bayesian t-tests” and point null hypotheses. I think a formulation based on measurement-error models would be far more useful. I’m very disturbed by purportedly Bayesian methods that start with meaningless priors which then yield posterior probabilities that, instead of being interpreted quantitatively, have to be converted to made-up categories such as “extreme evidence,” “very strong evidence,” “anecdotal evidence,” and the like. This seems to me to be taking some of the most arbitrary aspects of classical statistics. Perhaps I should call this the “no true Bayesian” phenomenon.

And, if you know me at all (in a professional capacity), you’ll know I hate statements like this:

Another advantage of the Bayesian test that it is consistent: as the number of participants grows large, the probability of discovering the true hypothesis approaches 1.

The “true hypothesis,” huh? I have to go to bed now (no, I’m not going to bed at 9am; I set this blog up to post entries automatically every morning). If you happen to run into an experiment of interest in which psychologists are “discovering a true hypothesis,” (in the statistical sense of a precise model), feel free to wake me up and tell me. It’ll be newsworthy, that’s for sure.

Anyway, the ESP thing is pretty silly and so there are lots of ways of shooting it down. I’m only picking on Wagenmakers et al. because often we’re full of uncertainty about more interesting problems For example, new educational strategies and their effects on different sorts of students. For these sorts of problems, I don’t think that models of null effects, verbal characterizations of Bayes factors, and reassurances about “discovering the true hypothesis” are going to cut it. These methods are important, and I think that, even when criticizing silly studies, we should think carefully about what we’re doing and what our methods are actually purporting to do.

22 thoughts on “That silly ESP paper and some silliness in a rebuttal as well”

Rob McQueen on January 6, 2011 5:56 AM at 5:56 am said:

I agree, this is ludicrous. Instead of performing error-prone experiments and applying complex statistical analysis to reveal a 3% anomaly in the data, why don't they actually look for the supposed brain waves that predict the future?

Einstein didn't invent relativity by "chance" (I can't say the same for Quantum, heh). It was done using careful measurements of the spherical orbs and lots of calculation.

While it would be fun to believe that our brains unconsciously communicated with one another at a hidden frequency, nothing has been observed to prove such a hypothesis. I'll read this paper if it told me that it has found traces of previously unknown electrical signals within the brain that are resonant with all human brains. Now that would be cool!
Manoel Galdino on January 6, 2011 6:04 AM at 6:04 am said:

Excellent.

You said:

"whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in "legitimate" psychology research. The difference is that when you're studying a large, robust phenomenon, little statistical errors won't be so damaging as in a study of a fragile, possibly zero effect"

This connects to another post of yours when you discussed a paper by Shrodt, in which you commented about two conflicted advice: use simple methods and build complex models.

As I see, qut quoted passage may solve this conundrum: When the effects are large, use simple methods. When they are small, use complex methods.
Andrew Clegg on January 6, 2011 7:27 AM at 7:27 am said:

Could you elaborate a little on your preferred way of doing Bayesian significance testing please?
Justin Smith on January 6, 2011 8:49 AM at 8:49 am said:

Great post!

I have personally noticed that the 'psi' model has moved from a "Force" interpretation to an "Information" interpretation over the years.

One of the main candidates, at least as of several years ago, in the latter is Decision Augmentation Theory (DAT), that states, I believe, something like the brain somehow knows when to enter a bitstream.

There are several papers on it I link to (co-authored by the statistician Jessica Utts), and my critical analysis of DAT I wrote a while ago, at:

http://www.statisticool.com/dat.htm

To me, DAT is a great example of statistical sophistication applied towards something that is probably silly.

Justin
Nick Cox on January 6, 2011 9:22 AM at 9:22 am said:

The late and great Martin Gardner repeatedly exposed poor science in this area. His main points were similar. It was not a matter of being sure that there is nothing to discover, but rather that researchers and many others were repeatedly deluding themselves that they had found (evidence for) effects for which there was no other explanation. Widespread fraud and fakery of many kinds does not make the problem any easier….
Patrizio Tressoldi on January 6, 2011 10:03 AM at 10:03 am said:

You wrote: "I'll read this paper if it told me that it has found traces of previously unknown electrical signals within the brain that are resonant with all human brains."

Here are some papers related to your curiosity. If necessary, I can send you more references:

Wackerman, J, Seiter, C, Keibel, Walach, H. Correlations between brain electrical activities of two spatially separated human subjects. Neuroscience Letters 2003, 336, 60-64. 

Wackermann, J. (2004). Dyadic correlations between brain functional states: Present facts and future perspectives. Mind and Matter, 2 (1), 105-122.

Radin D. Event-related electroencephalographic correlations between isolated human subjects. J Altern Complement Med 2004, 10, 315–323.

Richards TL, Kozak, L, Johnson LC, Standish LJ. (2005). Replicable functional magnetic resonance imaging evidence of correlated brain signals between physically and sensory isolated subjects. Journal of Alternative and Complementary Medicine, 11(6), 955–963.
BenSix on January 6, 2011 11:00 AM at 11:00 am said:

On the other hand, Martin Gardner could also stuff up his critiquing of supposed paranormal shenanigans pretty badly. While he was indeed great and as such I've no wish to retrospectively psychoanalyse him, I think it's also true that some people who've invested themselves in a particular view of "<a>skepticism" are inclined to embrace just about any explanation that discludes the otherwise – at least temporarily – unexplainable.
Joseph Wilson on January 6, 2011 11:56 AM at 11:56 am said:

E. T. Jaynes had an interesting Bayesian take on this subject in his famous book. Basically, his argument was that our prior probability for ESP was so low that any typical study which showed statistically significant evidence for ESP would "resurrect a dead hypothesis" such as "H_1: there were errors in collecting the data" or "H_2: the data was faked".

In other words, these statistical studies won't convince a Bayesian with a very low prior probability for ESP since it has the unintended effect of convincing them of the truth of some more reasonable explanation like H_1 or H_2.

All of which means statistical tests won’t prove ESP unless either the effect is huge and obvious, or a causal mechanism is found.
EJ Wagenmakers on January 6, 2011 12:22 PM at 12:22 pm said:

Dear Andrew,

Since it's my paper you discussed I felt the need to respond. Now I know you don't like Bayes factors, but some people do (Jeffreys, Berger, Raftery, O'Hagan, etc.) We could discuss the pros and cons for a while, but I think we will never agree. Experimental psychologists want to test hypotheses: did my experimental manipulation work or did it not work? This requires a Bayes factor approach. Usually, experimental psychologists are not interested in parameter estimation. I know you have argued that null hypotheses are never completely true, but this does not hold for experimental research.

Also, you appear to have misread the section where we use a prior of 10^-20 for ESP: as we indicate quite clearly, this was only meant to illustrate the maxim that extraordinary claims require extraordinary evidence; this prior did not enter our Bayes factor calculations. Also, we have an online appendix, referenced in the paper, where we do a robustness analysis and look at the impact of our prior on effect size.

I am also interested what concrete Bayesian analysis you would propose to answer the question of whether or not the participants show psi.

Cheers,
E.J.
Andrew Gelman on January 6, 2011 1:25 PM at 1:25 pm said:

E. J.:

I appreciate your taking the time to reply. By writing that article you were sticking your neck out, and I think it's important that we can all engage with criticism.

To respond briefly:

– You say that testing hypotheses "requires a Bayes factor approach." Tell that to R. A. Fisher and a zillion other non-Bayesian statisticians! You may be comfortable with a Bayes factor approach but I hardly think it's required!

– I don't think the 10^-20 probability makes sense even as an illustration. Seeing it anywhere in the paper makes me uneasy.

– I like the comment of Joseph (just above yours): This is what I was trying to get at when I talked about measurement error models.

– I agree that the ESP problem is a difficult one to analyze using any methods, my own included. I don't think that any method of analyzing these sorts of data will work unless you explicitly allow for measurement error. Rejecting the null hypothesis of zero correlation is not the same as rejecting the null hypothesis of no ESP. And that holds whether your methods are Bayesian or otherwise.

– As I noted in by blog above, my concern is with Bayes factor methods being used in more important areas such as education research.
Bill Jefferys on January 6, 2011 7:01 PM at 7:01 pm said:

Here's Jim Berger's take on all of this.

http://www.stat.duke.edu/%7Eberger/papers/02-01.p…

Jim discusses the takes that Fisher, Neyman, and Jeffreys had on hypothesis testing. He also points out that "point null testing," in the Bayesian context, need not mean "exact" point nulls (see the reference to his paper with Mohan Delampady).

I think that E. J. was thinking of Bayesian hypothesis testing when he wrote "hypothesis testing." Obviously there is no Bayes factor in the approaches of Fisher or Neyman or many other non-Bayesians. So I think it is a little unfair to complain about E. J.'s comment here. But I may be wrong, and E. J. can clarify.
EJ Wagenmakers on January 6, 2011 11:40 PM at 11:40 pm said:

Hi Bill, hi Andrew,

Yes, I was firmly in Bayesian world when I mentioned hypothesis testing. I really like Jim Berger's work, and also the particular paper you refer to. As I recall, the slight difficulty there is that the non-exactness of the null needs to change with the sample size. Something to do with one-half of a standard error from 0, but I may be wrong.

Peter Gruenwald also has some recent papers in which he discusses some challenges for Bayes factors when the true model is not in the set of candidate models. See for instance http://homepages.cwi.nl/~pdg/presentations/Helsin…

Cheers,
E.J.
Bill Jefferys on January 7, 2011 5:52 AM at 5:52 am said:

E.J.

The link you provided doesn't work. Gotta trim off the period at the end:

http://homepages.cwi.nl/~pdg/presentations/Helsin…
Bill Jefferys on January 7, 2011 5:56 AM at 5:56 am said:

E.J. is correct about the approximation to a point null in Berger and Delampady. If the approximation isn't good because of too much data, you can always replace the exact point null with an appropriately skinny beta (for example, if the data are supposed to be Bernoulli trials).
Jeff Rouder on January 7, 2011 6:28 AM at 6:28 am said:

Perhaps this is too simplistic, but I think it's important to separate truth and Bayes factors. I believe I can learn from comparing models without any of them be true, and, in that vein, use Bayes factor as a comparison metric. If I am judicious and wise, the BF comparison of select models may point me toward the next step in understanding the phenomenon at hand.
Andrew Gelman on January 7, 2011 7:12 AM at 7:12 am said:

Bill:

My problem with these Bayes factors is not the model near zero, so much as the choice of prior distribution for the alternative hypothesis.
Bill Jefferys on January 7, 2011 7:41 AM at 7:41 am said:

Andrew: I agree, the choice of prior for the alternative hypothesis is not easy and can be problematic.

When I analyzed an ESP paper by some Princeton guys (citation of my paper can be found in E.J.'s paper) I looked at a number of priors. Amongst others, I considered a prior that matched what an earlier researcher had claimed to have obtained as an effect size. That one was to me the most defensible of the several I investigated.

[The real point of the paper was to give a real-world example of the Jeffreys-Lindley "paradox."]
YK on January 8, 2011 9:13 PM at 9:13 pm said:

Hi Andrew,

Thanks for the insightful post and discussion. I'm still unclear about what you mean when you say you'd prefer a "measurement error" model approach to this. You wrote that you don't think the "10^-20 [prior] probability makes sense even as an illustration", but you said you liked Joseph's comment regarding E. T. Jaynes.

It sounds to me like the argument Jaynes makes about an extraordinary result like ESP resurrecting a "dead" hypothesis, such as measurement error, rests on the notion of an extremely unlikely prior probability. If you can accept this, then EJ Wagenmakers is incoporating that intuition formally, by using this prior in a Bayes-factor/hypothesis testing framework, which doesn't seem unreasonable if you buy Jaynes's original argument. It's hard (for me) to think of other Bayesian frameworks where this highly unlikely prior can be incorporated, to have a formal correlate of the idea that "ESP is incredibly unlikely".

In any case, would love to hear what are some of your proposed methods for this case, since it comes up a lot. EJ Wagenmakers wrote that "Experimental psychologists want to test hypotheses: did my experimental manipulation work or did it not work?…Usually, experimental psychologists are not interested in parameter estimation." In my view, this statement holds equally well for experimental biologists who try to use statistics for their own experiments or in genomics. They want to do (or think they want to do) the sort of things the ESP paper does. If the Bayes factor approach is not the one you prefer, I'd be very curious to hear about any alternatives — even if you think they don't fully solve the problem, which probably no current models do.

Thanks very much. –YK
Andrew Gelman on January 9, 2011 6:27 AM at 6:27 am said:

YK:

I'll have to think more about problems like the ESP study. Some of my ideas on inference for small effects are in this article and this presentation.
Paul Alper on January 10, 2011 12:25 PM at 12:25 pm said:

But there is another issue regarding Bem’s paper which is outside of the domain of statistics. Why do so many people passionately believe in ESP even though there has never been any credible evidence for it outside of a low p-value? Perhaps the answer lies in a weird perversion of the notion of democratic opinion. If ESP exists then physical laws, the specialty of the scientifically and mathematically educated, no longer hold and everyone has an equal say. Beauty may lie in the eyes of the beholder, but it is incontestable that the speed of light is approximately 299,792,458 meters per second, the harmonic series diverges and the planet on which we reside is considerably older than a few thousand years. Such items are not up for a vote and should not be subject to the ballot box of public estimation.
Bill Jefferys on January 11, 2011 9:04 AM at 9:04 am said:

Hi Paul, long time no see!

Slight clarification, which you know for sure. The speed of light in the SI system is EXACTLY 299,792,456 m/s, by definition. (The meter is defined by this constant).
Bill Jefferys on January 11, 2011 9:05 AM at 9:05 am said:

Oops, typo. 299,792,458 m/s. Fat finger syndrome.

Comments are closed.