A new Bem theory

The other day I was talking with someone who knows Daryl Bem a bit, and he was sharing his thoughts on that notorious ESP paper that was published in a leading journal in the field but then was mocked, shot down, and was repeatedly replicated with no success. My friend said that overall the Bem paper had positive effects in forcing psychologists to think more carefully about what sorts of research results should or should not be published in top journals, the role of replications, and other things.

I expressed agreement and shared my thought that, at some level, I don’t think Bem himself fully believes his ESP effects are real. Why do I say this? Because he seemed oddly content to publish results that were not quite conclusive. He ran a bunch of experiments, looked at the data, and computed some post-hoc p-values in the .01 to .05 range. If he really were confident that the phenomenon was real (that is, that the results would apply to new data), then he could’ve easily run the experiments on a bunch more students, gathering enough data so that nobody could doubt his claims. But Bem didn’t do that. Instead, once he felt he’d reached the statistical significance plateau, he stopped and submitted to the journal. This behavior is consistent with the idea that he did not want to push his claims further, instead wanting to get into print before any new data could reveal problems with his study.

Ironically, I told my friend, Bem’s strategy didn’t work. Yes, the paper was published in a top journal. But, rather than this publication making the result more plausible, the reverse happened: the implausible claims reduced the perceived validity of psychology studies more generally. The journal didn’t establish the truth of the finding; instead, the finding dragged the journal down.

My friend then unleashed an amazing theory: that Bem really really doesn’t believe these ESP claims, that he did this whole project with a straight face to demonstrate problems with our current system of statistical/scientific research and publishing. Never breaking character, Bem will take this secret to his grave.

I don’t know, but my friend is the one who knows Bem, and that’s what he tells me.

28 thoughts on “A new Bem theory

  1. I had hoped that Bem was pulling a Sokel as well, but I’ve knew Bem from my time as a graduate student at Cornell. He was working on studies like these then (more than 20 years ago). He had published a paper on similar studies in Psychological Bulletin years before the JPSP paper. If it is a prank, it is one that has lasted for decades and occupied most of his empirical research program. My sense from talking with him about these studies way back then (and the opinion of my colleagues who know him better than I do — I asked them in the hope the was pranking everyone) is that he truly believes the ESP claims. He doesn’t claim to fully understand them, but he thinks they’re real.

    I agree that the methods and results scream “p-hacked,” that a more powerful demonstration would have used substantially larger samples and would not have stopped when p < 0.05, etc. I wished it had been a hoax that he could then reveal to the betterment of research practices and journal standards. Unfortunately, I don't think it is a hoax. He was just applying the standard criteria for publication used at that time (and still too much now) — was p less than .05. JPSP accepted the paper because it met those standards. For what it's worth, I think his paper unintentionally contributed substantially to the impetus to improve research practices in psychology (e.g., pre-registration as a way to minimize p-hacking, emphasis on replication, etc). It was a powerful illustration of how flawed practices can lead to nonsensical results. Yes, it hurt the reputation of the journal, but it might actually help move the field toward improved methods and standards.

    • “He was just applying the standard criteria for publication used at that time (and still too much now) — was p less than .05. JPSP accepted the paper because it met those standards”

      Some argue that the Journal of Personality and Social Psychology (JPSP) did in fact contradict their own rules of only publishing “groundbreaking” and “new and surprising” results (which are of course valid and logical rules from a scientific perspective, just think about it):

      “The primary failure in publication of Bem (2011) is that JPSP did not follow its evaluation standard – to publish evidence that advances novel theoretical ideas. What is new in the Bem article? Nothing. Bem’s article is a weak replication of a well-established phenomenon ”

      http://www.projectimplicit.net/arina/B2012.pdf

      JPSP did follow their own rules better by not wanting to publish any replications of the Bem article/findings. Why would they need to/ do that of course: from a scientific perspective that would not make any sense, just think about it.

      I wonder if a few rounds of replications of articles in journals like JPSP will provide information as to whether social psychology will come to the conclusion that the majority of its published findings are non-replicable and probably have all been a waste of money, time, and energy.

      If that will be the case, I don’t find it strange at all that journals like JPSP only publish “surprising and new” findings: that’s the only way to keep the publishing-machine (i.c. money-machine) going and keep believing the fairytales!

      • “I wonder if a few rounds of replications of articles in journals like JPSP will provide information as to whether social psychology will come to the conclusion that the majority of its published findings are non-replicable and probably have all been a waste of money, time, and energy. ”

        I think there is a saying that goes something like “psychology is the study of psychology students, by psychology students”. Has anyone ever investigated the percentage of papers in which only psych students were used as participants? You could take an entire year’s worth of articles published in “top tier” journals like PsychScience or JPSP for instance, and just determine the percentage of papers that only used psych students as participants.

        Maybe that could provide some interesting information pertaining to the saying “psychology is the study of psychology students, by psychology students”. Maybe you could even write a nice paper about the results !

        • http://connection.ebscohost.com/c/opinions/51882847/most-people-are-not-weird

          “In this article, the author discusses the need for behavioural scientists to halt doing most of their experiments to understand human psychology. The author states that much research on human psychology and behaviour presumes that everyone shares most fundamental affective and cognitive processes, and that results from one population use across the board. The author says that experimental findings from various disciplines show considerable variation among human populations in various domains.”

          http://www.readcube.com/articles/10.1038/466029a?locale=en

          “So the fact that the majority of studies used WEIRD participants presents a challenge to the understanding of human psychology and behaviour. A 2008 survey of the top psychology journals found that 96% of subjects were from Western industrialized countries, which house just 12% of the world’s population. Strange, then, that research articles routinely assume that their results are broadly representative, rarely adding even a cuationary footnote on how far their findings can be generalized”

        • And to think, I was attacked for suggesting that there would be a problem generalizing from Mechanical Turk participants and UBC students to the general population of women!

    • If there was a long-term interest, perhaps this is slightly reminiscent of the studies at PEAR, although its leader, Robert Jahn was also involved in the Society for Scientific Exploration, and published often in Journal of Scientific Exploration.
      While it covers ESP, UFOs, reincarnation, etc … it has more novel items like dog astrology, too. Most of the articles are on-line.

  2. The sense I got from talking to statistician[s] who worked on meta-analysis of ESP studies was similar to Daniel: they really did seem to believe there is ESP like stuff and its important to pin it down (and they are a bit sensitive to other people being so dismissive.)

    (OK, I won’t suggest that they might have been reading my mind at the time.)

  3. It is sad, I always assumed the Bem paper was a Sokal-type hoax — “this is obviously false, but if you reject this and accept other publications on the basis of identical criteria then you’re accepting/rejecting papers based on outcome which is a breach of the scientific method” — but I suppose prev. commenters know more about Bem and are probably right. It _worked_ like a Sokal hoax at least…

  4. You wrote: ” This behavior is consistent with the idea that he did not want to push his claims further, instead wanting to get into print before any new data could reveal problems with his study. Ironically, I told my friend, Bem’s strategy didn’t work.”

    Well, if his objective were simply top-tier journal publication then he succeeded… and the weakness of the paper may have generated citation counts too….

  5. So Bem’s the Colbert of psych? Great idea, wish I’d thought of it. On the other hand, maybe he’s the Kardashian of psych. How will we know?

    • -So Bem’s the Colbert of psych?

      Of course he is! That’s why he even appeared on Colbert:

      http://www.colbertnation.com/the-colbert-report-videos/372474/january-27-2011/time-traveling-porn—daryl-bem

      Also see this:

      http://www.youtube.com/watch?v=0Tdiu5kwjKs

      and this article of Bem might also be interesting to read with a slightly different viewpoint:

      http://dbem.ws/WritingArticle.pdf

      “Your overriding purpose is to tell the world what you have learned from your study. If your results suggest a compelling
      fram ework for their presentation, adopt it and make the most instructive findings your centerpiece. Think of your dataset
      as a jewel. Your task is to cut and polish it, to select the facets to highlight, and to craft the best setting for it.
      Many experienced authors write the results section first. But before writing anything, Analyze Your Data!”

      Concerning the “new Bem theory”: I will pre-register the hypothesis that Bem could be one of the few real psychological scientists left in today’s academia among the many “manager-types”/fake scientists (you know those “brilliant” highly succesful scientists who always publish “surprising new” stuff in “high impact” journals [and actually think that they are doing science that way], but really don’t know what science is about, nor posess simple basic scientific characteristics and capabilities).
      At a certain point, maybe that’s what you get then: the few true scientists left find the only way they know to try and improve things. Maybe it’s the only real and useful science possible in today’s academia…

      My bet is that there a few more: Sokal is already mentioned. Maybe there is Fiedler with his “the long way from alpha-error control” – article (http://pps.sagepub.com/content/7/6/661.abstract) and maybe even fraudulent scientists like Stapel (“a new Stapel theory”). But don’t tell the “manager-types”, and fake scientists/ journals/ editors/ institutions that, they probably wouldn’t understand… This is just for the few real scientists left…

  6. I absolutely agree with your friend. I find it hard to believe Berm wanted to push belief in ESP as his goal here.

    You are wrong in thinking “Bem’s strategy didn’t work”. I think it worked spectacularly, Andrew & others just mistake what the strategy was here.

  7. Running more subjects would very likely never have found a really compelling p-value over and above that found. Running far fewer subjects might have. But then he would have been dismissed because the N was too small. He hit the sweet spot for what he wanted to accomplish.

    • I don’t agree. There are virtually no exact point null hypotheses, experimental defects are invariably going to render what may have been thought of as a believable exact point null not exact. Jim Berger and Mohan Delampady give the example of “my plants will grow better if I talk to them.” But if you talk to your plants and you are too close, you may breathe excess carbon dioxide onto them that will make them grow better.

      And, if you don’t have a point null hypothesis that is exactly zero, more data will find that it is not exactly zero with smaller and smaller p-values.

      • +1

        people are barking up the wrong alley if they think the problem is solved by erect more stringent barriers to achieving statistical significance.

        As I think andrew has mentioned in the past, even pre-registration doesn’t solve the multiplicity problem because there exists an ensemble of researchers. Statistics doesn’t somehow begins an end with the intent of the individual scientist either.

        • So which is the right alley to bark at, then? That’s the million dollar question. If not statistical significance what?

          Ultimately people need an easily communicated, easily understood metric to understand questions such as “Does ESP exist?”

        • I would say that the right approach is to emphasize study designs which isolate sources of causal variation. Concerns of causal identifiability are far more important than statistical significance. Perhaps wider adoption of JP’s graph nomenclature would be useful in formalizing the discourse.

          With respect to multiplicity issues in low/marginally powered analyses, I would say regularization via shrinkage estimation (bayesian or otherwise) are a more productive route than trying to impose “no-peeking” rules and pretending that multiplicity across researchers doesn’t exist.

          I agree with the sentiment that it’s better to banish this notion that quantitative analysis means that there’s a magic formula for absolute certainty. However, if a discrete policy decision is needed, then it should be based on decision theory, not hypothesis testing.

          With respect to ESP specifically, the solution isn’t entirely statistical, part of the issue is the file-drawer problem that’s been discussed a few times here. If there were a larger number of studies with less of a filter for positive results, a meta-analysis with shrinkage estimates would get you a direct answer.

        • Thanks.

          As an industrial non-statistician who has need to know some statistics I can tell you this much: A lot of decisions are indeed discrete and we are lucky if the decision makers understand some p-values, statistical significance and hypothesis testing.

          But beyond that, it’s really rare to find people who grok stuff like causal identifiability, shrinkage estimation, decision theory etc. Ok, I speak for maybe Engineers or even specifically process engineers so maybe not your target audience.

          But I expect it’d be quite challenging to find ways to communicate these alternatives to descision makers. I’d love to see a policy paper, project report etc. that explains its conclusions on these techniques. So far, I haven’t come across any.

  8. When I was an undergraduate (over 20 years ago) Bem was added as co-author to (something like) the 10th edition of a pretty well used textbook (it was called “Introduction to Psychology”, which doesn’t narrow it down much). That edition mentioned ESP studies, and how it was possible to find ESP effects if you used the correct techniques. The previous edition didn’t.

  9. I don’t know anything about Bem personally, but as a recent social psych PhD his article seemed only slightly more ridiculous to me than others that I’ve seen published in top journals. It could have been written by the infamous Arina K. Bones:
    http://arinabones.com

  10. I was in a small psych seminar with Bem as an undergraduate at Cornell about 10 years ago. This was not the focus of the course, but I do remember him talking up various ESP-like studies and preliminary results that he was working on. Though I don’t think he referred to them as ESP. I would be very surprised if he didn’t to some extent believe in so called “ESP.”

  11. I have know Daryl since 1978, when he became the chairman of my PhD dissertation committee. He had just moved from Stanford to Cornell. He was already a superstar, and I felt very lucky to snag him. He was always a little quirky and I frequently got the impression that he usually left something unspoken after each of our conversations. It felt as if he pointed me in a particular direction, then tacitly challenged me to see what there was to see. I could never be sure whether he already knew what I would find, or whether he was genuinely curious and trusted me to help answer questions that were important to him.

    He was also a professional stage magician, with a twist. As he performed truly world-class illusions, he narrated in a pedagogical way about the lessons we could learn if we paid careful attention to how he was exploiting our psychological vulnerabilities. As he performed an illusion, he would constantly remind us that it was only an illusion, and challenged us to figure out how he did it, often leaving hints for us to consider. Sometimes he would fully explain an illusion, but the audience was nonetheless thrilled as he performed it. This, in direct opposition to the belief common among magicians that revealing the “secret” would take all the fun out of illusions.

    I could go on, but this is just a prelude to my agreement (which I expressed some years ago in a published response to Shermer in Scientific American) with the speculation that at some level, Bem has never really, truly believed in most of this precognition stuff. I have always thought that Bem was just being Bem, having a little fun by compelling us to re-examine the orthodoxy of experimental psychology we in the field had been taught to revere. But it’s more than fun; I think he is attempting to teach us to be better and more careful thinkers and self-critics without actually saying that some of the accepted conventions in experimental social psychology are flawed and can be seriously misleading. If he did that, it would just start a flame war and offend many people whose careers, like his own, were built on the foundation of those flaws.

    If I’m right, then my belief is consistent with my many observations of his humility and dislike for conflict. If I’m right, then he was criticizing some of the foundations of his own success as well those of his peers. I think that takes some courage, because he surely knew that he was putting at risk his legacy, which was monumental and unblemished prior to his foray into psi and related subjects.

    Just one man’s opinions.

    • David:

      So, he was a professional stage magician, huh? They’re the absolute worst. So smug. Just because they can “fool” others with storytelling and sleight of hand, they think they can’t be fooled themselves. Magic tricks are cool, yeah. But I’m soooo sick of the magician shtick: Penn and Teller, the Amazing Randi, Houdini, etc., the whole business about magicians being the ultimate empiricists. They’re actors. They can be smart, savvy, etc., but that has nothing to do with being a magician. I know about 2 magic tricks but that doesn’t get in the way of me realizing that Bem’s published paper on ESP is horrible.

Comments are closed.