Skip to content

“Less Wow and More How in Social Psychology”

Fritz Strack sends along this article from 2012 which has an interesting perspective. Strack’s article begins:

But, he continues, things changed in 2011 with the scandals of Diederik Stapel (a career built upon fake data), Daryl Bem (joke science getting published in a real journal), and a seemingly unending series of prominent studies that failed to replicate.

Strack continues:

. . .

I agree with most of the above—I, too, think that preregistration is overrated, that it is better to perform and display all comparisons of potential interest, and that p-values don’t provide much information.

I would change threee things in the above passage. First, if we want a measure of reliability, I think it makes much more sense to look at estimates and standard errors than to look at p-values. P-values are just too damn noisy. Second, I would emphasize that there’s not much of a difference between p=0.01 and p=0.20. Any field of science or decision making that leans too heavily on this distinction is in trouble. Third, I disagree with the emphasis on sample size. As we’ve discussed many times in this space, I think researchers should be focusing more on design and measurement, rather than taking all those factors as given and just playing around with N.

To return to Strack’s article:

. . .


But then I think he goes astray, falling into a common mistake, which is to believe the flawed first study rather than the careful replication.

Here’s Strack:

Let’s set aside the bit about the “fervent anonymous attacks” and focus on Strack’s mistake, which is to just assume that the claims of the 1996 study are correct. Remember the time-reversal heuristic? Pretend the large, careful study with its null finding came first, followed by the small, uncontrolled study from which a statistically significant p-value was extracted. In that case everyone would clearly see the possibility that nothing is going on—or, to put it more carefully, that any effects are context-dependent and unpredictable. In the context of the embodied cognition example discussed above, I agree with Strack’s point that conditions have changed, and there’s no reason to expect a treatment effect to be the same, twenty years later in a different country. The place where I get off the bus is where Strack just assumes that Bargh et al.’s original claim is correct. Given everything I’ve seen, I’d guess that if you were to go back in time to 1996 and try to replicate that study in that same population, it would fail. We can’t know, but we should at least consider this as a possibility, and Strack in the above paragraph isn’t.

Anyway, I agree with a lot of Strack’s message; I just think he was too credulous, at least back in 2012, a bit too willing to believe that various published claims were correct.


  1. Z says:

    “1970ies” and “80ies” is so weird! I’ve never seen decades written that way. Does he also refer to the 2000ands? Makes him very suspect all around in my book

  2. Hernan Bruno says:

    One really needs to read (too much) between the lines to conclude that Strack is assuming Bargh et al. (1996) to be correct. According to the passage you quote, Strack seems to be concerned about the iconoclastic vitriol that authors of past studies receive when new (better!) studies either overturn the key result or simply fail to replicate it. Admittedly, Strack does not refer to Bargh et al. (1996) as incorrect, as it would be referred to in this blog. I also don’t believe the effects of Bargh et al (1996) are as strong as reported. But I somehow believe that that study was performed under the rights standards of the time (the 90s!), i.e., under “scientific sincerity”. And so calling it “incorrect” goes too far.

    I read Strack’s passage as a call for open mindedness when looking at these older studies. Even if these older studies described an effect that now we know does not exist, they might have open the path for discovering other effects that are real (at least according to our current statistical understanding). Or provided us with some further understanding or related phenomena.

    • Z says:

      “that study was performed under the rights standards of the time (the 90s!), i.e., under “scientific sincerity”. And so calling it “incorrect” goes too far.”

      I think sincerity and correctness are orthogonal concepts.

    • psyoskeptic says:

      Hernan, Strack talks about a phenomenon being demonstrated. The demonstration was Bargh et al. Therefore, he’s supporting the Bargh finding. Further, I’m not sure Andrew would characterize Bargh et al. as false, simply that there’s very little support for the effect. The balance of evidence is leaning the other way at this point.

      And while you’re right that Strack is focusing on unwarranted attacks the solution isn’t to defend the finding but to portray a better understanding of science. Bargh could have just had an odd sample. He didn’t need to do anything untoward for this to happen.

  3. Dale Lehman says:

    We recently had discussion on this blog about the facial feedback effect in which Strack was impressed by the replication, but some others (at least me) were not so impressed. In the paper quoted above, it seems to me that Strack is evaluating evidence on the basis of whether or not it comports with the particular theory being posited. ESP is bogus, therefore the research investigating it has hampered social psychology; the “elderly stereotype” is real, therefore the research along these lines is valuable, and therefore “P-values are indicators of reliability.” He appears to put a lot of faith into the particular theories being examined – I guess that’s fair, given that it is his field and not mine. But, I thought the whole idea of empirical work was to gain insight into the validity of those theories, not to evaluate the empirical work on the basis of whether it is consistent with a favored theory.

    To be fair, I don’t disagree with much of what Strack says above. However, it seems that when he uses his approach, we differ tremendously. I still find the facial feedback work suspect and I am not convinced by the evidence. He appears to find the same evidence supportive of the effect. I don’t see how operationalizing the “universal underlying mechanisms” is going to bridge that gap – how do we know what those universal underlying mechanisms are?

    • gec says:

      I tend to agree that there wasn’t much reason to believe the original Bargh effect, but in fairness to Strack it is possible (though I don’t know for sure) that the “universal underlying mechanisms” he refers to are not those of “attitudes towards the elderly” as much as they are mechanisms of priming in general.

      “Priming”, in the broad sense of a recent exposure to a stimulus having a detectable effect on the performance of an ostensibly unrelated task, is not controversial and, though the *mechanisms* of priming across different settings are far from understood, it is at least reasonable to ask the question, “can exposure to elderly concepts affect behavior on some other task?”

      But at that point, though the question is reasonable, it is also so broad that the answer *has* to be “yes”, getting into the realm of Andrew’s “no effect is strictly zero” territory. So to me, that suggests two future routes: One, which Andrew has advocated for, is to switch the question from the presence/absence of an effect to the *size* of the effect and the conditions under which it takes certain size. A different route, though, would be to focus more on the mechanisms of “priming” in general, e.g., the importance of semantic versus perceptual features of the prime, the relationships between the context of exposure and those of the subsequent task, etc.

      So in responding trying to defend one interpretation of Strack’s point I think I actually undermined it, since as you say an appeal to “universal underlying mechanisms” is empty unless those mechanisms are also well understood. Even if “priming” might be considered “universal” we certainly don’t know all the mechanisms that make it happen, whether social or not.

      • Dale Lehman says:

        +1 very well put

      • jim says:

        What I think is interesting here is that people seem to want to revolutionize the universe with every study. In the physical sciences, even small new discoveries often get the living crap beat out of them before they’re accepted as fact. When a new phenomenon is observed, people dive in a) to verify it over and over again; and b) to discover the mechanism.

        So the case Andrew discusses above, there is one study and one verification (or verification failure, as the case turns out). But really there should be several more verification attempts. If it fails several times then ultimately the proponents have to give up. But with only one verification attempt, they can keep claiming something’s not right.

        Also missing? is the effort to uncover the mechanism? I don’t knw the particular research but I’d think after getting a result like priming you’d first replicate then start trying to work out the mechanism, presumably first by carefully working out the scope of the effect with repeated experiments broadening the time span over which priming remains effective, group composition, etc etc..

    • Martha (Smith) says:

      Dale said,
      “I don’t see how operationalizing the “universal underlying mechanisms” is going to bridge that gap – how do we know what those universal underlying mechanisms are?”

      Or even if there are any universal underlying mechanisms?

  4. Hans says:

    Social psychology’s problem is that the field can’t develop strong, testable theory. It has vague verbal models but you can’t get from there to a testable hypothesis without a large number of hard-to-test auxiliary hypotheses. Of course, all sciences rely on some auxiliary hypotheses for testing theories but social psychology theories normally do not seem to be falsifiable in practice.

    The most important thing the replication debate has shown us is that social psychologists normally can’t agree on whether an experimental procedure has been replicated. In a proper science that can happen from time to time but when you almost never see agreement about what procedures are valid there must be a serious problem with how theory is constructed.

    On top of this much of social psychology is “empirical science” about questions that are not really empirical, as Jan Smedslund has often pointed out. Much of social psychology research is based on confusion between causes and reasons of behavior.

    I’m not saying that social psychology would be completely hopeless if you got rid of the sloppiness, the conceptual confusion, pseudo-measures and bad statistics, but I’m not really sure there would be anything left.

  5. Gelman’s central criticism of Strack is right on target. Strack assumes there’s a universal underlying mechanism (due to this type of priming) , and that’s what is open to question. To test it, he should test an instance of this general mechanism that he deems sufficiently timely to be operative.

    However, I’m surprised to hear Gelman say “there’s not much of a difference between p=0.01 and p=0.20.” Within the same test, that would be a difference between less than 1 SE (~.85SE) and 2.5 SE (in 1-sided Normal testing). In a 2013 paper with Robert, (p. 72) they say:

    “[C]onsider what would happen if we routinely interpreted one-sided P values as
    posterior probabilities. In that case, an experimental result that is 1 standard error
    from zero – that is, exactly what one might expect from chance alone – would imply an
    83% posterior probability that the true effect in the population has the same direction as
    the observed pattern in the data at hand. It does not make sense to me to claim 83%
    certainty – 5 to 1 odds [to H1]”.

    This is on p. 256 of my book SIST. Putting aside the point being made (about using a matching prior), they’re clearly dismissing a 1 SE difference as “exactly what one might expect from chance alone”. They are right to say this, and a .8 difference is even more frequent under chance alone. I assume Gelman would not say the same thing about a 2.5 SE difference.

    The reason to use a standardized statistic like z, as I’ve always understood it, is to make distinctions that remain vague just looking at whether differences seem sort of big, pretty similar etc.

  6. Fritz Strack says:

    My reasoning is based on the assumption that there exists no direct way from data to truth. Instead, data (including statistical parameters) are arguments in a persuasive communication (or, critical discussion). To draw conceptual inferences, the proposed interpretation must be a special case of a “universal underlying mechanism” whose validity has been demonstrated in many other domains. This is the case for the Bargh et al. study, but not for Bem’s research. Based on numerous findings in the domain of cognitive and social psychology, I believe that the mechanisms of priming deserve a universal quantifier and can be invoked to justify the original authors’ inferences. But I hasten to add that this is my personal conviction on the basis of my knowledge of the literature.

    • gec says:

      Thank you for dropping in and helping to correct my misreading above.

      I certainly agree that priming is ubiquitous enough to call “universal”, and I share your conviction that there are indeed universal underlying mechanisms at work, even if we don’t know what they are.

      So would it be fair to say that what you are calling for is more work on those mechanisms, rather than focusing on “effects”? For example, most of the “wow” in social psych over the last couple decades has been showing that people’s behavior can be affected by exposure to ostensibly irrelevant stimuli, but very little of this work has shed light on the “how” of priming, that is, the psychological mechanisms by which this exposure is turned into observable behavior.

      And as you say, priming mechanisms have been extensively studied in various branches of cognitive psychology that are not even directly tied to social psych directly (e.g., attention, memory, psycholinguistics), so that provides an independently-supported foundation on which Bargh can explain his results. This is in contrast to Bem who has no mechanistic ground to stand on.

Leave a Reply to Fritz Strack