The Psychological Science stereotype paradox

Posted on October 19, 2016 9:21 AM by Andrew

Lee Jussim, Jarret Crawford, and Rachel Rubinstein just published a paper in Psychological Science that begins,

Are stereotypes accurate or inaccurate? We summarize evidence that stereotype accuracy is one of the largest and most replicable findings in social psychology. We address controversies in this literature, including the long-standing and continuing but unjustified emphasis on stereotype inaccuracy . . .

I haven’t read the paper in detail but I imagine that a claim that stereotypes are accurate will depend strongly on the definition of “accuracy.”

But what I really want to talk about is this paradox:

My stereotype about a Psychological Science article is that it is an exercise in noise mining, followed by hype. But this Psychological Science paper says that stereotypes are accurate. So if the article is true, then my stereotype is accurate, and the article is just hype, in which case stereotypes are not accurate, in which case the paper might actually be correct, in which case stereotypes might actually be accurate . . . now I’m getting dizzy!

P.S. Jussim has a long and interesting discussion in the comments. I should perhaps clarify that my above claim of a “paradox” was a joke! I understand about variability.

30 thoughts on “The Psychological Science stereotype paradox”

aby on October 19, 2016 10:02 AM at 10:02 am said:

The second point is related to the first. If a stereotype means “All X are Y,” then the paradox exists. If a stereotype means “X are Y more often than other groups” then there is no paradox. Obviously, the paper must use a definition closer to the latter.

Reply ↓
MANOEL GALDINO on October 19, 2016 11:03 AM at 11:03 am said:

One of the definitions of accuracy in the paper is correlation bigger than .4 between believes and objective measurements. I’m no expert on the matter, but it seems a pretty low standard to me.

Reply ↓
- Martha (Smith) on October 19, 2016 10:54 PM at 10:54 pm said:
  
  Seems pretty low to me, particularly in light of one of my favorite websites: https://tylervigen.com/spurious-correlations (See link at bottom of page to try your hand at discovering new correlations!)
  
  Reply ↓
  - Diana Senechal on October 20, 2016 7:15 AM at 7:15 am said:
    
    That’s great. Thank you for the link. I see that money spent on movie theater admissions correlates strongly with Apple iPhone sales.
    
    Of course one can explain this. Those “turn off your phone” announcements serve as subliminal advertisements. People probably go to their Apple stores right after the movies, so that next time they will have something to turn off.
    
    In addition, the more expensive the movie ticket, the greater the embarrassment if you don’t have a phone to turn off (the rich people around you are all turning off theirs), and the greater your desire to get one right away.
    
    Reply ↓
    - Keith O'Rourke on October 20, 2016 7:30 AM at 7:30 am said:
      
      Manoel, Marth, Diana:
      
      IF you have not, you may wish to read up on the method Lee referred to contextualize to correlation coefficient as an effect size – Rosenthal, R. & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect, Journal of Educational Psychology 74, 166–169.
      
      (The second author seems to have a good grasp of statistical issues.)
    - Carol on October 20, 2016 11:57 AM at 11:57 am said:
      
      Hi Keith:
      
      “The second author seems to have a good grasp of statistical issues.” I should hope so. Donald Rubin has a PhD in statistics from Harvard.
    - Sam Gross on October 20, 2016 4:02 PM at 4:02 pm said:
      
      He is also a professor of Statistics at Harvard, which is probably a more significant credential.
    - Andrew on October 20, 2016 4:12 PM at 4:12 pm said:
      
      Also a coauthor of BDA, which is an even more relevant credential!
    - Martha (Smith) on October 20, 2016 6:09 PM at 6:09 pm said:
      
      From a quick skimming of the Rosenthal and Rubin paper, it seems to be saying that a correlation coefficient can be a reasonable effect size for evaluating a treatment. However, this is not what the Jussim et al paper is studying — i.e., I don’t see how R and R’s argument applies to Jussim et al.
Lee Jussim on October 19, 2016 11:57 AM at 11:57 am said:

I. The Accuracy of Stereotypes is an Empirical Statement Not Declaration of Absolute
Our conclusion that stereotypes are mostly accurate is an empirical statement, based on what has been found in the literature. It is not a declaration of an absolute. All stereotypes are definitely not perfectly accurate, and no one has ever made that claim. One cannot go from our review to a knee-jerk assumption that any belief they hold about any group is therefore accurate. One always needs the data.

So it is an empirical question whether any particular stereotype is accurate, and, just because most of the lit so far has found that the stereotypes that have been studied have been mostly pretty accurate, is not equivalent to, “therefore, you can assume any belief or concept or category or generalization that you make is therefore perfectly, or even moderately, accurate.” You need the damn data.

II. How Hypocritical Do You Want to Get?
Yes, one standard we use is a correlation of .4 between belief and criteria. Is this a “low” standard? It is a “high” standard for psychological research. Vanishingly few studies in social psych produce effect sizes higher than r=.40. Second, by longstanding conventions, when one is not sure, Cohen’s longstanding recommendation is to consider an effect large when it exceed r=.40. Last, via Rosenthal’s binomial effect size display, it means people are right more than 2/3 of the time. Bottom line: Our standard for “accurate” holds people to very same standard psych scientists hold themselves to. Given that laypeople do not have the expertise or training of scientists, that seems like quite a high standard to me.

Which leaves anyone claiming “hey, .4 ain’t so hot” in the following position:
1. You can conclude .4 ain’t so hot, and then, yes, your view of the stereo acc lit would be different than mine; you would also have to conclude that, even if you assume a perfect scientific literature, with no methods or statistics problems whatsoever, it is also not very accurate.
OR
2. You can conclude that .4 is actually pretty damn good, and you would be consistent to conclude that a problem-free lit with effect sizes of .4 is doing a damn good job predicting whatever outcome it is predicting AND that people’s stereotypes are quite accurate.

The one thing you cannot do, is declare “hey, stereotypes where r>.40 are not so hot, but the psych lit? That’s great stuff.” At least not if you wish to be logically coherent (and logical coherence is a minimum condition for considering any claim scientific — as opposed, to, say, religious or political).

3. So few stereotypes** are of the form “All of THEM are X” that is not worth further discussion. It is an irrelevant straw argument, a caricaturization, rather than characterization, of people’s beliefs about groups.

** “So few”: As in “none have ever been found in any empirical study of stereotypes.” Now, maybe they do exist out there. I challenge any reader of this blog to identify a single empirical study identifying a single person who beliefs All of Them are X. Until you do, it is a claim or assumption not worth taking seriously. I will change my claim that this has never been found, when someone can produce the damn data.

IV. Andrew’s “Paradox.”
First, there is no paradox. Let me count the ways. 1. This article was a review, not a Psych Science piece. I suppose you could argue The Entire Literature is Garbage, it is Not Just Psych Science, and Nothing At All Can Be Believed, but I suspect that is just you Methodological Terrorist** types going way over the top. Or, at least, I do not see it in that extreme a manner, and for some reasons, you can go here:
https://www.psychologytoday.com/blog/rabble-rouser/201602/are-most-published-social-psychology-findings-false
** Most of my most interesting colleagues are methodological terrorists…

2. No one knows how many published articles are true versus false. Thus, no “stereotype” about the articles is currently falsifiable. Not the claim that they are all hooey; not the Gilbert-esque claim that everything is fine. Without a criterion, it is impossible to assess the validity of Andrew’s strong claim. Claims about particular articles may be falsifiable, but not about the population of articles.

3. So, taking Andrew at his literal word, “most articles are noise mining” this leaves the following possibilities, none of which involve any paradox at all:

i. His stereotype is mostly right, and it even applies to the underlying stereotype lit we reviewed. I doubt it, but that is possible. Show me the evidence. In fact, in the paper Andrew referred to, we had to alter our conclusions somewhat comnpared to those we reached a few years ago, because, over the last few years, there has been an explosion of research showing national character stereotypes are usually inaccurate; and that political stereotypes are both accurate (high correlations) and biased (partisans exaggerate the other sides’ views). Show me the data…
ii. His stereotype is mostly right, and the stereotype accuracy literature is an exception.
iii. His stereotype is mostly wrong; though there are more problems than we all once believed, declaring “most” of the lit to be noise is incorrect.

IDK which is true, but there is no paradox.

—-

And here is more stuff for you edification and amusement. It is actually the story of the roots of my own interest in scientific integrity issues. (Andrew and I have butted heads over this term before; I use “scientific integrity” in its common second meaning — to refer to a set of conclusions that are robust, intact, and not about to collapse around our ears, ala a building with structural integrity is not about to collapse around our ears).

V. The Black Hole at the Bottom of Most Declarations that “Stereotypes are Inaccurate”
Despite the large lit showing stereotypes are mostly accurate, broad reviews and books largely ignore it — like, even if one disagrees with my conclusions, scientists should not be in the busines of simply *ignoring* the data. Don’t believe it? Fine, refute it. But you cannot call yourself a scientist and just ignore it, as if we can declare “stereotypes are inaccurate” on the basis of ZERO evidence demonstrating that, but, when the 50 or so studies addressing the issue overwhelmingly provide evidence of accuracy, what do we (mostly) do? Ignore it as if just never happened.

Which reminds of me of Winston Churchill, on Stanley Baldwin, the main architect of appeasement:
“He occassionally stumbled on the truth, but hastily picked himself up and hurried on as if nothing had happened.”

When researchers in papers declare or define stereotypes to be inaccurate” they almost always do either of two things:
a. Cite nothing at all. AKA as making crap up.
b. Cite some article. If one reads that article, one will find that *that article* provides no evidence of stereotype accuracy.
This is incredibly easy to discover for yourself — just find any paper that declares stereotypes to be inaccurate (lots in our review paper).

We have taken to referring to this bizarre phenomenon of scientists making strong claims with zero evidence The Black Hole at the Bottom of Most Declarations of Stereotype Inaccuracy

VI. The Value of Skepticism for Producing Robust Science
Stereotype accuracy is an exquisite example of the value of skepticism in producing a robust science. Do you think your overwhelmingly leftist/egalitarian/feminist/social justice colleagues WANTED to find that stereotypes were accurate? There is indirect but obvious evidence that this is not the case (such as the fact of the ongoing denial of the evidence (see point 2 above).

So all the usual incentives work in reverse for stereotype accuracy, because most of our field DOES NOT want to find such evidence. There is little or no sociopathological questing for p<.05. There is a desperate search for bias and error.

We recently performed a pcurve on the stereotype accuracy literature, and, because the effect sizes were so high, the pcurve was a think of beauty, massively right skewed.

Even better, and this should not have been but was surprising even to me. We performed Schimmack's incredibility index on the stereotype accuracy studies. Given how high large the effect size is, the average power (as per pcurve) of the studies was around 99%. (As I said, all the usual incentives reverse for this topic). Consequently, there should only have been about 1% of the literature finding nonsignificant correlations between stereotypes and criteria. There were far more than 1%, so much so that a binomial produced a pvalue of near zero, reflecting the fact that there are far more studies in the published lit failing to find stereotype accuracy than there should be.

Our conclusion stands in the face of a research environment biased against it. Which kinda reminds me of the overwhelming evidence for hereditary bases of intelligence and the predictive validity of standardized tests. And the great message for the science reform movement is that, crybyullying accusations that the reformers are harming the field notwithstanding, skepticism produces robust science. Stereotype accuracy is a prime example.

Lee

Reply ↓
- Sam Gross on October 19, 2016 1:16 PM at 1:16 pm said:
  
  Hi Lee,
  
  Thanks for the comments. If you came here on a google alert you might not be aware that Andrew frequently writes about a lack of good methodological practices in social pysch. While I don’t want to put words in his mouth (his own are all of the site), I doubt he views .4 correlation as all that indicative of anything. He might be more interested in the posterior distribution of the correlation coefficient.
  
  The point is, this post wasn’t really about your article. It was just a joke based off of abstract.
  
  Reply ↓
- mark on October 19, 2016 2:40 PM at 2:40 pm said:
  
  Lee,
  Thanks for the detailed response. I agree that an r>.40 is practically very meaningful and stronger than most of the effects that we observe in psychology. I am probably being dense but how are these correlations computed? On page 492 you outline the three step process but it is unclear to me how the criterion can exhibit any variance. Supposedly the criterion (e.g., the proportion of Asian Americans who complete college) is a constant. Is it the case, that participants make judgments about a whole host of characteristics about a particular group and these judgements are then compared to the criteria? If this is the case, does each participant have a correlation or are the judgments of all participants averaged? If the latter is the case then the nested nature of the data can produce artificial correlations.
  
  Reply ↓
  - mark on October 19, 2016 3:10 PM at 3:10 pm said:
    
    I see from Ryan (2002) that these correlations reported in the article are likely to be means of within-subject correlations. I wonder what the distribution of the within-subject correlations look like.
    
    Reply ↓
- Vilgot Huhn on October 19, 2016 2:49 PM at 2:49 pm said:
  
  Hi Lee,
  I’ve seen some posts you’ve written on psychologytoday, and I think some other places. I haven’t had time to dig into the stereotype accuracy literature, but I find it very intriguing and hope to do so some time in the future.
  
  However, if you don’t mind me asking:
  I’m never able to wrap my mind around belief-criterion correlation as a measure of stereotype accuracy. Everything I’ve happened to read about this has been of the pop-article/review variety, where no single instance of “accurate stereotypes” are thoroughly explained.
  If I were to imagine a stereotype accuracy study I would think it’d look something like this: We have a stereotype, say women are more emotional than men, we operationalize it and we measure it and then we look how much more emotional than men women are. And then we get a cohen’s d/standard deviation/some other dimensional measure. But in all articles I’ve seen you’ve been talking about correlations? What’s up with that? Are you converting ds to rs for some reason or is this belief-criterion-correlation something completely different?
  
  Kind regards,
  Vilgot
  
  Reply ↓
  - mark on October 19, 2016 3:13 PM at 3:13 pm said:
    
    I think what they are doing is that they assess a whole series of beliefs about women and then, for each participant, compute a correlation between the perceptions and the correct values for these beliefs. What is reported in the article would then appear to be the average of all these correlations across participants.
    If I understood this correctly then this strikes me as odd for two reasons: 1) stereotypes about a group often revolve around one or more particular characteristics rather than a long list of them and 2) it supposedly ignores the large variability between participants in how well their beliefs correspond to reality.
    
    Reply ↓
    - Vilgot Huhn on October 20, 2016 12:35 PM at 12:35 pm said:
      
      Thank you Mark!
    - isopar on September 2, 2017 2:29 PM at 2:29 pm said:
      
      Jussim is looking at the criteria actually used in the “stereotype” studies, so, technically, the inadequacy of the criteria is not _his_ problem
Bob on October 19, 2016 12:30 PM at 12:30 pm said:

“But what I really want to talk about . . .”

That reminds me. I was thinking of going to the barber today.

Bob

Reply ↓
veblen on October 19, 2016 12:40 PM at 12:40 pm said:

Andrew,

The article writes “stereotype accuracy refers to the extent to which beliefs about groups correspond to those groups’ characteristics”. The key term here is “extent to which”. Your seeming paradox stems from a deterministic view, one of hasty generalization. Still, it’s entertaining.

Not to put a downer on things, but I like to think that your stereotype about Psych Science articles is that _most_ (but not all) of them are exercises in noise mining, followed by hype, and I agree that the characteristics of most articles probably correspond to that stereotype (you are my favorite statistician after all). So if the article is true, you’d have to decide if it belongs to “most” articles in Psych Science or not.

In one sense, your behavior is perfectly rational in the context of the article’s argument. You use your Psych Science stereotype to judge this article: 1. when you lack information about the article’s characteristics (you haven’t read it in detail). 2. your Psych Science stereotype is pretty informative. Don’t blame ya; I share and use your stereotype. Who wants to spend their time mucking around in dirt to find a gem? I look at the table of contents and most times, go “huh?!” That said, gems do exist, but it’s less fun to say “Wow, shiny!” than “Eww, yuck!”.

But hey, who doesn’t like paradoxes? I know I do!

Oh, and Bob, can you ask the barber if he shaves himself?

Reply ↓
- gdanning on October 19, 2016 1:31 PM at 1:31 pm said:
  
  I am more interested in knowing whether the barber ever completes a shave, given that before he completes one, he must complete half a shave, etc, etc.
  
  Reply ↓
  - Diana Senechal on October 19, 2016 1:55 PM at 1:55 pm said:
    
    Yes, and how many hairs must a beard let fall, before it stops being a beard?
    
    (From “Paradox,” the nonexistent followup to “Imagine”)
    
    Reply ↓
    - jrc on October 19, 2016 3:51 PM at 3:51 pm said:
      
      All I know is that if the bearded guy and the anti-bearded guy both come out of the barber shop, one of them is doomed.
      
      https://theinfosphere.org/Time-paradox_duplicate
- Andrew on October 19, 2016 4:04 PM at 4:04 pm said:
  
  We’ve discussed Russell’s paradox on this blog before.
  
  Reply ↓
zbicyclist on October 19, 2016 12:53 PM at 12:53 pm said:

” a claim that stereotypes are accurate will depend strongly on the definition of “accuracy.” ”

Pretty much true of all of social science. Without operational definitions, we are just wandering in the wilderness.

The definition here is r>.4, so we’re talking about tendencies. If we take a stereotype such as “Scandinavian women are beautiful”, we don’t expect it to be universally true, we just expect it to be generally true, or true more often than for other women. Which, of course, it is.

Disclosure statement: My wife is 3/4 Scandinavian ancestry.

Reply ↓
Josh Warren on October 19, 2016 7:48 PM at 7:48 pm said:

Sounds almost…Godelian.

Reply ↓
Martha (Smith) on October 20, 2016 12:45 AM at 12:45 am said:

Stereotypes brings to mind bias, which brings to mind the following two articles I came across independently today that point out the problem of making predictions from data that may be (in this case, seems to be) biased:

https://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2016.00960.x/full

https://www.sciencemag.org/news/2016/09/can-predictive-policing-prevent-crime-it-happens

Reply ↓
psyoskeptic on October 20, 2016 1:04 AM at 1:04 am said:

The first issue that irked me in the paper was that base rates aren’t discussed. The definition of an accurate stereotype seems simplified to the point that it’s useless. When you do a survey and see that the rating of a personality trait for one sex is similar to the measured personality trait for that sex is it really a stereotype if the other sex also shows a similar effect?

Reply ↓
Jeff Walker on October 20, 2016 8:11 AM at 8:11 am said:

wow. and I thought this post was just a really clever and funny version of the “all Cretans are liars” paradox.

Reply ↓
ralmond on October 20, 2016 2:27 PM at 2:27 pm said:

Dizziness when presented with paradox can be a serious problem. The solution is, of course, to use your power pose before attempting to resolve it. I recommend these highly effective Ninja Power Poses:
https://www.animenewsnetwork.com/interest/2016-10-18/beat-test-stress-by-channeling-your-ninja-power-poses/.107799

Reply ↓
- Martha (Smith) on October 20, 2016 5:39 PM at 5:39 pm said:
  
  Curiosity got the best of me: n = 15, no comparison group.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

The Psychological Science stereotype paradox

30 thoughts on “The Psychological Science stereotype paradox”

Leave a Reply Cancel reply