Aversive statistical methods explain differences in “dark” publication in PNAS across subject areas

Posted on October 19, 2025 9:58 AM by Andrew

Bill B. points to this recently published article, Aversive societal conditions explain differences in “dark” personality across countries and US states, which begins:

Humans differ in their levels of aversive (“dark”) personality traits such as egoism or psychopathy. Building upon theories suggesting that socioecological factors coshape the development of personality traits, it can be predicted that prior aversive societal conditions (ASC) (herein assessed via corruption, inequality, poverty, and violence) explain individuals’ levels of aversive personality (assessed via the Dark Factor of Personality, the common core underlying all aversive traits). Results considering individuals from 183 countries (N = 1,791,542) and 50 US states (N = 144,576) support the idea that ASC coshape individuals’ levels of aversive personality.

He writes:

I read the stuff on the link and I have to say that this requires a lot more thought and investigation to figure out if the model makes any sense.

That said, I think they got their data from website questionnaires to cross with ASC index matching to responses. Is this a legit way to do things?

This is way beyond my understanding about how to do research and build models. I think they equate the validity on developing a D statistic with that in developing a g statistic.

Don’t get me wrong, reading the abstract and contents felt pretty good, all sounded good, but I don’t have time to go down the rabbit hole of their citations. (Meager content though). But it looks like bullshit to me…

From the paper:

Recent advances in personality research have provided strong evidence for the existence of a single disposition underlying all aversive traits. The Dark Factor of Personality (D) offers a clear conceptualization of this disposition, defined as “the general tendency to maximize one’s individual utility—disregarding, accepting, or malevolently provoking disutility for others—accompanied by beliefs that serve as justifications”. Much like the g factor of intelligence, D represents the “aversive essence of personality,” from which all aversive traits arise as specific manifestations.

They give some references, and I wouldn’t call it “bullshit” exactly, but I’m skeptical. Over the years I’ve dealt with a lot of assholes (I think that’s the colloquial term for high-D people), and asshole behavior seems very context-dependent. People are assholes in some settings and not in others (as discussed in detail here), so it’s hard for me to think of D as a universal trait. I mean, sure, you can measure it–their paper links to a survey form with options from Strongly Disagree to Strongly Agree for questions such as, “It is hard for me to see someone suffering” and “All in all, it is better to be humble and honest than important and dishonest”–but, even setting aside the issue of insincere responses (something that could be an issue with assholes, no?), I just don’t know how much this construct would predict individual attitudes and behavior.

There seems to be some literature on the topic; from the above-linked paper:

Put simply, D is the underlying trait that predisposes individuals to engage in all kinds of aversive behaviors, as dictated by the idea of a reflective construct. This conceptualization has been supported by several findings, including D predicting behavioral outcomes such as dishonesty, selfishness, or outgroup harm, with specific aversive traits typically not yielding any incremental validity beyond D for self-serving behavior at the cost of others.

OK, maybe. I don’t know. I haven’t looked into this; I’ll have to ask some psychometrician colleagues. As I said, I’m skeptical but I’m open to being convinced.

But the point of the paper is not to talk about D; there’s already a literature on it. Rather, their contribution is to correlate the national average D (as collected from an online survey in which “the country in which participants were located was assessed via geo-locating their IP addresses”) with a country-level measure of aversive societal conditions (corruption, inequality, poverty rate, and homicide rate).

So their basic story is:
1. People in poor countries are assholes.
2. It’s because they’re living in bad, low-trust environments.

They also do a statewide analysis in the U.S. Amusingly enough, Louisiana and Mississippi have the highest level of aversive social conditions, but the states with the most assholes per capita are Nevada, New York, and Texas (also South Dakota, but we can’t do much with that, so we’ll chalk that one to small sample size). The nicest people are in Vermont and Utah, which sounds about right. So at least the measure has face validity.

Setting aside any data issues, there’s a big big problem with this paper: it doesn’t have any causal identification at all! They don’t even attempt causal identification. It’s entirely a correlational story–and the correlations are not so high.

I’ll tell you one thing. There’s absolutely zero chance this paper could ever have been published in an econ journal. Say what you want about economists (and I’ll say a lot), they’ll at least try to identify their causal claims. This one is just . . . here are some correlations and we’re gonna tell a causal story. All righty.

I also have some statistical concerns with the analysis. They adjust for age and sex of respondents, but this seems to occur only in robustness checks, not in the main analysis. That doesn’t make sense to me: you’d want to adjust for these things in your primary analysis, no? This is not a big deal given the larger issues of measurement, sampling and causality, but since I’m here, I’ll mention it.

In summary, it all seems plausible to me that there are such correlations between states and countries. But to say the societal conditions “explain” the differences across personalities doesn’t seem right.

Not at all! Not one bit! It’s not even like they did some causal inference and they have some shaky assumptions. They didn’t even try to do causal inference, and their causal story just seems entirely made up. Nor for that matter does it jibe with my own anecdotal experiences with assholes. I can accept that different countries and different states have different “cultures”–for that matter, different groups within a state or country will have different rates of asshole behavior–; the problem is in trying to pin it to this variable that happens to be very weakly correlated at an aggregate level. This ain’t it, chief.

But, it got published in PNAS, so it got some press, including this ludicrous headline from the (London) Independent: These are the states where psychopaths are most likely to live.

That’s what I call bad reporting. There was nothing in the study about psychopaths. It was a survey form, for Christ’s sake.

Oh well, as long as this sort of research gets shoved into the world’s most prestigious scientific journals (along with air rage, ages ending in 9, himmicanes, and other PNAS classics), and as long as this stuff gets uncritical press coverage, we’ll keep seeing more of it.

It’s almost like the culture of certain scientific subfields enables and encourages certain dark characteristics in their research.

Just to be clear, I’m not trying to say that the authors of this paper are bad people or are trying to do bad research. I assume they don’t know any better: they’re living within an environment in which this sort of behavior is celebrated, and so it makes sense that they do more of it.

And we can hardly blame the staff of the Independent for that stupid article and its incredibly inaccurate headline: journalism today is at a zero budget, and this was probably one of 100 articles and headlines that someone was tasked to write in one day, or maybe they had a chatbot do it.

But . . . maybe we could blame the National Academy of Sciences, or whoever reviewed and edited this paper. Somebody should’ve seen the problem, no? I’d love to see the referee reports on this one.

P.S. On the plus side, they make use of my statistical method:

The multilevel Bayesian model with random loadings and intercepts was estimated in Mplus 8.3 (16) using Gibbs sampling with two MCMC chains over a minimum of 5,000 iterations each, using a potential scale reduction criterion of a maximum of 1.01 to indicate convergence.

How cool is that? This little thing I figured out one day sitting at my desk 35 years ago in Murray Hill, New Jersey, and it’s still being used! A little bit of immortality.

11 thoughts on “Aversive statistical methods explain differences in “dark” publication in PNAS across subject areas”

Olaf Zimmermann on October 19, 2025 11:08 AM at 11:08 am said:

“From the paper:

Recent advances in personality research have provided strong evidence for …”

Might as well stop reading then and there. Sorry.

Reply ↓
Dale Lehman on October 19, 2025 11:40 AM at 11:40 am said:

If I were younger (and had more D genes), I’d embark on a career publishing studies like: D measures of politicians, political parties, academic disciplines, ethnic groups (why should the eugenicists have all the fun?), etc. And the fact that their measure appears to correlate negatively with wealth (if that is, in fact, true) suggests to me that it is a bad measure – it seems likely that certain survey responses are heavily influenced by context and situation, with the resulting D measure representing that rather than innate personality characteristics. Questions such as the one mentioned above (It is hard for me to see someone suffering) must surely have a different meaning for someone living in an environment where suffering is in your face and rampant compared with someone living in a gated community (I’m surely overgeneralizing there).

Reply ↓
- Olaf Zimmermann on October 19, 2025 12:03 PM at 12:03 pm said:
  
  There’s always the North Dakota Null Hypothesis Brain Inventory (Buchwald, 1965). Which is just about as valid.
  
  Reply ↓
  - Max Shepsi on October 20, 2025 11:42 AM at 11:42 am said:
    
    I had never heard of this before, but it is hilarious! Thanks for mentioning it.
    
    Reply ↓
Mathias Berggren on October 19, 2025 5:06 PM at 5:06 pm said:

Sigh. “it can be predicted that prior aversive societal conditions […] explain individuals’ levels of aversive personality”. This is a prediction about how individuals are influenced by societal conditions. Psychology is typically interest in such effects within individuals, so it is strange that assessing group differences between countries has become so popular. This will just confound those differences with a bunch of other stuff between countries! Here’s some things I thought about when reading this:
– As Dale writes, answers could be affected by the societal environment, so that one’s answers does not just depend on one’s own D-levels, but also the levels one observes in the society.
– There is a clear skew in the scale – all country-means are below the scale midpoint of 3. Consequently, the “most D”-countries, are the one’s closest to the scale midpoint. If there are invalid responders, who just answer randomly around the scale midpoint (or with the midpoint), say because they do not care about that item, or find it confusing or unrelatable, then countries with more such responders/responses would be closer to the midpoint. It seems quite likely that such responses are more common in countries with more “aversive societal conditions”.
– Building on the last point, previous results in psychology have found less variance in scores in Asian countries, that is, keeping more close to the midpoint, which could reflect a cultural way of answering scales. If I read Fig. 1 correctly, then the “most D”-countries or regions – the one’s closest to the midpoint – are Japan, Hong Kong, (South) Korea, Taiwan, Myanmar, Indonesia, and China.

I also noted the author’s reference (14) and (15) that they use to argue that “even relatively small effects can be cumulative in nature (14, 15), so the relation between ASC and D may have important consequences at scale”. These references have racked up a bunch of citations as authors use them to argue that their seemingly weak effects could be very meaningful – and you should lean towards thinking of them that way! – even if the authors have not studied them in a way that they can demonstrate that they are so. This appear to be the new way to ignore effect sizes (in favor of only looking at p-values). I’m currently writing a critique of this, and previous critiques exist by Anvari et al. (2023), Primbs et al. (2023), and Sauer & Drummond (2020). I’d love to see some discussion around this on this blog as well!

Reply ↓
- Andrew on October 19, 2025 8:21 PM at 8:21 pm said:
  
  Mathias:
  
  Yes on all that. But it was published by the U.S. National Academy of Sciences so it gets instant respect.
  
  What a scam.
  
  Reply ↓
Anon on October 20, 2025 1:59 AM at 1:59 am said:

Sigh…, edited by Susan Fiske. I wish I wasn’t joking, but every time I see a crazy looking paper in PNAS, I look up the editor, and almost ways, she is the editor.

Reply ↓
- Olaf Zimmermann on October 20, 2025 10:34 AM at 10:34 am said:
  
  I’m afraid I have to second this. Gimme Shelley Taylor anytime. (Those were the days, my friend …)
  
  Reply ↓
Oliver C. Schultheiss on October 20, 2025 6:32 AM at 6:32 am said:

Talking about causal inference: This is the reason why I neither do research with such scales nor have much interest in studies like the recent paper in PNAS. In the end, no one really knows through what process a person comes to endorse an item on a dark-personality scale (or most of the other personality trait scales). Is it veridical introspective insight? Then I suggest re-reading Nisbett & Wilson (1977) in conjunction with Gazzaniga’s work or any basic textbook on neuropsychology. Is it some memory retrieval process through which a certain episode is remembered in which one behaved like a narcissist, machiavellist, or psychopath? If so, how representative is that episode for one’s behavior overall? This brings us back to your (Andrew’s) point on the context-dependency of behavior. In addition, our memory simply isn’t a good integrator of remembered behavior over time — that’s not what it was designed to do. So even if I recall 1 salient episode, it may not at all be representative, and an answer to an item like “I like to manipulate people” therefore shouldn’t be taken as a good indicator of one’s actual free-ranging behavior. But here we’re already getting into some finer points of speculating about what drives the scores on such inventories in a causal manner. In the end, there’s too little research into the generation, and too much belief in the meaningfulness, of such scale scores. That’s why I am not a personality psychologist.

Reply ↓
- Lurking Psychologist on October 20, 2025 5:14 PM at 5:14 pm said:
  
  It’s not like self-report is the only way to assess personality. See Funder’s work.
  
  Reply ↓
John N-G on October 20, 2025 10:30 PM at 10:30 pm said:

How about some of us get together, analyze the same data, write a paper, and submit it to PNAS claiming that certain countries have lots of corruption, homocides, etc. because they’re filled with assholes?

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Aversive statistical methods explain differences in “dark” publication in PNAS across subject areas

11 thoughts on “Aversive statistical methods explain differences in “dark” publication in PNAS across subject areas”

Leave a Reply Cancel reply