Controversy about average personality differences between men and women

Blogger Echidne pointed me to a recent article, “The Distance Between Mars and Venus: Measuring Global Sex Differences in Personality,” by Marco Del Giudice, Tom Booth, and Paul Irwing, who find:

Sex differences in personality are believed to be comparatively small. However, research in this area has suffered from significant methodological limitations. We advance a set of guidelines for overcoming those limitations: (a) measure personality with a higher resolution than that afforded by the Big Five; (b) estimate sex differences on latent factors; and (c) assess global sex differences with multivariate effect sizes. . . . We found a global effect size D = 2.71, corresponding to an overlap of only 10% between the male and female distributions. Even excluding the factor showing the largest univariate ES [effect size], the global effect size was D = 1.71 (24% overlap).

Echidne quotes a news article in which one of the study’s authors going overboard:

“Psychologically, men and women are almost a different species,” said study researcher Paul Irwing, of the University of Manchester, in the United Kingdom.
The new findings may explain why some careers are dominated by men (such as engineering) and others by women (such as psychological sciences), Irwing said.
“People self-select in terms of their personality… and what they think is going to be suitable in terms of the fit,” for their career, Irwing said.

It’s too bad that men and women are almost a different species. It would be so convenient if they could mate with each other and produce fertile offspring! Oh well. In any case, Irwing himself is, like Sigmund Freud, that rare oddity, the male psychologist. He must be in that elusive 10% area of overlap!

Where I’m coming from

As the above mockery (along with occasional blog entries such as this and this) should make clear, I don’t have a lot of patience for this sort of boys-do-this, girls-do-that flavor of schoolyard evolutionary biology. (Although I hate it even more when people just make stuff up and then give their non-facts a pseudo-populist political spin.) In this particular case, any observed differences between men and women can be explained in any number of ways so I see the connection to evolution as being pretty weak.

That said, based on my glance at the Del Giudice et al. paper, their analysis seems like a good idea to me, if you separate out the politics and the evolutionary speculation and treat the analysis as entirely descriptive. (Descriptive statistics is just fine, remember?) If you pick the dimensions in which men and women differ the most, you can find a large separation. In my paper with Delia Baldassarri, we did something similar (but simpler) with political attitudes and it was hard to find much, but I suspect that personality profiles are more detailed and have more repeatability than political questionnaires (at least for the general population). It makes a lot of sense to look at differences in many dimensions in addition to studying distributions for single measurements or traits.

And the results don’t seem obviously wrong to me. From my subjective judgment, there certainly appear to be some traits and behaviors for which men are much different from women, and of course there are big statistical differences in crime rates and in some opinion items. Once you interpret these findings descriptively, I can well believe that it is possible to choose a set of items for which the difference between the average man and the average woman is much larger than the average difference between two people within either sex.

What do the numbers mean?

OK, so what about that idea that the distributions of men and women overlap by only 10%? Echinde found this response from Janet Hyde, a researcher whose work was criticized in the above-linked paper. Hyde writes:

The main innovation in the Del Giudice paper is to introduce the use of Mahalanobis D to the measurement of the magnitude of gender differences. A staple of multivariate statistics for decades, D in this application measures the distance between 2 centroids in multivariate space . . . computed by taking the linear combination of the original variables that maximizes the difference between groups. What they have shown is that, if one takes a large enough set of personality measures and then takes a linear combination to maximize gender differences, one can get a pretty big gender difference. . . .

Another important point to note is that Del Giudice and colleagues’ methods rely on subjective self-reports of personality. . . . As an example, Feingold’s meta-analysis found gender differences in anxiety ranging in magnitude between d = -.15 and -.32. That is, the differences were small, with females scoring higher. . . . In a meta-analysis of research on gender differences in temperament – some of it based on parent or other adult report, some of it based on behavioral measures – the effect size for the gender difference in fear was d = -0.12, i.e., a smaller difference . . . Moreover, a behavioral study measuring children’s distress to the insertion of an intravenous needle showed no significant gender difference. That is, the boys were as anxious and fearful as the girls. Too much of the research on gender differences has relied on subjective self-reports, when objective, behavioral measures may show much different results. . . .

As Hyde says, the mathematics of multivariate distributions are such that if you have two different distributions, as you go into high dimensions the overlap goes down.

Here’s the story in statistical notation. Consider the following simple example in d dimensions with two distributions, x (the continuous personality traits for the population of women) and y (for men). Assume x and y have different centers but the same scale, and let each distribution have the identity covariance matrix. (Actual distributions would have correlations; I’m just assuming the identity for simplicity.) As for the centers, assume that the mean of x and the mean of y differ by some small amount “a” in each dimension.

What, then, is the overlap between the two distributions, as defined along the axis that separates them? The distributions live in d dimensions and are separated by “a” in each dimension, thus the distance between their centers is a*sqrt(d)—think of the distance between opposite vertices of a d-dimensional cube. Along this dimension, x and y still separately each have variance 1. I’m not quite sure how they’re defining “overlap,” but for fixed a, if you let d get bigger and bigger, the distance can get as big as you want. For example, suppose a=.2 and d=30. Then a*sqrt(d)=1.1. Or if a=.5 and d=15, you get 1.9.

I guess what I’m saying here is that I don’t find it easy to interpret the published values of 2.7 and 1.7 (or, as the authors quaintly put it, 2.71 and 1.71). I really want to see are some graphs. I’d like to get a sense of what these distributions are, and who are the people in the overlap.

And I’d like to see them replicate the method comparing other groups. They’ve done men/women. They could do young/old, Northerners/Southerners, hi/lo education, Democrats/Republicans, parents/non-parents, etc. These calculations would provide a calibration, a basis of comparison. I’m very supportive of work connecting personality to political attitudes, and if I think there can be big differences betweeen Democrats and Republicans, then it certainly seems plausible that there are big differences between men and women. If, as I expect, the average differences between men and women are much larger than the differences between those other groups, this would strengthen the published findings.

Finally, Echinde asks:

1. Would the same overall findings inevitably be produced by different researchers using the same data and the same method and would it matter if they tried to maximize or minimize gender differences?

2. When the researchers say that only 18% of men and women have the same personalities, what do they mean? That all values on all dimensions are the same or roughly so? And if we move away from this 18%, how large are the differences?

My reply:

1. Yes, I assume that if others tried to replicate, they’d get similar results (with various small differences involving choices of data and model). The nature of this method is that it maximizes differences. As noted by Hyde, the more dimensions you have, the more ways you can find differences.

2. I’m not quite sure how they define overlap. I’m picturing two univariate distributions—little bell-shaped curves—with some area of overlap that is shaded. If the distributions are identical, the overlap is 100%, if they are far apart, the overlap is near 0. Again, these low numbers such as 18% make sense to me if you look at enough dimensions, but I don’t really have much intuition on the survey responses, the statistical models used to estimate underlying traits, or the summary of the multidimensional comparison.

25 thoughts on “Controversy about average personality differences between men and women

  1. My biggest problem with these studies is the mistaken causal attribution that gender for the observed differences. Even if they do a reasonable job measuring the quantity of interest, they are never measuring the direct effect of gender. Yet, they always seem to imply to the popular press that the two are the same.

    Every scientists knows how to recite that correlation != causation, but when it comes to their own research, it often seems to go out the window.

    • Gender is more or less randomly assigned at birth, hence the case that gender differences are causal is a pretty good one. The real questions are (i) what are the mechanisms that account for differences and (ii) what about external validity.

      • Suppose I want to measure the effect of drug X, which actually has no effect. I randomly assign patients to X vs. (another) placebo. However, I’m not blinded to which patients are on X and I give those patients drug Y, which actually has an effect.

        X may have an effect in the counterfactual sense, but in terms of a policy intervention this would lead us to to the wrong conclusion because X only has an effect through our decision to give those patients drug Y. So it’s fine to do this if you are comprehensive about the channels through which the effect acts, but this relies on the dubious assumption that you can measure every single intervening consequence of gender, which is pretty much intractable. Also, the way they advertise this (and the way the popular press tends to interpret these studies) implies a direct effect of gender, which is not the case.

        • Yup, X has an effect in the counterfactual sense, which was my point. Given that you – correctly – point out that it’s practically impossible to track all the channels through which gender may operate, you seem prett certain that biology plays no role (if I interpret your comments correctly).

        • It bugs me when they say men and women are “almost a different species”, that implies that these differences are intrinsic to their sex. Also (again this isn’t my area of literature), but when they appeal to evolutionary selection, that also implies an intrinsic, biological, difference. Given the degree of confounding and the social cost of the inefficient allocation of human capital, the onus is on proponents of these intrinsic differences to establish causation. I’ve yet to hear one of these evolutionary arguments that convincingly demonstrates that they aren’t mistaking external factors for intrinsic characteristics of sex.

          On a personal note, I’m lucky not to be sheltered from women who’ve had upbringings that encouraged STEM-type occupations. These women certainly don’t seem like genetic abnormalities to me, yet succeed in doing tasks that require stereotypically male characteristics. I don’t really see much that connects these women other than their upbringing, so from my experience, that’s the parsimonious explanation.

      • gender is not randomly assigned: sex is randomly assigned, gender is socially constructed. they are clearly related but to revo11’s point, there is a lot of psychological room for the knowledge of biological sex (X) to influence the perception of gender (giving them Y).

        • Anon: Yes – I used gender because revo used it, but I should have used sex instead.

          revo: I have yet to see a good methodological argument as to why, when two aspects are confounded, one is a priori more likely than the other to be the real cause of variance in the dependent variable. Generally, the analytical problem is that biology is always expressed in a social context, so asking questions about the “direct” effect of sex is not particularly meaningful. What you can do, however, is look for similarities and differences across social contexts, which reveals both. This suggests that there is more room for social influences to work with with regard to some factors than with regard to others.

          As for your anecdote about women in STEM-type occupations, I’ll counter with anecdotal evidence to the effect that children tend to be “encouraged” to do stuff they’re good at, holding perceived desirability of the behaviour constant.

          Generally, I’m quite baffled to see many proponents of the environment-only view be so certain that they’re right, when the evidence does not seem to warrant it.

        • If you’re including both direct and indirect effects of sex (including social/cultural/other environmental effects), then what does it mean to attribute a difference to sex? My impression is when someone says that the outcome is attributable to sex, then they’re referring to an intrinsic property of the individual that is coupled to their sex. Otherwise, I don’t see what the competing hypothesis would be (if you include both direct and indirect effects).

          Looking at similarities and differences across social contexts is worthwhile, but there’s still a pretty big inferential leap between characteristics that are conserved among current cultures and an evolutionary explanation. Especially when the mechanism/genes involved in the latter are not identified and there are strong cross-cultural spillover effects across cultures even if you look at trends on a global scale.

          “anecdotal evidence to the effect that children tend to be “encouraged” to do stuff they’re good at, holding perceived desirability of the behaviour constant.” I don’t agree with this. Outside pressures are pretty strong influences on what children are encouraged to do, especially when it’s very unclear what they actually would be good at as adults. Sure math/music geniuses may be identifiable early on, but that’s not the usual case. Also, social desirability of the behavior is strongly dependent on sex, so to me that’s one of the sources of confounding.

          I’m not certain I’m right – this isn’t even my research area. I’m just stating my opinion and my impression of the evidence as an outsider. I’m open to being wrong if someone points me to some convincing evidence.

  2. I think your example of criminals is a great one. Yes, the number of men committing crimes is orders of magnitude higher than women. But both of these are very tiny sub-populations of men and women. If your statement is about differences among men and women, in general, then you must include the fact that well over 90% of the population overlap in staying out of prison.

    I could say that they’re very different because the ratio of men to women committing crimes is 10:1, or I could say they’re very similar because the ratio of women to men not committing crimes is 10:9. I believe that if you want to truly generalize you have to conclude the latter. I’m certainly going to be using it when I want to predict if individual X committed a crime and the only information I have is sex. Once I have information a crime was committed by either X or Y, I give a different answer, but that’s a sub population and not men and women in general.

    • I could say that they’re very different because the ratio of men to women committing crimes is 10:1, or I could say they’re very similar because the ratio of women to men not committing crimes is 10:9.

      Yet another argument for the odds ratio!

  3. It seems like it might be a good idea to re-run the analysis (assuming they provide the underlying data) with comparisons of different categories. If the overlap for men-women is similar to white-asian, old-young, and USA-UK then there’s nothing special about gender differences. Of course, they may have done this. I didn’t bother to read the paper.

  4. As a biologist, the “different species” comment is really bizarre. Sex differences in humans have got to be comparatively small by any reasonable measure. Male anglerfish don’t even have brains for most of their lives, let alone personalities similar to females. Even among mammals or primates, I think it’s pretty obvious that our sex differences are small.

    Anyway, let’s think about what D of 1.5 means in this case (I’m using the raw scores instead of latent scores because I want to apply this to individuals).

    They asked people 185 questions and calculated 16 summary statistics from them. If they used an optimal linear combination of those values to predict someone’s sex, they’d be right about three quarters of the time.

    Am I the only one that doesn’t think that sounds very impressive?

    • I guess it depends on the questions. If the questions don’t have any obvious cultural/gender biases in them (How often do you have sex with men? Do you prefer Oprah or Bill O’Reilly?) then, yeah, I’d be impressed. Actually, the next step would be to see what questions contributed most to these differences in some metric and try alternative versions of those questions which sought to remove some heretofore unnoticed cultural bias.

  5. I am surprised that this paper is new and controversial. Psychologists have been doing these personality tests for decades, and they have also been looking for statistical correlations in order to screen for disorders and other traits. I would have thought that the sex-differences had been analyzed 50 years ago.

    • Put simply (and mentioned in the article) it is difficult to find stable differences in personality measures that can be unambiguously attributed to sex. On any given measure the differences tend to be small unless you pick something obviously hugely confounded with cultural or other factors. A further problem is that much of ‘personality’ is situational (arguably all – according to some theorists) and difficult to define in a context-free way.

      Also, as David J. Harris pointed out, there are good reasons for thinking that sex differences are small a priori.

    • Difficult to find sex differences in personality? The study says that men are more Dominant, Reserved, Utilitarian, Vigilant, Rule-conscious, Emotionally stable. While women are more Deferential, Warm, Trusting, Sensitive, Emotionally reactive. These differences have been known for centuries and are consistent with my experience. Surely this study is not the first to analyze these differences.

      • Dominant, emotionally stable, deferential, sensitive… Nothing “loaded” about those descriptors, is there? But, oh well, if those differences are consistent with YOUR experience, I guess there’s no need for any further discussion.

      • Psychology has a long history of studying personality traits. If you think that it is all bogus, then you probably think that the field is not a social science at all. There is no need to discuss my experience with personality traits. If the research is wrong, just demonstrate the error.

      • While there are indeed often measurable differences between the behavior of males and females, the issue of contention is what portion of these can be attributed to the direct effect of gender difference. The reason why this is a question is because there are massive confounding factors, such as culture and upbringing, that are impossible to avoid unless you plan on studying children raised from birth in a controlled setting, which would be quite difficult and very unethical.

        Evidence that suggests large portions of the difference is cultural rather than biological include studies such as this one: , in which small psychological effects can have a large impact on the performance gender gap. (The same effect has also been shown to apply to other culturally-fed differences, such as the racial achievement gap, so the study’s results are unlikely to be caused by a biological male/female difference.)

        (And being ‘known for centuries’ does not support a theory as much as one might think. Many people in the past were wrong about many things.)

      • EveryZig, you seem to want to turn this into some sort of nature v. nurture debate that is not the subject of the paper. That Science magazine paper says that “values affirmation” can affect physics test scores, but that is really off the subject.

  6. Global differences between men and women are going to be amplified because in much of the world women are still treated like slaves/cattle.

    • @krissy Are you suggesting that if someone is being persistently treated as “slaves/cattle” they will develop personalities which can be characterized as “Deferential, Warm, Trusting, Sensitive, Emotionally reactive” ?

  7. Pingback: “Psychologically, men and women are almost a different species” « The Jury Room

Comments are closed.