My problems with Superior

This post is by Lizzie.

I was once in a faculty meeting where a colleague attempted to explain that people are complicated. They are rarely simply good or bad. The same person can be intellectually brilliant in some regards and a revolting racist at the same time (Jim Watson was the subject in this case). I am not sure she made a dent in the perceptions of everyone who needed perhaps more than a dent in their perceptions, but it was a an excellent try.

This sort of complexity is sometimes a major pain. It means there are no easy answers where it would be really handy if we had them. And then we could organize things as simply good or bad, right or wrong. Such as research in human genetics, with its revolting and recurring history of eugenics, alongside its power to help us better diagnose and and potentially develop gene therapy treatments for a suite of diseases (such as Cystic Fibrosis).

And this sort of complexity is a major part of what was missing for me from the book Superior: The Return of Race Science by Angela Saini (reviewed by Andrew here).

I have a couple disclaimers about my feelings towards this book.

  • I read this book many months ago (in the spring of 2021 I believe) and am only now, at the end of the year, getting around to writing this up. So my memory is not as fresh as I would like. Part of the delay was feeling busy at work and part was not wanting to be negative about a book that in many ways takes on a hard subject and makes a compelling case, sharing information we should all know along the way.
  • I am a biologist. I am a community ecologist, so fairly removed from the genetics realm, but I find genetics work fascinating and my partner works in ecophylogenetics, so I am not as far removed as some. This makes me biased, as the book takes aim at biologists, but it also means I understand the science behind some of the claims.

I recommend Superior. It’s well-written and easy to read. It’s a super quick read of a topic we all should think and talk more about: race science, and how science can slip into supporting racism, nationalism and a suite of other evils that society cannot seem to rid itself of. It also captures a good dose of the sadder side of the process of science, including how the power of publication pushes people to over-reach and over-state.

I also disliked Superior for a few key reasons:

1. It seems to skip over all the really difficult questions in the topic of how we do science on human genetics, with no attempt to acknowledge that they are difficult questions or provide any answers. I spent a good half the book waiting for this to crop up. But it never did. My best guess is that Saini thinks we should simply stop doing 90% of the research we’re doing in human genetics.

There’s a related illogical mix of how the science is presented. She explains that the greatest human genetic diversity is found in Africa (which I presume all biologists know and I hope most people learn this early in school nowadays), then explains human migration patterns and that they show how often different populations of people in human history have interwoven. This is cool science! And stuff we learned through studying human genetics of different groups of people in different places. But then she condemns recent studies using similar methods, without fully defining the problem that made the old studies useful and the new studies evil. She seems to decry studies of ethnic groups without ever mentioning the utility of say, studies of the Ashkenazi Jews, which helped biologists find the genes linked to Cystic Fibrosis, Tay-Sachs and many other diseases. Science does good and bad. That’s the messy, tricky problem that I thought this book seemed to mostly ignore.

There’s actually an interesting issue here to me in the evolution of genetic/genomic science. From sometime in the 80s through sometime in the 2000’s genetics was really focused on finding single genes that did important things: caused Huntington’s disease for example, or changing the hair color of certain rodent groups. But at some point we ran out of finding those and I would say we didn’t find all that many. Most phenotypic traits might be more like height — which genetically is controlled by at least 50 genes, probably many more, in complicated ways that interact with the environment you’re raised in. And it takes a lot of data to find this all out. So finding out the genetic component of most traits is going to require a lot of data (and better methods) and wading far deeper into this tricky subject of how to do it ethically. But polygenic inheritance doesn’t appear much in the book and it’s not in the index.

Back to good and bad, scientists seem to be mainly good or bad also in this book. They are either out telling the world we’re all one people, or slipping rapidly and ignorantly into race science. The ones who find human history is a complicated story of migration and intermixing are never those charging for publications or doing anything scientifically questionable. David Reich “surprises [the author] with his gentleness. … He is unfailing polite, pausing only to message his wife.” He is never a researcher running a powerhouse lab potentially in part by colluding with other well-funded labs to outcompete the labs that won’t collaborate with him (as detailed here). I think he’s likely both, but scientists don’t seem to be so complicated as that in the book.

There’s no blurry line the author ever leads the readers to look at and wonder what to do about.

2. I thought statistics were not consistently reported. They were explained in depth when they supported the author’s argument and glossed over otherwise. For example, the author (who is British-Indian) writes “as much as 95% of [human genetic] variation is within population groups. Statistically this means that while I look nothing like the white British woman who lives next door to me in my apartment building, it’s perfectly possible for me to have more in common with her than with my Indian-born neighbor who lives downstairs.” Sure, it’s perfectly possible (for many reasons, given how little info we’re given here), but statistically it’s more likely that Saini will have more in common genetically with her Indian-born neighbor because of that other 5% which is never discussed. She introduces uncertainty intervals only for a recent study painfully trying to link a gene variant, which appeared to sweep through some European, North African, Middle-Eastern and Asian groups at a mean date of 5800 years ago, to the brain, but we never discuss the uncertainty for past human migration that mixed populations. I get that she’s making an argument, but I wanted it more carefully made. Complaining that scientists warp statistics for their own agenda is less convincing when an author does the same.

3. I am biased as a biologist myself, but I felt like the author had a holier than thou tone, looking down on us racist little biologists who could not understand the complexity of the social construct of race. I sort of liked this for my own personal benefit: it struck home for me the idea of why we need to all acknowledge our privilege and bias. I was annoyed with the author for not making it clear to me that she knew she was as much part of the problem, and that we’re all in it together.

At the end of the book she writes of biologists who use racial categories in their research: “They should at least know what race is” (author’s emphasis). This comes just after she writes about a:

…fairly young, diverse, international team, not all stuffy or old fashioned. And [the anthropologist studying them] noticed that all the scientists were routinely using racial categories not only to select their subjects but also to confidently pick out statistical differences between these racial groups.

So as [the anthropologist] observed them, she asked each scientist she interviewed one simple question `How would you define race?’

Not one of them could answer her question confidently or clearly.

I bet a bunch of them bombed answering this in a way that I’d be horrified by, but I didn’t trust the author at this point to think much more than that this is a damn hard question that would require a long answer about the social science of it, with a dash of biology (including the current inadequacy of both genetic data and statistics), and the mess of check-boxes on forms.

It’s a question that I wish Saini had given me a straight answer to in the book, and I don’t feel she did.

One answer is that race is a social construct and cannot be defined genetically. Saini seems to suggest the ills of racism spring from this attempt. But the ills of racism to me do not come from whether we define race socially or genetically, but in the idea that there could ever be a hierarchy of races. If we could define races well genetically, I doubt we rid ourselves of this inane idea.

Superior seems to want to take down the basic ideas of racism by showing that we are all one people, that we are highly intermixed and genetically similar. We are! But we are also genetically diverse. And, like we value cultural diversity, we should also value — not try to obscure — that diversity. In my work and the work of many researchers, genetic diversity is often what saves us; it’s the monocultures that fail us. I think Superior fails by not acknowledging how fantastic genetic diversity is, how much it has given us (and will in the future), and instead offers the false promise that if we all recognized race isn’t genetic then we wouldn’t be racist anymore.

22 thoughts on “My problems with Superior

  1. Lizzie:

    Thanks for the review, which motivated me to reread the long comment thread from a couple years ago. I guess the difference is that I was reading the book as a political scientist, and you’re reading it as a biologist. As I wrote in my review, I thought that in her book Saini did a good job at taking apart some ever-popular ideas of racial essentialism, but I take your point that, in following a good-guys / bad-guys framework, she missed an opening to talk about challenging open questions.

  2. A recent book that discusses polygenic indices and takes on the complexity of the inter-related questions of biology and society is _The Genetic Lottery_ by Kathryn Paige Harden, at Amazon here: https://www.amazon.com/Genetic-Lottery-Matters-Social-Equality-ebook/dp/B091MQ771M/. From the site, the author “is a professor in the Department of Psychology at UT, where she leads the Developmental Behavior Genetics lab and co-directs the Texas Twin Project.”

  3. “routinely using racial categories not only to select their subjects but also to confidently pick out statistical differences between these racial groups.
    So as [the anthropologist] observed them, she asked each scientist she interviewed one simple question `How would you define race?’
    Not one of them could answer her question confidently or clearly.”

    Interesting methodological divide. The author thinks the purpose of the scholar is to define things; the scientists think it is to measure things and make verifiable numerical predictions.

    • It’s also as old as Socrates and as recent as (at least) Wittgenstein. People use words all the time quite effectively that, when pressed, they can’t fully define (i.e., eliminate boundary problems). Sometimes they are justified in doing so and sometimes not. (Look up any definition of “species,” for example.)

      • I go with the scientists generally here. They say, If A happens then B happens”, and that’s valid even if they put wrong the label (the wrong definition) on A, so long as they measure it right.

        When we get into what “the true meaning” of a word is, on the other hand, it’s easy to equivocate and project internal biases.

  4. Thanks for writing this.

    One of the concerns I had working on this study APC I1307K increases risk of transition from polyp to colorectal carcinoma in Ashkenazi Jews https://pubmed.ncbi.nlm.nih.gov/11159880/ was that if a community accepts that they have a lot of genetic overlap and they can organize and cooperate then can expect a possible improvement if their health outcomes. So is that an unfair advantage? Almost the opposite of harmful “racism” in some sense.

  5. “as much as 95% of [human genetic] variation is within population groups. Statistically this means that while I look nothing like the white British woman who lives next door to me in my apartment building, it’s perfectly possible for me to have more in common with her than with my Indian-born neighbor who lives downstairs.”

    Is this actually the case? I’d expect the curse of dimensionality to prevent it across the tens of millions of common SNPs segregating across human populations (since rare variants would not contribute much to total variance). Just as the whole “Lewontin’s fallacy” thing says that a ratio of multivariate densities can be quite distinct from the ratio of marginal, univariate densities (in the context of, say, a naive Bayes classifier), I feel the same extends to distance & similarity metrics in lots of dimensions, too.

    It’s too early for real math, but not too early for a quick simulation experiment! Suppose we have an even mixture of two multivariate normal distributions with diagonal correlation structure and marginal variances = 1, and with the means of each differing by (0.45, 0.45, 0.45, …, 0.45). Just over 95% of the variance is found within groups, in this case — the means of each marginal normal are less than half a standard deviation apart. But for mere 20,000-dimensional MVNs, the distributions of euclidean distances between individuals within groups vs. between groups are basically completely non-overlapping, e.g. see https://i.imgur.com/3i5uFpS.png (quick & dirty R code here: https://pastebin.com/Bmn7giaa)

    Should also apply to e.g. binomial random variables (under panmixia) as much as it does to normals, or to linear combinations of those variables under a complex polygenic model with plausibly thousands of causal loci of comparably small effect working in concert to structure phenotypic variation (can also verify pretty easily w/ Howell’s or Goldman’s osteometric etc. data, but the quote specifically refers to genetic variation, so). I think within-pop non-independence between sites due to LD should also exacerbate the difference, too? Thinking about separating lines in 2D space. Though it would also reduce the effective dimensionality of the data so not actually sure. 🤔

    • Saini is wrong because she is reusing Lewontin’s fallacy and making an argument on a per-allele basis. Yes, it’s true that on a *single* allele, she might be more ‘related’ to a white woman than a fellow Indian woman because ‘there’s more genetic variance within groups than across groups’, but this is accumulated across millions of alleles, and worse, it’s all correlated due to phylogeny. Needless to say, none of the PCAs will ever put Saini closer to her white neighbor than her fellow Indians, nor need she worry about waking up to 23AndMe trying to link her to a hitherto unknown Scottish cousin.

      It’s yet another instance of the ‘multivariate fallacy’. Sort of like arguing that ‘male’ and ‘female’ don’t exist because there’s more variation in, say, height within gender than across gender, and ignoring all of the other variables on which there are systematic differences as well.

      She’s also wrong about a lot of other parts. (Lizzie could’ve highlighted a lot of other errors in it. I recall a single sentence where she managed to make 3 different major errors about PGSes, heritability, and proving evolution can’t happen.)

      Why all the positive reviews? Well, her heart is in the right place, innit? And no one wants to wake up and find themselves named in, say, a Nature article about “CRISPR’s willing executioners”…

      • Enide:

        I don’t understand this. Whether Saini is closer (on some DNA-matching scale) to her white neighbor than to her Indian neighbor . . . this must depend on who these neighbors are, no? Lizzie’s statement that “statistically it’s more likely that Saini will have more in common genetically with her Indian-born neighbor” seems more reasonable than your absolute statement that “none of the PCAs will ever put Saini closer to her white neighbor than her fellow Indians.”

    • Yeah, this is basically right. It’s slightly more complicated because India is a huge place with a lot of diversity, and so you can find extreme examples of populations in India that aren’t that closely related to each other. But even when I included that I couldn’t really get the numbers to work out to make her statement reasonable.

    • Is this actually the case? I’d expect the curse of dimensionality to prevent it across the tens of millions of common SNPs segregating across human populations (since rare variants would not contribute much to total variance). Just as the whole “Lewontin’s fallacy” thing says that a ratio of multivariate densities can be quite distinct from the ratio of marginal, univariate densities (in the context of, say, a naive Bayes classifier), I feel the same extends to distance & similarity metrics in lots of dimensions, too.

      Shouldn’t the ratio of the minimum to the maximum distance between any two query points approach 1 as the number of independent dimensions grows? My understanding was that clustering in high dimensions is an inherently unstable procedure for that reason.

      • Hmm, so thinking a bit further in the normal case (though now burdened by jetlag in addition to general earliness lol), Euclidean distances between samples from a MVN random variables should be chi distributed, right (since it’s the square root of the sum of squares of a bunch of normals — could probably think of a Mahalanobis distance in this way, too, since it’d be equivalent to transforming the MVN)? Don’t have a good intuition for whether the df parameter outpaces the squared mean as df increases there (since the mean is propto a ratio of gamma functions), but your observation that the ratio of high-to-low distances approaches one does seem to hold, e.g. see (in R):

        library(chi)
        dfs <- round(1.5^(2:40))
        p <- c(0.001,0.999)
        rats <- qchi(p[2], dfs) / qchi(p[1], dfs)
        plot(dfs, rats, type = "l", log = "xy",
        xlab = "degrees of freedom", ylab = "ratio high-to-low")

        to produce https://i.imgur.com/3r0r2jo.png

        However, I think the gulf between individuals from different groups will still increase with increasing df, since the difference between normals is still normal, but now with non-zero mean (but the same variance, i.e. the sum of the variances of the original distribution), which continues to grow with increasing dimensionality even as the contribution from the random normal bit plateaus:

        par(mar = c(5,6,4,3))
        diff_in_means <- 0.45
        plot(qchi(p[2], dfs) – qchi(p[1], dfs),
        sapply(dfs, function(ndf) sqrt(sum(rep(ndf, x = diff_in_means)^2))),
        type = "l", log = "xy", xlab = "contribution of difference in means",
        ylab = "contribution of random normal component\n(99.8% quantile interval breadth)")

        to get: https://i.imgur.com/7hhTB9P.png

        I'll have to look at the paper you linked in closer detail, but my prior intuition was that high-D clustering was tricky because of how sparse the space is. But there are probably lots of other considerations, too!

        • Nikolai:

          The thing I’m concerned about is the idea of treating “white people living in Saini’s apartment building” and “Indian-born people living in Saini’s apartment building” as samples from two distinct populations. White people living in Britain are a pretty diverse group, as are Indian-born people in Britain, and it seems to me that in your analysis there’s a hidden assumption that these two groups represent samples from nonmixing populations.

        • Agreed. The technical question of distance metrics in high dimensions is interesting, but second order to the social one. White people in, say London, may include your classic anglo saxons, Germans, Poles, and depending on who you ask, clean shaven Turks and agoraphobic North Africans. On the Indian side, there isn’t a whole lot of mixing between Gujarati folks and, say, South Indians from Chennai, and an “Indian person” living in London can easily be a second or third generation immigrant who’s just dark enough to be considered Indian. It’s too easy to assume we’re talking about some kind of representative agent or “normal” subpopulation.

        • Hi Andrew:

          I’m totally with you that the population history “white people” and “Indian people” is gonna be full of all sorts of complex reticulation, ILS, swells and dips in size, etc. etc. The ancestry of any real individual from those folk categories will be some admixed tapestry comprising lots of sources, yielding surprising genetic similarities that contrast against starker phenotypic differences. I was just trying to structure my intuitions about what the original quote could imply by drawing up a little toy example, unburdened by any tricky nuance of reality, specifically wrt the argument that because some large proportion of variance lies within and not between groups, distances in high dimension will behave in this or that way.

          But I think maybe a better arbiter here would be data (and a better-specified model!), and I have the 1000G files dloaded at home. After winter travels I could probably just subsample the genomes of GIH / ITU individuals and GBR individuals (or whoever), and get a sense of empirical distances under a GTR+Γ model (or w/e, dunno if straight substitution would differ from trying to incorporate the coalescent wrt a distance metric?), and see if the general idea holds.

  6. Lizzie –

    > Sure, it’s perfectly possible (for many reasons, given how little info we’re given here), but statistically it’s more likely that Saini will have more in common genetically with her Indian-born neighbor because of that other 5% which is never discussed.

    I’m hoping you can unpack the logic of that comment a bit.

    You might benefit from listening to this interview between Robert Wright and (anthropologist) Augustin Fuentes. I think it addresses your reaction to the book in the sense that includes back-and-forth dialog on many issues (or issues related to those) you raise:

    https://meaningoflife.tv/videos/43515

  7. “statistics were not consistently reported. They were explained in depth when they supported the author’s argument and glossed over otherwise.”

    Standard journalism fare. Krugman does this all the time. He repeatedly cites a single study of fast-food stores from the early 70s regarding min wage which he claims proves min wage doesn’t cost jobs, despite surely thousands of studies since. Recently someone uncovered the methodology used to conduct Krug’s favorite study: the authors called up stores and asked them!

    So much for the ballyhooed journalism as “Protector of Democracy.” Journalists are nothing just making a living, just like everyone else.

Leave a Reply

Your email address will not be published. Required fields are marked *