Skip to content

Iceland education gene trend kangaroo

Someone who works in genetics writes:

You may have seen the recent study in PNAS about genetic prediction of educational attainment in Iceland. the authors report in a very concerned fashion that every generation the attainment of education as predicted from genetics decreases by 0.1 standard deviations.

This sounds bad. But consider that the University of Iceland was founded in 1911, right at the beginning of the period 1910-1990 studied by the authors (!). So there is a many-thousand-percent increase in actual educational attainment at the same time as there is an ominous 0.005 SD/year decrease in ‘genetic’ educational attainment. Over this period educational attainment in the developed world seems to have exploded beyond a reasonable doubt, as is shown in the paper’s appendix also for Iceland. This genetic effect seems like a kangaroo feather to me.

My reply:

I’m not quite as skeptical as you—after all, it doesn’t seem unreasonable that different groups in the population are having children at different rates—but there are some things about the paper that I don’t completely follow:

1. The last sentence of the summary: “Another important observation is that the association between the score and fertility remains highly significant after adjusting for the educational attainment of the individuals.” This comes up again at the end of the paper: “It is also clear that education attained does not explain all of the effect. Hence, it seems that the effect is caused by a certain capacity to acquire education that is not always realized.” I guess they mean it’s not just that more educated people are having fewer (or later) children, it’s that this fertility differential is predicted by the genes. But I don’t see why that matters for their story.

2. The last sentence of the abstract: “Most importantly, because POLYEDU only captures a fraction of the overall underlying genetic component the latter could be declining at a rate that is two to three times faster.” This is what Eric Loken and I call the “backpack argument.” N is huge here so we don’t have to worry about statistical significance; still, I’m concerned about measurement error and there’s something about this “two to three times faster” claim that I find suspicious, even though it’s hard for me to pin down exactly what’s bothering me about it.

3. Saying P < 10^-100 is kinda silly. They say ∼0.010 standard units per decade, which is fine, but then it would make sense to perform a separate estimate for each decade and see how this varies.

4. I also don’t quite understand “the genetic propensity for educational attainment.” Doesn’t the definition of this propensity vary over time? In the U.S., having an XY chromosome used to be strongly predictive of educational attainment, but not so much anymore. Similarly for various ethnic groups. Iceland might be different because it’s a more homogeneous society, but I’d still think these conditions would vary over time.

This is not to say the research is all wrong—I actually know the first author of the paper, we were in grad school together and he’s a smart guy. It seems like the topic is worth studying as long as people recognize the uncertainty and variation here. Calling it “nonsense” seems a bit strong, no?

My correspondent responded:

Thinking it over, I agree that I was being too harsh, though I remain skeptical. Certainly the editor is high respected, and the last author is also very prominent. I would tend to agree with you and the authors that different people have different fertilities that are associated with demographic factors like education (this appears to be a very old result). Furthermore, it would not be surprising if there was an indication of this in the genetic makeup of the population, but this could be true whether the trait was genetically determined or not.

I expand on my criticism at some length below- I apologize for going on a bit. First I’ll respond directly to your points:

1) I think you are right- how I would say it is that these variables (education, fertility, and POLYEDU) are all somewhat independently correlated with each other.

2) I think that this refers to the fact that most heritable variance of educational attainment (~90%) is not explained by POLYEDU. The argument seems to be that this unobserved genetic component is probably behaving the same as POLYEDU, see page 4 results. I doubt this, based on my view of complex trait genetics, but making this assumption allows them to make the estimate. That said, I couldn’t reconstruct where the 2-3 multiplier comes from. I wasn’t able to find anything on your backpack argument via google, but I would be curious to know more.

3) Agreed.

4) For me the weakest point (see below). All of these genetic effects are context dependent, and the context in which they occur is one in which the environment is overwhelmingly pushing the opposite direction relative to the (measured) genetics. For instance, imagine that educational attainment and fertility both stabilize in the 21st century. It is possible that the selective effect goes away even if the demographic pattern persists, because what POLYEDU measured was a demographic group who were transitioning from high-fertility to low-fertility as they attained more education. Moreover, all of this is based on gene-education associations that were only measured in the last generation, and they account for only 4% of the variance of the trait (of which 20-40% total is heritable).

I should admit that the source of my strong reaction to the paper, and particularly to the publicity, is that when this kind of finding about human genetics gets oversold, it immediately gets shared on white supremacist websites as incontrovertible proof of whatever (as has already happened in this case). This is something that I have a problem with in my chosen field (and why I don’t work on humans): there are consequences to overstating the role of genetics in human variation.

This paper seems to be playing into an “Idiocracy” narrative based on its data; this is certainly how the paper seems to be getting played in the press (“We’re evolving stupid: Icelandic study finds gradual decline in genes linked to education, IQ”). However, I don’t see how they could reliably measure that effect with this data. For instance, towards the end of the paper they translate their POLYEDU effect into IQ because education increases IQ, and therefore IQ has decreased by ~0.3 points overall based on POLYEDU. At the same time, educational attainment in the study population has increased by an average of 2-4 years over the same period, which if you look at other studies ( suggests a historical increase in IQ of 8-16 on the same period (one measured increase in the interval of the study said 13.8 points). So what they’re measuring is a countercyclical genetic trend in the face of a much larger environmental trend. It isn’t wrong to try to do this, but the presentation obscures the larger picture that genetics probably isn’t doing much here. A rather more parsimonious hypothesis is that the demographic shift caused some mostly meaningless fluctuations in standing genetic variation (which would be expected anyways in a population model). This seems more plausible to me, and I believe it could be modeled without too much difficulty.

The authors are also certainly correct that 0.01 SU/decade would be a significant trend in evolution, but this then also assumes that the conditions of the 20th century in which this change is taking place are held constant for evolution. Presumably this includes the increasing educational attainment that is the context for the decreasing genetic propensity for educational attainment. I think this is similar to your question (4).

To their credit, the last author more or less agrees with all this in an interview, and they talk about it in the end of the results section. But then, why describe something as meaningful on an evolutionary timescale unless you believe it actually will be? Their extrapolation just makes me uncomfortable, especially because they don’t account for the effects of demography. What is particularly worrisome is your question (1), which might mean that the score is actually somehow measuring signatures of declining fertility via the mediator variable of education. Personally, I mistrust polygenic scores in humans (animals are better, I think). They tend to rely on large numbers of features to explain rather small effects (4% of variance in this case). However, mine is a fringe position in this field.


  1. gwern says:

    1. It’s an attempt at a mediation analysis: showing that the dysgenics is not purely through the measured education credentials. This is interesting because some of the dysgenics analyses argue that the selection against intelligence is not against intelligence per se but only against education, and it selects against intelligence as well simply because intelligence causes/is genetically correlated with education.
    2. Think of it as a correction for measurement error. To get the absolute trend, you multiply by the uncaptured variance.
    3. It doesn’t vary much. IIRC, Kong includes a graph of per-decade cohorts. You can also look at other papers measuring dysgenics to see it doesn’t much change in the available ranges. We also do not see polygenic scores changing in predictive power all *that* much regressed over decades across a variety of traits (thanks to the UKBB measuring a bunch of traits).

    2. I don’t know why you think that the current PGS would change wildly with some larger sample sizes ‘based on your view of complex trait genetics’; can you give a single example of a complex trait where the PGS for the first few percent looked very different from much better PGSes? (Certainly didn’t happen for height or schizophrenia…) Additive SNPs are pretty much additive SNPs, regardless of whether your sample size is n=10k or n=500k. Also, they give their derivation of the estimate inside the paper – that’s what the whole section “Determining the Rate of Change of the Polygenic Score As a Result on Its Impact on Fertility Traits” is about, isn’t it?
    4. again, the polygenic scores are not super fragile and ‘context dependent’. They were not measured just in the last generation – the cohorts used in intelligence and education by UKBB and SSGAC etc range in age from age 9 to age 70s across many countries. And again, if they really did vary hugely and were super ‘context dependent’, you would not see smooth curves over time like Kong’s! They would jump up and down based on the changing ‘context’, not look like smooth responses to steady selection. Also, are you sure you want to try to criticize it on being ‘just’ 4% of variance? What theoretical reason do you have to believe that a result using a PGS with 4% variance is unconvincing but 5% or 6% would be? How much variance does it *need* to be? (Keeping in mind that 4% is already obsoleted by the next SSGAC paper which will have something like 9%?) Also, education is not 20% heritable, you can’t explain this away as ‘ignoring demography’ (where do you think the selection is coming from?), and given the ending of the Flynn effect across the West, it’s not obvious that this is ‘some mostly meaningless fluctuations in standing genetic variation’ with no phenotypic implications. If something can’t go on forever, it won’t.

    • max says:

      I’m the correspondent mentioned by Andrew. Sorry about the delay in responding, the workweek intervened.

      I infer that the second set of responses in this question are addressed to me, so I’ll respond in turn to them. I apologize for going into some detail; my tl,dr is: if you have faith in biology to be statistically well-behaved there is perhaps something in the paper. I have little such faith, and this seems to be my difference with @gwern.

      (2). This doesn’t really have anything to do with sample size, from my point of view (130K Icelanders plus a bunch of Brits [or whatever it is they end up using] seems like a pretty good sample size!), but rather the generalizability of the results and the problems with the additive model generally. If everything is perfect and additive you of course expect that as power and sample size tend to infinity you will find infinite contributing SNPs contributing a collective 100% of the heritable variation (up from 10%-20% now). This is more or less what @gwern means (I think) when he says that an additive SNP is an additive SNP: if you believe that heritability is the sum of additive SNPs, just find all the additive SNPs and you’re golden. My POV is that additive SNPs are only rarely really additive SNPs. Certainly there is some proportion of the heritable variance that can be easily represented as additive and that is probably the 10-20% that’s already been found. The rest is going to be a lot harder to find*. Additivity is a very nice convenience for making our models work and keeping our heads from exploding, but that doesn’t mean that we can trust the genetic architecture to behave that way (Andrew actually has a post saying something similar about the social sciences which I’m not able to track down currently). So while I take polygenic scores and similar constructs as possibly useful tools, such tools have breaking points.

      (4). Perhaps I missed something, but my recollection of the paper was that only one group was genotyped. The authors write: “Probands used for the genetic analyses here are limited to those with both parents and all four grandparents listed in the genealogy.” It looks like there are multiple generations represented in the Icelandic group (starting year of birth 1910) but Icelanders are pointedly excluded from the GWAS used to estimate the polygenic score. They say some other stuff in the M+M that makes this a little hard to interpret, but it’s at best the combination of several contemporary cohorts. This is fine, given the practicalities, but given the context-dependence (yup) of genetic association studies it is not obvious to me that if you went and estimated the score in 1920 in Iceland you would find the same variants, or if you found the same variants, that they would have the same sign of effect, because the circumstances of measurement are so completely different.

      There are a few more criticisms here which run together, I will try to cover them.

      a) It is true that I would be more convinced by higher % variance, yes- not necessarily to the point of believing the results, but yes to taking the study more seriously. I have no theoretical reason or threshold per se (beyond the “closer to estimated heritability” theoretical reason discussed above), but rather a gut feeling that if you make a measurement with a model with known inadequacies (again, above+footnote), and the model doesn’t work out all that well, maybe there could be a problem with the model.

      b) Maybe there is an estimate of heritability >20% for education, I try not to keep up with things because I don’t find them very interesting and they’re usually not very meaningful. Recall that heritabilities vary from measurement to measurement and population to population, so it’s hard to take any one estimate very seriously.

      c) Potentially I overstated a role for demography in what I wrote previously- it was just an idea I had for where the pattern came from. Granted, the pattern could also be something else or just noise, that is very possible. But, just for fun, let’s imagine a situation in a structured population (like all pops), where educational attainment is 0% heritable, 100% environment. Some people get learnin’ and teach it to their chillun, and so forth, who go on to make fewer babies. That will, naturally, lead to a population genetic signature, because there are always signatures of demographic stuff happening in the alleles, because we’re working outside Hardy-Weinberg equilibrium. You will still find variants that look like they’re associated with the trait, because population structure also shows up in the genetics. The authors use a genomic control to try to look, very legit and clear, and it doesn’t seem to matter, and they also perform their regressions after tossing out the 20 leading PCs: “Adjusting for 20 principal components reduces, but does not eliminate, this effect.” So they’ve nominally covered their bases. Is that good enough? Dunno. Probably not (usually the case). I’m not claiming I have a better suggestion, it’s just a hard problem, and the only way to deal with it are heuristics. Once you stack enough heuristics on top of one another, I get worried.

      d) Flynn effect: It seems to me that the ending of the Flynn effect, if anything, more or less renders anything this paper has to say about the future as uninterpretable. (Had it ended? I thought it was still going on in some places, coincident with economic development, but I’m often wrong). Consider it this way: the general increase of the standard of living gives easy gains in education and cognitive ability in the 20th century, which reduces selection for genetic bases of both, but now that that’s over and everyone’s pretty well-fed and has socialized health care the hammer will come back down and selection on the genes will resume. I don’t believe this, because I think there’s a general tendency to overestimate the amount of selection that’s going on in humans for behavioral traits, but it’s at least as likely a hypothesis as what @gwern seems to be saying. Bigger-picture, I would say that Flynn shows that the important fluctuations and insults that affect these cognitive phenotypes are principally environmental, as various people have argued. If that’s the case, I mostly don’t care about this kind of study.

      Then again, I don’t really believe that objective knowledge is possible (could you tell?), so maybe I’m wrong.
      cheers, max

      *For anyone with a very high boredom tolerance, this is called the “missing heritability” problem and there’s a very large literature of people arguing about why they think it happens. There are 3 principal viewpoints on this:

      1. Everything is fine, we just need more grant funding to get larger sample sizes and find more common genetic variants because we just don’t have good enough power to ascertain ALL the variants yet. For a sample, see This is very popular with the people who did well with additive models in the early 2000s.
      2. Most things are fine, except that the additive model is wrong because organisms are made of biology, and biology is complicated and doesn’t do things additively, so we need more grant funding to understand some weird piece of cell biology in fruit flies. For a sample, see Very popular with the people who do genetics with non-human organisms.
      3. “heritability is an unstable trinket”**. It doesn’t measure what you think it’s measuring, because it depends on a ton of other stuff that you aren’t accounting for and was basically invented for (and has had its best success with) breeding hogs. For a relatively old but widely read sample, see Popular with the crusty, cantankerous old geezers who invented what we think of as quantitative genetics and their younger but equally unsociable, poorly-funded friends.

      Some of the work that I’ve found most instructive on this subject was first Leonid Kruglyak’s work with yeast (Ehrenreich et al. 2013 Nature, Bloom et al. 2015, Nature Comms) that came out a few years ago claiming to show that most genetic variance in a big huge cross was additive- 80% or something, as opposed to 20% epistatic. So, great, genetics solved. However, just recently Örjan Carlborg (in a collaboration with Kruglyak) took the same yeast cross and, whoops, showed that a big chunk of that additive variance is really epistasis that can be represented as additive at the allele frequencies of the cross (Forsberg et al. 2017). I don’t think they come up with a single number for how much additive variance is actually epistatic (there are multiple phenotypes involved), but generally it seems to be most of the variance. More specifically, these interactions resulted in big prediction errors of an additive-only model. Notably, the Kruglyak group’s initial work found that they were able to explain basically 100% of the narrow-sense heritability with the loci they found- so they have nearly total ascertainment! And yet at least a big gob of that is actually interactions.

      So… the take-home message there is that I don’t find polygenic scores, or arguments based on extrapolations from additive SNPs, to be very convincing.

      **actual title of a slide in a talk I saw at the Genetics Society of America meeting in 2016, given by William Valdar. I immediately googled it and was VERY angry at him for not writing a paper with that title or containing that phrase, because then I could cite it all over the place and that would be lovely. Note that this hasn’t stopped me.

Leave a Reply