Skip to content
 

Are GWAS studies of IQ/educational attainment problematic?

Nick Matzke writes:

I wonder if you or your blog-colleagues would be interested in giving a quick blog take on the recent studies that do GWAS (Genome-Wide-Association Studies) on “traits” like IQ, educational attainment, and income?

Matzke begins with some background:

The new method for these studies is to claim that a “polygenic score” can be constructed — these postulate that there are thousands of SNPs (single-nucleotide polymorphisms) that have tiny independent effects on the trait, and that by adding these up, the trait can be predicted to some degree. (The SNPs could themselves be causal, or perhaps in linkage disequilibrium (LD) with causal SNPs.)

I am an evolutionary biologist/phylogeneticist, but I do not work in GWAS. However, my sense of it is that the main way these studies work is that they construct hundreds of thousands of individual linear models, one for each (non-linked) SNP, do something like a Bonferroni correction, and then take all the SNPs beyond the p-value cutoff (something like 5×10-8, although even within a paper there seem to be multiple cutoffs used) as the interesting ones. Then the individual effects are summed to produce a polygenic score for educational attainment, IQ, etc.

This work today has received a huge publicity boost in the New York Times:

– An editorial by a psychologist arguing that progressives should welcome these new results, with few hints about the limitations and problems with these kinds of studies:

Why Progressives Should Embrace the Genetics of Education
By Kathryn Paige Harden
Dr. Harden is a psychologist who studies how genetic factors shape adolescent development.
July 24, 2018

– A news report by Carl Zimmer on the results, which seems much more responsible and mentions some of the limitations stated in the paper, but not what I think are possible bigger statistical issues:

Years of Education Influenced by Genetic Makeup, Enormous Study Finds
More than a thousand variations in DNA were involved in how long people stayed in school, but the effect of each gene was weak, and the data did not predict educational attainment for individuals.
By Carl Zimmer
July 23, 2018

Here’s the referenced paper:

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals
James J. Lee, Robbee Wedow, […]David Cesarini
Nature Genetics (2018)
Published: 23 July 2018
https://www.nature.com/articles/s41588-018-0147-3

There was another round of this a few months back in various publications, about race and intelligence, that also involved GWAS, and seemed to try to prepare/repair the ground for people to accept the idea of genetic differences in IQ between races.

– How Genetics Is Changing Our Understanding of ‘Race’
By David Reich
March 23, 2018

– DNA is not our destiny; it’s just a very useful tool
Ewan Birney
Yes, our genes affect everything we do, from educational attainment to health, but they are only a contributing factor
https://www.theguardian.com/science/2018/apr/05/dna-sequencing-educational-attainment-height

– Denying Genetics Isn’t Shutting Down Racism, It’s Fueling It
By Andrew Sullivan
http://nymag.com/daily/intelligencer/2018/03/denying-genetics-isnt-shutting-down-racism-its-fueling-it.html

Some pushback:

– Genetic Intelligence Tests Are Next to Worthless
And not just because one said I was below average.
Carl Zimmer, May 29, 2018
https://www.theatlantic.com/science/archive/2018/05/genetic-intelligence-tests-are-next-to-worthless/561392/

He then expresses some concerns:

I [Matzke] am worried that (a) these GWAS statistical methods might be fundamentally flawed, despite their widespread popularity, leading to wrong or largely wrong conclusions both about the genetics of intelligence/education and perhaps many other traits (medical traits etc.), and (b) if flawed methods are contributing to the same bad old narratives about genetic causes of inequality (going back to eugenics, anti-immigration propaganda, genetic racism, etc.), we really need to know that!

Things that make me worry:

* The Lee et al. study reports that their polygenic score, derived from ~1 million individuals, still explains only 11% of the variance in educational attainment, and the median effect for an individual SNP was ~1 week of education

* The effect sizes go down by 40% when family-level variation is used (e.g. siblings where one has the SNP and one doesn’t)

* The polygenic score’s predictive ability, such as it was for a European-derived population, didn’t work well for an African-American population. Another case of GWAS predictions outside of the training population being problematic is this one on schizophrenia:

https://www.biorxiv.org/content/early/2018/03/23/287136
=========================
Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia
David Curtis
doi: https://doi.org/10.1101/287136

Key quote: “There are striking differences in the schizophrenia PRS between cohorts with different ancestries. The differences between subjects of European and African ancestry are much larger, by a factor of around 10, than the differences between subjects with schizophrenia and controls of European ancestry. . . . Two kinds of explanation suggest themselves. The most benign, from the point of view of the usefulness of the PRS, is that the PRS does indeed indicate genetic susceptibility to schizophrenia and that the contributing alleles are under stronger negative selection in African than non-African environments. The least benign would be to say that the PRS is basically an indicator of African ancestry and that for some reason, perhaps through mechanisms such as social adversity, subjects in the PGC with schizophrenia have a higher African ancestry component than controls. It does not seem that the latter can be a full explanation, because it does seem that the PRS is associated with schizophrenia risk in a homogeneous sample after correction for principal components. On the other hand, it is difficult to accept that the PRS does not index ancestry to at least some extent. . . . Whatever the explanation, these results have important implications for the interpretation of the PRS. . . .”

Much of the GWAS data comes from sources like the UK BioBank. We know, even if all the samples are from “European” individuals, that there will be genetic structure in the data due to ancestral geography and isolation-by-distance. All of these social “traits” — education, income (and IQ which correlates with both — Ken Richardson [https://scholar.google.co.nz/scholar?q=%22Ken+Richardson%22++IQ&hl=en&as_sdt=0%2C5&as_ylo=1990&as_yhi=2018] argues that IQ may be nothing more than an index of these middle/upper-class attributes) also have geographically-structured variation, simply due to the history of economic development (among many other things). It seems to me that all it would take would be some regional historical variation in wealth/education, and some spatial structure in the genetics, to lead to weak correlations between certain alleles and educational attainment. That would apply in the UK with deep ancestral genetic structure, or in the USA where the history of immigration (even just European immigration) has been highly nonrandom, as has the wealth and status accumulation by ethnic group. I think it is not a stretch to say that there might be a difference in wealth and educational attainment between USA people with different European ethnicities — say, classic WASP populations in New England that date back to before the American Revolution, versus southern and eastern European populations that came later. This difference in wealth/average education would not have a genetic cause, but it would definitely have genetic correlations.

Matzke concludes:

I wonder if these GWAS studies for wealth/IQ/education are mostly picking up accidental correlations due to ancestry (perhaps with a moderate proportion of genuinely causal alleles, perhaps mostly ones with a pathological effect). This would be a ready explanation of why polygenic scores can be nonpredictive or pathological outside of the training population, and why the effect sizes decrease dramatically when studying variants within families.

In other words, are GWASes on education and perhaps many other social traits mostly bunk? Are we perhaps going to see another great statistical crisis (like the crises in small-data psychology, or the p-value/replicability crises), but in the “big data” arena of Genome-Wide Association Studies?

And, is Harden’s essay in the New York Times, “Why Progressives Should Embrace the Genetics of Education”, thus wildly misguided, expressing confidence about statistical results that we shouldn’t be confident in, and dissuading skepticism about the Very Long And Bad history of people trying to explain systematic inequalities through genetics, when in fact we should be maintaining or increasing our skepticism in the modern world of genomics and GWAS?

PS: There is an extensive FAQ from the authors of the Nature Genetics study, which makes me feel somewhat better about the population stratification issue:
https://www.thessgac.org/faqs

Also this from Graham Coop about the generic topic of between-population differences:
Polygenic scores and tea drinking

This stuff is so technical, and I have not tried to follow the details. But the topic seems important enough that I thought I’d share with all of you. Speaking generally, I can see the appeal of both sides of the argument. On one hand, even noisy data can provide some insight, and it seems reasonable to start by drawing hypotheses and tentative conclusions based on what we have; on the other, when a variable or set of variables explains only a small percentage of the variation in the outcome, you have to be concerned that selection biases will overwhelm any effect of interest. We can draw an analogy here to surveys with 10% response rates: for many purposes this is just fine, as long as we adjust for relevant differences between sample and population, but there will be questions for which the results of any comparisons are driven by biases that are hard to adjust for.

142 Comments

  1. Anoneuoid says:

    What is the goal of these studies? To predict IQ from genetics data? Then simply do it and check the results as the future data comes in, there is no reason to generate so many words about this.

  2. Dogen says:

    No, that’s not “the” goal. One goal, among many others, is to make causal statements about genes.

    • Anoneuoid says:

      Assuming that’s a response to my post above… Then that is just a waste of time.

      Every attribute is collectively caused by every gene. The end.

      It’s the modern day equivalent of arguing about how many angels can dance on the head of a pin, endless discussion of conclusions derived from a false premise. Watch how this discussion about causality fails to comes to any kind of satisfactory resolution (just like every one before and after).

      • Worse, every attribute is collectively caused by every gene, and everything about the environment.

        It’s only worth studying causality if the size of the causal effect of some particular stuff is dramatically more than typical. So, for example suppose there’s a small group of genes which are involved in causing a resistance to HIV infection… then you study how that works, biochemically. Or if there’s a gene variant that causes a small group of people to have a 4th type of cone cell in their retina (there is) you go and you find those unusual people and you study their vision…

        But getting a big grant to study genetic factors related to education, and throwing genes into a blender, finding some cutoff for “association” and declaring victory with another Nature paper should by all rights get you jailed for defrauding the govt, and the people on the grant review board who are probably your best friends too.

        • Specifically, think about it, what are the actionable consequences of the knowledge that there are say 1000 genes a variant in each of which provides some effect on your brain that on average induces you to stay in school about a week longer (or a week less in the alternative case)? (just assuming it’s true, which is already questionable)

          This is like the statement “there are a lot of cars on the freeway, each one you get within a quarter mile of provides about a 1/10000 chance of causing you to have an accident on your commute per week”

          So, what? Specifically, everyone with a pulse already knows this, and the only way we can use this information is the way we already use this information, which is to have broad requirements for people to take driving classes, learn things about defensive driving, take eye tests, update their license, re-test on the laws every 10 years etc. etc.

          On the other hand, if someone has been caught drunk driving they lose their license. If they have had several speeding tickets we up their insurance costs… Those people are responsible for outsized risks. If you could identify another outsize risk, like say a willingness to wear socks with sandals and listen to the Doobie Brothers, then you could charge those people more for insurance or send them to reeducation camps where they are made to break in a pair of loafers, and force them to listen to James Brown for 4 hours a day…

          This study takes what should be the default position: “lots of things have small effects” and applies a methodology which is essentially guaranteed to produce evidence that confirms that hypothesis independent of what you’re studying… (filter things based on p value based evidence that they have a small effect) and then declares success. It’s like someone handed these guys a coin sorting machine and now they’re going around declaring that a large fraction of the coins you find under people’s sofas are pennies dimes or nickles, but that relatively few of them are gold doubloons…

          duh

          • ElJefe says:

            “Specifically, think about it, what are the actionable consequences of the knowledge that there are say 1000 genes a variant in each of which provides some effect on your brain that on average induces you to stay in school about a week longer (or a week less in the alternative case)? (just assuming it’s true, which is already questionable)”

            When people buys stocks they have to make significant financial decisions based on correlational data. How significant is “weak” correlation will depend on how much money you want to invest

            • Anoneuoid says:

              Why would you care about “causality” in that case though? Or even the individual correlations?

              You would throw everything you can into the model and it synthesizes the information for you into a buy/sell signal. If your training data generalizes and you didn’t overfit the model (both are easier said than done) then you should be right the expected proportion of the time.

              • jim says:

                “Why would you care about “causality” in that case though? “

                Well obviously people do because investment houses spend a lot of money on analysts who put out a tremendous effort to find the likely drivers of the future price of a stock, combing through every detail of management plans on capital spending, cost savings’ plans, marketing plans acquisitions and everything else.

              • Anoneuoid says:

                “Why would you care about “causality” in that case though? “

                Well obviously people do because investment houses spend a lot of money on analysts who put out a tremendous effort to find the likely drivers of the future price of a stock, combing through every detail of management plans on capital spending, cost savings’ plans, marketing plans acquisitions and everything else.

                Where is causality (vs degree of correlation) important here?

            • gec says:

              My interpretation of Lakeland’s point (which, of course, he should feel free to correct!) was that, to the extent that these GWA studies are detecting anything, it is only weakly correlated with outcomes of interest and, given that lots of things are weakly correlated with outcomes of interest, the genetic information does not deserve a Privileged Position within the Pantheon of Possible Predictors. You might as well use another weak correlate that is either easier/cheaper to measure or spend time looking for better correlates that are more likely to steer you toward discovering important causal factors.

              In other words, “investment” happens on both ends—making decisions about what to measure AND what to do with that information. Lakeland seems to be arguing that investing in collecting GWA information is not worthwhile.

              • Right, why should we have ever funded GWAS after the first few of them? The whole premise is to find some uncontrollable aggregated effects that there is basically no actionable consequences to… Like knowing that driving a car exposes you to accident risk, and that driving a more safe car design using techniques of defensive driving reduces your risk of serious injury or death… those are useful things. How useful is it to know that in aggregate, minor knicks in the paint, small divots in the tread of your tires, a piece of hanging plastic under the trunk, some minor rust on the exhaust pipe, sun fading on the dashboard, a little tear in the rear seat upholstery, and 100 other minor issues that occur on your car all in aggregate increase your risk of death from 100/100k to 118/100k per year? (made up those numbers, but it’s the kind of thing we’re talking about)

                There’s no way to “fix” all those things, because doing so would be more expensive than buying a new car, and buying a new car would get you more safety… so knowing that *if* you could somehow fix all those things your risk rate would go down by dx amount is not that helpful. It would be HUGELY helpful to know that say putting some anti-shine matte finish on your dashboard is going to reduce glare in a way that cuts your risk of death from 100/100k to 3/100k but you know, that’s not what GWAS is looking for, nor has it ever found any such thing. So… what’s the point?

                IMHO the point is once guys have put together a pipeline of analysis and databases and things they need to keep busy doing stuff or someone will notice and they’ll stop getting paid.

                There are useful ways to use biological data to study things, and doing analysis of biological data is useful in those contexts, but GWAS doesn’t seem like one of them.

          • Martha (Smith) says:

            Daniel said, “It’s like someone handed these guys a coin sorting machine and now they’re going around declaring that a large fraction of the coins you find under people’s sofas are pennies dimes or nickles, but that relatively few of them are gold doubloons…”

            Good metaphor.

            • Clyde Schechter says:

              I’m actually replying to Dan Lakeland’s comment here, but there was no “Reply to this comment” link under it.

              Even though I agree that GWAS studies are not nearly as useful for understanding causality as some people would like to think they are, there are plenty of reasons to fund more such studies. Their predictive value can be good and actionable. Several GWAS studies have been done on risk of breast cancer. In some women, their polygenic risk scores are sufficiently extreme (high or low) that one has grounds to recommend that they deviate from conventional breast cancer screening recommendations (screen more or less intensively, respectively). Some might also consider recommending chemoprophylaxis with tamoxifen to women with sufficiently high breast cancer risk as estimated from a polygenic risk score. Whether the SNPs used in the scoring are causal or merely incidentally associated with breast cancer risk doesn’t matter for these purposes. But it does emphasize some of the points made in the original post here: risk scores developed in databases of information about women of European ancestry tend to have poor predictive value in women of African or Asian ancestry. And I suspect that if we had more data at more fine grained geographic ancestry levels, we would find similar deviations. Nevertheless, these predictors can have practical uses and we are just beginning to sort out which do and which don’t.

              • Martha (Smith) says:

                Good point.

              • Anoneuoid says:

                Why isn’t ancestry used as a feature in these models?

              • Where there’s a hope of an actionable decision being made on a risk score, I can see funding it… But I wonder, how much better is such a risk score based on GWAS than based on say asking people about their family history and looking at some environmental factors (smoking, diet, weight, etc)? I could imagine for some things they’d be much better, but I could also imagine for some things they’re no better but certainly much fancier seeming.

              • Cyde Schechter says:

                Replying to Anoneuoid and Daniel Lakeland–again there is no “Reply to this comment” link at this level.

                Ancestry isn’t used as a feature in these models at this time because the existing data is overwhelmingly from women of European ancestry. There are a few small GWAS’s for breast cancer risk in women of African ancestry–but the PRS scores from those databases do not separate high and low risk nearly as well. One imagines that with time there will be more data and better ability to apply ancestry-specific polygenic risk scores effectively.

                There aren’t that many Asian women in the United States or Europe, where most of this research is done. Obviously there are plenty of Asian women in China, Japan, Korea, and Taiwan, countries which have the technology to carry out this kind of research. But breast cancer has a pretty low incidence among Asian women living in Asia (they acquire higher risk when they migrate to the West), so this has not been as high a priority in those countries.

                The information provided by polygenic risk scores is certainly not orthogonal to what can be gleaned from a good family history and assessment of some other risk factors (e.g. age at menarche, number of pregnancies, breast radiologic density, obesity, smoking). But it does add some additional discrimination ability. Here’s the thing: it is only for populations whose risk of breast cancer differs from that of the general US population by a factor of about 2 (in either direction) that the tradeoff between harms and benefits of the general recommendations for screening changes enough that a materially different approach makes sense. The non-genetic risk factors when put together in the best way we know at this time, classify only a tiny handful of women as being at that high or low a relative risk. When we add polygenic risk to the mix, even though it doesn’t add a lot of independent information, it does increase the number of women who can be designated as high or low enough risk to warrant a different approach. It’s not a huge number of additional women, but it’s something.

                I don’t know where the future is going to lead on all of this. Over the past 50 years, and particularly in the last 25 years, improvements in treatment for breast cancer have contributed more to breast cancer mortality reduction than have improvements in screening. If treatment becomes sufficiently effective and treatment toxicity is sufficiently reduced, then screening will become less important, so strategies to target screening most efficiently may become less important as well. But it’s anybody’s guess whether this will happen.

                All of this is a long way of saying that this is one example where polygenic risk scoring has some real utility, and improved polygenic risk scoring holds the potential for further value added in health care. Whether that gain will be realized remains to be seen. But I think it would be premature to shut the door on this kind of research at this point.

              • Martha (Smith) says:

                Thanks, Clyde, for this explanation that helps put things in perspective.

        • Joe Nadeau says:

          Indeed, every gene contributes. But they do not contribute equally, hence the rationale for most genetic studies, which genes, under which genetic and environmental conditions, is the fundamental question.

        • While it may seem obvious that “everything causes everything,” it isn’t obvious to many people. Much of biology is dominated by the opposite extreme of thinking as that which is criticized here: acting as if a gene has a single function, and so studying that “known” function in excruciating detail tells us something useful. (I’ve lost count of how many microbiome studies, for example, find an abundance of transcripts of gene X in some sequencing study, which some poorly annotated list says is involved in pathway Y, from which we conclude that Y explains the phenotype of the studied group.) Even when multiple genes are involved, there’s a strange belief that the number must be small, and there are “higher-order” interactions between them that must be invoked. The notion that *lots* of small effects can add up, while it should be obvious, apparently isn’t. (Maybe it’s too painful to acknowledge?) Even if just as an educational tool, I think these studies are valuable.

          Separate from this, there’s the criticism written in many comments that knowing how small effects add up isn’t interesting or useful. This I really don’t understand — don’t we want to know things? But I should save that for another comment…

          • Garnett says:

            “…knowing how small effects add up isn’t interesting or useful.”

            Following Daniel and Anon’s logic, ALL effects are the sum of many effects.

            • Anoneuoid says:

              I never liked the phrase “sum of the evidence”, or the use of “sum” or “add up” here. It sounds too much like a linear model of the form y = ax + by + cz + ….

              Instead I say every event is collectively caused by everything that came before. The goal should be to synthesize the various evidence into a model.

          • Martha (Smith) says:

            “Even when multiple genes are involved, there’s a strange belief that the number must be small, and there are “higher-order” interactions between them that must be invoked. The notion that *lots* of small effects can add up, while it should be obvious, apparently isn’t. (Maybe it’s too painful to acknowledge?)”

            Well put.

      • Garnett says:

        Daniel and Anon:
        Despite your best efforts, a person trained to believe that “cause” is discovered through suitable data collection and analysis will never agree that “everything causes everything”. A better argument is needed to convince that mindset, and I think it’s worth the effort.

        One approach that I think *might* work (paraphrased from Daniel) is that accepting “everything causes everything” may completely explain the notion of randomness. i.e. we don’t need an aleatory definition of randomness, but an epistemological definition is entirely sufficient.

        • Anoneuoid says:

          I think it is such a fundamental principle, convincing someone would be like telling a monk not to worry about God’s role in whatever phenomenon he is concerned with. It just isn’t going to happen like that.

  3. Adede says:

    I find problematic the intermingling between intelligence and educational attainment. I’ve observed only a weak correspondence between the two.

  4. jim says:

    “On one hand…and it seems reasonable to start by drawing hypotheses and tentative conclusions based on what we have;”

    Absolutely. I

    The real question at this point isn’t so much about the science. The science seems highly speculative, but that’s fine, scientists can work that out. The real issue is the way it’s played – or more likely overplayed – in the press.

    • But not just in the press, also in the grant review boards and funding decisions. Why are we funding this? It is more or less the equivalent of dowsing rods.

      • jim says:

        We try things because the outcome isn’t always obvious at first. If we fail at first we keep trying because the path to or what constitutes success isn’t necessarily obvious either. That’s basic research.

        It’s critical to do wildly speculative things – even things we don’t think will work – as long as we understand the relative value of the work and don’t overplay the results. At some point obviously we stop funding flat earth research – that’s a judgement that’s up to the funding agencies and boards.

        • But GWAS studies are not new, nor have they ever really discovered that “x gene is the key” nor are they even trying to do that… They’re trying to find out what we know already… which is that lots of things matter a little bit. It’s as if people were throwing 12 dice and trying to figure out “why” they average so close to the same thing each time… because that’s the overwhelmingly most likely thing for them to do.

  5. Daniel Weissman says:

    This post mixes together two somewhat distinct things, genome-wide association studies (GWAS) and polygenic scores (PGS, sometime “polygenic risk scores”, PRS). At the extreme GWAS end, you’re looking for special genes that have an unusually strong influence on a trait. At the extreme PGS end, you’re looking to predict an individuals’ trait value from their genome. So in the former, you want to apply a Bonferroni correction and zoom in on outlier genes, while in the latter you’ll use many more genes and not worry so much about whether you’ve accurately estimated individual effect sizes.

    As Daniel Lakeland points out in these comments, why you’d want to do either one of these things is generally a very good question. There are good answers in a few cases, but in most it seems to be that this is a way to get funding and glam publications for pretty sterile research.

  6. somebody says:

    I feel like I have to be missing something. I really don’t understand how these GWAS approaches are any different from the old social science/epidemiology regression monkey-ing. It feels like building wide predictive models on a very large set of variables and naively assigning a causal interpretation to the coefficients. The approach seems fundamentally unable to distinguish between correlation and causation. Worse, causation doesn’t even really seem well defined; causal according to what counterfactual? Some kind of hypothetical gene editing? Assuming there are identifiable super-smart engineer genes, giving someone the super-smart engineer genes by some genetic therapy that doesn’t exist yet in a country with no universities or manufacturing infrastructure won’t make them an engineer. So even assuming well-identified unbiased estimates of the causal effects of the linear sum of some subset of genes, those causal effects are only meaningful with respect to the population from which the subjects are drawn, no? But I never read a description of what population researchers are talking about except implicitly where the data source is described. And I’ve never seen even an attempt at describing an actual causal pathway here.

    To summarize, I haven’t been able to identify a reason why any of GWAS isn’t vulnerable to the zip code argument; zip code is highly predictable from your genome, and zip codes are highly heritable. The correlations are real and even meaningful, but it’s obvious that doesn’t mean genomes cause zip codes. I don’t see why any of the GWAS regressions are any different, despite the fact that they’re claiming causality. But I’ve never read a practitioner in the field express any of this skepticism, only pure statisticians. I’m disinclined to assume they just don’t understand or care about these issues. It’s a big field with a lot of people, so there has to be something more to it, right? The emperor has clothes?

    • somebody says:

      Addendum: I’m not denying that genes have an effect on everything, but I’m not really sure how they’re claiming to estimate the causal effect of some particular subset of genes, or what it means when researchers say “these genes have been linked to higher rates of schizophrenia”. It seems like if we ran these models in the 1850s, they’d identify in the same way that genes for African ancestry are linked to lower life expectancy. I’m sure that some genes do cause lower life expectancy too, but I don’t see how we can distinguish between the two with these methods

      • Anoneuoid says:

        The thing with any type of code is that there is always multiple ways to accomplish the same thing. Lets say I need my code to tell me the proportion of “True” in a vector, which doesn’t seem that different from what a biological system may have to do (signal integration).

        Starting with the input x = c(T, F, T, T), the simplest way would just be mean(x). But then you tell me I can’t use the “mean” function (it was mutated), so then I do sum(x)/length(x), but then you tell me I can’t use the “sum” function so I count the number of trues in a “for” loop, then tell me I can’t use the “length” function so I need to use a “while” loop, etc.

        Some methods are more memory efficient, while others faster, others are easier for a novice to the language to understand, etc. So the optimal solution depends on your constraints and use-case, but these differences may be negligible until the code needs to be used under some stressful situation.

        A real life example could be the inactivation of L-gulonolactone oxidase, which leaves the organism dependent on ascorbate in the diet. As long as the organism gets enough of the vitamin in their diet, the gene is irrelevant. But when it is not, the organism gets scurvy.

        However if the organism also starts expressing GLUT-1 transporters on its Red Blood Cells (RBCs), then it can reduce dehydroascorbate (oxidized vitamin C) back to ascorbate with glutathione. Basically by recycling the same vitamin C molecule it can even go for months without any vitamin C in the diet. This will work as long as it is getting sufficient glucose to maintain sufficient reduced glutathione in the RBCs.

        Then you can throw in more factors like: say you get a malaria parasite living in the RBCs which oxidizes the glutathione faster than it can be regenerated. Then too much vitamin C can be undesirable. It will react with the glutathione, which regenerates the ascorbate, but will also further oxidize the glutathione until it falls below minimum levels and trigger death of the RBCs. That will kill the organism far faster than scurvy, so then you can get some whole other set of thousands of genes involved in reducing the appetite for fruit when sick with malaria.

        tl;dr:
        The entire “genes cause attributes” way of thinking about what is going on is fatally flawed. There are always multiple ways of accomplishing a very similar thing, each with its own advantages/disadvantages that often differ depending on the circumstance.

        • somebody says:

          > The entire “genes cause attributes” way of thinking about what is going on is fatally flawed. There are always multiple ways of accomplishing a very similar thing, each with its own advantages/disadvantages that often differ depending on the circumstance.

          Right, but I feel like GWAS literature frequently explicitly claims that the “effect” of genes is the sum of independent small genetic effects. On the other hand, microbiology and biophysics literature generally reads like your example and these two descriptions feel fundamentally incompatible with each other.

          • Anoneuoid says:

            I’m more saying the “effect” of a gene depends on what other genes are around along with the circumstances. It makes no sense to look at the effect of a gene in isolation.

            • Martha (Smith) says:

              It makes sense to me that some effect might come from (for example) the presence of particular variants in two genes– neither variant alone would make a difference. So it’s neither “this gene” nor “an additive effect”.

            • somebody says:

              Right, you need a set of circumstances and a counterfactual, which makes inferring causality of genetics only possible in very specific circumstances and probably near useless. But then GWAS style analysis makes claims like

              “However, the newly identified SNPs also lead to several new findings. For example, they strongly implicate genes involved in almost all aspects of neuron-to-neuron communication.”

              “We found that the median effect size of the lead SNPs corresponds to 1.7 weeks of schooling per allele; at the 5th and 95th percentiles, 1.1 and 2.6 weeks, respectively.”

              I’ve never heard the counterfactual or circumstances defined. My thinking is that the literature must have some standard meaning that I’m missing, or else they’re just using causal language to describe a fundamentally predictive model without thinking about it (or worse, because it makes the research sound more exciting)

    • Daniel Weissman says:

      If you have a bunch of siblings you can do a pretty good job of testing causality.

    • Jim Dannemiller says:

      There is a case to be made that GWAS results allow causal inference in a way that correlational studies do not because the phenotype cannot strictly speaking “cause” the genotype. It would be odd to interpret a correlation between genotypic variation and phenotypic variation as implying that the variation in the phenotype caused the variation in the genotype. Whether someone has an A base at a particular place in the genome or a C base cannot be “caused” by their phenotype.

      • Martha (Smith) says:

        You seem to be forgetting that correlation doesn’t necessarily mean that one variable “causes” the other. (e.g., there might be a common factor influencing each variable — or even really spurious correlations; see e.g., https://www.tylervigen.com/spurious-correlations)

        • Jim Dannemiller says:

          That’s certainly true, but in this case, it’s hard to imagine what that third variable would be. The alleles that are inherited to comprise the genome 1) are determined well before the phenotype even exists, and 2) as far as we can tell, they are a random sample (with constraints imposed by LD) of the two possible alleles at each polymorphic locus that could be inherited from each parent. I was just trying to make the case that the causal arrow that is clearly ambiguous for many reasons with most correlations, is not as ambiguous in the case of a genotype/phenotype correlation because of the nature of the processes involved and temporal ordering issues. I am speaking strictly about the base sequence that one inherits; epigenetics is another matter.

          • As someone else pointed out, zip-code is highly heritable. The truth is that both zip-code causes DNA and DNA causes zip-code…

            1) zip-code causes DNA: by living in this particular location, it provides an environment in which a couple feels comfortable with having children. If you causally moved them to some much more expensive zip-code, or forced them to live in the middle of nowhere, they would undoubtedly consider children differently.

            2) DNA causes zip-code: DNA affects people’s lives in many ways, if you live in a high end zip code it’s in part because you came from a family that had some money, you had a decent education, you lived to be 35 years old… Any number of mutagens given to you at age 2 would have made it so you died well before 35 from cancer…

            so… both.

            • Jim Dannemiller says:

              Zip code might “cause” them to have children, but it doesn’t cause the particular genotypes those children have. If it did, then every child born to that couple in that zip code would have the same genotypes. GWAS studies aim to relate genetic VARIATION (SNPs) to phenotypic variation. A constant zip code cannot cause genetic variation in a set of siblings, so, saying that zip code causes DNA is true in the sense that you described, but that’s not the level at which the DNA matters in a GWAS.

              • > then every child born to that couple in that zip code would have the same genotypes

                No, every child born to a couple in this zip code would have a different genotype than they would have if they were born to the same couple in a different zip code… Which is trivially true. I mean, once you’re conceived you can’t change your genotype substantially by having your pregnant mother move zip codes, but since it takes time to move zip codes, being conceived in one zip code results in a different conception than being conceived in another. So, the counterfactual of your mother being forced into another zip code would have resulted in you being a different genotype.

          • somebody says:

            > That’s certainly true, but in this case, it’s hard to imagine what that third variable would be.

            Maybe I‘m pretty imaginative, but that doesn’t seem hard to me.

            Height in China and Mexico is substantially lower than in the US. It’s easy to predict one’s height using a PGS style model on data from the three countries, and could probably identify that the “effect” of genes that predict Chineseness/Mexicanness is negative on height. But children of immigrant families from China and Mexico are closer in height to white Americans than to their parents.

            It’s not as easy to do this in general, but that doesn’t mean confounders don’t exist, just that the problem is harder to wrap your mind around the problem.

      • Anoneuoid says:

        I think you are drastically underestimating the mutation rate in human cells.You probably have no two cells with the same genome and many that have been selected for your specific environment.

        Eg, lung cells in smokers vs nonsmokers will have been selected for ability to survive excess oxidative stress. I’m not talking about cancer here, just normal, beneficial evolution of the tissue in response to its environment.

        A person does not have a single genome, some studies even report substantial numbers of (normally functioning) cells are actually aneuploid. They take an average of many cells and report it as your “genotype”, which I think has led to a lot of confused thought.

        • Jim Dannemiller says:

          If that were true, then genetic paternity tests would be worthless. My “average” genotype at all of the markers would look nothing like my father’s “average” genotype if genomes were nearly as chimeric as you are claiming. Instead, my alleles at those markers on which my father is heterozygous exactly match half of my father’s two alleles across those markers, and they match all of my father’s alleles at the markers on which my father is homozygous, the exception being very rare germline de novo mutations.

          • Anoneuoid says:

            Look up somatic mosaicism, eg:

            Genomic changes are likely to be derived from disturbances in genome maintenance and cell cycle regulation pathways along with environmental influences (genetic-environmental interactions) [7, 27, 28, 32-47].

            https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850503/

            Also, the regions used in paternity tests were specifically chosen due to lack of slective presure, they are not representative of DNA that would be relevant to the cell’s survival/behavcior/etc.

          • Anoneuoid says:

            Another good one:

            To obtain a more comprehensive picture of the mutational landscape of normal cells, we performed whole genome sequencing to 147× depth on a biopsy containing this clone. This identified 73,904 base substitutions and 2,248 small indels, with a mutation signature largely dominated by UV light exposure (Fig. S12B,C). About 14,000 of these were clonal (~4.6/Mb), presumably hitchhiking with the FGFR3 and TP53 mutations, but the rest were subclonal, often in <20% cells (Fig. S12A). Integrating the allele frequencies, we estimate an average of 21,102 mutations per genome per cell (~7/Mb) in this sample. The mutation rate was found to vary along the genome, with higher rates in lowly expressed genes and in repressed chromatin (Fig. S13), as observed in cancer (47) and human evolution

            https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4471149/

            • Jim Dannemiller says:

              I’m aware of those studies and others that show that it is mostly retrotransposition of mobile elements that causes neuronal mosaicism, but my reading of that literature suggests that the jury is still out on the physiological impact of those events. Whether insertion of a retrotransposon has an impact depends to a large extent on where it gets inserted (exonic vs. intergenic), and in turn on whether it affects expression. So the presence of heterogeneous neuronal indels by itself does not necessarily imply functional/neurophysiological effects large enough or pervasive enough to manifest in phenotypic differences.

              This is getting pretty esoteric and far afield of the original question. I just thought that it was important to ground some of the claims about GWAS studies in some biology/genetics. Thanks for the references.

              • Anoneuoid says:

                Why wouldn’t selective pressure on tissue stem cells be taken advantage of to make the organism more tuned to the environment? We should assume that all population of cells are evolving at all times as a default.

                I don’t find this esoteric at all, it is a basic fact of life.

  7. Max says:

    It seems to me that a lot depends on context. How well-supported are various alternative wide-range explanations for observed phenomena are? Are the alternative theories more noise-resistant and causally sound than whatever GWAS etc outputs ? Usually political pressure is to “do something right away”, school administration has no time to wait for conclusive evidence.

  8. Joe Nadeau says:

    From a genetics perspective, these studies have several serious (fatal) problems that should be, but apparently are not, obvious to practitioners,

    Trait measurement. How does one measure IQ reliably? A messy trait at best. And an ethically questionable trait that has often done more harm than good.

    Association versus causality. GWAS is great for association (hence the A in GWAS) across often heterogeneous populations, but poor for causality, which is an attribute of individuals. Sometimes population measures are relevant to individuals, but often they are not, and the consequences of applying population information to individuals is not always pretty, or ethical. Taking heterogeneity into account is hard because one needs to understand its nature and origins – what does take into account. (The growing evidence that ancestral genetics and environment affect current phenotypes is an example of the complexity.)

    Context-dependence. Most traits depend on an often inextricable combination of genes and environment. Context is often everything.

    An additive model in a non-additive world. Polygenic risk scores are based on an additive model. But considerable evidence shows that non-additivity is deep and pervasive.

    The considerable evidence for context-dependence and non-additivity is usually ignored because it is inconvenient – context-dependent, non-additive models are thought to be hard, at least in the genetics community.

    Relevance. Even if all of these issues could be resolved, what have we learned, how is any result useful? Geneticists have a responsibility to do good genetics, good research. Traits like IQ, sexual preference partners, social status need special justification if only to avoid the obvious social harm. In what world can this information be used in beneficial way? This should be the first question that any geneticists should ask before undertaking GWAS and PRS analyzes.

    • Steve says:

      “Traits like IQ, sexual preference partners, social status need special justification if only to avoid the obvious social harm. In what world can this information be used in beneficial way?” I think that there is one way to answer this question. The GWAS studies have actually thrown doubt on the “genes cause X” stories, by demonstrating that under additive model traits like IQ, mental illness are the result of the interplay of dozens of genes each contributing tiny effects. So, having these studies done is helpful is demolishing he “genes cause X” stories. The prior twins studies were (and still are) used to argue that “Genes cause approximately 50% of trait X” (which curiously is true for almost any trait X you pick). So, in that sense these studies have been helpful. Behavioral geneticists were suppose to find a “Gene for X”, and couldn’t. Then, they were suppose to find “genome wide associations for X”, and found tiny effects from large numbers of genes. So, maybe this evidence will push people to stop thinking that the additive models can provide “genes cause X” explanations.

    • Martha (Smith) says:

      +1 to Joe. Steve also raises some good points.

  9. Terry says:

    I can see being very skeptical of these studies. Indeed it would be surprising if this field did not suffer from over-reaching and all the other problems other research areas suffer from.

    But, I can’t see ignoring this area. Brand new technology is giving us brand new data on something very real and important. How can we not study it?

    • Andrew says:

      Terry:

      Good point. To put it another way, the data are available so people will study this stuff, hence it’s good to understand what can be learned from such analyses.

      • Joe Nadeau says:

        Agreed, in general. But too often work on IQ and social traits has reinforced bias, prejudice and stereotypes. Presumably we have an obligation to chase our curiosities. while doing no harm.

        • Andrew says:

          Joe:

          I’m saying it somewhat differently. Whether or not we are personally interested in studying IQ and genes and social traits, there are people who will be interested in studying these topics, whether out of careerism, intellectual curiosity, a social or political agenda, or some other reasons. Given that the data are available, we know that people will be studying these things, so we should understand what can be learned from such analyses.

          • Joe Nadeau says:

            Andrew, I’m pretty sure i understood your original and second point. I guess my concern is what is meant by ‘we should understand what can be learned from such analyses’. It is human nature to ask questions that seem questionable – we can of course pursue many questions, but we ought to have the good sense and conscience to let some questions lie. If you mean ‘learning from mistakes’, then I agree with you. Otherwise I disagree, knowing for example a little about the eugenics movement in genetics, and the persisting harm that was done. This is not a ‘politically correct’ argument, but rather an argument to avoid harm, real harm.

            • Andrew says:

              Joe:

              OK, then I guess I don’t understand your point . . . So let me try again.

              Suppose that researcher X is going to do an analysis Y of genes and social data that, if crudely interpreted, could lead to racist implications that could be used to justify policies that could hurt people. Then it seems valuable to understand what analysis Y really implies scientifically, to point out the gaps in or hidden assumptions researcher X’s logic which lead to these problematic policy recommendations.

              That is, suppose your goal is to avoid harm. You can’t stop X from doing that analysis. But you can understand the science to be able to reveal gaps and biases in X’s scientific and policy conclusions. If you just step back and refuse to pursue the questions, that won’t solve the problem, because X is still going to pursue them.

              • Steve says:

                Andrew:

                Given no budget restraints, I agree. But, given budget restraints, we shouldn’t waste money on studies that are unlikely to result in real discoveries, but likely to be used to justify morally dubious ideologies. That said, as I posted above I think the behavioral GWAS studies have actually undermined the “Genes cause X” stories that have been used politically to defund interventions to alleviate poverty, inequality etc. But, at some point, why should we keep spending money on this?

              • Vinayak Bhardwaj says:

                This is a tired old argument.

                We can keep on doing this until we find out every flaw in every racist argument on earth.

                This argument which tries to link polygenic traits to social effects is an old one, trotted out by every supposed champion of free speech and fearless inquiry (almost all of whom are, coincidentally, white and male).

                It is based on logic that is truly useless. The amount of genetic variation across humans is painfully small. The Fulani tribe in Nigeria has more in common genetically with Swedes in Sweden than with other Nigerians. So what possible value can be derived from these GWAS analyses? In any case, epigenetic effects trigger gene expression. And epigenetic effects are boundless in their scope. So ultimately this line of inquiry is wrong-headed, with dangerous consequences, with really zero scientific or even logical basis. Free speech might be the last bastion of the otherwise epistemically useless.

              • RogerSweeny says:

                Vinayak, I went outside to test your statement that “The amount of genetic variation across humans is painfully small.” I found an amazing difference in the way people look, In fact, I could distinguish almost all of the thousands of people I saw. But I suppose genetic variation only shows up in looks.

              • Ben says:

                > If you just step back and refuse to pursue the questions, that won’t solve the problem, because X is still going to pursue them.

                If we’re going down this path, forget genetics — let’s hit up the ESP guys, run some cold fusion experiments, and jump off a few cliffs.

                I liked this review of the first article: http://unwashedgenes.blogspot.com/2018/07/why-this-is-nicely-wrapped-eugenics.html

                From: https://statmodeling.stat.columbia.edu/2020/01/24/the-latest-perry-preschool-analysis-noisy-data-noisy-methods-flexible-summarizing-big-claims/

                > Remember Gresham’s Law of bad science? Every minute you spend staring at some bad paper, trying to figure out reasons why what they did is actually correct, is a minute you didn’t spend looking at something more serious.

              • Curious says:

                Roger Sweeney,

                You are making a common mistake by conflating highly specific genetic variation with our crude psychometric measures. Simply because variation in reality may exist, does not mean our crude attempts to measure it are accurate. I defy anyone to demonstrate a consistent difference in performance for people who score a 120 on an IQ test and those who score a 121. Any research claiming such would be laughable given what we know about the types of error that occur in the measurement of this construct, which is logically “whatever the test measures”. It is a ligical absurdity to believe that such a measure can adequately capture genetic variation.

              • Andrew says:

                Ben:

                You’re missing the point. The correct analogy is not that I’m saying, “let’s hit up the ESP guys . . .” The correct analogy is, “let’s understand why reputable scientists think the Bem paper showed evidence for ESP, and let’s understand how random noise could look like evidence for ESP.” Psychologists will be studying ESP, embodied cognition, ovulation and clothing, himmicanes, etc., so, yes, it’s worth spending some effort understanding the (mistaken) statistical appeal of, and errors in, such studies.

              • RogerSweeny says:

                Curious, I completely agree that there is a big difference between “genes cause significant differences in human beings” and “we know what genes cause how much difference.” I was simply criticizing Vinayak’s idea that since “genetic variation across humans is painfully small”, that small genetic variation could not have any significant effects.

                “I refute it thus.”

              • Anoneuoid says:

                The amount of genetic variation across humans is painfully small. The Fulani tribe in Nigeria has more in common genetically with Swedes in Sweden than with other Nigerians.

                How do you know this when even the human reference genome is only 90% complete? https://reasons.org/explore/blogs/theorems-theology/read/theorems-theology/2019/08/15/a-post-genomic-era-ludicrous-with-an-incomplete-human-genome

              • Ben says:

                Andrew:

                You say:

                > it’s worth spending some effort understanding the (mistaken) statistical appeal of, and errors in, such studies.

                But when Joe said:

                > If you mean ‘learning from mistakes’, then I agree with you.

                You said you didn’t understand. And now we’re in a loop.

            • jim says:

              Vinayak: genetics isn’t about race unless you make it that way.

              • Stephen Richards says:

                For Roger Sweeny and everyone else:
                What is meant by the comment “The amount of genetic variation across humans is painfully small” is that it is in comparison to other species.
                In general if you compare DNA hapltype sequences between people, there will be 1 change in every 1000 base pairs (ACG or T). This is a relatively low number.
                In other species – fruit flies the number is about 1 in 200 bp or 5X more, and it can go as high as 2% depending on the species.
                For humans this means that of the 3 billion base pairs for each of our haplotypes, there are only ~3 million single nucleotide variants (called SNVs in the trade) (in fruit flies it is about 0.5 million SNVs because the genome is smaller at only 180 million basepairs)
                When you look at a lot of people with different changes you end up with a massive multiple testing problem that no-one has really solved, so it will need a different approach.
                If you look at a simple phenotype, and a small location in the genome – for example gene expression, and just look for SNVs that affect gene expression in a small region of the genome around that gene, then it works fine, as no power problems.

                Overall I think these large experiments have been valuable, but more by driving down the cost of data generation, and identification of mendelian disease – which will likely be important for 5-10% of the population and perhaps be treatable for some diseases in the future with Crispr. It will also be useful for identifying genes of large effect that can be targeted by pharmaceutical companies.
                Finally the failure to really be able to predict from a genome a phenotype as if we were measuring the phenotype from a identical twin has not escaped the field, and a nice paper discussing that failure is: https://www.sciencedirect.com/science/article/pii/S0092867417306293

                Hope all this helps!
                fringy

  10. oncodoc says:

    In 1875 many believed that the physical world was ruled by strict mechanical determinism but that living things had some other underlying principles; Pasteur believed that fermentation was only possible in living things. Now physicists think that random quantum processes are important, and we have biologists searching for the chemical basis of why some people are good at math or playing the violin.

  11. Vinayak says:

    Surprised that the word ‘epigenetics’ has not been mentioned once either in the blogpost or in the discussion below.

  12. gregor says:

    I wouldn’t dismiss 11% explained variance. That would mean the actual and expected values are about 33% correlated. Similarly, the SAT in isolation explains something like 13% of the variance in freshman GPA and some people insist it is therefore a “poor predictor” which is nonsense. The difference between say a 450 and 650 math score is massive. If you were to put the full range of students in the same calculus course the predictive power would be so obvious you would not even need to bother with formal statistical analysis. Some measures don’t look great in terms of linear correlation when you are comparing to a noisy dependent variable (grades, educational attainment, income, etc), especially with restriction of range issues, but if you look at expected outcomes for high and low scores, the large effects should become apparent. What is the likelihood of someone with a -2 SD genetic score completing a degree in engineering from a good school? What is it for +2 SD?

  13. Rick G says:

    The summary of how polygenic scores are computed is not accurate at all. It is one linear model per trait, not per gene. Also, there are good theoretical and experimental reasons for believing that linear additivity is the best model for most genes, most traits, and most of the time. Claims that this interacts with that, etc., as if this were the first time anybody had thought of this, are not helpful. If one wants to look at the models on which this kind of research is actually based, they are not hard to find and engage with.

    Low explained variance is now widely believed to be due to rare variants, i.e. prediction is hard because a lot of variance is determined by mutations each of which only a very small fraction of individuals have, but each individual has many such mutations. The prediction problem is sparse, and even GWAS with a million individuals is not enough to crack it. This also explains why broad sense heritability for things like IQ is high (and shared environment continues to explain roughly 0% of the variance) while actually getting a PGS to show such high variance may prove impossible with realistic amounts of data.

    • gregor says:

      Interesting comment, Rick. These rare variants, is that the same as “mutational load”? And it’s usually bad? So your GWAS is sort of like a ceiling and the mutations deduct from there? Are smart people smart partly (primarily?) because they have fewer of these mutations? And this isn’t really captured by GWAS studies?

    • Steve says:

      Rick G writes “There are good theoretical and experimental reasons for believing that linear additivity is the best model for most genes, most traits, and most of the time. Claims that this interacts with that, etc., as if this were the first time anybody had thought of this, are not helpful.”

      This is an argument from authority. If there are good reasons, state them. If there is good evidence cite to it or explain it. People here are open to evidence.

      • Anoneuoid says:

        At the very least every gene strongly interacts with every embryonic lethal gene. That is apparently about 30% of all genes in mice. Actually, it seems 43% of genes can be mutated to lead to death before birth:

        In addition, 57 % of the lines were viable, 13 % subviable, 30 % embryonic lethal, and 7 % displayed fertility impairments.

        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3463797/

        So the clam that linear additivity is the best model must be restricted to some very limited subset of genes.

        • Max says:

          If additivity is not the rule when it comes to genetic differences, what explains correlations between relatives and their proportionality to relatedness? See this table. Even two-factor interactions contribute relatively little to covariances between relatives and higher-order interactions contribute approximately nothing. If higher-order interactions were a predominant explanation of genetic variation, genetic correlations between parents and offspring would be ~0.

          • Anoneuoid says:

            I’d have to look into where those numbers came from (is that data or model output?), all I can tell you is it makes little sense when you consider what a gene is.

            It looks to me like people are trying to explain “variance explained” which I don’t see as a number that reflects the reality of what is happening. I suspect there is/are probably some averaging artifact(s) involved too.

            • Max says:

              It’s from Falconer & Mackay’s Introduction to Quantitative Genetics. The covariances can be derived from Mendel’s laws. Your perspective on what genes are is that of a molecular geneticist, which is a completely different perspective from that of an animal breeder or a theoretical population geneticist (who figured most of these things out before molecular genetics even existed). When you look at individual ontogeny, it’s just tangled interactions upon tangled interactions, but that’s not the case when you simultaneously model the effect of all variants on differences between individuals. If that wasn’t the case, evolution, whether natural or artificial, would not be possible.

              • Anoneuoid says:

                I don’t see what exactly was figured out from that table. Is there some data to compare it to?

              • Max says:

                Quantitative and behavior genetics is all about testing data against expectations derived from quantitative genetic theory. For example, here are correlations for various traits in different kinds of sibling pairs in Sweden. As usual, there’s a roughly linear relation between phenotypic and genetic resemblance, with some effect from the shared environment.

      • harry says:

        Steve: Rick G may have in mind theoretical results like this, where the additive part of the underlying associations is best preserved under linkage disequilibrium, i.e. it’s typically going to be the main part of any signal you’re likely to be able to find and/or make much use of in a prediction.

      • Stevec says:

        Steve asks, “This is an argument from authority. If there are good reasons, state them. If there is good evidence cite to it or explain it. People here are open to evidence.”

        You could start with:
        Hill, W. G., M. E. Goddard, and P. M. Visscher, 2008, Data and theory point to mainly additive genetic variance for complex traits.
        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2265475/

        • Anoneuoid says:

          Why is it framed as a problem about “variance explained”?

          Say you have a system of genes that code for an synthesizing enzyme, degrading enzyme, and a transporter that work together to maintain some compound in a certain concentration (the phenotype) in the cytosol.

          Now a mutation in the gene for the synthesizing enzyme cuts the rate of synthesis in half. To maintain the necessary concentration the cell can:

          1) produce half as much degrading enzyme or one mutated to be half as efficient
          2) produce twice as much transporter (to pump the compound out of the cell) or one that is half as efficient
          3) some combination of the above

          I just can’t map “variance explained” to this. I mean, when things work correctly there should be no phenotypic variance to be explained, yet you can get all sorts of genetic variability.

          • Anoneuoid says:

            Typo in #2: “or one that is *twice* as efficient”

          • Max says:

            GWAS, polygenic scores and heritability are about differences between individuals. You seem to be talking about mechanisms within individuals. These are entirely different problems.

            • Anoneuoid says:

              No, one individual has gene variant A that produces compound at rate x. The other individual has the variant B that produces it at rate x/2.

              Also, I see my point #2 is still wrong. The transporter is pumping into the cell..

              • Max says:

                The question of interest in GWAS etc. is the average effect of an allelic substitution on some phenotype and the sum total of such effects. Whatever compounds different variants produce does not enter into it.

              • Anoneuoid says:

                Whatever compounds different variants produce does not enter into it.

                Well, genes code for RNA and proteins that primarily function to transfer electrons and ions around, generate/maintain various aggregate molecular structures, and and generate/control concentrations of various signaling molecules.

                That is what genes do. So GWAS studies are apparently studying something else that ignores the actual functions of genes.

              • Max says:

                The quantitative/population genetic conception of genes used in GWAS is older and, from a practical and economic point of view, vastly more important than the mainly academic conception that you talk about is. For example, here’s a chart showing genetic (EBVs, estimated breeding values) and environmental improvements in milk yield in Holstein cattle since 1957. It’s from a paper by Wray et al. (2019) who comment on the chart:

                In 1957 the SD in milk yield was 600 kg (600 liters). The genetic value for milk production of an average dairy cow today is >6.5 genetic SD above the mean milk production in 1957 (achieved despite the relatively long generation intervals). In 1957, only 0.1% of cows would have produced >9600 kg of milk, now >50% of cows achieve this!

                How are such monumental gains possible? It’s because of the great amount of additive genetic variance that exists for all sorts of traits in animals and plants. The EBV is simply the aggregate additive genetic value of an individual which responds to selection in accordance with the breeder’s equation, namely:

                R = h^2*S

                where R is the response to selection, h^2 is additive heritability and S is the selection differential. For example, if heritability is say 40% and you select midparents that are 2 SD above the mean, the phenotypic mean of the offspring generation will be .40*2 = .80 SD above the parental generation (in constant environments).

              • Anonymous says:

                the actual production average for all U.S. Holstein herds enrolled in production-testing programs in 2015 was 24,958 pounds of milk
                https://www.farmanddairy.com/news/holstein-cow-sets-national-milk-record-of-77480-pounds/392855.html

  14. Stevec says:

    I recommend people interested to read “Behavioral Genetics” 7th edition by Knopik et al. It’s a textbook. It’s well worth reading. Earlier editions had Robert Plomin as principal author, he’s just a contributing author know in the latest edition. Most of the confused ideas in the comments here would be put to rest by reading the book.

    Why fund GWAS studies when they “only” explain 11% (or whatever) of the variance of a given trait. Most science doesn’t have utility until it’s made a certain amount of progress. Stop science maybe unless we know in advance it will produce the results we require? It seems bizarre to suggest this.

    Many of the academic criticisms of behavioral genetics use the idea “it only explains 5%, 10% of the variance” to pour scorn on the existence of behavioral genetics.

    But no one doubts that height is explained mostly by genetics. And up until very recently height also had the “missing heritability” problem with only a small fraction explained by genetics from GWAS studies. Lello et al (including co author Stephen Hsu) published a paper in 2017, “Accurate Genomic Prediction Of Human Height” with genomic prediction accurate to 2cm. What changed? A large enough dataset. This is explained in “Determination of Nonlinear Genetic Architecture using Compressed Sensing”, Chiu Man Ho, Stephen D. H. Hsu.

    I recommend Stephen Hsu’s blog, https://infoproc.blogspot.com, for keeping up to date with some research in genetics.

    Value of recent genetics research:

    1. We know that some diseases have no one gene that is largely responsible. That in itself is valuable knowledge. We can see which pathways are involved giving clues to researchers looking for cures or lifestyle improvements to help mitigate effects of a disease.

    2. We know that 50% of personality traits are genetic (strictly speaking 50% of the variance in a given population is typically explained by heritability) – we know this from quantitive genetics, i.e. family studies. This helps throw much social science in the bin because it ignored genetics. Good result, not popular though. GWAS gives insights into pathways that shape personality, iq, mental health problems.

    Is this research “problematic”.

    Yes, it will upset lots of people. Finding out that the earth went around the sun was problematic too. But generally we got over it. Was it really useful? Not especially. Genetics will be way more beneficial than the theory of gravity.

    So let’s not fund progress in scientific fields unless in advance we can see clear benefits and “the right people” agree it’s “not problematic”.

    • Steve says:

      Stevec writes: “We know that 50% of personality traits are genetic (strictly speaking 50% of the variance in a given population is typically explained by heritability) – we know this from quantitive genetics, i.e. family studies. This helps throw much social science in the bin because it ignored genetics. Good result, not popular though. GWAS gives insights into pathways that shape personality, iq, mental health problems.”

      We “know” no such thing. It is utter nonsense. It is a claim that a personality trait is the sum of genetic causes (whatever that means) and some other causes. What is the evidence for such a claim? Data that has been analyzed using a linear model. That is circular. It is not that some of us are snowflakes that don’t want to deal with the “fact” that 50% of what we personality (or intelligence or whatever) is caused by genetics, and you’re Galileo. It’s that the claim “50% of trait X is caused by genes” is not well-formed. It cannot be made into a logically well-formed claim about causation. And, every time someone points that out, knowledgeable advocates of the “50% of trait X is caused by genes” gets reformulated into a precise claim about a heritability score, which is just a description of the data analyzed using the assumption of additivity. Not evidence, just circularity. And, “genetics is useful” is also no argument. I believe in quantum mechanics, but if someone started a branch of science to use quantum mechanics to predict the next election, I would be equally skeptical. That does not make me a science denier. It makes me a clear thinker.

      • Martha (Smith) says:

        “What is the evidence for such a claim? Data that has been analyzed using a linear model. That is circular.”

        Yup

        “And, “genetics is useful” is also no argument. I believe in quantum mechanics, but if someone started a branch of science to use quantum mechanics to predict the next election, I would be equally skeptical. That does not make me a science denier. It makes me a clear thinker.”

        Yup

      • RogerSweeny says:

        Twin studies and adoption studies do not rely on assumptions about additivity. They are where the 50% figure comes from. Read Plomin’s Blueprint and know your enemy. (You can even skip the second half, which is overly bullish on PolyGenic Scores–which do indeed assume additivity.)

      • Stevec says:

        Steve said “We “know” no such thing. It is utter nonsense”

        Maybe you don’t know it. It’s not popular.

        Compare identical twins with fraternal twins. Compare step siblings. Compare adopted children with the biological children of the household. Compare children of identical twins (cousins but who are 50% genetically similar).

        The results demonstrate a consistent story. Growing up in the same household accounts for something like 0-10% of a given trait, once that child is a young adult. Genetics explains about 50%. The balance is unexplained, often referred to as the “unique environment”.

        You can read:
        Meta-analysis of the heritability of human traits based on fifty years of twin studies, Polderman et al, Nature 2015
        Top 10 Replicated Findings From Behavioral Genetics, Plomin et al, Perspectives on Psychological Science 2016

        Or if you prefer a book, read “The Genome Factor” by Dalton Conley & Jason Fletcher. Conley is a professor of sociology and it sounds from his commentary that he didn’t believe it originally.

        The essence of the argument by the sociologists, against this heresy is “these dangerous idiots base everything on an untested assumption, the EEA (equal environment assumption)”. Along with the usual ad hominin smears – we know who else thought genetics was important..

        The idea from sociologists is to explain the much higher correlation of personality traits between identical twins vs fraternal twins as being “they were treated differently”. Well, the EEA is well tested. You can go read up the 50+ studies that evaluate it.

        But instead the Conley book has very interesting research which he did. Some significant proportion of “identical twins” are actually fraternal twins, so not 100% genetically identical, on average 50% genetics in common. Some “fraternal twins” are actually identical twins, surprising but something to do with placement in the womb and some other factors in utero.

        Now, if we place twins in the correct category and redo the calculation of contribution of genetics, what happens? Here’s the book (p.26):

        “The results held across all of the methods and samples. We, the social scientists who questioned the veracity of the EEA and assumed the behavioral geneticists were making a fundamental error, ended up confirmed their “naive” ACE models”

        • Steve says:

          As usual in this debate regarding “genes cause X” advocates of the “genes cause X” ignore the argument, namely, that there evidence for their model is circular and proceed to cite to the same evidence as if more of it should persuade anyone. Ice cream consumption is associated with an increase in crime. The problem with saying that ice cream causes crime is not that the association doesn’t replicate. In fact, it replicates. The problem is not the twin studies don’t replicate. We know they do. They just don’t prove “genes cause X”. We need the theory to make mathematical sense (it doesn’t) and we need it to predict something independent of the assumptions used to analyze the data (it doesn’t).

          • Stevec says:

            Why are identical twins more alike in their personality traits than fraternal twins? Just an unexplained correlation?

            Why are adopted children not like their adoptive parents in their personality traits by the time they reach adulthood? But are like the biological mother (usually the biological fathers’ data is unavailable) even if they were adopted in the first few months of their life?

            Why are identical twins physically very similar? Is this just an unexplained correlation? Or is it genetic?

            I’m interested to hear your explanation.

            • Steve says:

              I don’t know how to make you understand what you clearly don’t understand. Twins are similar because they are twins. They are freaking identical. Does being identical influence your environment? Probably. Also, I have an adoptive son. He is like me. He is also completely different from me. It all depends on how you measure “likeness.” Rhetoric aside, you seem to think that because twins are alike that proves that “genes cause X.” That is nonsense. If all of the Sneeches with Stars on their bellies get treated better than the Sneeches without Stars on their bellies, is true. And, if from that you believe that if having stars on your bellies is genetic, then getting treated better is genetic, you have committed a very obvious logical fallacy. Does it matter to you that your theory is totally illogical?

              • Terry says:

                Twins are similar because they are twins. They are freaking identical.

                And what has caused them to be identical?

              • Stevec says:

                As with Terry’s comment. You haven’t answered the question. If being “freaking identical” means genetics explains why they look more similar than fraternal twins but somehow doesn’t explain why their personalities are more similar than fraternal twins because that’s “circular reasoning” then I’ll leave it there.

                For other readers hopefully the point is clear.

              • Bob says:

                Ah come one, you can’t claim someone’s argument is circular, and then claim twins are identical because they are identical.

        • somebody says:

          The problem isn’t that there isn’t evidence from which you can construct a robust, replicable number which is equal to 50%. The problem is that the relationship between that number of 50% and the statement “50% of traits are caused by genetics” isn’t well defined. It’s not just “based on a naive” assumption — it’s not defined.

          Saying that one thing determines another thing invites a counterfactual. What is the counterfactual and how is the effect being estimated from these studies? Without at least defining that hypothetical, there’s no way to even know what you’re actually claiming. I don’t really think your claim is false, so much as I think your claim os too vague to even be assigned a truth value.

          Example:

          > But no one doubts that height is explained mostly by genetics.

          I doubt this. In the first sense of “explained”, as in, “by knowing ones genome, a person can understand why someone is short or tall”, it’s obviously false. There’s no understanding these massive, wide, models. But that’s a bit of a dirty trick of me, “fraction of variance explained” is typical in statistics and also has nothing to do with explanation.

          In the second sense of explained, as in, causally determined by, I could also say it’s false. If I take identical twins, and I starve one twin at random while raising the other as I would my own child, I would find obviously that height is caused primarily by how they’re raised. You could say I’m being intentionally obtuse, but I’ll at least claim that I’m not. If I can control the height to an arbitrary degree by how I raise these children, then in what sense is height “determined mostly by genes.”

          In the third sense of explained, as in the statistical sense of “residual variance under some model / total population variance”, there are some caveats. Saying that some trait is predictable from genes is a much weaker claim than saying that some trait is determined by genes. I’m not sure the idea that traits are predictable from genes, clearly stated, is all that troublesome to anyone.

          Furthermore, height isn’t even very heritable from genes when you look on a global scale. All of the polygenic/GWAS/heritability estimators come with the caveat that they cannot be transplanted across populations. As an example of what goes wrong when you don’t take that seriously, it’s well known that children of immigrants to the U.S. from other countries exhibit drastically increased adult height. Typically, what’s said is that heritability is defined with respect to some range of environmental variation over which it’s measured. So there really is no such thing as “the” heritability of a particular trait, or “the” effect of some genes. Your statement “height is explained mostly by genetics” doesn’t actually mean anything needs to be modified to “height is mostly from genetics with respect to some range of environmental variation under discussion.” Which is much less impressive. “I can predict trait t with some accuracy–after removing most of the environmental variation in the world.”

          Furthermore, there’s something to be said about the fact that your comments have been consistently misleading. The Hsu paper on a polygenic model for height claims “The actual heights of most individuals are within ∼3 cm of the predicted value” where you say “within 2 cm”. The Hsu paper also says that the predictor explains 40% of the variance, whereas you claim that “no one doubts that height is explained mostly by genetics.” You claim “we know that personality is 50% genetic”. I think you’re pulling that from your source “Top 10 Replicated Findings from Behavioral Genetics” which states “For personality, heritabilities are usually 30% -50%.” Reporting the very top of an interval and claiming that’s just “what we know” is pretty disingenuous. And you claim “GWAS gives insights into pathways that shape personality, iq, mental health problems.” This is just false. A pathway is something like “this gene is expressed in this circumstance, synthesizing this protein, which is carried here and absorbed by this which causes this effect.” GWAS has certainly identified nothing of the sort for personality, iq, and mental health problems, and has had pretty limited success in doing so for much easier problems in pathogenesis.

          I’m not all that bearish on GWAS and polygenic models as some others on this page. Assuming the cost will come down enough to be competitive, I think predictive models based on genomes could be pretty useful. It’s a ton of data all in one place, currently very expensive to access but maybe not so much in the future. But I’m not for the dressing up of predictive models and risk factors with causal language like “explained”, “effect” and “determined” for the sake of making it seem more scientific. I’m also not for the use of genetic arguments for antiprogressive politics because it makes no sense. The genetic statistics cited are only defined and meaningful with respect to some range of variation in the environment; the whole point of the policies under debate is usually to alter the range of variation in the environment, so the descriptive genetic statistics has nothing to say there. And I’m definitely not in favor of lying/misrepresenting citations to buttress your point, hoping no one will check the details.

          • Olav says:

            ‘Your statement “height is explained mostly by genetics” doesn’t actually mean anything needs to be modified to “height is mostly from genetics with respect to some range of environmental variation under discussion.” Which is much less impressive. “I can predict trait t with some accuracy–after removing most of the environmental variation in the world.”’

            This is true of essentially all explanatory and causal claims. Striking a match will cause the match to light on fire, but not if you do it underwater. Shooting a person in the head will cause the person to die, but not if he’s wearing a bulletproof helmet. Etc. If someone says that “genetics explains most of the variation in intelligence between people,” I think it’s most charitable to understand the claim as being relative to the range of variation in the environment that you typically find in the United States or and other developed countries.

            • somebody says:

              > This is true of essentially all explanatory and causal claims

              The issue is that, in a formal sense, no explanatory or causal claims are actually being made. Words like “explained” and “effect” are being used on things that are summary statistics on predictive models. It’s really the statisticians’ fault, since that’s where the language comes from. But it feels almost intentionally misleading at this point.

              “If we make choose our industry standard terminology to sound like a causal claim, we get to make strong causal claims from weak evidence. Then when we’re accused of overstating our results, then we can back down and say that it’s just the terminology, we weren’t actually saying anything about causality.”

          • Terry says:

            I’m not seeing a lot of daylight between somebody and stevec.

            Genes matter. Genes aren’t everything.

          • Stevec says:

            Which is why my first comment in this post was recommending a book so that people can read the basics.

            Most of the confusion on this subject can be solved that way. It’s easy to confused about a technical subject if you haven’t read a textbook on said subject.

            Onto somebody’s point:

            Of course, if you starve one of the twins, and not the other, then heritability contributes nothing. One is dead, wasting away. One is alive and 6′ tall, 180 pounds.

            Guess what, geneticists agree with you, understand this point, explain it in all their books and some of their papers (you don’t usually explain the bargain basics in papers written for specialists).

            For a given population, heritability explains 50% of a typical trait. It’s 90% for height, 80% for BMI, 80% for IQ in adulthood, 50% for personality traits.

            If you bash someone over the head enough this IQ or personality score won’t hold. Just like if you starve someone the height or BMI score won’t hold. This doesn’t mean that heritability has no meaning.

            Here’s Wikipedia on the topic:

            Studies of heritability ask questions such as how much genetic factors play a role in differences in height between people. This is not the same as asking how much genetic factors influence height in any one person.
            Heritability is a statistic used in the fields of breeding and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population.[1] In other words, the concept of heritability can alternately be expressed in the form of the following question: “What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance

            • Steve says:

              Stevec states: “For a given population, heritability explains 50% of a typical trait. It’s 90% for height, 80% for BMI, 80% for IQ in adulthood, 50% for personality traits.”

              The problem here is the ambiguity with the word “explains.” I can understand that removing a variable reduces the variance of trait X in the sample, but that does not tell us anything about that variable’s impact on an individual. This is why I and others keep saying that these statements like “50% of personality traits are explained by genes” are ill-defined and meaningless. What would “90% of height is explained by genes” mean at the individual level? Does that mean that if I’m 6 feet tall, my genes made me 5 feet 5 inches tall and the other 7 inches are up to the environment. Of course not. But, advocates of “genes cause X” theories based on twin and GWAS studies won’t give a clear meaning to their claims at the individual level, and then retreat to a technical explanation of heritability, which only applies to variance in the population. Associations and correlations are not transitive. The fact that having the same genetic inheritance explains a certain amount of variance of a trait in a population does not mean that those genes cause that trait in any individual. Of course, genes “play a role”, but that is vague hand-waving until someone lays out the mechanism that explains how the gene leads to the trait. And, in the meantime, people like Plomin argue that parents don’t make a difference. These are causal claims based on nothing but a descriptive statistic that is meaningful only on the population level.

            • somebody says:

              I know that’s the standard definition. The point is that you’re saying things about “the” effect and “the” heritability which don’t mean anything. Not because heritability doesn’t have a rigorous mathematical definition,, but because it doesn’t mean anything without defining a population/range of environment of interest and what it does mean in the best of cases doesn’t lead to the implications you say it does. It’s little more than an R^2 value; has no bearing on social science or environmental interventions. Don’t take this as me being offended and defending social science, I agree that most of it can be safely consigned to the dust bin.

              People like to do this “nature vs nurture” discussion. “Nature is more important!” “Nurture is more important!” I feel like you’re arguing in the former camp, assuming everyone else here is arguing in the latter camp. In fact, the mood here seems to be “the question is so vague as to be meaningless.”

              Suppose heritability of SAT scores wrt the population of New York City. The instinct is to say “improving schools is pointless because genes matter most.” Except that doesn’t follow; the heritability of SAT scores computed is with respect to the range of environmental variation in the current schooling system, and if you change the range of environments, you don’t expect it to hold. So what has it accomplished? What does that mean for some intervention or observable fact besides reproducing heritability statistics?

              Heritability tells us, for individuals drawn from some specific distribution, how much better are you predicting some variable using their genes vs picking randomly from the distribution of that variable. Like R^2, it’s a rescaled measure of mean prediction error, and like R^2, the horrible, no-good choice of the word “explained” makes it seem more meaningful than it is.

              • Terry says:

                When someone assumes environmental factors largely or completely explains something, I rarely hear about all these subtleties. Any idea why?

              • somebody says:

                Because nobody makes those claim? I’ve never heard someone say “criminality is x% environmental” as if that’s a meaningful statement; it transparently isn’t. Claims of environmental effects are typically more like “legality of abortion reduces crime rate by y%”. The analogous genetic claim isn’t “personality is 50% heritable”, it’s “genes cause Down syndrome and Williams syndrome”. Specific effect, specific causes, falsifiable claims and clear implications, simply not vulnerable to the same line of criticism.

                That said, you shouldn’t take those environmental claims at face value. A lot of that work is shoddy. If you want to see where people criticizing the subtleties of these kinds of environmental explanations, you should see…this blog, where people are extremely critical of some environmental explanations and effect size estimate at least once a week.

                I feel like you’re insinuating that there’s an asymmetry of standards where people are totally credulous for environmental explanations but not genetic ones because it fits with what they want to believe. The problem is that the genetic explanations under discussions aren’t explanations at all, whereas the environmental ones at least are, and at least here everyone is very critical of them even so.

              • Terry says:

                somebody said:

                Because nobody makes those claim? I’ve never heard someone say “criminality is x% environmental” as if that’s a meaningful statement; it transparently isn’t.

                In serious academia, you are probably right.

                But in the grievance industry, it is the standard assumption, and in parts of academia, it is heresy to suggest otherwise.

                And, I am (gently I hope) suggesting there is a double standard. Not as egregious as your characterization, but real nonetheless.

            • Dogen says:

              1-Thanks for the pointer to Hsu’s blog. It’s got some interesting stuff.

              2-“ For a given population, heritability explains 50% of a typical trait. It’s 90% for height, 80% for BMI, 80% for IQ in adulthood, 50% for personality traits.”

              Last I looked it was 50% for IQ based on twin studies, do you have a reference for the 80%?

              Also, it seems to me that the garbage sociological claims based on sentences like the quote above are at least as common as the garbage sociology that denies any role of genetics.

              For example, the general public, and many smart, educated people seem to take “a given population” to mean something fixed, like “people with black skin” and “people with white skin”.

              My favorite example of the conundrum this poses is the height differences of Koreans in the North and South. North Koreans are 2-3” shorter, despite being genetically the same population. There are hundreds of other examples, but this one is particularly easy to follow.

              What’s really difficult is to take that quote and explain what it means by “explains” and how that information might be accurately used and inaccurately misused. My observation is that, in the general press, it is pretty much 95% misused. It’s even grossly misused inthe comments section of Hsu’s blog…

              • Terry says:

                Also, it seems to me that the garbage sociological claims based on sentences like the quote above are at least as common as the garbage sociology that denies any role of genetics.

                So what’s the right way to say it? (To quantify or estimate the importance of genetics).

              • Stevec says:

                I should have been more specific. Here’s Knopik et al “Behavioral Genetics”, 7th ed, p.181:

                Twin studies also show increases in heritability from childhood to adulthood… from about 40% in childhood (age 9) to 55% in adolescence (age 12) and to 65% in young adulthood (age 17).. although the trend of increasing heritability appears to continue throughout adulthood to about 80% at age 65, some research suggests that heritability declines in later life.

          • JonjoShelvey says:

            This is a joke. Of course height is caused by genetics and the fact that there are environmental variables affecting height as well is trivial. Lower variation in environmental variables yields better estimates of the genetic component given a fixed sample. In Western countries most people have enough food. Assuming that on an individual level, that environmental variables are not affecting the individuals genetics is often reasonable, given the phenotype.

      • Terry says:

        So what is the proper way to quantify heritability?

        And what is the proper way to think about genetic causation?

  15. Ed Hagen says:

    It’s worth considering *why* most genes have a (small) effect on most traits. It’s not that genes don’t have very specific, large effects, it’s that many of these effects are at the cellular level, and just about everything in your body, including your brain, is made of cells. Hence, any genetic variation in these cellular-level functions will impact just about every tissue and organ in your body, including the 86 billion neuron cells in your brain. Similarly, any genetic variation in somewhat higher level functions, e.g., in cellular adhesion, intercellular signaling, and other functions related to tissue assembly, will also effect almost every organ in your body. It’s this (likely) hierarchical organization of genetic functionality that (probably) explains why most genes are involved in most phenotypic traits of interest to social scientists.

    Further, due to purifying selection, the only genetic variants in these lower level functions that we will usually find in living humans are those that don’t have much effect, because if they did have a large effect, this effect would usually prevent the embryo from developing successfully in the first place.

    To illustrate the importance of cellular-level genetic functionality: Yeast is a single celled eukaryotic organism (so its genes obviously provide cellular-level functions). Homo sapiens is a multicellular eukaryotic organism. Yeast and humans diverged about a billion years ago, yet 23% of yeast genes have homologs in humans (i.e., the genes derive from a common ancestor). Of these genes, the amino acid sequences overlap, on average, by about 32%. Even more remarkable, after replacing 414 critical yeast genes with their human homologs, 47% of the human genes functioned and enabled the yeast to survive (Kachroo et al. 2015).

    Computer software provides a nice analogy: any variation in the microcode of almost any CPU instruction, say popping a value off the stack (analogous to a cellular-level gene function), will impact almost every software package on the computer (analogous to, e.g., our psychological traits). Any major “mutation” in the microcode of the POP instruction will usually prevent the computer from working at all (and hence we will never such variants in working computers).

    • Terry says:

      It’s worth considering *why* most genes have a (small) effect on most traits. It’s not that genes don’t have very specific, large effects, it’s that many of these effects are at the cellular level, and just about everything in your body, including your brain, is made of cells. Hence, any genetic variation in these cellular-level functions will impact just about every tissue and organ in your body, including the 86 billion neuron cells in your brain.

      Now THAT’S interesting.

      I’m skeptical of this GWAS stuff because it seems too simple a model. I suspect there is layer upon layer of structure in genetics and interactions are important. To use an analogy, letters are the building blocks of language, but studying letters alone doesn’t get you far. You have to know how letters combine into words that have far more information in them, and words form paragraphs with a whole other layer of meaning, and so on and on.

      Warning: this is rank amateur speculation.

  16. RogerSweeny says:

    from a comment by Walter Sobchak:

    Measures of IQ in early childhood are not only unreliable but already contaminated by environmental effects before birth and during infancy. Polygenic scores for IQ are free of that contamination. They can then be compared with actual IQ scores.

    Suppose that, a few years from now, it has been solidly established that adolescents from disadvantaged backgrounds have IQ scores that average 10 points lower than their genetic potential would have led us to expect. Confident new knowledge of that kind will energize the search for effective childhood interventions in ways we can scarcely imagine.

    Suppose instead it is found that the adolescent IQ scores of children from disadvantaged backgrounds are about the same as we’d expect from their polygenic scores. That will provide an incentive to foster human flourishing for people with lesser abilities—an issue that has been criminally ignored in our era’s insistence that all the children can be above average….

  17. Stevec says:

    Vinayak Bhardwaj says, January 28, 2020 at 2:37 pm

    “..The amount of genetic variation across humans is painfully small. The Fulani tribe in Nigeria has more in common genetically with Swedes in Sweden than with other Nigerians. So what possible value can be derived from these GWAS analyses?…”

    Your confidence is only matched by how completely wrong you are.

    Have you read any papers published in the last 10 years on genomic analysis?

    If you haven’t I’d recommend as a starting point YouTube on the 1000 Genomes Project. Or reading the papers from the 1000 Genome Project.

    Here’s one I dug out for you, Lynn Jorde:
    https://youtu.be/PzG99bexziA

    Watch the whole thing. There’s some PCA plots at around 30 minutes that you can find in Xing et al 2010 “Towards a more uniform sampling of human genetic diversity”. I’d include the graphic but don’t think that works here.

    • Anoneuoid says:

      Please just tell us the point?

      • Stevec says:

        Anoneuoid says, January 29, 2020 at 10:42 pm

        “Please just tell us the point”

        If your question is to me(?) The point is Vinayak Bhardwaj is factually completely wrong. I find this blog interesting and people seem interested in knowledge and data rather than invented ideas.

        The point – accurate understanding rather than popular inventions.

        • Anoneuoid says:

          Yes, you did not explain what is in these sources you want us to check. No one is going to start watching a bunch of YouTube videos or reading all the papers from a project just because you vaguely say they prove someone wrong.

          • Stevec says:

            If someone says the earth doesn’t rotate around the sun in an elliptical orbit. Or DNA isn’t the mechanism of inheritance.. it’s hard to know where to start. Likewise with nonsense about human population genetics.

            I didn’t vaguely say it. I said, non-vaguely, his idea is completely wrong. I gave a specific video and at what point to start watching.

            Not everyone has access to papers. Maybe you do. Read Xing et al, “Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping”, Genomics 2010.

            Or provide a reference for the claim by Vinayak Bhardwaj.

            Not sure what else I can do. No proof was offered for a claim that is contrary to genetics basics. I’ve provided a YouTube clip, I’ve referenced the 1000 Genomes project, now I’ve provided a paper.

Leave a Reply