Skip to content

P-value of 10^-74 disappears

Nick Matzke writes:

Given the recent discussion of p-values, you or colleagues might find this interesting:

Population Genetics: Why structure matters
Nick Barton, Joachim Hermisson, Magnus Nordborg

One possibility is to compare the population estimates with estimates taken from sibling data, which should be relatively unbiased by environmental differences. In one of many examples of this, Robinson et al. used data from the GIANT Consortium (Wood et al., 2014) together with sibling data to estimate that genetic variation contributes significantly to height variation across Europe (Robinson et al., 2015). They also argued that selection must have occurred, because the differences were too large to have arisen by chance. Using estimated effect sizes provided by Robinson et al., a more sophisticated analysis by Field et al. found extremely strong evidence for selection for height across Europe (p=10−74; Field et al., 2016). Several other studies reached the same conclusion based on the GIANT data (reviewed in Berg et al., 2019; Sohail et al., 2019).

Berg et al. (who are based at Columbia University, Stanford University, UC Davis and the University of Copenhagen) and Sohail et al. (who are based at Harvard Medical School, the Broad Institute, and other institutes in the US, Finland and Sweden) now re-examine these conclusions using the recently released data from the UK Biobank (Sudlow et al., 2015). Estimating effect sizes from these data allows possible biases due to population structure confounding to be investigated, because the UK Biobank data comes from a (supposedly) more homogenous population than the GIANT data.

Using these new estimates, Berg et al. and Sohail et al. independently found that evidence for selection vanishes – along with evidence for a genetic cline in height across Europe. Instead, they show that the previously published results were due to the cumulative effects of slight biases in the effect-size estimates in the GIANT data. Surprisingly, they also found evidence for confounding in the sibling data used as a control by Robinson et al. and Field et al. This turned out to be due to a technical error in the data distributed by Robinson et al. after they published their paper.

This means we still do not know whether genetics and selection are responsible for the pattern of height differences seen across Europe. That genetics plays a major role in height differences between individuals is not in doubt, and it is also clear that the signal from GWAS is mostly real. The issue is that there is no perfect way to control for complex population structure and environmental heterogeneity. Biases at individual loci may be tiny, but they become highly significant when summed across thousands of loci – as is done in polygenic scores. Standard methods to control for these biases, such as principal component analysis, may work well in simulations but are often insufficient when confronted with real data. Importantly, no natural population is unstructured: indeed, even the data in the UK Biobank seems to contain significant structure (Haworth et al., 2019).

Berg et al. and Sohail et al. demonstrate the potential for population structure to create spurious results, especially when using methods that rely on large numbers of small effects, such as polygenic scores. Caution is clearly needed when interpreting and using the results of such studies. For clinical predictions, risks must be weighed against benefits (Rosenberg et al., 2019). In some cases, such as recommendations for more frequent medical checkups for patients found at higher ‘genetic’ risk of a condition, it may not matter greatly whether predictors are confounded as long as they work. By contrast, the results of behavioral studies of traits such as IQ and educational attainment (Plomin and von Stumm, 2018) must be presented carefully, because while the benefits are far from obvious, the risks of such results being misinterpreted and misused are quite clear. The problem is worsened by the tendency of popular media to ignore caveats and uncertainties of estimates.

Finally, although quantitative genetics has proved highly successful in plant and animal breeding, it should be remembered that this success has been based on large pedigrees, well-controlled environments, and short-term prediction. When these methods have been applied to natural populations, even the most basic predictions fail, in large part due to poorly understood environmental factors (Charmantier et al., 2014). Natural populations are never homogeneous, and it is therefore misleading to imply there is a qualitative difference between ‘within-population’ and ‘between-population’ comparisons – as was recently done in connection with James Watson’s statements about race and IQ (Harmon, 2019). With respect to confounding by population structure, the key qualitative difference is between controlling the environment experimentally, and not doing so. Once we leave an experimental setting, we are effectively skating on thin ice, and whether the ice will hold depends on how far out we skate.

If only the p-value had been 4.76×10^−264, then I’m sure it would’ve held up.


  1. Anonymous says:

    “If only the p-value had been 4.76×10^−264, then I’m sure it would’ve held up.”

    This would be a perfect opportunity for Mayo to demonstrate the potential of her SEV function.

  2. Matt Skaggs says:

    “They also argued that selection must have occurred, because the differences were too large to have arisen by chance.”

    In certain cities in Kentucky, there are far more people with the name “Skaggs” than in other cities. The anomaly is far too large to have occurred due to chance, so the only reasonable conclusion is that we Skaggs are genetically superior to other residents of Kentucky.

    Sarcasm aside, the argument that if something looks salient, it must be adaptive, has a very weak footing. Unfortunately, this argument is used axiomatically in evolutionary psychology.

  3. paul alper says:

    Never mind all those weird p-value numbers:

    “1903, modern revival of Old English sibling (“relative, a relation, kinsman”), equivalent to sib +‎ -ling. Compare Middle English sib, sibbe (“relative; kinsman”), German Sippe. The term apparently meant merely kin or relative until the 20th century when its necessity for the study of genetics led to its specialized use. For example, the OED has a 1903 citation in which “sibling” must be defined for those who don’t know the intended meaning [!].”

    In other words, until 1903, the English language lacked a word to designate “brother or sister.” One would of thought the word goes back to antiquity. Likewise, the term “antisemitism” is not much older than “sibling.” From

    “[antisemitism] was coined in 1879 by German political agitator Wilhelm Marr to replace Judenhaß (“Jew-hatred”) to make hatred of the Jews seem rational and sanctioned by scientific knowledge. The similar term antisemitisch (“anti-semitic”) was first used in 1860”

  4. Justin says:

    “This would be a perfect opportunity for Mayo to demonstrate the potential of her SEV function.”

    This would actually be a better opportunity to demonstrate how one would do a Bayesian analysis here because none of the original or current papers use any Bayesian techniques.

    We read “effect size estimates released from their 2015 study were strongly affected by population structure due to a computational bug”. If Bayesian analyses were used, the same thing could have happened something like: “New data shows the Bayes factor went from 137 to 3”.

    The newer papers mention structure as a confounder, but in the Field paper (, regarding the 10^-74 p-value, they write:

    “This is not an artifact of uncontrolled population structure in the GWAS, as the correlation is even stronger for a smaller family-based GWAS that provides stringent structure control”

    Speaking about effects of other traits, the Field paper writes:

    “Although these signals are highly intriguing, and some match known phenotypes of modern Britons, the confounding role—if any—of population structure in contributing to these signals remains to be fully determined.”

    The new paper says “The slope is ~1/3 as large as in GIANT, though still modestly significant (p = 1.2 x 10^-2)”.

    IMO things are still a bit murky. No one doubts the phenomena, just the strength of the effect.
    I’d like to see Field’s team followup.



    • Anoneuoid says:

      This would actually be a better opportunity to demonstrate how one would do a Bayesian analysis here because none of the original or current papers use any Bayesian techniques.

      You need to go far beyond “bayesian analysis” to do something useful with this data. Just replacing the arbitrarily valued coefficient of a Frequentist model with an arbitrarily valued coefficient of a Bayesian model is not going to help you.

      The p = 10^-74 was based on one set of features, and the p > 0.05 on another set of features. No surprise that it changes since it is just the arbitrary number chosen to optimizes the fit of the model…

      • Right, the point of Bayesian analysis isn’t to do basically the same analysis and get a slightly different answer with slightly different interpretation… It’s to enable sophisticated scientific analyses that involve generating processes that are much different from the kind used in typical frequentist analyses. Whether this data set can be used in such a manner is a different, subject matter specific question. You can’t just bolt on Bayes, you have to use Bayes to enable subject matter specific inquiry.

    • Anonymous says:

      So….you also think Mayo’s never going use her ideas for a single real statistical inference either?

    • Anonymous says:

      Mayo has spent four decades telling people how statistical inference must surely be done. Written entire books telling people how to do it. Invented new methods to do it. Argued endlessly about it. Maintained a fanatical anti-Bayesianism the entire time. I can’t even remember how many times I’ve seen someone point out (with details, demonstrations, and evidence) errors in her reasoning and she never changed her mind on anything or admit the slightest mistake. She’s that certain she’s got the goods on “statistical inference”.

      Ok. Let’s see *her* perform, with *her* ideas, real statistical inference on a non trivial example where the answer isn’t known or obvious ahead of time. Let’s see what her powerful insights and philosophical genius can actually do.

Leave a Reply to Daniel Lakeland