I think one can abstract away the population genetics details: it’s easy to calculate a test statistic for the data, and to estimate the error by resampling. We understand the underlying data-generating process well enough to have some idea about how much to trust the resampling; in fact, we understand it well enough that we can even generate fake data, but this is a lot of work, even for a single parameter set. In this case, I think it sometimes makes sense to start by just looking at whether the test statistic deviates from the null.

(PS: Nick, we met at the Simons Institute in Berkeley several years ago.)

]]>You and I have met — more than once. I recall twice in Cheltenham and once or twice in Princeton. If you don’t remember, I might try giving you a hint. :~)

]]>Please go to the Reich lab web page:

There are tabs for publications, software and datasets.

In my note I was referring to the “f4 test” described in

“Ancient Admixture in Human History” (2012). The test is implemented

in a program qpDstat, part of a larger package ADMIXTOOLS. Much suitable data

is also available on the site.

**I do not know of another living statistician who has done such impressive work across academia (Broad), government (GCHQ/NSA) and business (Renaissance).**

It would be useful to have a simple example (with smaller N) and working code, in order to make the discussion more concrete . . .

]]>That was precisely my point. My frequentist technique is basically to analyze each "SNP" (variable locus) as though independent.

There can easily be 1M loci. That gives a statistic that under the null has mean 0. To get the standard error

we delete large blocks (about 5M genome bases) in turn and apply the jackknife. This can easily be coded up

in a day or two. A Bayesian analysis if practical at all would in my best guess be months of work, and perhaps sensitive

to obscure modeling assumptions about LD.

And I have run my test on perhaps 1M population pairs…