Probabilistic screening to get an approximate self-weighted sample

Posted on August 12, 2012 9:59 AM by Andrew

Sharad had a survey sampling question:

We’re trying to use mechanical turk to conduct some surveys, and have quickly discovered that turkers tend to be quite young. We’d really like a representative sample of the U.S., or at the least be able to recruit a diverse enough sample from turk that we can post-stratify to adjust the estimates. The approach we ended up taking is to pay turkers a small amount to answer a couple of screening questions (age & sex), and then probabilistically recruit individuals to complete the full survey (for more money) based on the estimated turk population parameters and our desired target distribution. We use rejection sampling, so the end result is that individuals who are invited to take the full survey look as if they came from a representative sample, at least in terms of age and sex. I’m wondering whether this sort of technique—a two step design in which participants are first screened and then probabilistically selected to mimic a target distribution—is a known method, or if there may be something new here. It feels to me that something like this must have already been done, but I can’t seem to find anything in the
literature.

Hey, they call them “turkers”? I’d’ve thought it would just be “Turks.”

Anyway, here was my response:

Yes, what you are doing is a standard idea, to screen respondents in order to get what is called a self-weighted sample. You call it rejection sampling, in the survey sampling world it is called two-phase sampling. The alternative is to survey who you can get and then adjust via weighting or post stratification to match sample to population according to known characteristics. Either approach is ok, the key concerns are cost and convenience. Your approach requires more effort in the data collection stage but less effort in the analysis stage and thus is particularly appropriate to surveys where you’re planning to release the data for others to analyze. There is also a cost issue because you went to the effort to get various participants and then screened them away–but this is less of an effort for a survey such as yours with volunteers. Thus, the reason why screening (including probabilistic screening as a special case) is not done more often is that the cost and effort of getting someone to participate in a survey up to the screening stage is often a significant percentage of the cost and effort required to get them to just complete the damn survey.

Still, once you’ve done this, I think you’ll want to adjust for other variables in addition to age and sex. Depending on what you’re interested in, a few other natural adjustment factors are ethnicity, education, and region of the country. You could screen on these variables too, or you could just gather this information in the second phase and then post stratify on them. But if you’re doing the screening anyway, I recommend you screen on more variables. And then, if you’re doing the work anyway, you might ask a few other questions of these people…!

In any case, you’re already ahead of the game in that most experimental data are unadjusted to match for the population. Experimental researchers often naively think that all they care about is the treatment effect in their group of subjects without remembering that the goal is almost always to learn about the population.

To which Sharad replied:

Do you happen to know any good references on the topic? I’m curious to learn the specifics of the probabilistic screening method. There’s a whole literature on active learning in the CS world that seems closely related, and there may be some useful connections to be made there.

And I wrote:

It’s such a standard idea I don’t know where to start. I googled *screening survey sampling* and this paper (written by a bunch of people I’ve never heard of) came up. Here’s another from google that I found; I can only see the abstract here but it might be what you’re looking for.

To reflect on the larger point: the literature and practice of survey sampling is full of good stuff which indeed is often rediscovered by others. Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.

5 thoughts on “Probabilistic screening to get an approximate self-weighted sample”

zbicyclist on August 12, 2012 11:50 AM at 11:50 am said:

I agree this is a known and common method. It’s important to note that this will produce a sample which “matches” on demos such as age. I know the common term is “representative”, but that tends to be confused (in our clients’ minds at least) with “unbiased”, which this procedure most definitely is NOT.

See, for example, this web definition, which implies that representative samples are unbiased:

“A subset of a statistical population that accurately reflects the members of the entire population. A representative sample should be an unbiased indication of what the population is like.”

Read more: http://www.investopedia.com/terms/r/representative-sample.asp#ixzz23LanF8AF

So what’s the population under this method? It’s not “the total U.S.” because they’re Turkers, and as Sharad notes they skew young and likely have a whole variety of other characteristics on which they are not representative (such as being much more involved with the type of things that occur on Mechanical Turk).

The population isn’t Turkers, because otherwise they should skew young relative to the U.S.

I still like the method and use it, but it’s no panacea.
John Mashey on August 12, 2012 3:38 PM at 3:38 pm said:

I’m reminded of the large number of psychological studies based on studies of college students…
Patrick on August 12, 2012 8:47 PM at 8:47 pm said:

On the issue of experimental data, does anybody know any literature on weighting and variance estimation via weighted bootstrapping? My intuition is to bootstrap by sampling with weights equal to population weights, but I’m looking for some literature on it.
Olivia on August 14, 2012 3:27 PM at 3:27 pm said:

Would you say that this “rejection sampling” is the same as what is often referred to as balanced sampling?
Pingback: Model checking and model understanding in machine learning « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.