Generalizing from sample to population

Andrew Gelman, Department of Statistics, Columbia University

We’ve been hearing a lot about “data” recently, but data are generally a means to an end, with the goal being to learn about some population of interest. How do we generalize from sample to population? The process seems a bit mysterious, especially given that our samples are far from random (unless you count a sample of 24 psychology undergraduates and 100 Mechanical Turk participants as a random sample). Nonetheless we can use data to learn about populations of interest, in part using Bayesian reasoning. We discuss principles, methods, and open problems, and issues of sample size, effect size, and interactions, in the context of examples in political science and psychology.

The talk will be at 368 ISR located at 426 Thompson St.

Some relevant recent articles are here:

[2013] Deep interactions with MRP: Election turnout and voting patterns among small electoral subgroups. American Journal of Political Science. (Yair Ghitza and Andrew Gelman)

[2014] Forecasting elections with non-representative polls. International Journal of Forecasting. (Wei Wang, David Rothschild, Sharad Goel, Andrew Gelman)

The mythical swing voter. (David Rothschild, Sharad Goel, Andrew Gelman, Douglas Rivers)

Statistical graphics for survey weights. (Susanna Makela, Yajuan Si, and Andrew Gelman):

Bayesian nonparametric weighted sampling inference. (Yajuan Si, Natesh Pillai, and Andrew Gelman):

And some relevant recent notes are here:

Can you please post the slides? Thanks.

Great talk, very interesting and thought-provoking! There are two comments/questions I would like to ask you:

(1) Some of us in survey methodology, especially here in Michigan, work with a framework we call “Total Survey Error”, in which we like to separate the sources of error of a survey (coverage, sampling, nonresponse, measurement, etc.) so that we can understand them better and, ideally, minimize them both on the design and analysis. In this framework, we also talk about trade-offs and interactions among these sources of error. For example, one would be willing to decrease the sample size (and thus increase the sampling error) in order to use more cost-effective methods to decrease nonresponse error during the fieldwork. At some point in your talk (and elsewhere in this blog) you made a statement that a probability sample with a 90% nonresponse rate is not as much of a random sample as a non-probability sample. I interpret this statement as something like “a probability sample with such a high nonresponse rate is just as good as a non-probability sample”. Although I think I understand your point from a analysis perspective, it sounds that you are saying that the errors we have in non-probability samples (including bias selection and nonresponse) are just as large as the nonresponse error of probability samples, in such a way that the trade-off between sampling and nonresponse error favors non-probability samples (especially when we include cost in the equation). However, I am not aware of any evidence that this is true. Also, this statement seems to ignore that, while nowadays probability samples suffer a lot with nonresponse, non-probability samples might suffer just as much, with the only difference being that in the latter nonresponse is “hidden” and we don’t even know things like a response rate. On the other hand, I feel that you still think that probability samples (or randomization) are important to take into account possible differences in variables you can’t control, either on the design or analysis. So, I guess my question here is – at the end of the day, if it is up to you, given a budget constraint, what would you prefer: a very large non-probability sample or a smaller probability sample?

(2) It seems that the notion of “representative sample” is very important for you. However, this is a term that we try to avoid in survey sampling, especially because it is ill-defined. For example, even with a probability sample, you can end up with, just by chance, a sample with only elements of a certain type (females, for example). Of course, if you are a smart sampling statistician and if that variable is important to explain your outcome variables, you would control for that, but that’s not always possible (going back to my first point, in a concise form of what Box, Hunter and Hunter would state: “Control what you can, and randomize what you cannot”). “Representative sample” is such an weird concept in sampling that there is even a series of four papers by Mosteller and Kruskall trying to figure out the different meanings this term has in different areas. Anyway, my question here is: what would be a “representative sample” for you?

+1. I would add the question what is a definition of “the general population”, another seemingly important notion around these parts?

Mark:

The “general population” is the population you are interested in. It depends on the context. It could be all voters in a particular country, all adults, all people, etc. In many studies the general population is not clearly defined (for example, in a psychology experiment on 100 sophomores, is the general population supposed to be all people, all adults, all American adults, all American college students, all American college sophomores, all American college sophomores who might take a psychology class at this sort of university, etc.? Typically the mathematics is done under a very narrow implied population but the inferences are presented as being applicable much more broadly. As I discussed in my talk, this implication can typically be derived from an assumption of zero interactions. In general I think it’s good to think about what is the general population and what assumptions are being used to draw inferences about it.

Raphael:

1. As I said in response to a question at the end of my talk, I think that both design and analysis are important. 91% nonresponse rate is too bad, but I’d still like to have random sampling when I can. For example, I’d prefer calling 10,000 random phone numbers in order to get 900 responses, as compared to going out on the street and grabbing the first 900 people I can find. In the statistical model underlying MRP, the assumption is that the data are equivalent to a random sample within each post-stratum. Regarding telephone polls and the Xbox survey, my point is not to recommend that everyone switch to the Xbox (yes, I did get paid a little bit from Microsoft Research but not enough to get me to say that!) but rather that different modes of sampling have different advantages. Xbox is cheap, we got huge sample size and panel data, all of which allowed us to learn something new and important about public opinion. This doesn’t mean that I’d prefer Xbox (or, more generally, opt-in samples) for all purposes, just that it does have value and I disagree with people such as that AAPOR guy who was taking an absolutist stance against it (because I really do think his statement about “little grounding in theory” is essentially meaningless). Under any survey method, whether it be f2f or RDD or internet or whatever, I think we should be thinking about non-sampling error of various sorts.

2. A “representative sample” is one that matches the population in dimensions of interest to the researcher. I don’t think the term can be defined precisely but I still think it is useful. I’d rather have a representative sample that is nonrandom than a random sample that is nonrepresentative. The tricky part is that in some settings you can really know a sample is random, but it’s generally really hard to know that a sample is representative; in real life pretty much all we can do is check representativeness along some dimensions.

Can you give an actual quantitative example of a representativeness check along some dimension?

Yes, see the Xbox paper for an example.

Andrew,

The concept of “representative sample” seems central to survey analysis

and, still, it is hard to define precisely. Can you point us to some serious attempts to define it, be it by statisticians, philosophers or lay-persons?

Judea

The series of papers from Kruskal and Mosteller I was referring to is one of the few attempts that I know to understand what different fields (in scientific/non-scientific and in statistics/non-statistics literature) understand by “representative sample”. You probably have heard about it, but in case you have not, here are the references:

Kruskal, W. and Mosteller, F. (1979a). Representative sampling, I: Nonscientific literature. International Statistical Review, 47, 13–24.

Kruskal, W. and Mosteller, F. (1979b). Representative sampling, II: Scientific literature, excluding statistics. International Statistical Review, 47, 113–127.

Kruskal, W. and Mosteller, F. (1979c). Representative sampling, III: the current statistical literature. International Statistical Review, 47, 245–265.

Kruskal, W. and Mosteller, F. (1980). Representative sampling, IV: the history of the concept in statistics, 1895 – 1939. International Statistical Review, 48, 169–195.

Schouten, B., Cobben, F. & Bethlehem, J. (2009), Indictators of Representativeness of Survey Nonresponse. Survey Methodology 35, pp. 101-113.

I especially like the fact that they’re using it to benchmark various methods of data-collection.

Thank you for your answers! I couldn’t agree more with you on the first point, especially with the idea that we can’t select probability samples all the time, and in those instances, as statisticians, we should try to make the best use of it using appropriate models.

Let me just insist on the second point: I think that what you defined as “representative sample” is what we have been calling “balanced sample” in the sampling literature. I know, “you say potato, I say potato”, but I believe the lexical difference here does matter, for two reasons:

(i) “Representativeness” seems to present an argument of validity, that is, a good sample must resemble the population in such a way that each one of its categories/subgroups appears in the same proportion in the sample as in the population. However, we know this is not true: disproportionate stratified samples or unequal probability sampling (such as PPS) are often more desirable than proportionate stratified samples or equal probability sampling.

(ii) As Kruskal and Mosteller pointed out in their first paper on this subject, the notion of “representative sample” is sometimes used as a seal of approval and it might lead the layman to believe that the sample has a property or quality that we can’t actually guarantee, that is, that the sample is a miniature of the population in every possible variable or statistic. This is what worries me the most, because it works almost as a disservice to the general public and consumers of surveys. One would just need to tell that the sample is “representative” and that’s it, nobody would have to worry about anything, no need to question or understand the details about how the sample was selected.

Also, you are only able to tell that the sample matches the population in dimensions that you actually observe in the population (the actual study’s dimensions of interest I imagine you don’t observe in the population, otherwise, you wouldn’t be selecting a sample, right?). In that case, even if you have a “nonrepresentative sample”, you can always post-stratify/calibrate it, right?

Anyway, I don’t want to be so critical about this point, but I honestly don’t see how this term can be anymore useful than harmful for survey sampling.

Raphael,

Very illuminating, especially the last one.

Thanks for sharing.

Judea

The title of Andrew’s talk is so close to the work I have been doing lately that I could not resist posting some of my solutions to the problem of “sample selection bias”.

I hope some day a real smart fellow would be able to tell if we are duplicating or supplementing each other efforts.

E. Bareinboim, J. Tian, and J. Pearl “Recovering from Selection Bias in Causal and Statistical Inference”

In Proceedings of the Twenty-eighth AAAI Conference on Artificial Intelligence (AAAI), 2014.

http://ftp.cs.ucla.edu/pub/stat_ser/r425.pdf

E. Bareinboim and J. Pearl “Controlling Selection Bias in Causal Inference”

In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

http://ftp.cs.ucla.edu/pub/stat_ser/r381.pdf