So . . . the scheduled debate on using margin of error with non-probability panels never happened. We got it started but there was some problem with the webinar software and nobody put the participants could hear anything.
The 5 minutes of conversation we did have was pretty good, though. I was impressed. The webinar was billed as a “debate” which didn’t make me happy—I wasn’t looking forward to hearing a bunch of pious nonsense about probability sampling and statistical theory—but the actual discussion was very reasonable.
The first thing that came up was, Are everyday practitioners in market research concerned about the margins of error for non-probability samples? The consensus among the market researchers on the panel was: No, users pretty much just take samples and margins of error as they are, without worrying about where the sample came from or how it was collected.
I pointed out that if you’re concerned about non-probability samples and if you don’t trust the margin of error for non-probability samples, then you shouldn’t trust the margin of error for any real sample from a human population, given the well-known problems of nonavailability and nonresponse. When the nonresponse rate is 91%, any sample is a convenience sample.
Sampling and adjustment
The larger point is that just about any survey requires two steps:
There are extreme settings where either 1 or 2 alone is enough.
If you have a true probability sample from a perfect sampling frame, with 100% availability and 100% response, and if your sampling probabilities don’t vary much, and if your data are dense relative to the questions you’re asking, then you can get everything you need—your estimate and your margin of error—from the sample, with no adjustment needed.
From the other direction, if you have a model for the underlying data that you really believe, and if you have a sample with no selection problems, or if you have a selection model that you really believe (which I assume can happen in some physical settings, maybe something like sampling fish from a lake), then you can take your data and adjust, with no concerns about random sampling. Indeed, this is standard in non-sampling areas of statistics, where people just take data and run regressions and that’s it.
In general, though, it makes sense to be serious about both sampling and adjustment, to sample as close to randomly as you can, and to adjust as well as you can.
Remember: just about no sample of humans is really a probability sample or even close to a probability sample, and just about no regression model applied to humans is correct or even close to correct. So we have to worry about sampling, and we have to worry about adjustment. Sorry, Michael Link, but that’s just the way things are. No “grounding in theory” is going to save you.
What’s the point of the margin of error?
Where, then, does the margin of error come in? (Note to outsiders: to the best of my knowledge, “margin of error” is not a precisely-defined term, but I think it is usually taken to be 2 standard errors.)
What I said, during our abbreviated 5-minute panel discussion, is that, in practice, we often don’t need the margin of error at all. Anything worth doing is worth doing multiple times, and once you have multiple estimates from different samples, you can look at the variation between them to get an external measure of variation that is more relevant than an internal margin of error, in any case.
The margin of error is an approximate lower bound on the expected error of an estimate from a sample, and that such a lower bound can be useful, but that in most cases I’d get more out of the between-survey variation (which includes sampling error as well as variation over time, variation between sampling methods, and variation in nonsampling error).
Where the margin of error often is useful is in design, in deciding how large a sample size you want to estimate a quantity of interest to some desired precision.
In an email discussion afterward, John Bremer pointed out that in tracking studies you are interested particularly in measuring change, and in that case it might not be so easy to get an external measure of variance. Indeed, if you only measure something at time 1 and time 2, then the margin of error is indeed relevant to assessing the evidence. To get an external measure of uncertainty and variation you need a longer time series. I just wanted to emphasize the point that the margin of error is a lower bound and, as such, can be useful if it is interpreted in that way. Even if sampling is perfect probability sampling and there is 100% response, the margin of error is still an underestimate because the sample is only giving a snapshot, and attitudes change over time.