Comparing people from two surveys, one of which is a simple random sample and one of which is not

Juli writes:

I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs).

Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involvement in CBOs…but they were all involved in those.

I know I can’t blindly combine the two studies. I also know that not having a control group (i.e., not in CBOs) in study #2 is a problem, as is the convenience sampling, but I can’t change those things. I was trying to see if I could somehow use study #1 (or part of it – participants who look similar based on a variety of factors) to act as the control group for study #2 and do some sort of matching, but I’m not sure that’s okay. Then I was trying to see if I could combine the studies and act as though they are different strata, one with SRS and one with quota sampling (I think – per Lohr’s book, chapter on stratified sampling). But I’m still not sure if it’s okay to compare them that way.

I know that overall, generalizability is going to be nearly impossible here. But it would be really nice to come up with a creative way to make this work. I have a sneaking suspicion that this might be useful for others – which then made me wonder if this has been tackled before. Any thoughts?

My reply:

It’s funny this comes up, because we were just having a discussion on the blog with a student at UCLA who was asking about the use of hierarchical models for causal inference, combining different data sources.

My generic advice is to set up a regression model controlling for as many background variables as possible, then it’s possible that within each poststrat cell, the two groups can be considered to be equivalent to a natural experiment in which one group is involved with the CBO and the other isn’t. Since you can’t control for everything, the next step is to include in the model an unobserved variable representing unknown differences (that is, selection effects). How exactly to do this, though, I don’t know. On this subject, I’m all talk and no action.

My more constructive suggestion would be to talk with Jennifer or, since you’re at UCLA, to Sander Greenland in the epidemiology department. This sort of thing is right up his alley.

4 thoughts on “Comparing people from two surveys, one of which is a simple random sample and one of which is not

  1. “Specifically, I am working on looking at the civic engagement of the adults in both studies”

    Is belonging to a Community Based Organization an example of “civic engagement?”

  2. Juli: I believe the hazards (of being mislead) given the costs (involved to directly address all the problems) are just not worth it – if a less wrong study (study #2 is a problem) can be done.

    I think the reason Sander Greenland (and other epidemiologists) work so hard to get something out of epi studies is that it’s not clear how a less wrong study can be done (you can’t get randomized comparisons or random samples of subjects) and that hard work is absolutely required to identify which less wrong studies are most helpful and can be done (based on assessed size and direction of selection, misclassification and confounding biases) and things stop only when you can’t do this or have to make a decision.

    So my guess is that second study might help you suggest a good next study to do – but will likely just sour any value in the first study if you try to mix them together (problem of combining sweet and sour apples).

    (A paragraph on some formalization of this “Combining Biased and Unbiased Estimates in a Meta-Analysis” at http://www.ssc.ca/archive/main/meetings/Program2000_e.pdf )

Comments are closed.