Hey! Here’s what to do when you have two or more surveys on the same population! (Combining survey data obtained using different modes of sampling)

Posted on August 12, 2025 9:37 AM by Andrew

This has come up before (in 2011 and in 2018) but someone just asked me again today so I thought I’d give this advice again, since I haven’t seen it written anywhere.

Here was the question:

I’m reaching out to seek your advice on how to integrate two probability samples for the new Poverty Tracker cohort.

In the Poverty Tracker 2024 cohort, we finally got the chance to include an Address Based Sampling (ABS) sample. Our sample design includes half of the sample coming from the Random Digit Dial frame and the other half recruited through the Address-Based Sampling. I’m trying to map out how we’d be able to integrate those samples from two frames but having a hard time to find resources.

And here’s my reply:

The right thing to do is to simply pool the data together from both samples into a single dataset. Also in the dataset include an indicator that says, for each data point, which sample it came from. Then we do all analysis (including construction of weights, if we want to do that) using the combined dataset. When fitting models, running regressions, we include the indicator for which sample, just in case it is predictive of anything.

You can do something similar if you want to combine more than two samples; just include an indicator for each sample. And the same idea applies when combining raw data from multiple surveys (although then you might need to do some work to line up relevant poststratification variables, for example if the two surveys use different categories or different question wordings when asking about education or ethnicity or party identification or whatever).

7 thoughts on “Hey! Here’s what to do when you have two or more surveys on the same population! (Combining survey data obtained using different modes of sampling)”

Raphael Nishimura on August 12, 2025 10:12 AM at 10:12 am said:

Strictly from a sampling perspective (i.e., not accounting for differential nonresponse and measurement error across modes), this is a multiple frame survey estimation problem. There is a pretty large literature on that in survey sampling, with various approaches to integrate the samples. It’s almost 20 years now, but Lohr and Rao at JASA (https://www.jstor.org/stable/27590779) is a great review paper on this topic.
At ANES this time, we integrated the various sample sources (FtF, Web and Panel) using a Hartley estimator minimizing the MSE of the overall vote choice estimate.

Reply ↓
Dale Lehman on August 12, 2025 11:28 AM at 11:28 am said:

Please elaborate on “simply pool the data together from both samples into a single dataset.” If each survey has a complex stratified design with different weights, how are they “simply” pooled? What do you do regarding the weights from each design?

Reply ↓
- Andrew on August 12, 2025 11:29 AM at 11:29 am said:
  
  Dale:
  
  I would pool the data together without weights and then construct weights on the pooled sample.
  
  Reply ↓
  - Dale Lehman on August 12, 2025 1:05 PM at 1:05 pm said:
    
    So, if the surveys each have complex but different designs, constructing the weights after pooling sounds somewhat complex to me. So, I am questioning your claim to “simply” pool the data – is it still simple in such cases?
    
    Reply ↓
    - Andrew on August 12, 2025 2:12 PM at 2:12 pm said:
      
      Dale:
      
      Sure, real life can get complicated! Even with just one survey and not two, if you have a complicated design it can be difficult to generalize to the population, and simple weighting will often not do the job.
  - Chris on August 12, 2025 1:25 PM at 1:25 pm said:
    
    I would pool the data together without weights and then construct weights on the pooled sample.
    
    being careful, of course, not to construct so many weights that the sample sinks to the bottom of the pool… :)
    
    Reply ↓
Jintao on December 4, 2025 3:18 PM at 3:18 pm said:

Hi Team,

If the two surveys respondents are sampled from one large set of population. these two surveys have different set of covariates based on different system it got sampled, what would be the best way to maximize the bias correction if I want to also include those system specific covariates?

Sample 1 has A * B * C * D
Sample 2 has A * B * E * F

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Hey! Here’s what to do when you have two or more surveys on the same population! (Combining survey data obtained using different modes of sampling)

7 thoughts on “Hey! Here’s what to do when you have two or more surveys on the same population! (Combining survey data obtained using different modes of sampling)”

Leave a Reply Cancel reply