Skip to content

Should we be concerned about MRP estimates being used in later analyses? Maybe. I recommend checking using fake-data simulation.

Someone sent in a question (see below). I asked if I could post the question and my reply on blog, and the person responded:

Absolutely, but please withhold my name because this is becoming a touchy issue within my department.

The boldface was in the original.

I get this a lot. There seems to be a lot of fear out there when it comes to questioning established procedures.

Anyway, here’s the question that the person sent in:

CDC has recently been using your multilevel estimation with post-stratification method to produce county, city, and census tract-level disease prevalence estimates (see The data source is the annual phone-based Behavioral Risk Factor Surveillance System (n=450k). CDC is not transparent about covariates included in the models used to construct the estimates, but as I understand it they are mostly driven by national individual-level associations between sociodemographic factors and disease prevalence. Presumably, the random effects would not influence a unit’s estimated prevalence much if the sample size from that unit is small (as is true for most cities/counties, and for many census tracts the sample size is zero).

I am wondering if you are as troubled as I am by how these estimates are being used. First, websites like County Health Rankings and City Health Dashboard are providing these estimates to the public without any disclaimer that these are not actually random samples of cities/counties/tracts and may not reflect reality. Second, and more problematically, researchers are starting to conduct ecologic studies that analyze the association, for example, between census tract socioeconomic composition and obesity prevalence (It seems quite likely that the study is actually just identifying the individual-level association between income and obesity used to produce the estimates).

I’ve now become involved in a couple of projects that are trying to analyze these estimates so it seems as though their use will increase over time. The only disclaimer that CDC provides is that the estimates shouldn’t be used to evaluate policy.

Are you more confident about the use of these estimates than I am? I am also wondering if CDC should be more explicit in disclosing their limitations to prevent misuse.

My reply:

Wow, N = 450K. That’s quite a survey. (I know my correspondent called it “n,” but when it’s this big, I think the capital letter is warranted.) And here’s the page where they mention Mister P! And they have a web interface.

I’m not quite sure why you say the website provides the estimate “without any disclaimer.” Here’s one of the displays:

It’s not the prettiest graph in the world—I’ll grant you that—but it’s clearly labeled “Model-based estimates” right at the top.

I agree with you, though, in your concern that if these model-based estimates are being used in later analyses, there’s a risk of reification, in which county or city-level predictors that are used in the model can look automatically like good predictors of the outcomes. I’d guess this would be more of a concern with rare conditions than with something like coronary heart disease where the sample size will be (unfortunately) so large.

The right thing to do next, I think, is some fake-data simulation to see how much this should be a concern. CDC has already done some checking (from their methodology page, “CDC’s internal and external validation studies confirm the strong consistency between MRP model-based SAEs and direct BRFSS survey estimates at both state and county levels.”) and I guess you could do more.

Overall, I’m positively inclined toward these MRP estimates because I’d guess it’s much better than the alternatives such as raw or weighted local averages or some sort of postprocessed analysis of weighted averages. I think those approaches would have lots more problems.

In any case, it’s cool to see my method being used by people who’ve never met me! Mister P is all grown up.

P.S. My correspondent provides further background:

The CDC generates prevalence estimates for various diseases at the county level (or smaller) by applying MRP to the national Behavioral Risk Factor Surveillance System. Unlike for other diseases, they’ve documented their methods for diabetes. Their model defines 12 population strata per county (2 races x 2 genders x 3 age groups) and incorporates random effects for stratum, county, and state. There are no other variables at any level in the model.

A number of papers use the MRP-derived data to estimate associations between, for example, PM2.5 and diabetes prevalence. Do you think this is a valid approach? Would it be valid if all of the MRP covariates are included in the model?

My response:

1. Regarding the MRP model, it is what it is. Including more demographic factors is better, but adjusting for these 12 cells per county is better than not adjusting, I’d think. One thing I do recommend is to use group-level predictors. In this case, the group is county, and lots of county-level predictors will be available that will be relevant for predicting health outcomes.

2. Regarding the postprocessing using the MRP estimates: Sure, it should be better to fold the two models together, but the two-stage approach (first use MRP to estimate prevalences, then fit another model) could work ok too, with some loss of efficiency. Again, I’d recommend using fake-data simulation to estimate the statistical properties of this approach for the problem at hand.


  1. Daniel H. says:

    I’m wondering: If I want to pass estimate uncertainty to a second model, is there a pragmatic way to do this? Like not only pass mean estimate but two points with mean+se and mean-se?
    Any suggestions here?

    • Björn says:

      Is that not kind of the same as doing Bayesian inference after multiple imputation? If so, I believe there’s a decent amount of explanation on how to do that (see e.g. or in fact in BDA).

    • The best thing to do is build a joint model with both the first model with MRP and the second model depending on the first. With MCMC, for each draw, you compute MRP estimates which can be used as, say, predictors in another model. In Stan, that MRP computation would happen in the transformed parameters or model block, depending on whether or not you want to save the posterior for the MRP.

      As for Björn’s comment, using multiple imputation is kind of like cut in BUGS—the information doesn’t properly flow from the second model back to the first. Sometimes people want to do this, as in PK/PD models where the PK model is very well calibrated but the PD model is poorly specified. If you build a joint model, the PD model can use the degrees of freedom in the PK model to distort the PK posterior.

      • I think Daniel H’s question gets at a slightly different point, which is that often you’ll have as your *input data* some output from some other person’s analysis, whether that’s the MRP estimates from the Census or some other researcher or whatever.

        How can you use this MRP output, especially in the form of a point estimate, as input to your model while taking into account the fact that the MRP output isn’t uncertainty free?

        I think the answer is to build a measurement error model. An MRP output is basically no different from any other measurement. All measurements are based on models, even say a voltage measurement in a multimeter is based on knowledge of how the circuit works. And yes, if you can get the source to either output several values for their estimates or even just a point estimate and an rms error estimate (standard error) it can help you with this measurement error model quite a bit. Otherwise you might have to provide a relatively wide Bayesian uncertainty over the size of the measurement error.

        • If you don’t have access to the original model and data, then you can’t build a proper joint model and have to work out how best to factor.

          With standard multiple imputations, you just run each one of them independently and then just combine all the outputs to provide limited propagation of uncertainty (limited in that it’s still just one way).

          Creating a measurement model should be a better approach. It doesn’t give you direct joint modeling, but it does allow information to flow from the second model at least back into the measurement error model. I’d be very careful to do posterior predictive checks on the measurement errors—they may not turn out to be realistic for the known measurement procedure if they work out like the PK/PD case.

        • Keith O’Rourke says:

          If you are stuck with just the MRP output it is interesting to think about how to get least stuck.

          You could do ABC on the MRP output and in principle that should be the best you can do as you only observed the MRP output and you are fully conditioning on just that observed (at step one). Unfortunately is not sufficient and you can’t condition on the second model’s output – but you are stuck.

    • Daniel H. says:

      Thanks everyone for the advice. I should probably specify a little more: I’ve read most of the ARM book and am able to fit models with lmer or stan_lmer, but writing full Stan models is something I’m not able to do yet. My background is engineering and from my very-much-applied background, something like this sounds promising:
      A) Fit a multilevel model on test data (say smartphone battery lifetimes for different battery types, usage modes, charging modes and environments). In such a scenario, a multilevel model appears to be just the right thing.
      B) Maybe use poststratification depending on the underlying question. From this perspective, that’s just a loop of simulations for a target population, so very easy to implement.
      C) In most scenarios, interesting questions could be answered by analyzing the output from A or B. A politics-related example would be to compare the MRP estimates of different elections (say 2016 and 2018), the phone battery example above might be an investigation of the effect of battery chemistry modifications or whatever. This happens all the time manually as part of a normal result discussion (see the recent “what really happened…” post on the 2018 election), but if I wanted to formalize this in a model, I`d definitely want the uncertainty to be there as well. For step A->B, simulations do that in a very userfriendly way.
      I could imagine that a simple, approximate solution might be helpful to a large userbase (while making actual experts cringe), even as a first step of preparation for a joint model. Then again, I should probably just try fake simulations to get an idea of what might work… Also, I`ll read up on multiple imputations and measurement error models!
      Cheers, Daniel

  2. As for the plot, I’d propose putting a legend on it so we knew what the dots meant. Also, grouping 18+ year olds together for coronary disease seems like it’s going to be too tangled with population age distributions (in either totals or per capita measures).

    I couldn’t find that web interface on their page.

  3. Peter Ould says:

    I’m wondering if 12 cells per county is too low. Here in the UK, YouGov have pioneered “constituency level predictions” for elections using MRP where the sample per constituency is around 80 to 85 records. That seems robust (and their track record is very good, even predicting a dramatic swing in my own constituency of Canterbury).

Leave a Reply