When it rains it pours . . .
John Transue writes:
I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties.
However, the paper (see here) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).”
They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates.
Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests?
I don’t have a chance to look in detail but it sounds like they’re on the right track. I like that they cross-validated; that’s what we did to check we were ok with our county-level radon estimates.
Regarding your question about the small county problem: no matter what you do, all maps of parameter estimates are misleading. Even the best point estimates can’t capture uncertainty. As noted above, cross-validation (at the level of the county, not of the individual observation) is a good way to keep checking.