How about the following story:

Mister P established his fame as he wins a series of statistical duels. He was honored baronetcy from the royal (statistical) society after which he became Baronet P or Bart. P.

]]>https://arxiv.org/abs/1805.08233

As you can probably tell, a lot of the in-writeup themes are from this blog

MrP:

The way I framed it is, in the small area estimation applications, you typically follow a 3 step process

1) Calibration 2) MLM Prediction 3) Benchmark

I presented MrP as somewhere in between 2 and 3, say 2.5

I think its a viable option to do: 1,2.5, and 3. If you have units individuals i, strata cells s, ‘small groups’ c, and larger groups L, then you can fit a regularized MLM prediction for each cell s, post-stratify the cells into small groups c. This results in c-many MrP predictions . Then, you can optionally consider benchmarking behavior of the c-many MrP predictions at larger groupings L. I think this would need to respect the nesting of the various resolutions.

Composite estimators:

Also would be curious to hear about ‘composite’ predictions.

In this blog, I’ve seen the idea of folding previous predictions into a secondary mlm model floated out a few times. I feel it falls under that category of ‘composite’ estimators which actually motivated SAE models.

Not a lost cause at all! It’s just best to do so using some prior information.

]]>Obviously, inferences would be very tentative in such cases, but is it worth the effort by applying pretty regularizing say, half-normal, prior on the SD? At least in my limited experience, it seems like these parameters are pretty sensitive to specification (or, just lead to a bunch of divergent transitions).

]]>Yes, one of the advantage of doing a preregistered replication is that we had to make our code super-clear!

]]>…

while some people use the term to mean regularizing between the outcome mle (lhs term) and a model (set of rhs terms)

as in the comment above focusing on fused lasso for mean components to contrast with the mlm structure to regularize the outcome.

of course the two concepts are related, its just helpful to think what aspect youre regularizing

im about to post the summary review writeup onto arxic or someplace and peoples discussion about this would be great

]]>See this paper and this one for examples of MRP with many factors.

]]>If we are investigating “likelihood of voting Dem based on voter’s income,” and we also have information about the voter’s state, it’s easy for me to state a model where the income coefficients for {Kansas, Nebraska, Wisconsin, …} voters can all be shrunk towards each other. I can adjust my model hyperparameters to require more shrinkage or allow less shrinkage, checking which results seem to make the most sense given the data we have. It’s a great way to encode my belief that voter behavior probably isn’t *wildly* different, state-to-state.

But if I have state, income, and education — I get tripped up. I’d like to have the coefficient {middle-income, some college, Nebraska} be shrunk towards {mid-income, some college, Kansas}. But I also want to shrink {mid-income, some coll, NE} towards {high-inc, some coll, NE}. I’ve never figured out how to express that in a multilevel model. I believe that rich, no-college Nebraskans behave a lot like rich, no-college Kentuckyans; but also that rich, no-college Nebraskans also behave a lot like rich, college-degree Nebraskans.

To think about it in this post’s terms — that the multilevel aspect is a special case of regularization — maybe what I’m after is to add a bunch of “fused Lasso” regularizers, like:

min_theta loss(y, x, theta) +

lambda_{same state} * |theta_{rich, college, NE} – theta_{low, college, NE} | +

lambda_{same income} * |theta_{rich, college, NE} – theta_{rich, college, KY} | +

But I haven’t been able to express that as a graphical model. Are there any tried-and-true examples of this kind of “partially pool across several dimensions” I should look at?

Thanks much for this post!

(Also: We have a copy of this book sitting around my house and every time it catches my eye I think, “oh nice, Mr. P finally finished his degree.” https://smile.amazon.com/Drop-Ball-Achieving-More-Doing/dp/1250071739/ )

]]>In MRP researchers often use one macro variable; let’s say Republican support at the state level. In the multilevel regression, you would simply add a macro variable in the regression model.

Now, you would still predict on the same 4000 rows of new data, but with one added column – Republican support- that takes 50 different values, and is repeated 4*5*4=80 times in the prediction dataframe. Does that make sense?

]]>1. Yes, in the past we’ve been too casual about using noninformative prior distributions for the group-level scale parameters. I think it makes more sense to use informative priors. The same information that suggests using certain factors as predictors, should also give us a sense of how much their coefficients should vary. Also we can think of all of this as an approximation to fitting a large model to data from several years, in which case there would be lots more information on the hyperparameters based on the variation in earlier years. The whole fit-each-dataset-from-scratch thing doesn’t generally make sense, and this is a point that Jennifer and I weren’t so clear on in our earlier book. One advantage of stan and rstanarm is that it’s really easy to add informative priors.

2. One could do spatial correlations of states. It always seemed clearer to me to just include relevant state-level predictors and groupings.

]]>This post also reminded me to ask you if you’d ever fit the 50 states with a spatial model? We could use some kind of GP or ICAR prior for it rather than defining regional grouping (I was never sure how you came up with the regional groupings—that seems like another modeling degree of freedom that’s combinatorially intractable).

]]>