Elliott Morris writes:

– I want to run a MRP model predicting 4 categories of response options to a question about gun control (multinomial logit)

– I want to control for demographics in the standard hierarchical way (MRP)

– I want the coefficients to evolve in a random walk over time, as I have data from multiple weeks (dynamic)

Do you know of any example Stan code that does this? So far I have been accomplishing this by interacting linear and quadratic terms for time with all my demographic controls, but that seems like a hack given what we could do with a gaussian process. The folks who made the dgo package for R seemed to have figured this out, but not for multinomial models!

My reply:

I’d recommend linear model rather than multinomial logit. With 4 categories I think the linear model should work just fine, and then you can focus on the more important parts of the model rather than having the link function be a hangup. You can always go back later and look into the discreteness of the data if you’d like.

Also, if you’re modeling attitudes about gun control, think hard about what state-level predictors to include. My colleagues and I thought about this a bunch of years ago when doing MRP for gun-control attitudes. Two natural state-level predictors are Republican vote share and percent rural. These variables are also highly correlated. But look at Vermont: it’s one of the most Democratic states and also the most rural. Vermont also is a small state, so the MRP inference for Vermont will depend strongly on the fitted model, which in turn will depend strongly on the coefficients for R vote and %rural. I think you’ll need some strong priors here to get a stable answer. Default flat priors could mess you up. You might not realize the problem when fitting to one particular dataset, but you’ll be getting a really noisy answer.

**P.S.** These issues would arise for any survey estimation procedure, not just MRP. MRP just makes the whole thing more transparent.

**P.P.S.** More here.

I doubt a random walk on the coefficients is the best idea, since it will tend to put all of the probability on one of the extreme answer categories (and a linear model will put positive probability on values much less than 1 or greater than 4). But AR(1) like in

https://arxiv.org/abs/1908.06716

https://github.com/alexgao09/structuredpriorsmrp_public ( under simulation/proposedarN01_v3.stan )

could work. That Stan code is a centered AR(1), which I have always found to be worse than non-centered. So, a non-centered AR(1) would be something like this in the transformed parameters block:

U_age[1] = sigma_age * epsilon_age[1] / sqrt(1 – square(rho));

for (j in 2:N_groups_age) U_age[j] = rho * U_age[j – 1] + sigma_age * epsilon_age[j];

where epsilon_age is declared a a vector of size N_groups_age in the parameters block and you have

epsilon_age ~ std_normal();

in the model block. You might also want to have an intercept, in the AR(1) process, which is best parameterized as mu * (1 – square(rho)).

Ben,

Thanks for the comment. This is quite helpful, especially the example stan code.

How would you approach a multinomial logit version of this? IE to predict the individual response categories as discrete options, rather than a 1-4 linear scale?

I think Rob Trangucci has some multinomial logit Stan code. Or maybe it was Jim Savage. Anyway, I don’t remember it being any different than the textbook log-likelihood function for a multinomial logit model. You could use the mlogit package in R to set up the design matrix.

Wouldn’t ordinal statistics be the way to go? A linear model assumes that 4=3+1 or 4=2*2, etc. which is not the case for these types of data.

Yes, I’m wondering about this particularly from the point of view of the prediction/imputation step that MRP requires.

In practice I don’t think it will make a difference. The advantage of using a linear model is that it allows you to focus on the more important aspects of the model.

I actually think a lot can go wrong assuming the ordinal data is metric. See this paper by Kruschke and Liddell for example:https://www.sciencedirect.com/science/article/abs/pii/S0022103117307746

If the goal is NHST, then I doubt reviewers allow the linear model. They’ll demand multinomial logit, or whatever they perceive the state of the art to be.

Granted I don’t do MRP and I’m not even a political scientist, I’d like to ask about this:

> I think you’ll need some strong priors here to get a stable answer.

If you use a strong prior then the stability of the answer is somewhat an illusion since you are propagating your belief. I don’t have a problem with that and I’m comfortable with the idea that objective claims don’t really exist.

My concern is whether it becomes easy to fool yourself (and others) that results are more precise than what the data really support. In theory, one can inspect code & equations and know *exactly* what prior has been used. But in practice that may require so much expertise that in complex situations not even the author could understand what’s happening. Maybe I’m more comfortable presenting a noisy estimate and advise relying on the prior for further decisions. In other words, if I see a Bayesian estimate how do I know to what extent it is supported by data as opposed to a strong prior?

Unless you just completed the research in the post above you are only using your prior to come to your conclusion. I don’t think that undermines your reasoned analysis alone, just as I don’t see why using prior information that is true or reasonable but not in the data would undermine an analysis either.

Dario:

The analysis should be transparent. In particular, for all inferences one can present the posterior (that’s the usual summary) and also the prior, and see how things changed with addition of the data. One can also try inferences under different models. Just speaking generally, more research needs to be done on methods for understanding and communicating inferences and sensitivity to assumptions.

Andrew:

Agree, but then as a final step something that more directly reflects underlying scales?

Keith:

Yeah, sure, at some point it would be good to go back in and model the discrete data. It’s hardly the first priority, but, eventually, yes.

In this case, the role of the priors is to avoid being fooled into a conclusion that is more precise than the data can support. As Andrew says a bit later, the paucity of data in this situation (or rather, the granularity with which the data are being modeled) means that without any priors “you’ll be getting a really noisy answer”.

What this means is that the parameter estimates without a prior will converge strongly on those that best fit the limited data at hand. This apparent certainty is the illusion that strong priors will dispel. So when you say “the stability of the answer is somewhat an illusion”, this statement applies even better to the estimate without priors.

Somewhat ironically, the role of the Bayesian prior in this case is the same as that of the null hypothesis, something you are willing to believe until the data force you to do otherwise. But that means it is also incumbent on the modeler to define their priors in a precise and reasonable way, just as it would be for a null hypothesis.

You sort of answer your own question in the end when you say, “I’m more comfortable presenting a noisy estimate and advise relying on the prior for further decisions”. If you want to convince someone to trust your prior (null hypothesis) rather than your noisy data, the best way to do this is to show the noisy estimate then show the estimate from the model with the prior. This shows that the data don’t give you any reason to doubt that particular prior belief, and so you should stick with it instead.

Off the topic but in relevance to

>But look at Vermont: it’s one of the most Democratic states and also the most rural.

which echoes Slovinia in your other “capitalist country never fight” example.

The general framework is causal inference with multiple causes. From the perspective of survey sampling or small area estimation or minimax prediction (Vermont example), Vermont is a naught outlier that results in inference instability or large variance. From the perspective of causal estimation (capitalist example), these overlapping regions are so precious that grant causal identifiability.

Yuling:

It’s Serbia, Croatia, and Bosnia! But, yes, that’s the idea.

presumably the responses are logically ordered, so the OP meant “ordinal” not “multinomial,” but if he really meant multinomial, you definitely don’t want to predict the outcome as linear!

Dl:

For reasons indicated above, I disagree; I do recommend predicting the outcome as linear.

but not if the responses are not logically ordered, right? suppose your outcome is a respondent’s favorite color (1: red 2: blue 3: yellow)–you don’t want to model that as linear! implicitly, here, the outcomes *are* ordered, but i don’t think the post itself ever comes out and says it (other than the title).

Dl:

Agreed. In this case the responses are ordered. I’d misunderstood your comment above.