Survey Statistics: exploded logit !

Posted on April 28, 2026 4:00 PM by shira

Two weeks ago we modeled vote choice with candidates C = {Left, Right, Other} as a multinomial logit:

P[voter i chooses candidate c from C] = exp(f(X_ic)) / sum_c’ exp(f(X_ic’))

We saw this model implies independence from irrelevant alternatives (IIA):

Another consequence of the multinomial logit model is a simple expression for ranked data:

P[i ranks Other then Left then Right] = exp(f(X_iOther)) / sum_c’ exp(f(X_ic’)) * exp(f(X_iLeft)) / (exp(f(X_iLeft)) + exp(f(X_iRight)))

Train (2009) Chapter 7 calls this an exploded logit.

To derive the exploded logit:

Train (2009) Chapter 3 explains that the multinomial logit model is equivalent to latent utilities with a Gumbel distribution.
Powell (2023)* notes “The exponentials of the negated Gumbel random variables are Exponential random variables” and uses the memoryless property of the Exponential to derive the exploded logit.

The exploded logit form implies that the ranking of 3 alternatives can be expressed as 2 pseudo-observations: 1) choosing Other from C, 2) choosing Left from {Left, Right}.

* I got Powell (2023) from White Rose Research, not to be confused with Blue Rose Research, where I work. The paper’s subtitle “why endurance is better than speed” caught my eye. They study competitions like Backyard Ultras, where the goal is to outlast your competition.

8 thoughts on “Survey Statistics: exploded logit !”

Anonymous on April 28, 2026 5:17 PM at 5:17 pm said:

Who is Shira?

Reply ↓
- Andrew on April 28, 2026 10:45 PM at 10:45 pm said:
  
  Here’s Shira.
  
  Reply ↓
  - shira on April 29, 2026 2:32 PM at 2:32 pm said:
    
    Thanks, Andrew ! Hi Anonymous !
    
    Reply ↓
Zach on April 29, 2026 2:28 PM at 2:28 pm said:

Michael Betancourt just released a massive discrete choice modeling chapter on his Patreon. I think it’ll be paywalled for a while but worth a read if you’re willing to subscribe! https://www.patreon.com/posts/new-discrete-154918610

Reply ↓
- shira on April 29, 2026 2:32 PM at 2:32 pm said:
  
  Oh thank you for the reminder to take a look, Zach ! I’m already a long-time subscriber to Michael’s patreon materials.
  
  Reply ↓
Andrew on May 4, 2026 2:29 PM at 2:29 pm said:

Shira:

I’ve used and recommended these sorts of models for a long time. I’ve never heard it called an “exploded” model. It’s not a bad term. “Hierarchical” or “nested” or “tree” models would also make sense, but they could be confused with existing models with those names.

Reply ↓
- shira on May 20, 2026 5:09 PM at 5:09 pm said:
  
  Cool ! Can you link to places you’ve recommended these ? Thank you, Andrew !
  
  Reply ↓
  - Andrew on May 20, 2026 8:32 PM at 8:32 pm said:
    
    We discuss the idea in chapter 15 of Regression and Other Stories:
    
    Page 277:
    
    Ordered categorical data can be modeled in several ways, including:
    • Ordered logit model with K−1 cutpoint parameters, as we have just illustrated.
    • The same model in probit form.
    • Simple linear regression (possibly preceded by a simple transformation of the outcome values). This can be a good idea if the number of categories is large and if they can be considered equally spaced. This presupposes that a reasonable range of the categories is actually used. For example, if ratings are potentially on a 1 to 10 scale but in practice always equal 9 or 10, then a linear model probably will not work well.
    • Nested logistic regressions—for example, a logistic regression model for y=1 versus y=2,…,K; then, if y≥2, a logistic regression for y=2 versus y=3,…,K; and so on up to a model, if y≥K−1 for y=K−1 versus y=K. Separate logistic (or probit) regressions have the advantage of more flexibility in fitting data but the disadvantage of losing the simple latent-variable interpretation of the cutpoint model we have described.
    . . .
    
    Page 278:
    
    As discussed at the beginning of Section15.5, it is sometimes appropriate to model discrete outcomes as unordered. An example that arose in our research was the well-switching problem. As described in Section 13.7, households with unsafe wells had the option to switch to safer wells. But the actual alternatives are more complicated and can be summarized as: (0) do nothing, (1) switch to an existing privatewell (2) switch to an existing community well, (3) install a new well yourself. If these are coded as 0, 1, 2, 3, then we can model Pr(y≥1), Pr(y≥2|y≥1), Pr(y=3|y≥2). Although the four options could be considered to be ordered in some way, it does not make sense to apply the ordered multinomial logit or probit model, since different factors likely influence the three different decisions. Rather, it makes more sense to fit separate logit (or probit) models to each of the three components of the decision: (a) Do you switch or do nothing? (b) If you switch, do you switch to an existing well or build a new well yourself? (c) If you switch to an existing well, is it a private or community well? More about this important category of model can be found in the references at the end of this chapter.
    
    There’s a lot of good stuff in Regression or Other Stories!
    
    Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Survey Statistics: exploded logit !

8 thoughts on “Survey Statistics: exploded logit !”

Leave a Reply Cancel reply