## Take logit coefficients and divide by approximately 1.6 to get probit coefficients

[See update at end of this entry.]

Jeff Lax pointed me to the book, “Discrete choice methods with simulation” by Kenneth Train as a useful reference for logit and probit models as they are used in economics. The book looks interesting, but I have one question. On page 28 of his book (go here and click through to page 28), Train writes, “the coefficients in the logit model will be √1.6 times larger than those for the probit model . . . For example, in a mode choice model, suppose the estimated cost coefficient is −0.55 from a logit model . . . The logit coefficients can be divided by √1.6, so that the error variance is 1, just as in the probit model. With this adjustment, the comparable coefficients are −0.43 . . .”

This confused me, because I’ve always understood the conversion factor to be 1.6 (i.e., the variance scales by 1.6^2, so the coefficients themselves scale by 1.6). I checked via a little simulation in R:

> n <- 100 > x <- rnorm (n) > a <- 1.3 > b <- -0.55 > y <- rbinom (n, 1, invlogit (a + b*x)) > M1 <- glm (y ~ x, family=binomial(link="logit")) > display (M1)

glm(formula = y ~ x, family = binomial(link = “logit”))

coef.est coef.se

(Intercept) 0.88 0.22

x -0.44 0.24

n = 100, k = 2

residual deviance = 118.6, null deviance = 122.2 (difference = 3.6)

> M2 <- glm (y ~ x, family=binomial(link="probit")) > display (M2)

glm(formula = y ~ x, family = binomial(link = “probit”))

coef.est coef.se

(Intercept) 0.54 0.13

x -0.26 0.14

n = 100, k = 2

residual deviance = 118.6, null deviance = 122.2 (difference = 3.5)

> -.44/-.26

 1.69

I did it a few more times and got different results, but always between 1.6 and 1.8 (which is consistent with the literature, e.g., Amemiya, 1981).

Train also refers to a factor of pi^2/6, which is the variance of a single utility in the logit model (so that the difference has a variance of pi^2/3; see p.39 of his book here). This pi^2/3 is a variance, so its square root needs to be taken, hence pi/√3=1.8, which is indeed the sd of the unit logistic distribution. However, as Amemiya (1981) and others have noted, the logistic distribution function actually fits better to the normal, over most of the range of the curve, if we scale by 1.6 rather than 1.8. But, in any case, it’s 1.6, not √1.6. Anyway, I think that’s right.

P.S.

I talked with Dr. Train and we realized that we’re talking about two different (although related) models. I’m working with logit/probit for binary outcomes, or ordered logit/probit for multilnomial outomes, in which there’s a single latent variable (with logistic(0,1) or normal(0,1) error term). Train is working with a utility model in which each alternative has its own independent error term (extreme-value or normal(0,1)), so that the difference in two utilities is either logistic(0,1) or normal(0,2). Hence the sqrt(2) difference in our sd’s. The parameterization/model I use is more common in statistics and, I believe, in econometric analysis of discrete data (e.g., Maddala’s book), but I can see that Train’s parameterization/model would makes sense in settings with different random utility for each person and each outcome.

Train clarifies:

These are not two different parameterizations of the same model, with one parameterization being more common than the other. They are two different models, each with its own parameterization that is common for that model.

### One Comment

1. Russell Almond says:

The similarity of the the Normal Olgive and Logistic function is well known in psychometrics, where is plays heavily into Item Response Theory (IRT). Birnbaum (1968) notes that the logistic function differs by less than 0.01 from the normal cdf with mean zero and standard deviation 1.7.

Consequently, the usual parameterization for the logistic respone model in IRT is:

1/[1+ exp(-1.7 a (theta – b)) ]

Thus, the a and b parameter can be interpreted relative to either the logistic or probit model.

This is consistent with the between 1.6 and 1.8 result above.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F.M. Lord and M.R. Novick (Eds.), Statistical theries of mental test scores. Addison-Wesly. 395-479.