Coding ordinal input variables in a regression

Posted on October 6, 2009 12:32 PM by Andrew

Denis Cote writes:

I am reviewing a paper using logistic regression and I am uncertain about the way they coded their inputs.

They have different ordinal variables coming from self-report questions. For example, self-perceived health” with its answer choice: excellent, very good, good, fair, poor.

Or weight coded as underweight, normal, overweight and obese. They entered the answers as categorical-binary variables (unsure about the precise coding).

Shouldn’t they have kept a single ordinal variable? What would be the best practice with ordinal variables?

I think I would not have asked this question if I hadn’t read and applied your 2 standard deviations technique!

My reply:

Yes, I agree that it would make sense to code “excellent, very good, good, fair, poor” as an ordinal variable taking on values such as +2,+1,0,-1,-2. And, similarly, coding weight with four levels.

Sometimes it makes sense to use binary codings–for example, when modeling survey responses on political opinions, we usually code age as 18-29, 30-44, 45-65, and 65+, assuming no ordering–but in your example, I expect that the predictors would behave monotonically and so a numerical coding would be fine.

On the other hand, if you have tons of data, you could code them as binary and you’d be fine. You can also do a compromise, such as coding the main effects as binary predictors (for flexibility) but then use the continuous coding if you’re including the variables as interactions. You can even use the estimated coefficients from the main effects to inform your continuous coding.

Yet another option is to include the linear term as a predictor and then include departures from linearity using a multilevel model. We have an example or two of this in ARM.

These are interesting questions, and it’s funny how there’s so much statistical literature on ordinal outcomes, but not so much on how to code inputs.

7 thoughts on “Coding ordinal input variables in a regression”

Peter on October 6, 2009 9:11 AM at 9:11 am said:

I have been taught to code these types of ordinal response variables as a series of binary variables as well. As I was told, the reason is that it is difficult to interpret a change in ordinal health status, say from -1 to +2, or -2 to +2, and simpler to interpret the effects of the binary variables. Perhaps this is because in the binary case, the omitted variable becomes a reference for comparison for the remaining variables? I would be interested to hear your thoughts.
Maarten Buis on October 6, 2009 9:36 AM at 9:36 am said:

One way to think about this dummy coded ordinal variable is that it simultaneously estimates a single "effect" of the ordinal variable together with a scaling of the categories that is optimal for this model. The resulting effect is sometimes called a "sheaf coefficient" and was proposed in:

Heise, David R. (1972). Employing nominal variables, induced variables, and block variables in path analysis. Sociological Methods & Research, 1(2): 147–173.
Bob Carpenter on October 6, 2009 10:07 AM at 10:07 am said:

You can code ordinal values (1,2,3)as three binary features either directly, (=1,=2,=3), or cumulatively, (>=1,>=2,>=3). Specifically, 1 is coded as (1,0,0) or (1,0,0) both ways, 2 is coded as (0,1,0) directly or (1,1,0) cumulatively, and 3 is coded as (0,0,1) directly and (1,1,1) cumulatively.

There's no change in predictive power. If you fit a1,a2,a3 for the equals case, you get the same prediction for coefficients (a1,a2-a1,a3-a2) in the cumulative case.

Estimation's a different matter, because the scale change affects the prior. For instance, if the absolute binary features are (1,1,1), the cumulative ones are (1,0,0). If the binary features are (16,5,2), the cumulative ones are (16,-11,-3). So the advantage can go either way for priors centered around 0. Your prior mileage will also vary if the predictors are first normalized.
Manolo Romero on October 6, 2009 11:45 AM at 11:45 am said:

I was also thinking about the dummy coding (which is hopefully what they did to enter the ordinal variables as a series of binary ones). But then I remembered the use of polychoric correlations in CFA modeling. Could it be advisable to use some sort of "hetcor" matrix in regression models that have a mix of inputs? If it is not implemented that way, how could it be implemented in lm()?
If I am completely off I would welcome anyone to tell me so.
Peter Flom on October 8, 2009 2:09 AM at 2:09 am said:

I think the whole "nominal, ordinal, interval, ratio" thing has gotten calcified in ways that are not helpful. Variables don't fit so neatly into these categories.

Is "political party" nominal or ordinal? Aren't independents "in between" D and R? Aren't socialists "past" D?

What about hair color? In some sense, it's ordinal.

The weight classes that brought up this post are certainly not purely ordinal, in that, while we may not know EXACTLY what the intervals are, we know (intuitively) that a coding like

1 2 19999 120200010010

would not be right.

I think this could lead to an interesting discussion of statistical methods for different variables in different contexts
Jim B. on October 8, 2009 6:17 AM at 6:17 am said:

I'm skeptical of the whole thing. For the weight variable I can see a fair argument for a single ordinal variable, but what does "excellent" mean relative to "very good"? Are people well-informed enough to make this judgment? Is there any sort of benchmarking to determine inter-rater reliability?
For example my previous dean considered "excellence" a minimum standard for evaluating faculty. If you were rated "commendable" or "unsatisfactory" something was wrong. The current dean takes the view that commendable is a positive thing – you're doing "ok" by his expecations if you're commendable. Does that change anything about how I'm teaching?
Adam on February 13, 2010 12:44 PM at 12:44 pm said:

A late comment (stumbled across this post while Googling for "ordinal predictors"): Maarten's comment reminds me of developments in optimal scaling that've been incorporated into, for instance, SAS's PROC TRANSREG. Intuitively, finding optimal monotonic transformations of variables in a model seems fishy to me, I suppose because I'm concerned that this "best" transformation might represent reality poorly. For instance, what if the "true" scaling — could we know it — is more like the worst scaling, or somewhere in between the best and worst (but far from the best)?

Comments are closed.