Francesca Vandrola writes,
Reading several papers published recently in Political Science journals (i.e. Journal of Politics, Political Behavior, etc), I find quite consistently that:
I) Authors have discrete variables such as
– Income, say, measured as follow: (1 = $15,000 or under, 2 = $15,001Â–$25,000, 3 = $25,001Â–$35,000, 4 = $35,001Â–$50,000, 5 = $50,001Â–$65,000, 6 = $65,001Â–$80,000, 7 = $80,001Â– $100,000, 8 = over $100,000)
– External efficacy, say, measured as follow: an index that sums responses from four questions: Â‘Â‘People like me donÂ’t have any say about what the government doesÂ’Â’, Â‘Â‘I donÂ’t think public officials care much what people like me thinkÂ’Â’, Â‘Â‘How much do you feel that having elections makes the government pay attention to what the people think?Â’Â’, and Â‘Â‘Over the years, how much attention do you feel the government pays to what the people think when it decides what to do?Â’Â’. The first two questions are coded 0 = agree, 0.5 = neither, and 1 = disagree. The third and fourth questions are coded 1 = a good deal, 0.5 = some, and 0 = not much.
– Church attendance, say, measured as an index of religious attendance, 1 = never/no religious preference, 2 = a few times a year, 3 = once or twice a month, 4 = almost every week, and 5 = every week.
And so on and so forth.
II) Authors include the above variables in their models (as explanatory variables) as if they were continuous. Why? I see the point of not including some of the variables as categorical predictors (say, if the variable has 9 categories), but I am less clear on there being a good rationale for some other cases. Wouldn’t it be preferable, especially if there are sufficient observations, to include some of those predictors in a categorical fashion? Maybe they will indeed behave like an index and have a linear effect… but maybe they won’t.
Variables commonly behave monotonically, in which case linearity can be a good approximation–even if the scale of the predictor is somewhat arbitrary. But then it makes sense to check the residuals to see if the linearity is strongly violated. The alternative–modeling everything categorically–is fine, but that requires work in building and interpreting the model, effort that might be better spent elsewhere.