Can pseudo-R-squareds from logistic regressions be compared and used as a measure of fit?

Jay Kaufman writes:

This article by Nicholas Wade, “Genes Tied to Gap in Treatment of Hepatitis C,” notes that 55% of European Americans respond favorably to standard hepatitis C treatment, but only 25% of African-Americans. The authors of the new article in Nature assert that this is due in part to an allele they discovered, which is more common in European Americans (based on a survey of Duke University students!). The authors state that 58% of the ethnic difference is due to the differential distribution of this one allele. An interesting part of this story from the point of view of statisticians would be the basis for this 58% number. In the supplementary material available with the article, the authors explain their statistical analysis as follows:

Logistic regression does not have a direct equivalent to the R2 that is found in ordinary least squares (OLS) regression that represents the proportion of variance explained by the predictors. However, it is possible to use an analog, so-called a pseudo-R2, to mimic the OLS-R2 in evaluating the goodness-of-fit and the variability explained, which is the approach we used (ref 14). Using this approach we estimated that rs12979860 could account for 58% of the ethnicity-explained variability by estimating the difference between the expected variability if the IL28B SNP does not account for the variability explained by ethnicity at all, and the observed variability explained by both ethnicity and rs12979860.

While these details are somewhat vague, I trust that you will join me in finding this very suspicious.

The pseudo-R2 simply compares the log-likelihood from the null model (only an intercept) to the log-likelihood from the full model (all covariates included). I would call neither the R2 nor the pseudo-R2 a measure of “goodness of fit”, but at least the R2 in a linear model does mean something straightforward. The pseudo-R2 in a logistic model, however, seems to me to have no straightforward interpretation at all, and I was under the impression that no serious statistician uses this statistic.

The authors wrote that they used this statistic to ascertain that the exposure (allele) “could account for 58% of the ethnicity-explained variability” and yet the pseudo-R2 does not measure variability at all. They claim to have come up with the 58% number by “by estimating the difference between the expected variability if the IL28B SNP does not account for the variability explained by ethnicity at all, and the observed variability explained by both ethnicity and rs12979860.” I am not sure I quite follow that, but it sounds like they are computing the pseudo-R2 statistic twice, once for a model that contains ethnicity and the allele of interest, and once for the model that contains only ethnicity. Perhaps one of these numbers that results is 58% as big as the other, or something like that. If this is indeed what they did, I see no logical connection between this analysis and their claim that 58% of the ethnic disparity is due to the differential distribution of the allele. I have to assume that Nature has careful statistical review, but this doesn’t make a lot of sense to me, based on what I can glean from this description.

My reply: I’ve never used pseudo-R-squared myself, but I can’t speak for the general population of “serious statisticians” here. I know that a lot of statisticians don’t like regular R-squared, but I find it helpful sometimes (see graphs on page 42 of ARM) and even wrote a research article (with Iain Pardoe) on the topic. So I’d be wary about slamming pseudo-R-squared in general terms.

I also don’t know enough about genetics to try to interpret the 58%. But, yes, my guess is that this difference isn’t really 58% of something. Maybe it makes sense as some approximation, though. My recommendation in this sort of situation is to forget about log-likelihoods and R-squares and just attack the comparison more directly, perhaps through an ROC curve or some similar approach.

6 thoughts on “Can pseudo-R-squareds from logistic regressions be compared and used as a measure of fit?

  1. Psuedo R2 in my relative experience creates more confusion than enlightenment. Typically using a LOGIT or PROBIT GLM, I'll measure outcome based upon odds.

    Odds themselves can be difficult to explain, but are visually simple to represent and illustrate the finding much better than pR2.

  2. From the above description I also find it odd how did they come up with 58%. They are mentioning "ref 14" – is that a paper?

  3. In my experience, most ecologists really do want to know how well your model explains the data, and so demand an R2 or similar statistic. For generalized linear models, I've been fairly dissatisfied by the pseudo-techniques. What I've seen used several times, and have used myself on occasion, is an R2 of the observed versus predicted values using OLS. This seems to offer an intuitive measure of fit that reviewers of mine have liked. I haven't yet encountered a good reason why this is a bad idea, although I'd be intrigued to hear any critiques of the technique.

  4. Aren't fitted probabilities the best approach for this? That is, the difference in probability of success between individuals in different groups (given median covariates) is X% but the difference given two individuals in different groups of the same genotype is Y%?

  5. jebyrnes' practice is, I take it, to use the square of the correlation between observed and predicted values, used whenever predicted values can be calculated in the same units as observed. (Thus, there is no need to drag OLS or even regression into the description.)

    If people must have a single measure, it is as versatile a measure as most. A paper discussing its virtues is

    Zheng, B. and A. Agresti. 2000. Summarizing the predictive power of a generalized linear model.
    Statistics in Medicine 19: 1771-1781.

    It can and usually should be linked to a plot of observed versus predicted, what Andrew and some others call a calibration plot. On a variety of grounds, it seems less natural or useful for categorical responses, however.

    In my experience the simplest message has yet to be universally appreciated. Once you move away from the simplest regression models, then there are lots of ways to define R-squared or an analogue, and most of them aren't really much use.

    Also, R-squared in a correlation-squared sense is a measure of linearity, not agreement, although in practice the difference does not seem to bite very much for this purpose.

Comments are closed.