Skip to content

Logistic regression and ensemble statistical procedures

Still at the UCLA site, I found another paper by Richard Berk (with Brian Kriegler and Jong-Ho Baek), this time on “ensemble methods” for predicting binary outcomes. Here’s the abstract:

In this paper, we attempt to forecast which prison inmates are likely to engage in very serious misconduct while incarcerated. Such misconduct would usually be a major felony if committed outside of prison: drug trafficking, assault, rape, attempted murder and other crimes. The binary response variable is problematic because it is highly unbalanced. Using data from nearly 10,000 inmates held in facilities operated by the California Department of Corrections, we show that several popular classification procedures do no better than the marginal distribution unless the data are weighted in a fashion that compensates for the lack of balance. Then, random forests performs reasonably well, and better than CART or logistic regression. Although less than 3% of the inmates studied over 24 months were reported for very serious misconduct, we are able to correctly forecast such behavior about half the time.

I’ve just skimmed it through, and I can’t really say anything about the particular example in criminology, but I am very interested in the evaluation of prediction accuracy. I’ve been doing a lot of work with hierarchical logistic regressions where it’s possible to make reasonable predictions about groups of people, but with individual prediction errors remaining high.

I wonder whether a multilevel (hierarchical) logistic regression could perform well in this example?