Josh Menke writes,

I saw that you had commented on adjusted plus/minus statistics for basketball in a few of your blog entries [see also here]. I’ve been working on a Bayesian version of the model used by Dan Rosenbaum, and wondered if I could ask you a question.

I wanted to be able to update the posterior after each sequence of game play between substitutions, so I decided to use the standard exact inference update for a normal-normal Bayesian linear regression model. If you’re familiar with Chris Bishop’s recent book, Pattern Recognition and Machine Learning, the updating equations for this are 3.50 and 3.51 on page 153. I felt OK with using a normal prior based on some past research I did in multiplayer game match-making with Shane Reese at BYU. The tricky part comes with using exact inference for updating the posterior. The updating method is very sensitive to the prior covariance matrix. I start with a diagonal covariance matrix, and if the initial player variances I choose are too high, the +/- estimates can go to infinity after several updates. I thought this was related to the data sparsity causing an ill-conditioned update matrix, but I thought I’d ask in case you’d had any experience with this type of problem.

Have you dealt with an issue like this before? If I set the prior variances low enough, I get reasonable results, and the ordering of the final ranking is fairly robust to changes in the prior. It’s just the estimation process itself that doesn’t “feel” as robust as I’d prefer, so I don’t know that I trust the adjusted values (final coefficients) to be meaningful.

I don’t think I can use MCMC in this situation either because trying to get 100,000 samples using 38,000+ data points and 400+ parameters feels intractable to me. I could be wrong there as well since I suppose I only need to include the current players in each match-up within the log likelihood. But it would still take quite a bit of time.

It would also be nice to go with the sequential updating version if possible since I could provide adjusted +/- values instantly after each game, if not after each match-up.

My reply:

1. I’d try the scaled inverse Wishart prior distribution as described in my book with Hill. This allows the correlations to be estimated from data in a way that still allows you to provide a reasonable amount of information about the scale parameters.

2. I’d go with the estimation procedure that gives reasonable estimates, then do some posterior predictive checks, as described in chapter 6 of Bayesian Data Analysis. (Sorry for always referencing myself; it’s just the most accessible reference for me!) This should give you some sense of the aspects of the data that are not captured well by the model.

3. Finally, you can simulate some fake data from your model and check that your inferential procedure gives reasonable estimates. Cook, Rubin, and I discussed a formal way of doing this, but you can probably do it informally and still build some confidence in your method.

A quick suggestion if the exact inference doesn't work – why not try sequential importance sampling (aka particle filtering) instead? If you generate your draws (e.g. by MCMC) before the game starts, then at each play you update their weights by the new likelihood. I think there are also methods for regenerating particles as the old ones become too useless, and I guess you could run that during the plays.

Bob

Thanks for the suggestions. We definitely need to do checks along the lines of #2 and #3 to check both how the numbers line up, and whether or not the resulting values assigned to the players fit intuition. Although, you would hope that applying a new method to analyzing player performance would turn up some surprises. I may try the suggested prior as well for comparison.

The particle filtering suggestion also sounds interesting. I haven't coded up a particle filter in a few years, but I remember enjoying it. It also may handle the noise better than a direct fit.

I haven't tried simulating at the sequence level using the models I've fit, but I have taken the +/- values I've fit per player and then predicted how much a team will outscore or be outscored over the course of a season. The current results show that for most teams, the estimated number of points is significantly higher than the actual. It is, however, usually in the right direction (the correlation is 0.9)

Coming from a CS background, I don't have as formal a training in regression analysis, but would I be correct in thinking that inflated coefficients suggests I'm leaving something out of the model that is getting spread over the coefficients? For example, it could be a result of ignoring player-player interactions, which are probably important in basketball.

The mean +/- value for a player also comes out greater than 0, which is odd since I'd expect the average player, when weighted by time-played, to be closer to 0.

Thanks again!