Hi-tech hoops: Characterizing the spatial structure of defensive skill in professional basketball

Screen Shot 2015-05-03 at 6.38.27 PM

Joshua Vogelstein points me to this article by Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry and writes:

For some reason, I feel like you’d care about this article, and the resulting discussion on your blog would be fun.

Screen Shot 2015-05-03 at 6.40.24 PM

Hey—label your lines directly!

Screen Shot 2015-05-03 at 6.41.36 PM

Cool!

Screen Shot 2015-05-03 at 6.42.52 PM

Ummm . . . no.

Screen Shot 2015-05-03 at 6.43.33 PM

No.

Screen Shot 2015-05-03 at 6.44.17 PM

Really, really, really, really no. “−25,474.93.” What were they thinking???

I have nothing to offer on the substance of the paper because, hey, I know next to nothing about basketball! One thing that interested me, though, is that the claims of the paper are entirely presented in basketball terms. I guess that’s a difference between stat and econ/management. A stats paper about sports can just be about sports. An econ or management paper about sports will make the claim of relevance based on general principles of motivation, organization, training, or whatever.

I’m not saying one way or the other is better, it’s just interesting how the two fields differ.

16 thoughts on “Hi-tech hoops: Characterizing the spatial structure of defensive skill in professional basketball

  1. You should also mention that we used STAN to fit one of the models! However, when we wrote the paper, there were some issues with implementing large (e.g. hundreds of params, about 2 dozen categories) multinomial logistic regression in STAN, which is why we used variational inference as a computationally tractable alternative. I’m curious if things have improved with the new STAN releases. It would be cool to go back and try again to fit the multinomial model with STAN and compare it to the variational results.

    • Hey Alex, I had no idea you used Stan! More updated versions of Stan are faster for some models. You should post to the Stan users group and maybe there will be some useful suggestions. Also, Stan now has black-box variational inference (“ADVI”) so you could try that too.

      • This is great stuff. I’ve been reading and re-reading the work put out by this group for awhile.

        I like the use of spatial basis when you have point referenced spatial data.

        When it comes to lattice/areal referenced spatial data, there’s a similar basis method called Moran’s I basis by Hughes and Haran.

        To echo Alex: I tried the areal basis using full Bayes in STAN and it works for moderate sized number of spatial units. As you reach 3000+ units, things get more challenging and analysts might pursue the variational methods like the authors here.

    • Stan didn’t work so the analysis jumped to VB with no verification of the fit*? UNVALIDATED USE OF APPROXIMATE METHODS MAKES MIKE ANGRY.

      Seriously, though, this is an important issue. Even if you did a predictive check it would only be sensitive to the overall combination of the model and the fit, which is insufficient given the substantial importance placed on the model in the subsequent analysis. The importance of MCMC is that we can verify the fit which then allows us to criticize the model with an array of carefully chosen posterior predictive checks.

      *I didn’t see any verifications in the paper or the supplemental materials — not that there’s any great way of verifying a VB fit.

    • Will:

      At the very least, I’d round to -25,475, as the numbers after the decimal point are conveying essentially zero information. But really the question is why these numbers are there at all. If the issue is model fit, I’d prefer assessing fit on some more interpretable scale. I say this not just for the purpose of communication to outsiders but also for the researchers’ goals in understanding and comparing the models they fit.

      • I think our goal was just show that more complex models are actually “worth the effort” because they yield gains in terms of out of sample log-loss. I agree the numbers themselves are essentially uninterpretable and meaningless but I also thought that these deviance criteria were fairly standard. How else might we demonstrate that the out of sample deviance is decreasing for various model iterations. Would it be enough to simply state that this is the case without reporting (hard to interpret) log-likelihood values? We probably should have at least standardized by dividing the log-loss by the by the number of held out samples.

        • Posterior predictive checks! Identify the critical components of the model and see just how well those components model the data. There are sooooo many cool checks that can be done and, given the depth of the interpretation placed onto the model, this kind of verification is crucial.

        • @Andrew: Do you have a rule for reporting decimal places and then rules for how to round (I see here you’re “rounding” to 5)? Why not round the presentation to -25,500, or even just -25,000? I’d think some combination of standard error and posterior standard deviation would be a guide for presenting MCMC results.

          @Alex: -41,460 vs. -41,650 is a very tiny difference in log prediction on a percentage basis. I wonder if this is related to Wei’s and Andrew’s findings for log predictive density, which showed pretty high predictive differences with very small log loss differences. Could you somehow reduce this to a predictive accuracy measure?

          @Alex: I think everyone finds the log error rate (divide by N) easier to understand when there’s a simple population size N to divide by. And usually the machine learning people negate it and treat it as a loss. Deviance multiples by 2 just to make things confusing.

          @Alex: By all means send us the Stan model to see if we can help speed it up.

        • Bob:

          Yes, really I wouldn’t report these numbers at all. But there is a logic to rounding to the nearest integer (that’s what I did here, rounding -25,474.93 to -25,475, I wasn’t rounding to the nearest 5, it just happened that the nearest integer was a multiple of 5). The logic is that when comparing these deviance values, the standard error can be of order 1. Really it depends on the model, but under the simple scenario of comparing normal models that are nested, with model A having one more parameter than model B, the sampling distribution of the difference in deviance if the new parameter is pure noise is something like a chi-squared with 1 df. That is, you’d expect an improvement in deviance of about 1, just from noise. So there seems no point to reporting anything after the decimal point.

  2. Are you familiar with Jim Faller’s paper, The Physics of Basketball? I heard his talk in person while a gradual student at JILA. I play volleyball, not basketball, but he helped my performance as both a VB player and a physicist. ;-)

  3. I am a netballer not a basketballer but it’s similar enough for me to say that defenders don’t just defend people they also defend space – especially in basketball where the players are very big. So, in picture c) defender 6 is not only defending against player 2 but also the space that player 4 could move into to take an easier shot. And defender 7 is also nearly guarding player 4 as he can drop back to defend player 4 if need be. The latter point being that they are individual actors but working as a team so do Chris Paul’s stats really reflect his ability only or does it also include the ability of others in the team to give him the opportunity to apply the pressure he does.

    • great point. similarly, the pacers may have a strategy of funneling players to hibbert, accounting for the difference between his and dwight howard’s shot charts. so many confounders make it difficult to address the causal question of assessing defensive skill.

Leave a Reply

Your email address will not be published. Required fields are marked *