Estimating basketball shooting ability while it varies over time

Someone named Brayden writes:

I’m an electrical engineer with interests in statistics, so I read your blog from time to time, and I had a question about interpreting some statistics results relating to basketball free throws.

In basketball, free throw shooting has some variance associated with it. Suppose player A is a career 85% free throw shooter on 2000 attempts and player B is a career 85% free throw shooter on 50 attempts, and suppose that in a new NBA season, both players start out their first 50 free throw attempts shooting 95% from the line. Under ideal circumstances (if it was truly a binomial process), we could say that player A is probably just on a lucky streak, since we have so much past data indicating his “true” FT%. With player B, however, we might update what we believe is his “true” FT% is, and be more hesitant to conclude that he’s just on a hot streak, since we have very little data on past performance.

However, in the real basketball world, we have to account for “improvement” or “decline” of a player. With improvement being a possibility, we might have less reason to believe that player A is on a hot streak, and more reason to believe that they improved their free throw shooting over the off-season. So I guess my question is: when you’re trying to estimate a parameter, is there a formal process defined for how to account for a situation where your parameter *might* be changing over time as you observe it? How would you even begin to mathematically model something like that? It seems like you have a tradeoff between sample size being large enough to account for noise, but not too large such that you’re including possible improvements or declines. But how do you find the “sweet spot”?

My reply:

1. Yes, this all can be done. It should not be difficult to write a model in Stan allowing measurement error, differing player abilities, and time-varying abilities. Accuracy can vary over the career and also during the season and during the game. There’s no real tradeoff here; you just put all the variation in the model, with hyperparameters governing how much variation there is at each level. I haven’t done this with basketball, but we did something similar with time-varying public opinion in our election forecasting model.

2. Even the non-time-varying version of the model is nontrivial! Consider your above example, just changing “50 attempts” to “100 attempts” in each case so that the number of successes becomes an integer; With no correlation and no time variation in ability, you get the following data:
player A: 1795 successes out of 21000 tries, a success rate of 85.5%
player B: 180 successes out of 200 tries, a success rate of 90%.
But then you have to combine this with your priors. Let’s assume for simplicity that our priors for the two players are the same. Depending on your prior, you might conclude that player A is probably better, or you might conclude that player B is probably better. For example, if you start with a uniform (0, 1) prior on true shooting ability, the above data would suggest that player B is probably better than player A. But if you start with a normal prior with mean 0.7 and standard deviation 0.1 then the above data would lead you to conclude that player A is more likely to be the better shooter.

3. Thinking more generally, I agree with you that it represents something of a conceptual leap to think of these parameters varying over time. With the right model, it should be possible to track such variation. Cruder estimation methods that don’t model the variation can have difficulty catching up to the data. We discussed this awhile ago in the context of chess ratings.

P.S. Brayden also shares the above adorable photo of his cat, Fatty.

13 thoughts on “Estimating basketball shooting ability while it varies over time

  1. This is a perennial question in baseball analytics. Suggest checking out Tom Tango’s work (here’s one post on the topic with more questions than answers: http://tangotiger.com/index.php/site/comments/established-new-level-of-talent — definitely read the comments) and here’s another from Fangraphs about whether a particular catcher improved his ability to frame pitches for strikes: https://blogs.fangraphs.com/the-two-things-chris-iannetta-represents/ and finally a post from saberist Phil Birnbaum http://blog.philbirnbaum.com/2019/03/true-talent-levels-for-individual.html

  2. My suggestions:

    a) Use a quantitative approach. This paper:

    https://www.degruyter.com/document/doi/10.1515/1559-0410.1414/html

    quantifies the factors that are needed for success in free throw shooting. This may be applicable to determine if a player is improving. (It’s behind a paywall; so I assume you’ll have to use Sci-Hub, if you’re not at a university — but I haven’t checked if it’s available.)

    b) Use a multivariate fixed effects model.

    Begin by dividing each player’s season’s data into equal divisions. The purpose, then, is to determine how correlated each division is to each other. For instance, if there are ten divisions for player Kyrie Kwazy, a player that’s improving should show that the last two divisions, 9 and 10, are more correlated than 1 and 2. (How one divides the season’s data, I suspect, is the problem with this model.)

    The reason for a multivariate model is to test the model using different parameters, home games, away games, back-to-back games, etc.

  3. Another fun aspect of these models is that your estimates of the past values of the parameters will also keep changing over time. That is, at time t1 you might estimate that skill at t1 is x, but then with new data at t2 you’d estimate that skill at t1 was actually y > x. Intuitively, the moment you observe the player was on a hot streak, you might think “well they’re having a good night, a lucky break”, but then when the lucky break doesn’t stop, you might look at the first game in the hot streak and think “I guess that wasn’t all luck after all.” This is behavior that can’t be recovered by a simple recursive filtrations.

    I think the key to identifiability of this kind of time varying model is a prior regularizing how quickly a player’s skill can reasonably change. Otherwise, any fluctuation in performance can be explained as oscillations in skill that are obviously spurious to real humans.

    • Here’s a paper from Microsoft on Xbox’s skill based matchmaking system

      https://www.microsoft.com/en-us/research/uploads/prod/2018/03/trueskill2.pdf

      They model changes in true skill level by assuming skill changes between games are drawn from a N(previous skill, sigma). I don’t quite understand this — I would expect that skill changes would be mostly increasing with games, and so in posterior predictive checks they’d see skewness in the realized data relative to inferences, but maybe this is not identifiable in a purely ordinal ranking system? Though I’d still people’s win percentages to start low, then generally increase up to a point until they get bored and stop playing.

    • I wrote a case study on partial pooling for repeated binary trials that recreates the original Efron and Morris (1975) empirical Bayes paper on baseball batting averages. A couple of caveats from Andrew: “empirical Bayes” is a dumb name for this technique because it’s no more empirical than full Bayes. Batting averages aren’t clean binary trials because the denominator (at bats) is also a random variable and most importantly, one that’s not uncorrelated to batting average (better batters get more at bats).

      I left some comments to that effect on Robinson’s blog on his original set of posts (not the ones announcing the book).

      But if you really want to see some good baseball stats, you should head to Jim Albert’s blog, Exploring Baseball with R. This is what modern baseball analytics looks like.

      It’s typically assumed in serious sports models that a player’s ability varies over their career, typically increasing until physical condition times skill peaks, then declines. That’s often done per season, though, not per shot or per game. Having said that, every time something like this came up in something I did over the last 10 years, Andrew always said to generalize to a time series. Then it just becomes a problem of fitting, which is why including a hierarchical model is really necessary here.

  4. Hello, I’m a student and I’m actually pretty lost on how to replicate this, but it seems like a great homework problem. Isn’t this a Bayesian proportion test? I’m just trying to use rstnarm–hopefully I’m not going about this in the wrong way.

  5. So I guess my question is: when you’re trying to estimate a parameter, is there a formal process defined for how to account for a situation where your parameter *might* be changing over time as you observe it? How would you even begin to mathematically model something like that?

    I think you want this:

    https://en.wikipedia.org/wiki/Poisson_binomial_distribution

    You can then assume some level of streakiness. Eg, during a game it may take a bit to warm up but later in the game the player is tired and under pressure. So maybe p(success) is highest near halftime. Same for over the course of a season or career.

    Also note this useful fact from the wikipedia page:

    For fixed values of the mean ( μ {\displaystyle \mu } \mu ) and size (n), the variance is maximal when all success probabilities are equal and we have a binomial distribution

    So you can use underdispersion relative to binomial as a proxy for streakiness.

    • This is very easy. You do it this way in Stan:

      data {
        int P;  // number of players
        int T;  // number of times
        int y[P, T];  // number of successes for player at time
        int N[P, T];  // number of trials for player at time
      parameters {
        matrix[P, T] ability;  // player ability at time
      }
      model {
        for (p in 1:P) {
          ability[p, 1:T] ~ ... hierarchical time series model ...
        }
        for (p in 1:P) {
          for (t in 1:T) {
            y[p, t] ~ binomial(N[p, t], inv_logit(ability[p, t]));
          }
        }
      }
      

      Stan has much more compact syntax for this, but this should be more generally understandable for non-Stan users.

      The time series can be a simple first-order random walk or something more elaborate with expected career or season trends. The point is that it needs to use a hierarchical model to partially pool players somehow.

  6. So I guess my question is: when you’re trying to estimate a parameter, is there a formal process defined for how to account for a situation where your parameter *might* be changing over time as you observe it? How would you even begin to mathematically model something like that?

    You’d model that with a simple function of time. f(t) = some_basis_expansion, then learn the basis coefficients.

Leave a Reply

Your email address will not be published. Required fields are marked *