Meer Patel writes:
I’m a rising junior in high school and recently completed a study titled “Beyond Averages: Measuring Consistency and Volatility in NBA Player and Team Offense.” I used game-level data to build a new metric for offensive impact and analyzed how player performance fluctuates over time.
From the abstract of his paper:
While traditional player evaluation metrics focus almost exclusively on season-long averages, I propose a framework that incorporates both the magnitude and consistency of offensive impact normalized per minute of play using game-level data from the 2024-25 NBA season. . . . I introduce the Net Offensive Impact (NOI) statistic and use the coefficient of variation of NOI per minute over a fixed, randomly sampled set of games totaling approximately 400 minutes per player to quantify each player’s volatility using a standardized approach. . . . while the league’s top offensive performers tend to be both productive and stable, many players–especially role players and high-variance “wildcard” scorers–display far greater fluctuation. . . . Offensive consistency is closely linked to individual playing time and can also help predict team success to an extent, but falls short in the playoffs. . . . Offensive consistency is a valued but not singularly decisive attribute; both steady and volatile offensive contributors play important roles in shaping NBA outcomes depending on the situation.
I took a quick look at the first version of his paper and wrote that it’s hard for me to evaluate the basketball stuff because I don’t know so much about basketball. But I did have a statistical comment, which is that when your sample size is smaller you will see more variation in the average, and I think that’s what you’re seeing here. So I don’t know that you’re really finding evidence that some players are much more variable than others; you could just be noticing in a different way that some players play many more minutes than others. One way to look at this would be randomly subset your data so that you have the same number of data points for each player.
Patel responded with a revision, writing:
Now, I randomly select the same number of games (20) for each player when calculating volatility. I describe this approach in the updated Data and Methodology section. All results, tables, and figures are based on these random subsets. There were some changes, particularly in the top 5 most consistent and volatile player rankings, where the list changed drastically. That being said, I found that the main findings remained consistent when using equal-sized samples, which gives me even more confidence in the results.
I replied that I’m still concerned about sample size issues. Maybe you should sample equal number of minutes, rather than games, for the selected players? My intuition is still that when you measure volatility, you will find more volatility for players with smaller sample size per game. If you’re measuring game-to-game variability, then players who play fewer minutes will show more variability even with no differences between players, just because fewer minutes is a smaller sample size per game.
Patel replied:
After reading your last email, I revisited my methodology to make sure I’m not unintentionally overstating player-to-player differences in consistency, and updated my paper, which is attached to this email.
For each player, I randomly sample games (from their full season log) until I reach 400 total minutes, then compute NOI per minute for each sampled game, which normalizes all player contributions on the same standardized scale. All volatility statistics, such as the coefficient of variation (CV), are then calculated from this per-minute series, rather than from raw game-to-game NOI.
This adjustment removes the bias where players with fewer minutes per game could appear artificially more volatile, since all rate and variability statistics now reflect offensive production per minute played, regardless of a player’s role or average playing time.
In addition, to further check robustness, I repeated the random sampling with different seeds and confirmed that the volatility and consistency results remain stable across samples.
Enjoy. I have not read the paper in any sort of detail so have no comment on the findings there. (That’s not a negative statement on my part, it’s just the literal truth.) I’m posting here because the general idea sounds cool, issues of implementation aside. Studies of variation in sports are always challenging.
The last time I can recalling sharing someone’s NBA analysis was in this post from 2008. The guy who sent me that earlier item, Eli Witus, had written:
I don’t have any formal statistical training, so I am learning as I go. . . . I am very interested in multilevel modeling–I think it could be very useful in basketball since the game is much more interactive than baseball, and player statistics are heavily dependent on the context of the player’s teammates and coach. I think multilevel modeling could help answer questions about how a player’s statistics are likely to change if he changes teams.
I haven’t heard from Witus in a long time and his blog is no longer around, so I googled his name, and . . . it seems that he’s now the Executive Vice President of Basketball Operations & Assistant General Manager at the Houston Rockets. So all things are possible! In the meantime, if anyone near Edison, New Jersey, has interest in some basketball analytics, there’s nothing stopping you from contacting Meer Patel directly.
I head somebody on a podcast recently saying that the analytics guys say that for players that don’t get starter’s minutes, you can’t really assess a players 3-point average with less than at least one year’s worth of data. Just not enough shots. Too much variance.
It’s funny to me how hard it is to accept that. I instinctively don’t want to wait that long, even if the argument makes sense to me.
As an American living abroad, I really love the way we have an obsession for sports stats. It’s fun to nerd out on this stuff! That said, I always think sports stats view each player as an individual when the reality is that they’re generally under a highly variable set of instructions depending on where they play. It’s the same for a lot of other sports (soccer/football being the clearest to me).
This is, accordingly, the kind of situation where I would probably suggest a hierarchical component. A lot of these players are not operating on their own but rather as part of a system. Yes, you pick your team based on player characteristics (with trades, substitutions, etc) but ultimately most coaches and franchises will have these broader tactical agendas that can affect a player’s volatility and productivity. For example, Shaun Livingston bounced around like 10 teams with pretty patchy numbers until he joined the Warriors and became a rock solid option, and his FG% jumped big time (and then stayed there). The area I think that gets consistently overlooked is extracting a player’s key skillset from their data no matter who they play for, so that you can make good inference about how they’d perform elsewhere. Happy to be disagreed with here, sports stats is a multi-million dollar industry for a reason I presume!
Hello Mr. Blythe, I really appreciate your perspective. I think you’re absolutely right that players operate within systems, and those systems can massively influence how their performance shows up in the stats (I’ve been a diehard Warriors fan since 2013, I’m quite the Shaun Livingston fan!) . That’s actually one of the key things I tried to address in my paper. I introduced a stat called Net Offensive Impact that looks at what a player contributes on a per-minute basis, taking into account not just scoring, but also assists, turnovers, and shooting efficiency. To get at consistency, I used the coefficient of variation to measure how much a player’s performance fluctuates game to game. Then I created something called Volatility Weighted Contribution, or VWC, which takes a player’s average impact and adjusts it based on how consistent they are. So a player who puts up strong numbers but does it unpredictably will have a lower VWC than someone who delivers a similar impact more steadily. What surprised me was how strongly consistency correlated with playing time, even across different roles and teams. So while I agree that systems matter a lot, I think metrics like VWC help us start separating what’s truly player-driven from what might be more situational. Would definitely love to hear how you’d approach modeling tactical structure too—I think that’s a really exciting direction for the next layer of analysis.
Hi Meer, thanks for the reply. I get where you’re coming from on the consistency front. I just wonder whether consistency itself is a function of the role a player is asked to take, and this might show up differently depending on whether the coach’s tactics allow certain types of players to thrive.
Say, for example, you have a 3&D guy who goes through big streaky periods (e.g. DLo) depending on both who he plays for and his own form. He might thrive under coaches who spread the court real wide, with lots of cutters and less screens, compared to a coach who consistently runs pick and rolls. Your model would do a good job of picking up his form, but the coaching factor (pick and rolls vs more spacing) would go unaccounted for.
What you could try is basically just adding another variable to your dataset, which is the coach (or team, or whatever, up to you) as a random intercept. This would essentially look at how some of those individual factors might be explained by a higher-level factor, which is the coach’s style. Most software accommodates multilevel models.
Anyway, totally up to you. That’s just my intuition. Regardless, this is very impressive work for someone in high school!
Would be curious how the 92 selected were actually chosen. Why not just do every player, or every player who had over x minutes in the season?
Would love to see same scatterplots broken out or color coded by player type (the four types mentioned that form the basis of the 92). Related, something that relates the player’s variation to their % of games started or % of minutes played.
Sorry, you do have the stats vs playing time! Just show the breakout by player type (elite, role, etc.)
Quibbling a bit, but you’ve shared a fair amount of hot hand research since 2008.
Alex:
I was thinking that too, but I decided that the hot hand stuff was more about statistics than basketball so it didn’t seem so relevant here. There was also that study we discussed several years ago claiming that it was better to be down a point than up a point at halftime. I have no idea if anyone followed that one up with more data.
Very impressive effort especially from a rising junior – responding and adapting to the suggestions. Good show
Hi @Meer – I’m the creator of the DARKO system in the paper (misattributed to D. Kostic btw).
I think this is an interesting idea, and something people in the NBA analytics space pay attention to. A few thoughts:
1) How were the coefficients for NOI chosen? They seem somewhat arbitrary, and may exacerbate some of what you’ve found here. Maybe I didn’t understand this. I may suggest using something like DRE for this however.
2) My instinct is that players *do* vary greatly in terms of game to game impact, but this is mostly a function of shooting performance. Put another way, I think what you’re capturing here is “what percent of a player’s value is tied to shooting percentage”. Better players tend to have a broader base of skills, so even if they shoot a lot, the shooting is a smaller percentage of their overall offensive impact, so the game to game variance is smaller. Meanwhile the high variance NOI guys tend to lean disproportionately on shooting.
3) I’d probably suggest trying to drill in on the game to game variation (or possession to possession) for the various component stats that go into NOI, to untangle #2 above. It would be interesting to see if there are players who are “inherently” higher variance while controlling for the rest of their skillset and usage profile. For instance, is there actually an archetype that gets 10 rebounds per 100 possessions, with a standard deviation of +3, while there’s another set of players with the same mean, but a higher standard deviation? That would be a pretty interesting finding, if there were inherently higher variance players that teams could target. High variance is generally a bad thing for the very best teams (e.g., OKC), but is maybe a good thing for worse teams. But you need to control for the underlying offensive profile to untangle this.
Good work here. Eager to see what comes of this.
Hello Mr. Medvedovsky, I really appreciate you taking the time to read my work and share your thoughts, and I also want to apologize for the misattribution of DARKO in the paper — that was my oversight and I will make sure it is corrected in the next version. The NOI weights came from efficiency theory, which values each stat by its average contribution to points per possession. Points get full credit, assists partial credit based on the points they create, turnovers are negative, and offensive rebounds are positive. I agree these are just a starting point, and I plan to test weights learned from possession-level data as well as rerun the analysis with a metric like DRE to see if the volatility patterns hold, since these values are likely not fully representative.
Your point about shooting driving much of the game-to-game variance makes a lot of sense. I am planning to break NOI/min into components like shooting, playmaking, and turnovers to measure how much each contributes to variance, and to control for usage and shot quality so we can separate true shot-making variance from role-based effects. I also like your suggestion of looking for players who remain high-variance even after controlling for their profile, and will explore that in my next round of analysis.
Thank you again for sharing your perspective. I have been very interested in DARKO for a while and really admire the depth and predictive power it brings to player evaluation. It would be great to learn more about how you’ve developed and refined it over time, and I look forward to seeing how my own work can complement and build on ideas from your system.
Hi Meer (and Prof. Gelman),
I feel qualified to comment on this paper because I work in basketball analytics – I’m a Basketball Analytics Associate for the Mexico City Capitanes of the NBA G-League and write about and do sports analytics for Nate Silver’s newsletter, Silver Bulletin. I’m also from Edison, NJ, and likely graduated from the same high school you attend (you may, in fact, be in the same grade as my younger brother, who helps me write my own newsletter).
Second, I have worked on game by game metrics before (feel free to check my website, thezonemaster.substack.com), and they are particularly finicky, so don’t take any of my criticism in a bad way – I think it’s impressive you approached a pretty tough problem with a decently elegant solution.
There are many things I liked in your paper: I think your methodology after building the metric was nice – I do think there are better ways to cluster players though, and the change you made to per-minute NOI is good, but your coefficient of variation idea would be better under the assumption that your metric is actually calculating impact correctly.
That’s where much of my feedback lies: you’re right to point out most current metrics we have right now don’t measure game performance well. However, I think that you are missing a key assumption – players with counting stats that vary aren’t necessarily inconsistent – rather, there are a number of scheme-related adjustments and opponent adjustments that would result in a player having a higher or lower box score total.
So you can label NOI as a box score metric, and this study could be measuring which roles are “most stable” (i.e. which players, even if they have lower box score stats than a higher volatility player, tend to occupy the same role in their offense) rather than “most consistent”. Because by assigning impact to counting stats, a large portion of your “consistency” has to do with shooting consistencies, and then you’d just be making a proxy to something we already have direct data for.
It’s important you make that distinction before proceeding further. If you don’t understand why the distinction matters, it’s pretty simple:
Things like injuries, opponents, and even the player’s own teammates can adjust how that player is utilized game to game. And unfortunately, since you’re only using a randomized 400 minute sample, it’s unlikely to produce the results you want. So while you might directionally get some results (you found All-Stars were more consistent than bench players, which points to the distinction I made above), it doesn’t get you anything further.
Second, the weights you used in your metric are quite ambiguous. This could be an area to work on. I can understand why you chose those weights, but it’s important to note that the difference between 1.5 and 1 is quite large, and a player’s actual weight could reasonably be anywhere between a whole number, which is also concerning. I would suggest looking into pbpstats.com as a source for tracking data, which could help you determine more specific, positional weights for players. This is an area I’d be happy to provide advice on.
Feel free to reach out to me, and I’d be happy to discuss my thoughts on basketball as a whole and review any future work of yours.
Thanks,
Joseph