Somebody points me to this by Benjamin Morris.

I haven’t read this so I have no idea, but it does seem to have a lot of statistics!

The one part I’m suspicious of item 3(c), where he says, “The statistical community over-values Margin of Victory and under-values raw winning percentages.” As I wrote a few years ago, Don’t model the probability of win, model the expected score differential. But I’d have to take a careful look to evaluate the claim.

Yep, that’s mine.

It’s >9 years old and some of the earlier work I ever did in data-viz and deep-dive statistical analysis. And it definitely feels like that at points. But overall I’m still pretty proud of it–if not for every statistical argument, for the broader logic and strategy involved.

As for winning, hard disagree, especially related to that time period. I fully understand the value of using score differentials as modeling inputs/outputs, but winning has a lot of predictive value in basketball (especially when applied to the playoffs, something I’ve written about subsequently at length). The correct approach is really to use both, as I tried to do then. (Albeit in a very juvenile form, as most of that analysis was. If I were writing this today, instead of combining the two in a regression I’d just discount the value of each subsequent point in MOV with an exponential function like MOV^k where k<1.)

Benjamin:

You could well be right. It’s ultimately an empirical question, and you’ve looked at the numbers and I haven’t. I’ve spent lots more time looking at election statistics, and there you have a much smaller N, so the advantage of the less noisy estimate is clearer. And a lot of my “Just point differential, baby” attitude is coming in reaction to various work (see discussion here, for example) that’s based purely on wins, thus throwing away tons of information and turning everything into a festival of overfitting.

In sports you have a lot more data, and also the game unfolds in real time. In a presidential election, there’s rarely any sort of “garbage time” where you can put in your subs: there’s always the fear of losing, you also have down-ballot races to worry about, and there’s no advantage to be gained by resting your starters.

So, with that in mind, two questions for you:

1. Where do you think the extra information from winning, as compared to point differential, is coming from? One possibility is that there’s some particular ability of teams to win close games; another possibility is that there’s less information than might appear in non-close games, once teams put in their subs.

2. Baseball makes a lot less use of substitution than basketball. So would you expect wins to provide relatively less information, beyond score differential, in baseball compared to basketball? I guess soccer falls somewhere in between.

Not to reply for Benjamin, but I think the answer to #1 is a bit of both. MOV loses some value because of substitutions in blowouts and also run-of-the-mill cruising (even if you don’t substitute, the leading team plays differently and maybe less hard when they have a solid lead). There does also seem to be an ‘ability to win close games’, defined at least partially by having more really good players. If you take two teams that are equal on MOV over the season but one has three great players and the other only has two (having made up the MOV difference by having better players further down the bench, maybe), I would take the three player team in a head-to-head playoff or close game. In those situations the great players play more and harder, leading to a better chance of winning.

You can get an example of this by tracking how 538 has changed their NBA predictions over the years. They started with a straight ELO measure, which is basically just MOV. Then they added an adjustment for playoff experience after it became obvious that teams with a good playoff history won more in the playoffs than you would expect from their regular season ELO. Now they’ve more or less ditched ELO and use player rankings with minutes-played estimates because even the playoff-games-played adjustment wasn’t strong enough.

I’m less familiar with baseball, so I won’t hazard a guess at #2, but as far as 538 goes they still use ELO with only an adjustment for the starting pitcher. I assume that’s because the rest of a team’s lineup changes slowly enough that their margin of victory can be taken at face value.

I agree with Alex’s assessment. However, the fact that basketball teams are not always playing to maximize point differential (which baseball teams, for all intents and purposes, are) *also* means that a player’s appearance in a win vs. a loss, and even his on/off court data, has much more potential for bias. So it’s harder to know if and when you are measuring real causation.

Item 1(b) (“The trade-off between offensive and defensive rebounding exists”) looks like an instance of Berkson’s paradox.

This brings up the anti-hot hand. I recall that if an opposing player was doing a lot of damage, the Bulls would put Rodman on him, and often the result was that he shut him down. But was it all based on dispelling a hot hand illusion? It sure didn’t look like it.

The larger problem with the analysis, in my view, is that it doesn’t really grapple with the big analytic challenge in evaluating top rebounders: the very uncertain relationship between an individual player’s rebound percentage and that of his team. There is overwhelming evidence that this relationship is very weak in general, starting with the fact that there is huge variation in TRB% among players but relatively small among teams. As a result, at the team level rebounding ability explains only about 15% of wins, less than turnovers and far less than shooting efficiency. If players’ rebounding talent (the ability to actually secure rebounds the team would not otherwise have obtained) actually varied by anything like the variation we see in rebounding statistics, then team rebounding rates would differ a lot and rebounding would be a hugely important part of winning in the NBA. But that’s not what we observe.

Rodman was a great rebounder, but still we can be confident that the large majority of his extraordinary rebound totals were effectively taken from his teammates rather than his opponents. How many? That’s what you have to figure out (and then balance that against the fact that he contributed almost nothing else on offense). But here’s a simple illustration of the problem: Rodman led the NBA in DRB% (defensive rebound percentage) every season he was with the Bulls, and each of those seasons is one of the 50 best DRB% performance of all time. Do you know what the *Bulls’* DRB% was over those 3 seasons? About 70.3%, just 1.3% better than league average. That’s only half a rebound per game for the Bulls (worth about 0.5 points). Rodman also put up excellent ORB% numbers, and the Bulls fared better there. But the large majority of Rodman’s rebounds came on defense, and these likely had very little value to his teams.

We can’t possibly evaluate Rodman without understanding what his actual contribution was on the boards, and that hasn’t been done here.

It’s been about 8 years since I read this, but my memory is that this is precisely the strength of this analysis.

Usually it’s assumed that if a rebounder is “taking rebounds away” from his teammates, then that isn’t benefiting the team.

But that doesn’t make much sense: if Rodman’s teammates are confident that he can get a rebound, that lets them, well, not get the rebound. Which could mean hustling down the floor to get set up for the offensive possession or maybe just not work as hard at rebounding. Either way it should benefit the team by some amount that’s hard to see in the defensive statistics.

So measuring Rodman’s effect on winning seems a good way around this.

The NBA average is interesting, but not an enlightening comparison. What might be enlightening are:

1. Bulls averages for seasons pre-Rodman.

2. Jordan’s DRB% pre-Rodman (i.e., did this reduce Jordan’s workload?)

3. These same metrics for the last 5 minutes of close games.

4. Game winning defensive rebounds/blocks/tips, etc.

In basketball, as in a lot of sports, the impact of the event on the outcome is often determined by when it occurs.