How did Bill James get this one wrong on regression to the mean? Here are 6 reasons:

I’m a big fan of Bill James, but I think he might be picking up the wrong end of the stick here.

The great baseball analyst writes about what he calls the Law of Competitive Balance. His starting point is that teams that are behind are motivated to work harder to have a chance of winning, which moves them to switch to high-variance strategies such as long passes in football (more likely to score a touchdown, also more likely to get intercepted), etc. Here’s Bill James:

Why was there an increased chance of a touchdown being thrown?

Because the team was behind.

Because the team was behind, they had an increased NEED to score.

Because they had an increased need to score points, they scored more points.

That is one of three key drivers of The Law of Competitive Balance: that success increases when there is an increased need for success. This applies not merely in sports, but in every area of life. But in the sports arena, it implies that the sports universe is asymmetrical. . . .

Because this is true, the team which has the larger need is more willing to take chances, thus more likely to score points. The team which is ahead gets conservative, predictable, limited. This moves the odds. The team which, based on their position, would have a 90% chance to win doesn’t actually have a 90% chance to win. They may have an 80% chance to win; they may have an 88% chance to win, they may have an 89.9% chance to win, but not 90.

I think he’s mixing a correct point here with an incorrect point.

James’s true statement is that, as he puts it, “there is an imbalance in the motivation of the two teams, an imbalance in their willingness to take risks.” The team that’s behind is motivated to switch to strategies that increase the variance of the score differential, even at the expense of lowering its expected score differential. Meanwhile, the team that’s ahead is motivated to switch to strategies that decrease the variance of the score differential, even at the expense of lowering its expected score differential. In basketball, it can be as simple as the team that’s behind pushing up the pace and the team that’s ahead slowing things down. The team that’s trailing is trying to have some chance of catching up—their goal is to win, not to lose by a smaller margin; conversely, the team that’s in the lead is trying to minimize the chance of the score differential going to zero, not to run up the score. As James says, these patterns are averages and won’t occur from play to play. Even if you’re behind by 10 in a basketball game with 3 minutes to play, you’ll still take the open layup rather than force the 3-pointer, and even if you’re ahead by 10, you’ll still take the open shot with 20 seconds left on the shot clock rather than purely trying to eat up time. But on average the logic of the game leads to different strategies for the leading and trailing teams, and that will have consequences on the scoreboard.

James’s mistake is to think, when this comes to probability of winning, that this dynamic on balance favors the team that’s behind. When strategies are flexible, the team that’s behind does not necessarily increase its probability of winning relative to what that probability would be if team strategies were constant. Yes, the team that’s behind will use strategies to increase the probability of winning, but the team that’s ahead will alter its strategy too. Speeding up the pace of play should, on average, increase the probability of winning for the trailing team (for example, increasing the probability from, I dunno, 10% to 15%), but meanwhile the team that’s ahead is slowing down the pace of play, which should send that probability back down. On net, will this favor the leading team or the trailing team when it comes to win probability? It will depend on the game situation. In some settings (for example, a football game where the team that’s ahead has the ball on first down with a minute left), it will favor the team that’s ahead. In other settings it will go the other way.

James continues:

That is one of the three key drivers of the Law of Competitive Balance. The others, of course, are adjustments and effort. When you’re losing, it is easier to see what you are doing wrong. Of course a good coach can recognize flaws in their plan of attack even when they are winning, but when you’re losing, they beat you over the head.

I don’t know about that. As a coach myself, I could just as well argue the opposite point, as follows. When you’re winning, you can see what works while having the freedom to experiment and adapt to fix what doesn’t work. But when you’re losing, it can be hard to know where to start or have a sense of what to do to improve.

Later on in his post, James mentions that, when you’re winning, part of that will be due to situational factors that won’t necessarily repeat. The quick way to say that is that, when you’re winning, part of your success is likely to be from “luck”; a formulation that I’m OK with as long as we take this term generally enough to refer to factors that don’t necessarily repeat, such as pitcher/batter matchups, to take an example from James’s post.

But James doesn’t integrate this insight into his understanding of the law of competitive balance. Instead, he writes:

If a baseball team is 20 games over .500 one year, they tend to be 10 games over the next. If a team is 20 games under .500 one year, they tend to be 10 games under the next year. If a team improves by 20 games in one year (even from 61-101 to 81-81) they tend to fall back by 10 games the next season. If they DECLINE by 10 games in a year, they tend to improve by 5 games the next season.

I began to notice similar patterns all over the map. If a batter hits .250 one year and .300 the next, he tends to hit about .275 the third year. Although I have not demonstrated that similar things happen in other sports, I have no doubt that they do. I began to wonder if this was actually the same thing happening, but in a different guise. You get behind, you make adjustments. You lose 100 games, you make adjustments. You get busy. You work harder. You take more chances. You win 100 games, you relax. You stand pat.

James’s description of the data is fine; his mistake is to attribute these changes to teams “making adjustments” or “standing pat.” That could be, but it could also be that teams that win 100 games “push harder” and that teams that lose 100 games “give up.” The real point is statistical, which is that this sort of “regression to the mean” will happen without any such adjustment effects, just from “luck” or “random variation” or varying situational factors.

Here’s a famous example from Tversky and Kahneman (1973):

The instructors in a flight school adopted a policy of consistent positive reinforcement recommended by psychologists. They verbally reinforced each successful execution of a flight maneuver. After some experience with this training approach, the instructors claimed that contrary to psychological doctrine, high praise for good execution of complex maneuvers typically results in a decrement of performance on the next try.

Actually, though:

Regression is inevitable in flight maneuvers because performance is not perfectly reliable and progress between successive maneuvers is slow. Hence, pilots who did exceptionally well on one trial are likely to deteriorate on the next, regardless of the instructors’ reaction to the initial success. The experienced flight instructors actually discovered the regression but attributed it to the detrimental effect of positive reinforcement.

“Performance is not perfectly reliable and progress between successive maneuvers is slow”: That describes pro sports!

As we write in Regression and Other Stories, the point here is that a quantitative understanding of prediction clarifies a fundamental qualitative confusion about variation and causality. From purely mathematical considerations, it is expected that the best pilots will decline, relative to the others, while the worst will improve in their rankings, in the same way that we expect daughters of tall mothers to be, on average, tall but not quite as tall as their mothers, and so on.

I was surprised to see Bill James make this mistake. All the years I’ve read him writing about the law of competitive balance and the plexiglass principle, I always assumed that he’d understood this as an inevitable statistical consequence of variation without needing to try to attribute it to poorly-performing teams trying harder etc.

How did he get this one so wrong? Here are 6 reasons.

This raises a new question, which is how could such a savvy analyst make such a basic mistake? I have six answers:

1. Multiplicity. Statistics is hard, and if you do enough statistics, you’ll eventually make some mistakes. I make mistakes too! It just happens that this “regression to the mean” fallacy is a mistake that James made.

2. It’s a basic mistake and an important mistake, but it’s not a trivial mistake. Regression to the mean is a notoriously difficult topic to teach (you can cruise over to chapter 6 of our book and see how we do; maybe not so great!).

3. Statistics textbooks, including my own, are full of boring details, so I can see that, whether or not Bill James has read any such books, he wouldn’t get so much out of them.

4. In his attribution of regression to the mean, James is making an error of causal reasoning and a modeling error, but it’s not a prediction error. The law of competitive balance and the plexiglass principle give valid predictions, and they represent insights that were not widely available in baseball (and many other fields) before James came along. Conceptual errors aside, James was still moving the ball forward, as it were. When he goes beyond prediction in his post, for example making strategy recommendations, I’m doubtful, but I’m guessing that the main influence on readers of his “law of competitive balance” is to the predictive part.

5. Hero worship. The man is a living legend. That’s great—he deserves all his fame—but the drawback is that maybe it’s a bit too easy for him to fall for his own hype and not question himself or fully hear criticism. We’ve seen the same thing happen with baseball and political analyst Nate Silver, who continues to do excellent work but sometimes can’t seem to digest feedback from outsiders.

6. Related to point 5 is that James made his breakthroughs by fighting the establishment. For many decades he’s been saying outrageous things and standing up for his outrageous claims even when they’ve been opposed by experts in the field. So he keeps doing it, which in some ways is great but can also lead him astray, by trusting his intuitions too much and not leaving himself more open for feedback.

I guess we could say that, in sabermetrics, James was on a winning streak for a long time so he relaxes. He stands pat. He has less motivation to see what’s going wrong.

P.S. Again, I’m a big fan of Bill James. It’s interesting when smart people make mistakes. When dumb people make mistakes, that’s boring. When someone who’s thought so much about statistics makes such a basic statistical error, that’s interesting to me. And, as noted in item 4 above, I can see how James could have this misconception for decades without it having much negative effect on his work.

P.P.S. Just to clarify: Bill James made two statements. The first was predictive and correct; the second was causal and misinformed.

His first, correct statement is that there is “regression to the mean” or “competitive balance” or “plexiglas” or whatever you want to call it: players or teams that do well in time 1 tend to decline in time 2, and players or teams that do poorly in time 2 tend to improve in time 2. This statement, or principle, is correct, and it can be understood as a general mathematical or statistical pattern that arises when correlations are less than 100%. This pattern is not always true—it depends on the joint distribution of the before and after measurements (see here) but it is typically the case.

His second, misinformed statement is that this is caused by players or teams that are behind at time 1 being more innovative or trying harder and players or teams that are ahead at time 2 being complacent or standing pat. This statement is misinformed because the descriptive phenomenon of regression-to-the-mean or competitive-balance or plexiglas will happen even in the absence of any behavioral changes. And, as discussed in the above post, behavioral changes can go in either direction; there’s no good reason to think that, when both teams perform strategic adjustments, that these adjustments on net will benefit the team that’s behind.

This is all difficult because it is natural to observe the first, correct predictive statement and from this to mistakenly infer the second, misinformed causal statement. Indeed this inference is such a common error that it is a major topic in statistics and is typically covered in introductory texts. We devote a whole chapter to it in Regression and Other Stories, and if you’re interested in understanding this I recommend you read chapter 6; the book is freely available online.

For the reasons discussed above, I’m not shocked that Bill James made this error: for his purposes, the predictive observation has been more important than the erroneous causal inference, and he figured this stuff out on his own, without the benefit or hindrance of textbooks, and given his past success as a rebel, I can see how it can be hard for him to accept when outsiders point out a subtle mistake. But, as noted in the P.S. above, when smart people get things wrong, it’s interesting; hence this post.

52 thoughts on “How did Bill James get this one wrong on regression to the mean? Here are 6 reasons:

  1. I liked your description of the confusion between regression to the mean and the inappropriate causal reasoning James is applying to teams that are behind vs ones that are ahead. I think there is another mistake being made here – of a psychological nature. As a sports fan, I have always regretted when my team was ahead and started playing conservatively – and that often (at least it seems often) results in the team behind catching up (or at least coming close to that). But there are counterexamples as well, and it is a well-known confirmation bias at work here. Perhaps the best counterexample is Tiger Woods. In his prime, he was often far ahead and was known for playing cautiously when ahead. He won many tournaments by not making mistakes when he was leading. But what I tend to remember is when my favorite football, hockey, or basketball team got cautious when ahead and then blew their lead. Selective memory at work.

    Another example comes to mind, although it is fraught with some other (extraneous to this theme) issues. Lance Armstrong won a lot of races by protecting his lead once he was ahead. He did not appear to try to win stages, content to make sure that he retained his lead. The strategy of not taking too many chances sometimes works better than being aggressive.

    I think (and I believe this is one of your points) that the issues of effort and strategy are of a different nature than the statistical issue of regression to the mean. The latter exists without any reference to intentions, effort, or strategy. Those issues are relevant to the empirical facts, but there we have other issues at work and I’m not sure what direction the evidence points regarding probability of successful strategies when ahead or behind.

  2. Bill James has never incorporated regression into any of his projections. Just a weight of previous seasons and aging. It’s interesting to see him see the regression but not comprehend it.

    One issue not addressed is how the Win Expectancy (WE) is created. Most WE are from historical examples (e.x. the team behind by 10 has the ball on the 40 with 3:00 to go). If teams are already taking chances, the improved WE is already baked in.

  3. I wrote in to his Q&A column “Hey Bill” as a result of the same article you refer to, for the same reason. Having read him for 40 years, I thought I could present the main point in a form he would appreciate:

    “I ran an experiment to get a sense of how much of the Law of Competitive Balance can be explained by luck alone. (A 100-62 team is more likely to have had good luck than bad luck, so will likely regress next season just from that.) Trying to keep this short but clear… I made 10000 pairs of 162-game seasons with a 10-team league. For each pair of seasons, each team’s “skill” was a random number from 50 to 150. When two teams met, their chances of winning were proportional to their skills (this is basically log5 with the simplification that you didn’t have to compute “skill” from observed results).

    “Anyway, on average a team would be 80% as many games over (or under) .500 in season 2 as they were in season 1. For example, a team 10 games over .500 in season 1 would on average be 8 games over .500 in season 2. Not super scientific but it gives a sense of how much of the observed effect of competitive balance is explained by randomness and how much remains to be explained by other means.”

    His response was “Thank you. I understand what you did, and I appreciate your research.” So he did seem to get it, although I’m not sure how much it affected his thinking about the weighting of the various components of his Law of Competitive Balance.

    • At some point you have to conclude that Bill James is deliberately refusing to understand. He has been corrected on this point many times (this is probably the third I’ve seen, and I have almost zero interest in baseball or James), and every time he both conspicuously omits the name of the criticism and proceeds to ignore it.

      For example, in his 2011 _Sold Fool’s Gold_ (which I got around to reading in part because I assumed that if Andrew Gelman likes James so much, James can’t be *that* bad), he reprints his 1983 article, where he introduced all this ‘whirlpool’ and ‘law of competitive advantage’ stuff, and includes an odd footnote presumably written ~2010:

      “I have no doubt, 28 years later, that the “separate and unequal strategies” distinction that I drew here is real, and I could cite now many, many more studies in which this has been evident. However, the tendency of teams to move toward the center is caused by this phenomenon and *by many other causes*. I should have been more clear about that.”

      Many other causes, eh? Do any of them happen to have a name that readers who don’t already know what regression to the mean is might use to find out more and decide for themselves? If someone brings up regression to the mean, would James simply say, ‘I clearly said there were other causes and that’s one of them’ and proceed to do everything exactly as before? (https://en.wikipedia.org/wiki/Motte-and-bailey_fallacy) Well, let’s move to this post, in 2022, where he’s had an additional 12 years to reflect as a full-time sabermetrician and statistical analyst, and ripe with wisdom at age 73, writes:

      I did some small studies, which I won’t detail here, which confirmed to my satisfaction that this was what was happening. I generalized these observations as the Plexiglass Principle and the Law of Competitive Balance. I used those concepts for several years.

      But some time, I don’t know when, maybe early 1990s, I got a letter or an e-mail from somebody which caused me to put them on a shelf. I do not remember exactly who it was from or exactly what was said, but it was an intelligent letter from a thoughtful person known to me. The essence of it was that I was merely lumping together unrelated phenomenon under a common name. A batter falling back halfway toward his previous average has nothing to do with a team rallying from a halftime deficit.

      By chance, I think, this missive reached me at a time when I was involved with something else—writing a book, no doubt—and I did not have time or energy to think through the issue. I had received similar communications from other people. Despite my stubbornness and contrarian nature, I just set aside the concepts related to the Law of Competitive Balance, and did not write about them, as I recall, for a couple of decades. Probably somebody can find a place where I DID write about them, but. . .you know; good luck with that snipe hunt.

      But a few years ago I was thinking about this again, and I realized that whoever had written me that had just missed the boat. What I SHOULD have done was to write back to explain to him that the fact that he did not SEE the common cause behind these phenomenon does not mean that it was not there. It meant that he was not as smart as he thought he was. It meant that he had not trained himself to see what was beneath of the surface. Of course I would not have sent him that letter; I might have written it, but not sent it. My point is that I should have stuck to my guns.

      This reader obviously understood that it’s regression to the mean for the batter, even if there is a different explanation for teams ‘rallying’. James again refuses to use the term, only referring indirectly to ‘batters falling back to their average’ and in fact, he doubles down: he decides he was wrong to take regression to the mean seriously, that it doesn’t exist, that he should keep doing everything the way he always has, and in fact they are all real phenomenon & even the *same* phenomenon!

      And then you emailed him for what is probably the third such publicly-known correction (‘early 1990s’ doesn’t line up with 2010), and you got what you got (still circumlocutions like ‘what you did’ as opposed to a name), and yet, without your comment, no one would know, because I do not see any amendment or correction or addition to his post.

      How many corrections does it take before one concludes that he is not making an innocent common mistake, but refuses to acknowledge it? If it were a psychologist in the other sort of Gelman post…

      • Gwern:

        I’d put what you’re saying in item 6 of my above post. James made lots of progress (not just fame and fortune, but making real breakthroughs in understanding) by sticking to his guns, disregarding the experts, etc. So it makes sense that he continues with this strategy.

        I agree that it would be better for him to engage with criticism directly: that’s the academic way, but outside of academia (and the early years of the blogosphere) I don’t see much of this. In the world of journalism (and I guess we’d call James a sort of journalism), standard practice seems to be to do the absolute minimum of acknowledgement.

        Comparing people to Malcolm Gladwell isn’t so nice . . . but did Gladwell ever acknowledge he was wrong to say that airplanes take off in a tailwind? And, if he did, did he thank the people who pointed this out to him?

        This is not to say that academics are perfect in this regard. See for example Raghuram Rajan. I’d just say that at this point, Rajan is more of a big shot than an academic, so he’s behaving like a big shot, which is to say that he doesn’t want to acknowledge that he might be able to learn from others, outside of some narrow band of friends and colleagues.

  4. The real point is statistical, which is that this sort of “regression to the mean” will happen without any such adjustment effects, just from “luck” or “random variation” or varying situational factors.

    By saying it is “statistical”, don’t you mean we don’t know the exact reason(s) why? Adjustment effects can be part of the reason.

    Someone with allergies will put up with it until one time they have the worst bout ever. Only then do they finally seek out treatment. Due to regression to the mean, whatever treatment was chosen will likely seem effective. That doesn’t mean there weren’t real reasons the allergies got so bad, then improved.

    • Anon:

      Yes, there can be adjustment effects, but they can go in either direction. It’s possible that good teams are empowered by their success and then they innovate and improve beyond what you might otherwise expect, and it’s possible that bad teams are constrained by their failures and then fail to innovate, thus performing worse than expected. But in either case this is with respect to expectations, which will follow a regression-to-the-mean pattern.

      Regression to the mean is a ubiquitous statistical pattern. Strategic adjustments are also important, and the right way to study the effects of strategic adjustments is to compare outcomes with expectations under the default model. But the default model is not that nothing changes from one time point to the next; rather, the default model is regression to the mean. It is an error to observe regression to the mean and attribute this to adjustment effects.

      • Regression to the mean is a ubiquitous statistical pattern

        Indeed, but there is some real reason things went up too far then back down (or vice versa) in every individual case. We just don’t know what that is, so call it “random”. In the past they said “the gods” decided. Same thing.

        A coin flip is “random”, unless you know the relevant forces on the coin. Then it is no longer “random”.

  5. I like the ROS Chapter 6 description of how regression to the mean can lead to faulty causal reasoning. It is a real pit and you can fall into it. That being said, the basic epistemology of the concept is still very difficult for me.

    For example, regression to the mean does not apply universally to women’s height. It applies to tall and short women, because the height of women who begin very close to the mean will on average “wander” away from the mean. They must in order to maintain the variation. The same is true of test takers who score very close to the mean on the midterm, on average their grades on the final will wander away from, not regress closer to, the mean. From this perspective, regression to the mean is a phenomenon only associated with predictions about statistical outliers, not about full data sets.

    So in women’s height and test-taking comparisons, in addition to the population made up of outliers that will show a predicted slope less than 1, there is an equally well-defined population that begins near the mean and wanders from it. It seems the directionality of the entire population will always be towards the (+/-) standard error, with the high and low outliers regressing closer from the outside, and the population that begins near the mean wandering out to it. From this perspective, regression to the mean becomes an interesting aspect of predictions about outliers, and certainly something to watch for, but not really a fundamental aspect of how the world works.

    • Matt:

      No, regression to the mean holds for average-height women too. Considering women who are 1 cm taller than the mean, on average their daughters will be approx 0.5 cm taller than the mean.

      What you’re talking about is the absolute value of the difference from the mean. Let x = mother’s height relative to mean and y = daughter’s height relative to mean. E(y|x) = 0.5x (approximately); this is regression to the mean. If x = 1cm, then E(y|x) = approx 0.5cm. But E(abs(y)|x) is larger. In particular if x = 1cm, then E(abs(y)|x) is actually greater than 1cm.

  6. I’ve always intuited the “Regression to the Mean” phenomenon as being straightforwardly implied by geometry: a sample from a (symmetric, univariate) distribution will always have more mass on the side that contains its mean vs. the side that does not. So subsequent draws from the same distribution will tend to be from that mean-containing side than from further into whichever tail (with 50% mass going to regression “past” the mean).

    Though under that description, I guess it would better be called “regression towards the median”? Like, if a random variable is log-normal(0,1) distributed, and a draw is made within the interval (1, sqrt(e)), you’d regress *away* from the (arithmetic) mean? Though I guess a median is a type of mean — namely, a subtype of the Fréchet mean. And maybe it’s not about ranks but expectations.

    I do think (agree?) that it’s valuable to distinguish between “stochastic” and “deterministic” forms of mean reversion. Like, the reverting force of an OU process — maybe that’s a “deterministic” effect, like a sports team buckling down when behind or resting on its laurels when ahead. Or maybe generalizing to random fields — can imagine “mean avoiding” effects, if a team gets so far behind they give up. And maybe “deterministically stochastic” effects, if teams increase the “riskiness” of plays, recognizing that conservative play will guarantee their loss?

    Maybe another point of confusion is which mean exactly you’re regressing towards, when dealing with multilevel mixtures of values. A child’s expected height under constant environment will regress away from the parental midpoint towards: the mean of their siblings? the mean of their polity? of their ethnic group? of all extant humans? of all extant primates? etc. (across deep time, however, these may ofc converge). Or with test-scores: graded performance is distributed within students within teachers within schools, and students differentiate themselves along relevant test-taking factors. When the resident whiz kid gets a B on exam whose class average is a C, their next exam performance is less likely to be a B- than an A (but maybe some of their “inherent qualities” will also be mean reverting — their charmed upbringing resulted from their parents’ having gambled and won, but that luck may run dry and return them to some default state of nutritional stress, idk).

    • Nik,

      Yes, it depends on the distribution—and you don’t always get regression to the median either. For a simple example, consider a joint distribution, p(x,y) = 0.5*MVN((x,y) | (-2,-2), I) + 0.5*MVN((x,y) | (2,2), I), i.e. a mixture of two well-separated bivariate normal distributions. Here, if you observe x=1.8, say, then E(y|x) will be approximately 2. It’s regression toward the local mean, not the global mean.

      This sort of thing can happen in baseball settings if you analyze players from different levels (A, AA, AAA, MLB) without adjusting for the level.

      More generally, as you say, the regression is toward an individual mean, not to a population mean. In the simple case of joint normal distribution with no other information on the individual, the individual and population means are the same.

      • > Yes, it depends on the distribution—and you don’t always get regression to the median either.

        Sure, but that’s just because you’re conditioning on x, and draws from the latter mixture component are much more likely to produce x=1.8 than from the former (ie, updating from the flat prior given by the mixture proportion, the odds are exp(dnorm(1.8, 2, log = T)*2 – dnorm(1.8, -2, log = T)*2) = 1,794,075 to 1 you’re in component #2. If you re-draw both x & y from the mixture, y will regress towards (and almost certainly past) the median. I think it’s the same notion as the multilevel structure thing.

        But I’d also say that even in the conditional of a bivariate normal mixture example, you’re still regressing towards the local median and not the local mean, they just coincide for normals. If you’re in a mixture of two bivariate log-normals that you get from exponentiating your above mixture, and you draw an x of exp(2.25), then you’ll get more draws in the direction of the local median exp(2) than the local mean exp(2.5). But ofc E(y|x) will be almost exactly exp(2.5), but that’s cos there’s an E() written right there. I guess I just find “regression towards the mean” to more intuitively mean “which direction, up or down, will the next draw probably be in?” vs. “what is the expectation of the next draw”. Like, if I draw a value of x = 2 from p(x) = 0.9999*Normal(x | 1, 1) + 0.0001*Normal(x | 1E10, 1), I would say (for practical purposes) my next draw will “regress me” towards the median outcome of 1 and not the mean outcome of 1E6.

        So I think this is a question of language and not probability theory. And I’ll readily admit to ignorance of historical usage of the expression — it may be that my intuitive preference above is just something I made up, and should correspond to a different phrase! (like “movement at the median” idk… sounds like a groovy dance)

  7. “The team which, based on their position, would have a 90% chance to win doesn’t actually have a 90% chance to win.”

    I don’t know what “based on their position” specifically means there, but it seems like it should mean that teams with that position have won historically X% of the time–which would include any mean-regression or psychological effects already. I agree there are those effects, but unless there is a way of quantifying them so their relative effects could be compared I don’t see much point in hand-waving about it. Perhaps random mean-regression could be estimated and compared to historical data and the difference, if significant, could be called the James effect.

  8. 1. “James’s mistake is to think, when this comes to probability of winning, that this dynamic on balance favors the team that’s behind. When strategies are flexible, the team that’s behind does not necessarily increase its probability of winning relative to what that probability would be if team strategies were constant. Yes, the team that’s behind will use strategies to increase the probability of winning, but the team that’s ahead will alter its strategy too.”

    There’s a psychological issue to consider though. If the team that’s behind is faster to change strategy or more aggressive in their new strategy, on average that could create an asymmetry that favors the team that’s behind. It’s an empirical claim, maybe someone here knows if there’s any truth to it.

    2. “I don’t know about that. As a coach myself, I could just as well argue the opposite point, as follows. When you’re winning, you can see what works while having the freedom to experiment and adapt to fix what doesn’t work. But when you’re losing, it can be hard to know where to start or have a sense of what to do to improve.”

    I suspect this is exactly right. Often, there are many more ways a thing can go wrong than right. Sort of falls out of the 2nd law of thermodynamics.

    3. I have a very basic question about regression to the mean that I’m hoping someone can shed some light on. The common example is that of parent-child heights correlating to some degree, say r = 0.5. So if dad is 6’0”, and the average height in the population is 5’8”, we might predict the son’s height to be 5’10”, midway between the population average and dad. Does this not suggest that over time the variance in height of the population should get smaller such that in the limit everyone’s height in some future generation should approach 5’8”? I’ve seen comments on stackexchange about this making reference to genetic variance acting as a counterforce, but I find that not so helpful. Appreciate any thoughts people might have.

    • Adam:

      Regarding your point 3: this is a point that is often confusing, so it’s typically discussed in detail in textbook discussions of regression to the mean. The answer to your question is that, yes, the average height of the son in that example would be 2 inches taller than the mean, but there’s also variation. You don’t get everyone’s height approaching the mean, because of the variation. See also my comment here.

      • Had a chance to flip through ROS and think some more about this.

        If the correlation between father-son heights was 1, we’d have perfection prediction. If dad were 5’8”, we expect son to be 5’8”, and if dad were 6’0”, we expect son to be 6’0”. So if the distribution of dad heights had variance = sigma, and dad heights mapped perfectly to son heights, we should expect the son distribution to also have variance = sigma, so variance stays the same generation to generation.

        If the correlation were 0, we’d have no predictive power. Whether dad is 5’8′ or 6’0”, we’re merely reaching into the urn of normally distributed heights with variance = sigma to determine the son’s height. No matter dad’s height, our best guess is 5’8”, but other heights are possible with some probability that falls off according to the Normal as we move away from 5’8”. So variance in height remains at sigma, generation to generation, but for a very different reason.

        If the correlation was between 0 and 1, say 0.5, both forces are at play. If dad is 6’0”, we expect the son to be 5’10”. But this is on average, b/c as you point out, unlike when r = 1, this isn’t perfect prediction, instead there’s random variation pushing height around our best guess of 5’10”. Together the predictor (dad’s height) and noise contribute to the variance of the predicted variable (son’s height), and without going through the maths, I’ll have to trust that it works out so that variance remains constant generation to generation.

        It’s a non-technical, intuitive take, but hopefully the way I’m thinking about it accurately captures regression to the mean.

  9. Seems to me (albeit as a statistics illiterate), that there’s a missing piece here (well, there is for me, anyway) .

    I’ve always thought that reversion to the mean can’t be applied in some indiscriminate sense, but that it applies when a particular performance is anamolous..

    In other wordsz there’s a difference between this:

    > Hence, pilots who did exceptionally well on one trial are likely to deteriorate on the next, regardless…

    And this:

    > From purely mathematical considerations, it is expected that the best pilots will decline, relative to the others,

    In that the first example suggests an anomalous performance whereas the second example suggests particular pilots have skills that are above the mean.

    • IOW, seems to me you’d expect “the best pilots” to continue to perform exceptionally well, and NOT revert to the mean. But you’d expect a pilot that performed far above their typical performance to not maintain that high level (at least for very long, and unless they had some kind of step change attributable to some specific feature, like they got fit with new glasses).

      • “But you’d expect a pilot that performed far above their typical performance to not maintain that high level”

        Yes, but there is another dimension to it. When one is learning, the standard of performance is naturally lower, and luck plays a bigger part. As the pilot (in this example) learns how to do the maneuver, his standard of performance naturally increases and the variance gets lower, because the pilot can now recognize more factors that influence the situation, and has more neuro-muscular ability to operate the controls as needed. Yes, luck is always involved – and so regression to the mean will apply – but to a lesser extent as more learning and experience take place.

        • Tom –

          Makes sense to me.

          The other aspect that puzzles me about this post is that I don’t quite get how regression to the mean would work for a team, especially year over year.

          Teams comprise individuals, and seems to me the functioning of a team – in the sense of being a unit as opposed to a collection of independent actors – in one year isn’t likely to have much relationship to the functioning of the team as a unit in the next year – except by virtue of things like injury (e.g., an exceptionally large or small amount one year compare to another) or difficulty of schedule, etc.

          This is especially true since team makeup changes year to year..Of course there are some patterns that link teams across years – like teams with good (or poor) player personnel management or teams with huge budgets – that will likely have trends across different years (think the Patriots). But I see no reason why those factors would regress to the mean.

          The extent to which a team would regress to the mean, seems to me, it mostly would reflect changes in performance of individual players. In other words, if a large % of a team had career years one year, you’d expect regression the next year. But it’s not really that the team, per se, is regressing to the mean. Seems to me saying that is like trying to create an average among independent trends – which doesn’t make a whole lot of sense to me unless they’re all affected by the same causal variable.

          Or maybe ’cause I don’t know nothin’ ’bout stats, that doesn’t make any sense?

        • One more comment on this and then I’ll drop it.

          Thinking more about eams regressing to the mean thing, as opposed to groups of individuals regressing to the mean.

          Seems it would also vary a lot by the sport. High variance factors would seem to apply much more to football – with the onsides kick, going for it on 4th and long instead of kicking a field goal or punting, two-point conversions, blitzing, etc. Baseball? I don’t see much there in the way of “high variance” changes. Hockey – maybe pulling the goalie. Basketball? I dunno – with the shot clock I don’t really see slowing the game down as much of a variance strategy – not like it used to be in the college game where Upenn guards could dribble out the clock for the last 3 minutes of the game.

  10. Interesting discussion.

    The “regression to the mean” as Andrew expresses it here applies strictly to situations in which the variation is purely random. If, for example, I roll a die six times and get three fives, in my next 30 rolls I will probably get only 3 more fives. With dice, every roll is effectively the same.

    But in many situations in sports, it just ain’t like that.

    If Judge hits home runs in six consecutive games, then over the next nine games hits none, this yields about his average HR production per game, which is ~2hr / 5 games. Did he “regress to the mean” over the last nine games? What if the first six games were against OAK & SEA, the two teams that gave up the most HR in the AL, and next nine were against HOU, CHI, DET, the teams that gave up the least HR in the AL?

    If you didn’t know the schedule or the pitchers in this situation, you might view this as regression to the mean. But if you know the schedule and the pitchers, its probably not “regression to the mean” – the causal factor is the opposing pitching.

    So to some extent – in some cases probably a large extent – whether you consider some behavior to be “regression to the mean” or not is dependent on the resolution you have of potential causal factors.

    • Chipmunk:

      No. If the variation is purely random, the correlation between x and y is zero, and you get complete regression to the mean. If the variation is not purely random, the correlation between x and y will be some number between 0 and 1. For example if the correlation is 0.5 then you will get 50% regression to the mean. The only time you’ll get no regression to the mean is if the correlation is 1. Regression to the mean is a general phenomenon, not at all restricted to the correlation=0 case.

      • There would still be regression to the mean after conditioning on the strength of the opposition.

        You only wouldn’t see regression to the mean if you have perfect information about the causal mechanisms in a system, such that you can know with certainty what the outcome of every game will be (i.e., a deterministic model). Unfortunately, we don’t have perfect foresight so we will see some degree of regression to the mean regardless of the amount of information you condition of.

    • chipmunk –

      When I first read your comment it fit with my thinking. Obviously, Andrew says “no” but I can’t understand his comment. But among with Harry’s response:

      > What if the first six games were against OAK & SEA, the two teams that gave up the most HR in the AL, and next nine were against HOU, CHI, DET, the teams that gave up the least HR in the AL?

      You can ballpark averages in something like that, but where do you really get a foothold? You’d really have to look at where the Yankees meshed with their opponents in their pitching staff rotation, whether any pitchers were injured, how the pitchers they faced matched up and lefty/righty splits, which ballparks they played in and home run rates, etc.

      • Suppose I have a machine that shoots a paper airplane along a linear measuring tape and see how far it goes before it touches the ground. There will be some probability distribution of locations where the paper airplane will land, with a median m1. If on the first attempt we measure x1 > m1, the probability that the second attempt x2 will be less than or equal to x is Integral(PDF(x – m1)) + 0.5 > 0.5. More colloquially, the probability that x2 x2 is a little more than that because x1 is even further than m1. So it’s more likely than not that x2 m2, then his second attempt y2 will more likely than not be shorter than his first. Even though the difference from my shots is explainable, as long as his shots have any randomness, he will regress towards his own median.

        Conditional on all explanatory or predictive factors, if there is residual uncertainty, there is regression the mean/median.

        You can argue that, so long as you believe everything follows the laws of physics, there is no true randomness save for quantum randomness which is irrelevant at our measurement scale. So if you truly condition for *everything* like the currents of the air and the folds in the paper, there is no residual uncertainty and hence no regression to the mean. But in that case, there is also no probability and no statistics–we just use probability as a convenient model, but the appropriateness of a model depends on the state of your knowledge.

        • This site completely bungled that statement because it interpreted the less than or greater than sign as html tags.

          Suppose I have a machine that shoots a paper airplane along a linear measuring tape and see how far it goes before it touches the ground. There will be some probability distribution of locations where the paper airplane will land, with a median m1. If on the first attempt we measure x1 > m1, the probability that the second attempt x2 will be less than or equal to x is Integral(PDF(x – m1)) + 0.5 > 0.5. More colloquially, the probability that x2 is less than m1 = 0.5 because m1 is the median, and the probability that x2 is less than x1 is more than that because x1 is even further than m1. So it’s more likely than not that my second attempt is less than the first.

          The essential point is this. Given a point in a probability distribution, you are more likely to be on the side containing the median.

          Suppose my friend has the same machine, but starts 20 feet ahead of me. Then his shots will be distributed 20 feet ahead of mine; no mystery, fully explainable variation. But his shots still have their own median m2, and if his first shot goes further than m2 then his second attempt will more likely than not be shorter than his first. Even though the difference from my shots is explainable, as long as his shots have any randomness, he will regress towards his own median.

          Conditional on all explanatory or predictive factors, if there is residual uncertainty, there is regression the mean/median.

          You can argue that, so long as you believe everything follows the laws of physics, there is no true randomness save for quantum randomness which is irrelevant at our measurement scale. So if you truly condition for *everything* like the currents of the air and the folds in the paper, there is no residual uncertainty and hence no regression to the mean. But in that case, there is also no probability and no statistics–we just use probability as a convenient model, but the appropriateness of a model depends on the state of your knowledge.

      • Joshua said:

        “You’d really have to look at where the Yankees meshed with their opponents in their pitching staff rotation..”

        Yes, sure, I agree. Team ERA is just a broad measure of the competence of the pitching staff. Individual pitchers vary, the types of pitches they throw vary, how they pitch depends on the situation on the field – with runners on they would pitch different – which in turn depends on the preceding hitters in the lineup to some degree. How far the ball flies depends on weather conditions and time of day; whether it goes out or stays in depends on the park dimensions (which could also be reflected in the home team’s pitchers’ ERAs).

        I brought this up because it always seemed to me that ARod would eat up shitty pitching but was a bit of a bust against top pitchers.

        • IIRC, analytics guys say “clutch” hitting isn’t really a thing. Just an outgrowth of sample size. But you couldn’t ever convince me that Pat Burrell didn’t suck in the clutch.

        • ‘analytics guys say “clutch” hitting isn’t really a thing’

          I’ve thought that for a long time. Metrics like average with runners in scoring position seem to miss the fact that crappy pitchers allow more runners in scoring position, if you’re hitting with RISP you’re more likely to be hitting against a crappy pitcher, and your batting average *should* be higher under those circumstances.

          It’s when you’re behind 2-1 in the ninth against a Cy Young winner and someone eeks out a single that “clutch” hitting comes into play.

        • Chipmunk –

          > and your batting average *should* be higher under those circumstances.

          I think that misses the point. Which is that it’s not likely that in aggregate, average with RISP ISNT going to be different overall average. Not that overall RISP averages increase across all hitters (because they’re facing pitchers who allowed baserunners).

          What I recall (and I could be wrong) is that it isn’t likely that some hitters are good with RISP and some aren’t. It just appears that way as an artifact of sample size.

          Of course that’s hard to believe, because the intuitive experience of watching, leaves the viewer convinced that some players are “clutch” hitters and slime aren’t.

          That what makes baseball a great game even if it’s kinda boring action-wise.

          As I recall those stats were assembled before the extreme shifts became so popular and I wonder if that might have affected the RISP #s – as better hitters sre more likely to face shifts, and only a subgroup of the better hitters are more likely to face the extreme shifts (heavy pull hitters).

  11. Somewhat to the side of the main discussion, but there are many situations where coaches who are losing do _not_ pursue an optimal high-variance strategy, due to perverse personal incentives. If a football team is down 14 in the 4th quarter, and has a 5% chance of winning and a 95% chance of losing by 28 if they play a high variance strategy, many coaches will play conservatively to keep the score “competitive”, even if it guarantees a loss. Losing big is the kind of thing that draws a lot of media/fan scrutiny and can get a coach fired, while a 10 point loss is viewed as a sign that the team was “right there” and may buy a coach a little more time.

    (I’m an Iowa football fan, and Kirk Ferentz’s career can be viewed as a demonstration of the career-preserving power of this approach. Ferentz teams play a very slow, conservative style, so almost all their games are “close”, even though this is somewhat of an illusion due to pace and a suboptimal lack of risk-taking. Ferentz is in year 22 or so at Iowa.)

    In really high-leverage situations, like elimination games or playoff qualification games, you do see high-variance desperation, but in regular season games, you often see a kind of shadow boxing between teams in the last quarter/period to keep the margin reasonable while not altering the outcome.

  12. Stigler, S. (1996) The History of Statistics in 1933, Statistical Science, 11, 244- 252. Offers some amusing further examples of this sort of regression-to-mediocrity “thinking”.

  13. You point out that “regression to the mean” has predictive power, which is something that I hadn’t thought about before. If you have a model of future data that includes a “regression force” term, that model would tend to fit better than one without such a term, and I wonder how difficult it would be not to be “fooled” by this predictive power.

    To make this concrete, we could write a true generating model like this:
    Model 1: y[i] ~ normal(mean, sd)
    and a regression to the mean model like this:
    Model 2: y[i] ~ normal(y[i – 1] – r * (y[i – 1] – mean), sd)
    where parameter “r” represents the strength of the “regression” force.

    Obviously model 2 is more complex than model 1. My question is whether this increased complexity would typically be enough to counter-act the increased predictive power of the model. Most common model comparison metrics (cross-validation, Bayes factors, AIC/BIC, etc.) are predictive criteria, in the sense that they are based on how well the model does predict unseen data (cross-validation) or would predict unseen data that looks like the data we’ve already seen (BF, AIC/BIC, etc.). Would these metrics consistently prefer the true model (1) over the regression-to-the-mean model (2)? I’m genuinely not sure!

    • I think by “true model” you mean maximum likelihood. We’ve know theoretically that it’s better predictively to regularize since the 1950s. One of my favorite historical papers is this one, which uses a baseball example of regression to the mean to predict rest of season performance based on the first 50 at bats.

      Efron, B., & Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. JASA.

      It’s expressed through a hierarchical prior on what you call the “mean” rather than by changing the likelihood. Their shrinkage plot is a really nice illustration of what’s going on

      P.S. I reproduced Efron and Morris’s paper using full Bayes (instead of the empirical Bayes of the paper) in a Stan case study, Hierarchical Partial Pooling for Repeated Binary Trials.

      • Thanks for the links–I hadn’t seen your recent example, and it’s really cool! While it doesn’t directly get at my question, which was about model comparison rather than estimation, I think it reframes (and possibly answers) my original question in a revealing way.

        My question was, “given a model with a ‘regression force’ term and one without, would quantitative model comparison methods consistently prefer the model without that term in situations where there truly was no ‘regression force’, despite its obvious predictive advantage?” This question was inspired by Andrew’s remark that James remains convinced of a model with such a regressive force because that model keeps making good predictions. As we both pointed out, there is a long history in statistics of methods that try to avoid being “fooled” by a model’s predictive power. So another way of putting my question is, “would Bill James have been able to disabuse himself of the notion of a ‘regression force’ if he had used any of these methods?”

        Regularization is closely related to the model comparison question I posed. But I feel I should make clear how I used my terms. By “true model”, I do not mean maximum likelihood, I refer to the structure of the process by which the data were generated. In my example, that was model 1, which specified that data were generated by drawing independent samples from a normal distribution with two parameters representing its mean and standard deviation. So when I say “model”, I am referring to this structure, not to a specific choice of parameter values within that structure. When I say “model comparison”, I refer to comparing two putatively different structures for the process that generates data to assess which best approximates the true data generating process.

        But because the models in my example are nested, one could reframe the model comparison in terms of estimation, and perhaps this is what you meant. If we take the more complex model (#2, with the “regression force” term), place a regularizing prior on the ‘regression force’ parameter, and estimate the joint posterior distribution over parameter values within that model, that distribution will be “shrunk” toward the mode of the prior. One could then conceive of the model comparison in terms of how strong the degree of shrinkage is. After all, the estimate for the regression-force parameter can be thought of as the result of estimating parameters for both models, then marginalizing over each model’s posterior probability. We could place one of Kruschke’s “regions of practical equivalence” around the value of zero for the regression-force term and see how much of the posterior distribution fell within that region.

        Such a model would also embody the idea that there really is a “regression force” in the true data generating process. It is just that there is uncertainly about the strength of that force. And given all the examples above, maybe that’s true! So maybe the answer to my question is, “model 1 is never really the true model, so we should instead focus on estimating the strength of a regression force.”

  14. Perhaps in sports there can usefully be a two factor model that includes both the randomness inherent in classic regression toward the mean, but also effort.

    I have a vague impression that historically great years for baseball players often follow a disappointing season. Here are two examples from the late 1940s National League. E.g., Stan Musial’s monster 1948 (.376 with 39 homers) followed a disappointing 1947 (.312 with 13 HR, following a terrific 1946 of .365 with 20 HR).

    Or in 1949, Jackie Robinson won the MVP in his third season following a sophomore slump 1948 after being Rookie of the Year in 1947.

    For Robinson, we actually have a plausible explanation of why he suffered the sophomore slump besides just regression statistical regression toward the mean: Following his historic 1947 season, practically every black organization in the country invited him to a testimonial dinner in his honor, so he came to spring training in 1948 overweight. Chastened by his lack of progress in 1948, Robinson then upped his effort. He spent the next off-season getting in shape for 1949 and had a huge year.

    Perhaps (just speculating to make a theoretical point) Musial’s off year in 1947 vs. 1946 wasn’t caused by lack of effort but by pure bad luck — e.g., more of his life drives went right at fielders than in 1946. But that still might have encouraged him to put more effort into 1948 than he put into 1947 by teaching him a lesson about baseball being harder on average than it must have seemed to him in 1946.

Leave a Reply

Your email address will not be published. Required fields are marked *