Following my skeptical discussion of their article on the probability of a college basketball team winning after ahead or behind by one point at halftime, Jonah Berger and Devin Pope sent me a long and polite email (with graph attached!) defending their analysis. I’ll put it all here, followed by my response. I’m still skeptical on some details, but I think that some of the confusion can be dispelled with a minor writing change, where they make clear that their 6.6% estimate is a comparison to a model.

**Berger and Pope’s first point** was a general discussion about their methods:

We are interested in whether losing can be motivated and lead people to win more often. You are definitely right that teams down by one do not win significantly more than teams up by one, but this was not really what we are arguing because that comparison is somewhat flawed. Directly comparing the winning percentage of teams down by one with teams up by one is incorrect because these are different types of teams in different situations. First, teams are not randomly assigned to be up or down by one. On average, teams down by one tend to be worse (they have a lower season winning percentage). Second, mechanically speaking, it is harder for teams down by one to win. They have to score at least 2 more points than their opponent to win, while their opponents can win even if the teams trade baskets the rest of the game. Taken together, this means that it is incorrect to expect teams down by one to win 50% of games.

What percentage of games should teams down by one be expected to win? Different types of polynomials (cubic, quartic, quintic, etc.), window sizes, or even a linear specification give slightly different answers, but in all cases, the actual winning percentage (51.3%) is higher (and significantly so in almost every instance). The method you suggested indicates the same results. More broadly, while the regression discontinuity methods we use in the paper (including the 5th degree polynomial) are standard in economics (see for example the 2009 working paper on R&D implementation by David Lee and Thomas Lemiuex) we respect that your may find a different approach to the problem to be more useful. Importantly, however, the results are similar regardless of which method we use.

We hope this clears up the misunderstanding. We believe these results support the notion that being slightly behind can be motivating and leads people to win more often than they would have otherwise.

**My response:** I agree that teams are not randomly assigned to be up or down by one point at halftime. Maybe my point has to do with the way the article is written as much as anything else. Recall that Berger and Pope wrote, “Teams behind by a point at halftime, for example, actually win more often than teams ahead by one. This increase is between 5.5 and 7.7 percentage points . . .” This sounds like a descriptive statement, and, first, the “actually win more often” part is not statistically significant (that’s the 51% +/- 2% thing) and, second, is not 6.6 percentage points or anything like it. The 6.6 percentage points is a comparison to how many games they should be “expected to win,” which is a different story entirely. And I agree that this is interesting.

Using the “expected to win under a continuous model” framing would make things much better for me (and perhaps for other literal-minded readers such as myself).

Regarding the 5th-degree polynomial: first off, David Lee is great but I don’t see the complicated model being relevant here. I’m a little worried about high-degree polynomials because of their sometimes dramatic swings at the extremes. I’d prefer a linear model (on the logit scale) or maybe a spline. 5th-degree just seems like asking for trouble; you’re playing with fire when you fit this sort of model. In particular, that strong curve you’re getting near zero does not seem particularly to be demanded by the data; to me it seems like more of an artifact of fitting the 5th-degree polynomial.

Also, the “5.5% to 7.7%” thing is a bit misleading because this is a range of point estimates under different models. I’d prefer their uncertainty to come from the standard error of the model, not the range of point estimates, which can understate uncertainty. In particular, the model misfit is not trivial at a score differential of 4 points also. A fitting procedure that recognized this model error would give you a bigger standard error overall.

**Berger and Pope’s second point** involves some particular issues with the model fit:

In our sample of NCAA basketball games, teams that were losing by 1 point at halftime were more likely to win the game than teams winning by 1 point. This is indisputable. However, is this finding statistically significant given the noise in the data?

**My response:** It all depends on the comparison point. *No,* the difference in probability of winning is not statistically significantly lower if you’re up by 1 than if you’re down by 1. But, *Yes*, the difference is significant between the data and their continuous model. And, given that a continuous model makes sense (given the nature of basketball), this latter difference is interesting.

Berger and Pope continue:

We shouldn’t expect teams down by one to win 50% of games. What should be expected? This is where reasonable people may begin to differ on the right way to construct a counterfactual. Many different curves can be fitted to the data. One may argue (as many did in the comments) that a linear line should be fitted; Andrew Gelman suggests a logistic function. It ends up that it doesn’t really matter what curve is fit.

My reply: Exactly. The key is to state clearly that it’s a comparison to a model rather than that being behind “increases a team’s chance of winning,” which is a more dramatic statement that is not so clearly supported by the data.

Berger and Pope:

For example, consider the figure below (the exact figure requested by Andrew Gelman) which indicates the winning percentage for the home team as the halftime point difference for the home team ranges from -10 to 10. Also, note the inclusion of standard error bars [+/- 1 se’s] for sophisticated readers. The dotted line represents the fitted curve from a simple logistic function when including the halftime score difference linearly. Focus on the winning percentage when either the away team was losing by a point, or the home team was losing by a point. In both of these situations, the losing team did better than expected. For example, when the home team is ahead by 1 point, they end up only winning 57.5% of games while we would have expected them to win 65.6% of games. This difference in actual versus expected performance (8.1%) is statistically significant at conventional levels and provides evidence in favor of our hypothesis that losing can be motivating.

This difference persists when controlling for home-team advantage, possession arrow to start the second half, prior season winning percentage, and team fixed effects (see Table 1 of the paper).

Further, supplementary analyses show that teams losing by 1 point closed the gap the most in the first few minutes after halftime (supporting our motivation hypothesis). Laboratory studies, using random assignment, also demonstrate that merely telling people they are slightly behind halfway through a competition leads them to exert more effort.

My final comments: First off, thanks for making the graph! It wasn’t *exactly* what I’d suggested–I’d actually fit separate functions for the + margins and the – margins (that is, I’d do a discontinuity analysis also, just using a linear function on the logit scale and separate functions for the + and – halves.

In any case, looking at the newly plotted data, it’s striking how symmetric the two halves are: in addition to the patterns when behind by 1 or 2 points, there are also little jumps in both directions at 4 points and 7 points. Surely just noise, but amusing nonetheless.

In summary, I much appreciated Berger and Pope’s response–a little clarification goes a long way. Being behind by 1 point (or, for that matter, 2 points) appears to be “better than expected,” even if not better than being ahead.

P.S. Justin Wolfers wrote more on the subject here and also links to a related paper that Lionel Page sent to both of us on tennis (with tons and tons of data, so that statistical significance is not a concern at all) on the effect of winning or losing the first set in a tennis match.

This whole comparison of teams down by one to teams up by one is rather odd. The win% for the second set of teams will always be 1 minus the win% of the first set — it's the same games! So we want to compare -1 outcomes to ties and -2 outcomes (and other deficits), but it's meaningless to compare -1 and +1 outcomes.

I'd think this was too obvious even to mention, except the authors often make this comparison as though its meaningful, and include a graph in their paper showing both leads and deficits without seeming to recognize they must be mirror images. So I'm not sure they get this….

I too was skeptical about their hedging their bets on the topic of winning. The title of their paper, after all, is “When Losing Leads to Winning.”

Also note the significant difference between halftime ties and 1-point games. I guess that when the paper is finally published, they’ll change the title to “When Losing Leads to Doing Better than Expected.” Or better yet, “If You Can Make Halftime Prop Bets and the Score Is Tied and the Money Line Is Close to Even, Sock It In on the Home Team.”

Could this be an example of regression to the mean? Those teams one point above at halftime seem more likely to perform better in the next half, and vice versa.

Looking at the graph, those teams either tied or one point behind are systematically above the fitted curve, while those one or two points ahead are below. Beyond this narrow range around zero, the data oscillate fairly evenly around the fitted curve, as would be expected.

I have 25,000 college basketball games in my database, with scores in the halves as well as pointspread lines. I do not have the same results as the authors of this working paper. Here are my resutls (no home/away breakdown)

Team down by 5 wins game 27.4%

down by 4 wins 34.9%

down by 3 wins 38.1%

down by 2 wins 42.5%

down by 1 wins 48.0%

tied wins 50%

There were an average of 975 games in each line.

Then I looked at games where the pointspread for the game was 7.5 or less. There were fewer games – only about 640 games per line.

Team down by 5 wins game 29.7%

down by 4 wins 38.1%

down by 3 wins 40.9%

down by 2 wins 45.0%

down by 1 wins 48.1%

tied wins 50%

I think I have more data and games than the authors of the paper, but they have more information on each game. It is possible my data is flawed, I am not in academics so I am not as worried about accuracy as people in academics may be. However I do my best to gather accurate data and I have no reason to believe my data is not accurate.

For the academics out there, what are the chances that the data in the working paper and the data I have can be reconciled by the difference in sample size?.

Andrew's method of comparing home vs away allows you to use each game once, and breaks the symmetry eliminating Guy's concern, but the original graph didn't do this, and that made the analysis extremely weird.

I would be tempted to do a fourier series or wavelet analysis on the home vs away games. Is this effect localized to the region around 0, or is this just some kind of non-localized noise?

Critz post is very interesting, and prompts thoughts of how his data may be different: longer time? inclusion of more small colleges?

I took a look at the Lionel Page paper on momentum in tennis linked by Justin Wolfers at his second post on the NCAA study Freakonomics. It uses a discontinuity analysis similar to that of Berger and Pope, to show that winning a set in tennis improves a player’s performance in the following set by one game. But if I’ve followed the methodology correctly, I believe it’s wrong in critical respects.

The biggest problem is that the regressions run on either side of the cutoff – an infinitely long tiebreaker — appear to be duplicative. One regression estimates that a player who loses set #1 will outscore his opponent by about .5 games in the second set. But he then also calculates the impact of losing the first set –.5 of course – and then adds them together to say a win is worth one full win. But the first regression already captured the impact of losing as well as the impact of winning, because the dependent variable is the second set differential between set #l’s winning and losing players. The result is a series of perfectly symmetrical but redundant graphs in the paper. And any variation from zero at the hypothetical cutoff will necessarily create a “discontinuity” that is twice the real magnitude of the treatment effect.

Beyond that, it’s not clear that his device of looking only at long tiebreakers succeeds in creating a pool of contests between two equal players (such that we can infer any carryover from a first set victory is the result of that victory, not the player’s superior skill). In fact, the average difference in seeding between the winner and loser in long tiebreaks of 14 points or more (about 2 seeds) is the same as we see in 7-5 tiebreaks. And so is the disparity in second set games won. So it appears that the success of first set winners can be explained by their talent difference. (See Table 5).

Moreover, Page’s assumption that longer tiebreaks = more equal players is largely mistaken. Once a tiebreak has reached 6-6, its duration is largely a function of luck, not the players’ talent. The chance of each two-point segment ending in another tie is .50 if the players are evenly matched, but still .495 if one player truly has the ability to win 55% of the points. Even a 60/40 talent split – far more than is realistic –reduces the chance of a 2-point tie only to 48%. A superior player is much more likely to win even a long tie break – the 55% player is 50% more likely to win a long tiebreak than his opponent – but the disparity has little impact on tiebreak length. And thus even in long tiebreaks, the winner will tend to be a better player.

None of this precludes the possibility that winning the first set has a real effect. Page provides some other evidence later in the paper that’s intriguing. For example, when players split the first two sets of a match, the player who won the second is more likely to win the third. But even here the evidence is far from conclusive. And he doesn’t really address the most plausible (to me) explanation for the patterns he sees, which is that the players are learning about their opponents strengths, weaknesses, and patterns of play over the course of the match, and one player is thus developing an advantage over his opponent which then continues in following sets.