The “hot hand” and problems with hypothesis testing

Gur Yaari writes:

Anyone who has ever watched a sports competition is familiar with expressions like “on fire”, “in the zone”, “on a roll”, “momentum” and so on. But what do these expressions really mean? In 1985 when Thomas Gilovich, Robert Vallone and Amos Tversky studied this phenomenon for the first time, they defined it as: “. . . these phrases express a belief that the performance of a player during a particular period is significantly better than expected on the basis of the player’s overall record”. Their conclusion was that what people tend to perceive as a “hot hand” is essentially a cognitive illusion caused by a misperception of random sequences. Until recently there was little, if any, evidence to rule out their conclusion. Increased computing power and new data availability from various sports now provide surprising evidence of this phenomenon, thus reigniting the debate.

Yaari goes on to some studies that have found time dependence in basketball, baseball, volleyball, and bowling.

Here are my quick thoughts:

1. The effects are certainly not zero. We are not machines, and anything that can affect our expectations (for example, our success in previous tries) should affect our performance.

2. The effects I’ve seen are small, on the order of 2 percentage points (for example, the probability of a success in some sports task might be 45% if you’re “hot” and 43% otherwise). Yaari presents his results in terms of p-values but I think it’s best to look at effect sizes directly in terms of probabilities.

3. There’s a huge amount of variation, not just between but also among players. Sometimes if you succeed you will stay relaxed and focused, other times you can succeed and get overconfidence.

4. Whatever the latest results on particular sports, I can’t see anyone overturning the basic finding of Gilovich, Vallone, and Tversky that players and spectators alike will perceive the hot hand even when it does not exist and dramatically overestimate the magnitude and consistency of any hot-hand phenomenon that does exist.

In summary, this is yet another problem where much is lost by going down the standard route of null hypothesis testing. Better, in my view, to start with the admission of variation in the effect and go from there.

14 thoughts on “The “hot hand” and problems with hypothesis testing

  1. Is this a problem with hypothesis testing or looking at things on average? To me they are similar issues, but making claims about what the hot hand idea is on average seems to be the more dangerous presumption. Am I wrong in thinking this? A lot of studies seem to be fixated on the ATE (importantly), but is it always applicable?

  2. Point 3 really sticks with me.

    I expect this is true in many domains but especially medicine.

    When we do the conventional double-blind experiment we pick up the treatment effect on the average difference but we can often see that the within group variation is significant. We might actually prefer 10 treatments each with a low average treatment effect to one treatment with a larger average effect if the treatments had enough variability of effect and didn’t co-vary too much.

    How would we study a problem like that?

  3. 2. The effects I’ve seen are small, on the order of 2 percentage points (for example, the probability of a success in some sports task might be 45% if you’re “hot” and 43% otherwise). Yaari presents his results in terms of p-values but I think it’s best to look at effect sizes directly in terms of probabilities.

    I recall a paper in the mid-90’s (?) in the American Statistician comparing Joe Dimaggio’s hitting streak versus Ted Williams .406 season in 1941. As I recall, the odds of the hitting streak were astronomically small (how’s that for a oxymoron?–microscopically small doesn’t have the same ring, does it?)

  4. The ‘hot hand’ discussion always makes wonder whether anyone has studied or disputes the notion of a ‘slump.’ Certainly baseball and basketball players and fans believe in those. And in baseball, at least, there looks to me like there’s statistical evidence for batting slumps.

    If slumps are real then I think the ‘hot hand’ discussion becomes even more confusing, is the definition of ‘on a roll’ simply mean ‘not in a slump.’

    • Right. “Cold hands” definitely exist — a player might be partly injured, suffering from the flu, upset over his wife filing for a divorce, fallen into bad mechanics, stumped by the defense, not getting the ball at the right moment, nagged by lag of confidence, and so forth and so on. Maybe hot hands are when few of those problems are operating.

  5. Thanks for this. I’ve always had a problem with the “no hot hand” claim because I’ve seen the actual shooting or hitting. A guy hitting .333 or shooting .500 doesn’t go 1 hit, 2 outs or 1 basket, 1 miss. In that .333 or .500, he may go hitless 10 straight times or 1 for 20 (called a “slump”) or shoot 0-10 from the field (“cold hand”). He may also get 4 hits in a row (even 4 homeruns in a row) or drain 6 consecutive baskets. If after 3 of the 6 shots, his teammates decide he’s “hot” and feed him the ball, they’re not wrong for 4 more shots. In my experience, that’s what players are talking about when they talk about streaks, and players can be among the most probabilistic people in the world. I know this doesn’t really refute Kahneman’s assertion if you really look at it from his perspective, but from the players’ perspective, hot and cold do exist and show up in stat lines.

  6. Can the methods used for these studies distinguish between hot hands and cold hands? I played pickup basketball (poorly) until age 52, and when I got fatigued I shot very poorly. This is the opposite of a hot hand, but might look the same statistically if you were just looking at streaks.

    • No, the methods used in the Gilovich, Vallone and Tversky article have very little statistical power.

      Properly analyzed, their own data display clear evidence for non-stationarity, both for “streaking” (e.g., players ‘heat up’ at the free throw line, hitting a higher percentage of first attempts than second attempts) and “alternation” (not merely a statistical ‘regression to the mean’, but driving the success rate beyond that; for example, in field goal shooting, where the defense can make adjustments, and some shooters may become over-confident and attempt more difficult shots).

      One implication is that the baseline expectation for identifying a hot hand should not be, as Andrew notes in summary, the standard null hypothesis.

      The only reliable finding is, as Andrew states in his 4th point, that (most) people have greatly exaggerated expectations about the hot hand. Unfortunately, Tversky and Gilovich (in a followup article in _Chance_ magazine) made their own exaggerated claim that “this misconception of chance has direct consequences for the conduct of the game. … Like other cognitive illusions, the belief in the hot hand could be costly.” Instead, viewing human activity as a sequence of statistically independent trials of identical probability is itself a costly illusion.

  7. What I’ve been consistently confused about in the “hot hand” argument is this. If I’m a believer in the “hot hand,” presumably I don’t believe that every single player who hits six shots in a row has the hot hand; I think the large majority of them are just lucky, and only a few — say, 1% — have the hot hand. If that’s the case, I woudn’t expect the hot hand to produce a measurable correlation overall between the success of consecutive shots, right? Maybe it would be enough to measurably increase the number of long streaks of baskets, but maybe not. Such a hot hand effect wouldn’t provide very much predictive power to a spectator who has access only to the actual shots scored — but it might have much more predictive power for the player, if players KNOW when they have the hot hand.

  8. It’s a variation on the old con, for a 64 team tourney send out 64 letters each predicting that another team will win. Then send out 32, then 16, etc. When you get down to eight teams ask for money for the next game. Rinse repeat.

    The only thing you can look at is the effect of confidence or the lack of it

Comments are closed.