Molyneux expresses skepticism on hot hand

image

Guy Molyneux writes:

I saw your latest post on the hot hand too late to contribute to the discussion there. While I don’t disagree with your critique of Gilovich and his reluctance to acknowledge past errors, I do think you underestimate the power of the evidence against a meaningful hot hand effect in sports. I believe the balance of evidence should create a strong presumption that the hot hand is at most a small factor in competitive sports, and therefore that people’s belief in the hot hand is reasonably considered a kind of cognitive error. Let me try to explain my thinking in a couple of steps.

I think everyone agrees that evidence of a hot hand (or “momentum”) is extremely hard to find in actual game data. Across a wide range of sports, players’ outcomes are just slightly more “streaky” than we’d expect from random chance, implying that momentum is at most a weak effect (and even some of that streakiness is accounted for by player health, which is not a true “hot hand”). This body of work I think fairly places the burden of proof on the believers in a strong hot hand, or those still open to the idea (like you), to show why this evidence shouldn’t end the debate. Broadly speaking, two serious objections have been raised to accepting the empirical evidence from actual games.

First, you argue that “Whether you made or missed the last couple of shots is itself a very noisy measure of your ‘hotness,’ so estimates of the hot hand based on these correlations are themselves strongly attenuated toward zero.” If most of the allegedly hot players we study were really just lucky, and thus quickly regress to their true mean, the elevated performance by the subset of truly ‘hot’ players will be masked in the data. I take your point. Nonetheless, given the absence of observed momentum in games, one of two things must still be true: A) the hot hand effect is large but rare (your hypothesis), or B) the hot hand effect is small but perhaps frequent. This may be an important distinction for some analytic purpose, but from my perspective the two possibilities are effectively the same thing: the game impact (I won’t say ‘effect’) of the hot hand is quite small. By “small” I mean both that the hot hand likely has a negligible impact on game outcomes, and that teams and athletes should largely ignore the hot hand in making strategic decisions.

And since the actual impact on games is quite small *even if* your hypothesis is correct (because true hotness is rare), it follows that belief in a strong hot hand by players or fans still represents a kind of cognitive failure. The hot hand fallacy held by most fans, at least in my experience, is not that a very few (and unknowable) players sometimes get very hot, but rather that nearly all athletes sometimes get hot, and we can see this from their performance on the field/court.

(An important caveat: IF it proved possible to identify “true” hot hands in real time, or even to identify specific athletes who consistently exhibit true hot hand behavior, then my argument fails and the hot hand might have legitimate strategic implications. But I have not seen evidence that anyone knows how to do this in any sport.)

The second major objection made to the empirical studies is that the hot hand is disguised as a result of player’s very knowledge of it. As Miller and Sanjurjo suggest, “the myriad confounds present in games actually make it impossible to identify or rule out the existence of a hot hand effect with in-game shooting data, despite the rich nature of modern data sets.” Two main confounds are usually cited: hot players will take more difficult shots, and opposing athletes will deploy additional resources to combat the hot player. Some have argued (including Miller and Sanjurjo) that these factors are so strong that we must ignore real game data in favor of experimental data. But I think it is a mistake to dismiss the game data, for three reasons:

  • The theoretical possibility that players’ shot selections and defensive responses could perfectly – and with astonishing consistency – mask the true hot hand effect is only a possibility. Before we dismiss a large body of inconvenient studies, I’d argue that hot hand believers need to demonstrate that these confounds regularly operate at the necessary scale, not just assume it.
  • A sophisticated effort to control for shot selection and defensive attention to hot basketball shooters concludes that the remaining hot hand effect is quite modest. Conversely, as far as I know no one has shown empirically that the enhanced confidence of hot players and/or opponents’ defensive responses can account for the lack of observed momentum in a sport.
  • Efforts to detect a hot hand effect in baseball have invariably failed. And that’s important, because in baseball the players cannot choose to take on more arduous tasks when they feel “hot,” and opposing players have virtually no ability to redistribute defensive resources in a way that disadvantages players perceived to be hot. So even if you reject the Sloan study and think confounds explain the lack of momentum in basketball, they cannot explain what we observe in baseball.

I would also note that this “confounds” objection is in fact a strong argument *in favor* of the notion that the hot hand is a cognitive failure, given your argument that in-game streaks are a very poor marker of true hotness. If the latter is true, then it would still be a cognitive error for a player or his opponents to act on this usually-false indicator of enhanced short-term talent. If players on a streak take more difficult shots, they are wrong to do so, and teams that change defensive alignments in response are also making a mistake.

So, these are the reasons I remain unpersuaded that I should believe in a hot hand in the wild, or even consider it an open question. That leaves us, finally, with the experimental data that some feel should be privileged as evidence. I haven’t read enough of the experimental research to form any view on its quality or validity. But for answering the question of whether belief in the hot hand is a fallacy, I don’t see how the results of these experiments much matter. Fans and athletes believe they see the hot hand in real games. If a pitcher has retired the last nine batters he faced, many fans (and managers!) believe he is more likely than usual to get the next batter out. If a batter has 10 hits in his last 20 at bats, fans believe he is “on a tear” and more likely to be successful in his 21st at bat (and his manager is more likely to keep him in the lineup). But we know these beliefs are wrong.

Even if experiments do demonstrate real momentum for some repetitive athletic tasks in controlled settings, this would not challenge either of my contentions: that the hot hand has a negligible impact on competitive sports outcomes, and fans’ belief in the hot hand (in real games) is a cognitive error. Personally, I find it easy to believe that humans may get into (and out of) a rhythm for some extremely repetitive tasks – like shooting a large number of 3-point baskets. Perhaps this kind of “muscle memory” momentum exists, and is revealed in controlled experiments. But it seems to me that those conducting such studies have ranged far from the original topic of a hot hand in competitive sports — indeed, I’m not sure it is even in sight.

I don’t know that I have anything new to say after a few zillion exchanges in blog comments, but I wanted to put Guy’s reasoning out there, because (a) he expresses it well, and (b) he’s arguing that I’m a bit off in my interpretation of the data, and that’s something I should share with you.

The only thing I will comment on in Guy’s above post is that I do think baseball is different, because a hitter can face different pitches every time he comes to the plate.  So it’s not quite like basketball where the task is the same every time.

P.S. Yeah, yeah, I know, it seems at times that this blog is on an endless loop of power pose, pizzagate, and the hot hand. Really, though, we do talk about other things! See here, for example. Or here. Or here, here, here.

P.P.S. Josh (coauthor with Sanjurjo of those hot hand papers) responds in the comments. Lots of good discussion here.

87 thoughts on “Molyneux expresses skepticism on hot hand

  1. Guy’s argument conflates an increased within player probability of making the next shot with external dynamic changes to the local system that can reduce this probability. Those are two different phenomenon and conceptually muddled with the arguments about “belief” in a hot hand and “real game” data.

    1. If an increased probability can be shown in a more controlled setting, then the phenomenon exists (e.g., in a shooting competition).
    2. The ability to alter an effect with interventions that nullify this within player change is not evidence of a lack of effect.

    • Just to summarize the discussion Andrew refers to in the P.P.S.

      First, the tl;dr version: What the commenters have shown in the discussion below is that the three central assertions of the author of this blog post (first paragraph) are invalid, as stated. Further, the phrasing and tone used by the author throughout the post may mislead the average reader into thinking there is controversy on issues that are settled. Importantly, the average reader will likely not know that results of previous analyses are cited incorrectly and do not support the argument.

      While these misleading and incorrect assertions have served to obfuscate the issues, I make no claim that this was done deliberately. Rather the problematic aspects of this post appear to reflect an unawareness of some of the details (and results) of previous studies by someone who hasn’t read all the relevant papers carefully, and hasn’t performed the kinds of testing (e.g. simulations) one needs to do in order understand what these analyses can and cannot say. I am not saying this to fault the author. This post was made in order to facilitate discussion, and no one here is under any obligation to be active researcher in this area to join in.

      Importantly, the author’s blog post does not challenge the following 4 facts, while a few statements at the end of the post are (implicitly) at odds with these facts (partially explained in this recent general interest write-up of our work):
      (1) The main results from landmark hot hand study by Gilovich, Vallone and Tversky [GVT](1985) are invalid.
      (2) The re-analysis of the data from what GVT considered to be their critical test of hot hand shooting (Study 4, with Cornell players) reveals statistically significant evidence of a hot hand effect, with effect size estimates that are meaningfully large, as shown in our paper. This contrasts sharply with the author’s assertion that “conducting such studies have ranged far from the original topic of a hot hand in competitive sports — indeed, I’m not sure it is even in sight.”
      (3) Evidence for the hot hand has replicated in every extant (and available) study of hot hand shooting that eliminates defensive confounds entirely. This includes other controlled shooting studies as in GVT, as well as the NBA’s Three point shooting contest. Importantly, these shooting tasks have similar features to the shooting performed in games, and do not all involve simple repetitive muscle memory/calibration mechanisms (some involve moving around after each shot, incentives, crowds, and NBA players). This contrasts with the author’s assertion that data from these controlled studies don’t “much matter.”
      (4) On its own, this constitutes evidence that the hot hand is neither a myth nor a cognitive illusion. The belief in the hot hand can be justified. But we can go further. GVT concluded that when people feel like a player is more likely to make a shot, they are wrong, and the data from their betting task shows that they bet correctly at chance rates. This conclusion is incorrect. We have re-analyzed the data and shown that their bettors bet correctly at rates substantially better that chance would predict. In addition, in another study, we have some evidence that players have a reasonably accurate sense of which of their teammates tend to get hot.

      The three main claims made by the author of this post are:
      (1) There is powerful evidence against a meaningful hot hand effects in games
      (2) The evidence should create a strong presumption that the hot hand is at most a small factor in games
      (3) People’s belief in the hot hand can reasonably considered a kind of cognitive error

      These claims cannot *currently* be backed-up with evidence or analysis.

      For (1), there is *zero* evidence against a meaningful hot hand effects in games. Previous studies of game data were cited as evidence that the hot hand is small, but as discussed in more detail below, a simple simulation (e.g. Daniel Lakeland’s code , or our code, or a modification) shows that this evidence is not informative with regards to the size of the hot hand effect in games. There could be a meaningfully large and meaningfully frequent hot hand, and yet the methods employed by previous studies would never pick this up.

      For (2), there is no evidence to support the presumption that the hot hand is at most a small factor in games. Previous studies are not inconsistent with the presence of a hot hand that is frequent enough, and strong enough, so as to be a meaningful factor in games.

      For (3), given the *current* formal evidence, there simply is not enough evidence to consider people’s beliefs in the hot hand to generally be in error.

      Further detail can be found in the discussion below.

      It is important to emphasize that I am only characterizing the assertions as stated above by the author. The discussion below has brought out additional interesting nuance, and plenty of room for future study—especially with regard to a modified version of (3). In particular, based on the way people talk about the hot hand, it seems likely that many players and coaches will often over-react to the signals they believe are indicative of hot hand shooting (some may under-react). If the original study had showed something like this, it would not have come as surprise practitioners, they already believe that themselves. This is an important issue, both psychologically, and with regard to decision making in games. This issue may have been at risk of getting lost in all the recent discussion surrounding the hot hand. I suspect the motivation for the author’s post mostly reflects the concern that people may get the wrong message from our work, and are at risk of getting carried away with their hot hand beliefs. While it is difficult to check the *operational* beliefs of player’s and coaches are well-calibrated, this is nevertheless a valid concern.

      • I fear we have a bit of a culture clash here, between academic and non-academic perspectives. I don’t say this in order to assume a faux populist stance and dismiss “merely academic concerns” — the issues Josh pursues are no less valid than mine — but just to acknowledge that people bring very different assumptions about what’s important to this discussion. Consequently, I am largely asking different questions than Josh (and some other commenters) is choosing to answer. Two related examples seem most important:

        1a. Is the hot hand a myth?, vs.
        1b. Does the hot hand matter, practically speaking, in athletic competition?
        (Commenter MJT identified this important difference early in the discussion)

        2a. If the belief in a hot hand effect false?
        2b. Are people’s perceptions of the hot hand correct (to any significant degree)?
        (For the record, my own answers are no, no, no, no.)

        Obviously, asking different questions will complicate a discussion and lead to different answers. It also flips the burden of proof. Several commenters have observed that I can’t prove the absence of a hot hand effect. That’s true (I also haven’t tried), and that may be a legitimate rebuttal to Gilovich et. al. (who made overly strong claims). But on the question of whether the hot hand effect has any meaningful game impact, or should be considered when making strategic game decisions, why is it the obligation of hot hand skeptics to prove the negative? Step back and consider the big picture: Hot hand believers acknowledge that they have no idea how often this happens, the magnitude of the effect, or how long it lasts, and they have only very limited, experimental evidence suggesting anyone can identify the hot hand in a way that makes the information actionable. In that context, why would the burden of proof fall on hot hand skeptics to prove it *isn’t* important?

        I happily acknowledge the theoretical possibility the hot hand will someday prove to be important in “real world” terms. But someone needs to show me evidence that the hot hand impacts game outcomes in a material way — or that teams, athletes, or betters can benefit from acting on belief/knowledge of a hot hand — before I will change my “prior” of skepticism.

        Asking different questions also bears on the question of the value of game data vs. experimental data. As Dana Carvey used to say, it is very conveeenient for a hot hand advocate like Josh to privilege experimental data while largely ignoring game data. Still, that may be a valid approach if one is trying simply to prove that a non-zero hot hand effect exists for some set of discrete athletic skills. But if you want to demonstrate that the hot hand really matters in organized sports, you need to show that’s true under game conditions. And if you want to argue that sports fans’ perception of the hot hand bears any resemblance at all to reality, then again you need to engage with game data, because fans’ beliefs are about (and influenced by) what happens on the court, not in the laboratory. (And honestly, I can save researchers a lot of time here: if you just listen to sports radio for a couple of hours, or read some comments on a sports website, you will quickly conclude that fans routinely and wildly exaggerate the predictive significance of streaks in sports.)

        It’s interesting to consider how the hot hand’s revival in recent years will be looked back on in future years. Perhaps it will be seen as an important intellectual breakthrough, when the myth of the hot hand’s own mythness was overthrown. Maybe this is just the dawn of a period of major hot hand discoveries. If so, at some point I’d expect NBA teams to start knocking on Josh’s door and trying to hire him, because the commercial value of detecting a hot hand would be vast (especially in comparison to academic salaries). And if so, I will gladly congratulate Josh and others who led the way (I mean that seriously, not ironically).

        But I fear the story ends differently. My guess is people will look back at the current “hot hand is real after all” fad with some bewilderment, if not embarrassment: “Remember that time when people started believing in the hot hand again. Wasn’t that odd?” A few will recall that the exposure of Gilovich’s mathematical errors fed a large over-correction, based almost entirely on data from small, controlled experiments. A consensus will have formed that although there are infrequent and hard-to-detect hot hand effects, the gulf between the reality and the public perception is so large that the hot hand should still be considered a cognitive failure, and teams and athletes are strategically much better off ignoring the hot hand. Some may even point to the 2010s hot hand intellectual boomlet as further evidence for the power of the hot hand illusion: “even some mathematicians joined in!” And who knows, maybe the whole cycle will be repeated again in the 2040s or 2050s — our impulse to ascribe meaning to streaks is very powerful indeed.

        • Guy:

          Its more than a culture clash.

          You have shifted the goal posts, and are now making different and much weaker points than the ones you made in the post above. That would fine as it is if you would be responsible and clear that this is what you are doing, while also acknowledging the numerous incorrect statements you made in your original post and in the discussion below.

          You haven’t done this, and you haven’t introduced any new facts or analyses, instead you have gone for the straw-man set-ups that are easy to mock, while sneaking in some of your conclusions that have already been discredited below. This is again misleading to a casual reader. Because of this I have to assume that I am writing to someone who is not discussing matters in good faith and who will have no problem muddying the waters for the sake of rhetorical advantage. This is a sad conclusion to come to, but it is consistent with your approach in 2015 in the comments section of Andrew Gelman’s Guess What? blog post, where it seemed like you were looking for holes everywhere. You even made the claim that we made an error in calculating the bias, which you eventually acknowledged (after a lot of work). This attitude of throwing everything on the wall and seeing what sticks is lazy, and puts the time burden on us to explain everyting to you in order to clarify so that others dont get the wrong message from your over-confident and incorrect statements. Further, in this post you are coming with the same facts and evidence that you came with in our discussion in 2015, and I feel like I am repeating myself again and again, because you never aknolwedge the issue. At some point this starts looking like troll behavior.

          Because of this I will respond as briefly as possible, because we have already spent enough time going down these blind alleys.

          You say:

          But on the question of whether the hot hand effect has any meaningful game impact, or should be considered when making strategic game decisions, why is it the obligation of hot hand skeptics to prove the negative?

          No one asked you to prove the negative. In case you have forgotten, in the post you said that there is “powerful evidence against a meaningful hot hand effect”.

          This is false statement, full stop. It would be better if you acknowledged this, rather than dodge it.

          Hot hand believers acknowledge that they have no idea how often this happens, the magnitude of the effect, or how long it lasts, and they have only very limited, experimental evidence suggesting anyone can identify the hot hand in a way that makes the information actionable.

          First, this isn’t about forcing you to believe that hot hand, it is about putting the evidence out there and letting people come to their own conclusions, not obfuscating the issues as you have done by citing evidence that does not support your claims. Second, you overlooked practitioner beliefs, i.e. experiential evidence. It doesn’t make sense to dismiss this entirely. All you have as researcher is 0s and 1s to look at, they have a much richer experience. That doesn’t make them right, but have some humility before you say “people’s belief in the hot hand is reasonably considered a kind of cognitive error.”

          You say:

          As Dana Carvey used to say, it is very conveeenient for a hot hand advocate like Josh to privilege experimental data while largely ignoring game data.

          1. Its not about advocating and arguing that game data should be ignored in forming your beliefs, instead its about *demonstrating* that it must be ignored by showing that the current measurements simply can’t answer this. Please do the work and go through the simulations so you can convince yourself of this. You have dodged this point 2-3 times already. Look, in theory, all else equal, game data is great, and better than other data. In practice, it just doesn’t answer question one way or another.

          2. I never argued the experimental data, and data from the NBA’s Three Point contest should be privileged over game data, in principle. I only argued that there is information to be had there, that’s it.

          You say:

          And honestly, I can save researchers a lot of time here: if you just listen to sports radio for a couple of hours, or read some comments on a sports website, you will quickly conclude that fans routinely and wildly exaggerate the predictive significance of streaks in sports.

          I made this point already above, so you aren’t saying anything new here, and you aren’t somehow doing us lowly researchers a favor by pointing this out. If you want to do a formal study of how fans utterances—which are made based on diverse motives, and are often not belief driven—don’t match reality, go ahead. Its not that important. Remember coaches and players, who have experience with their team and the game, have a lot more info. My point is that you should have some humility and respect that they may be “measuring” things you can’t measure in the data. You don’t have to accept their conclusions, but your default mode here to dismiss it out of hand is just puzzling.

          You say:

          Maybe this is just the dawn of a period of major hot hand discoveries. If so, at some point I’d expect NBA teams to start knocking on Josh’s door and trying to hire him, because the commercial value of detecting a hot hand would be vast (especially in comparison to academic salaries).

          This is rhetorical device is pretty dirty man. We have said time and again that there is no evidence for or against the ability of players to detect the hot hand in real time. Importantly, if you had been following the measurement error story, which I have re-told too many times, for most models of the hot hand (non-feedback models) there is no way you would ever detect it with hit/miss data. Maybe someday with improved biometric data this will be possible, who knows!

          You say:

          But I fear the story ends differently. My guess is people will look back at the current “hot hand is real after all” fad with some bewilderment, if not embarrassment: “Remember that time when people started believing in the hot hand again. Wasn’t that odd?” A few will recall that the exposure of Gilovich’s mathematical errors fed a large over-correction, based almost entirely on data from small, controlled experiments. A consensus will have formed that although there are infrequent and hard-to-detect hot hand effects, the gulf between the reality and the public perception is so large that the hot hand should still be considered a cognitive failure, and teams and athletes are strategically much better off ignoring the hot hand. Some may even point… yada yada yada

          Let’s distinguish the fad from the substance. There is plenty of fad out there, we can’t control what reporters and fans say. Our recent general interest write-up tries to avoid this, though some of the syndicated headlines were changed without our permission (it was creative commons).

          If your statements are meant as a characterization of what has come out of recent research, rather than what is in the popular press, then they are absurd. You are ignoring many of the important contributions that have been brought up recently, along with the bias. There is much more if you would bother to read more than 2-4 pages from our papers, as you have already admitted is too much for you to do in the comments below. You wouldn’t be able to say what you are saying with a straight face if you would read a bit more, and do some hard work, rather than make nicely arranged sentences with almost zero content.

          On: “But I fear the story ends differently.” That’s your prediction, based on what evidence? I had no idea. Anyway good on you. I’d wager a lot that with better biometric data, especially if we advances to real-time wearable brain monitoring, we will at some point be able to know when a player’s probability of success is higher than usual. At that point the explanations for why people’s ability varies across time will no longer be magical. Of course this is all speculative, but since you’d rather engage in speculation, it is important to note that your speculation are not privileged.

  2. The thing is, as we said last time, the binomial Hit/Miss data is very very poor at detecting time varying effects. So the statement “I do think you underestimate the power of the evidence against a meaningful hot hand effect in sports” is just a wrong idea if you’re basing it on the lack of ability to detect the hot hand in hit/miss game data.

    In fact, what’s going on is people are taking measurements that have almost no hope of detecting an effect, and using the lack of detection as evidence that the effect doesn’t exist or is small.

    If you use hit-miss data to try to detect something like a 10 or 20 percentage point increase in hit probability, you’re gonna need something like a few thousand attempts to detect it reliably. But, the hypothesis is that the effect lasts maybe half a game, or on the order of 10 to 50 shots. If you gain confidence and feel smooth and in control for an hour and it takes your hit probability from 30% to 50% for 20 shots in a row, this is a substantial effect on your team, but it’s totally undetectable because the run is only 20 shots and the concept of a binomial distribution is an asymptotic one (ie. what is the *long run* frequency) and so it acts like a low-pass filter that is only sensitive to very slow changes in hit probability.

    It’s like a tele-seismometer with a bandpass filter for 0.1 to 1 Hz. If a small localized event occurs with energy in the range 10 to 30 Hz, you can’t use lack of detection to argue that the event didn’t occur.

    • To quote:

      “… this would not challenge either of my contentions: that the hot hand has a negligible impact on competitive sports outcomes…

      …Perhaps this kind of “muscle memory” momentum exists, and is revealed in controlled experiments. But it seems to me that those conducting such studies have ranged far from the original topic of a hot hand in competitive sports — indeed, I’m not sure it is even in sight.”

      From my perspective, it is Guy who has ‘ranged’ away from the original question of the hot hand.

      I simply frame the question at hand as ‘does the hot hand exist?’
      Where as Guy just posed (eloquent or not) a different question of ‘Does the hot hand impact the outcome of the game?’

      Guy’s new question is a higher order. Guy does seem to buy in on the simpler lower order question of ‘does the hot hand exist?’ with a Yes.

      The Yes, in agreement with many, is a ‘Yes, but it’s weak.’

      • It’s not really possible that the hot hand effect, however defined, could be 0.0000000000000000000%. It could be positive or it could be negative but it can’t be identically zero. It’s even possible — indeed, very possible — that it could be positive by some definitions and negative by others. For instance, suppose a performer can tell that they’re ‘hot’ in the sense that, conditional on the difficulty of the shot, they are more likely to make it than they usually are…but suppose they overestimate the magnitude of this effect, so they take more difficult shots. In such a case, their conditional probability of success will be higher than usual but their success rate will be lower than usual. By one definition, they’re hot; by another, they’re cold.

        To the extent that we can tell, it seems that in many sports there is a weak positive ‘hot hand’. But that’s interesting: it had to be either positive or negative, after all. To be interesting, the hot hand has to be of practical significance.

    • “If you use hit-miss data to try to detect something like a 10 or 20 percentage point increase in hit probability, you’re gonna need something like a few thousand attempts to detect it reliably.”

      GS: Let me ask this again here: if you look across a large number of games, within an individual, is there any evidence whatsoever that hits/opportunity increases (or is other than flat) as a function of streak length? This should probably be looked at over periods of time in which overall accuracy/game is stable (no systematic increases or decreases across games) or approximately so. For players around a long time, do arbitrarily selected segments, consisting of many games, more-or-less consistently show a non-flat hits/opportunity X streak length function?

        • So…your answer is “no.” But you claimed that the sinusoid could be detected with ~1000 instances. Anyway, I’m talking about rather large samples from individual subjects over years. Has anybody ever done that? Relatedly, are you suggesting in another post (probably a couple posts) that the sole reason the HH is thought to exist (or “exist” as the case may be)is that people claim to observe (or “observe” as the case may be) “it” in the first- and third-person?

  3. I have several observations, not necessarily coherent.

    1. If it’s so hard to establish cases of hot handedness even after-the-fact, then during a game it must be essentially impossible. Therefore if a team adjusts its tactics to counter a perceived hot hand, it must be usually doing so based on a cognitive illusion, whether or not hot handedness actually exists. It seems to me that this should lead to less effective tactics during a game.

    2. Based on my own experience in baseball and ping-pong, I would claim that when a player experiences a sensation of hot-handedness (and I certainly did), he feels a more tangible connection between intent and outcome – I used to feel a real, nearly physical connection to where the ball would go when I felt hot, and less so when I didn’t. This definitely affected my own tactics. It may have been an illusion, but I played differently because of it.

    3. I suggest that ping-pong would be a very good sport for studying the hot hand. It has way more plays per minute than any other I can think of, and only one other player to respond to perceived hot hands. So more data and fewer confounds all in a very confined arena.

    • “If it’s so hard to establish cases of hot handedness even after-the-fact, then during a game it must be essentially impossible. ”

      No, because the instrument we’re using after the fact (hit/miss data) is essentially blind to what is really going on. It’s like saying because a dancer doesn’t break his ankles doing high leaps it’s impossible to detect when one dancer is more graceful than another.

      If you *measure gracefulness* you could detect it easily, but you don’t have that measurement so gracefulness must not exist.

      The same issue occurs all over science, and this is why the hot hand is important and gets so many blog comments (from me at least).

      Here are some examples:

      1) Since the productivity of a stay at home spouse isn’t compensated by accounting level dollar transactions, it doesn’t contribute to GDP so it must not exist.

      2) Since the effect of (chronic lower back pain / allergies / depression / arthritis …) isn’t detectable by surveys on lost work productivity it must not exist.

      3) Since people who smoke cigarettes die earlier, they have lower end of life costs, so smoking is good for society (here we’re detecting one thing and not another, a fallacy of a one-sided-bet)

      etc.

      Lack of detection with careful measurement implies an effect doesn’t exist. Lack of detection with measurements that can’t possibly do a good job of detecting the thing of interest…. implies nothing.

      if p(Detect | Exists) ~ 0 and p(Detect | NotExists) ~ 0 then p(NotDetect) ~ 1 and P(Exists) = p(Exists | Not Detect) p(NotDetect) + p(Exists | Detect) p(Detect) = p(Exists | Not Detect) = p(Not Detect | Exists) p(Exists)/P(Not Detect) = p(Exists)

      not detection doesn’t change your prior on Existence.

      • Daniel – “Now if a coach/player could see these hidden variables, it would be in their interests to use this information. Can some coaches/players see these hidden variables and anticipate the hot hand?”

        It’s a nice point. It seems to me, though, that if players and coaches can detect a hot hand in progress during a game, that researchers should be able to do so after the fact. There’s so reason to limit the methods to just applying the binomial distribution, for example.

        OTOH, there’s no doubt that players and coaches *think* that they see or have a hot hand from time to time. gdanning related (below) that tale of watching Dave Kingman, and it at least appeared that the pitcher/catcher took his performance that day into account.

        So maybe the research question should be reframed as “When a team changes its tactics because of a perceived hothand, does that improve its chance of winning?

        • My understanding from the last time we discussed this a week back or so is that Joshua Miller’s paper discusses using the information of team-mates betting on whether the player in a shooting contest will do well shows that even with the noisy data of binomial sequences, using the teammates bet as information improves the prediction in a detectable way. But I have to admit that I haven’t read the paper, i’m relying on blog comments about it.

          I really have no interest in the hot hand per-se, what I really care about is the incorrect reasoning that you see all over the place where some analysis looks for something, has no hope of finding it, doesn’t find it, and then claims proof that the thing doesn’t exist. This is pervasive, and it’s made worse by the error of saying “NHST failed to reject the null, therefore the null is true” or even “therefore we act as if the null is true”

          this is actually the (usually inadvertent) application of a VERY STRONG prior p(Null) ~ 1 which is rarely justified. Combine with poor NHST based decision making and you have a recipe for bad policy, bad science, bad medicine, etc.

        • Dear Tom & Daniel

          Can coaches & players “detect a hot hand in progress during a game”?

          We re-analyzed GVT’s Study 4 betting data and found that $latex \hat{p}(\text{hit}|\text{bet hit})-\hat{p}(\text{hit}|\text{bet miss})=0.077$, i.e. shooters were 7.7 p.p. more likely to make a shot when a bettor bet on the shot. This is not negligible. Does this show that bettors are seeing hidden state variables? Maybe. It could also be that bettors bet heuristically, e.g. for streaks to continue, and this happens to do well on GVT’s data because streaks are more likely to continue in GVT’s data.

          Our data reveal that players know which of their teammates tend to get hot, which is useful in principle for detecting the hot hand in a game situation, but certainly not sufficient. If one wanted to test for the existence of a skill in detecting the hot hand in real time, you’d need the cooperation of experienced coaches and players, because as we know, if the skill exists it must use more than binary hit/miss data.

          “When a team changes its tactics because of a perceived hot hand, does that improve its chance of winning?

          This surely depends. As spectator, it seems like being more aggressive and physical with Curry when he appears to be flowing can be effective for the defense — disrupting the rhythm of a shooter is important. This may open up things for other players on the offense, of course.

          Offensive adjustments are less clear. When the defensive plays you more aggressively, you should be more willing to pass, which players do, but you still need to shoot to keep the defense honest, even if the shot opportunities are not as good. This is not obviously a mistake. Players and coaches know that the offense can get carried away, but the underlying reason for this may not be purely driven by beliefs, for example, a shooter can selfishly exploit the heat check to get more shots, or, to display his dominance.

        • Josh Miller: “We re-analyzed GVT’s Study 4 betting data and found that \hat{p}(\text{hit}|\text{bet hit})-\hat{p}(\text{hit}|\text{bet miss})=0.077, i.e. shooters were 7.7 p.p. more likely to make a shot when a bettor bet on the shot. This is not negligible.”

          We can’t tell if it’s negligible or not until we know the uncertainty in the estimate …

        • Tom-

          sorry when I said 7.7 p.p. was “not negligible” I meant in the magnitude sense, I did not mean “not negligibly different” from zero (I took that for granted, p<.001 S.E.= 1.8). For a reference <a href=" our paper“see p.24-25.

  4. A concrete example of both “baseball is different” and “the hot hand can be disguised by players’ response thereto”: When I was a lad, I saw Dave Kingman hit 3 homers in one game versus the Dodgers. When Kingman came to bat in the 9th with a chance to tie the record with 4 HRs in one game, the Dodgers’ Al Downing (a crafty veteran at that point if ever there was one) threw Kingman nothing but curveballs in the dirt, and he struck out. Had he not hit 3 HRs, Downing might well have pitched him differently, and Kingman might not have swung at bad ptiches.

  5. Hi Guy

    Interesting thoughts, sorry if I don’t have time to write a complete response or avoid typos, as I am traveling.
    First, let’s be clear where we were circa-2015 in terms of the consensus. We have a recent write-up which covers some of this. You have quotes in major psychology texts like this, and this consensus lead to papers like this . The hot hand was a “cognitive illusion” because people (a) believed in the hot hand, and (b) it didn’t exist. There was no evidence that players over-react to the hot hand, or any mistake like that. If the original paper would have been about players sometimes over-reacting to the hot hand, it would not have been such a surprising result. Players believe that themselves.
    Now on to your points
    First on the hot hand
    I don’t understand why you think “given the absence of observed momentum in games, one of two things must still be true: A) the hot hand effect is large but rare (your hypothesis), or B) the hot hand effect is small but perhaps frequent.”
    The main sources of attenuation in games are three-fold:
    1) heterogeneity of response – e.g. mixing shots from streak shooters with non-streak shooters
    2) measurement error – weakness of using *only* binary data as a signal
    3) strategic confounds – e.g. *costly* defensive adjustments.
    If you try really hard, and you have really nice game data, you can pick up a signal of hot hand despite all the noise (3), and (1) and (2). That is what Andrew Bocskocsky, John Ezekowitz, and Carolyn Stein were able to do. You cite their paper as if it is intended to be an effect size estimate, rather than a signal detection. Simulations are useful here. If you play with our code here, which eliminates (3), and you will see you can have both versions of the hot hand in games, but the effect sizes estimates using the “Complex heat” approach of Bocskocsky, Ezekowitz & Stein will say little on this. If you spike that code with within-player variation in shot location, it’s even worse. What you get with that code is *better* than any data you would get from games.

    Now, Should we really restrict ourselves to analyses of game data when informing our beliefs about the hot hand in game data? Aren’t we trying to be scientists here? Why can we not perform inductive inference? Gilovich, Vallone and Tversky certainly thought that we could, and considered that data their critical test. Notice we have: (1) a strong estimated effect of a prior streak on subsequent success in relatively controlled settings, and (2) there is some evidence regarding underlying mechanisms, e.g. variation in motor control (steadiness & coordination) which, btw, can be infuenced by amphetamines (and likely by their endogenous analogues), self-efficacy, flow. This should inform our beliefs.

    Beliefs
    Now if a coach/player could see these hidden variables, it would be in their interests to use this information. Can some coaches/players see these hidden variables and anticipate the hot hand?
    We don’t have definitive to answers to this. Gilovich, Vallone and Tversky concluded that while players believe they can see when they or another players is more likely to make a shot, if you look at the data, they can’t. This is the second main result of GVT (Cornell shooter prediction task in Study 4). This is also wrong, as we have shown in our paper (last year’s revision). Players can predict shot outcomes successfully at rates significantly better than chance. We have evidence in another paper that players know who has a tendency to get the hot hand, and this predicts performance out of sample. This does not mean players can detect the hot hand effect in real-time, or respond proportionately, but at least it shows there is some basis in their beliefs.
    We have mentioned elsewhere that the low signal value of binary data means anyone who uses binary data *alone* as a signal of hot hand—whether it be an econometrician, fan, player or coach—is making a mistake. This on its own shows that peoples intuitions about binary data is wrong, which is an important point in the spirit of the heuristics and biases program.

    The plain and simple fact is that the central results of Gilovich, Vallone and Tversky 1985 do not hold up. This doesn’t mean the spirit of what they were doing was wrong. GVT’s project shouldn’t have been about the mis-perception of random sequences, or about peoples’ inability to predict. Instead it should have been about how people over-interpret weak binary signals, how people are often unmoored from their priors while interpreting recent information, and how people over-react. It seems natural put this is in the general category of over-inference, using the nomenclature of Matthew Rabin. This is the important, generalizable, psychologically relevant point. Based on my personal experience it is almost certainly true that people often over-react to information (sometimes they under-react, e.g. discounting polls on Donald Trump). NBA players and coaches believe that people over-react as well, as mentioned about, this conclusion would not have been a surprise to them, and the study would not have made the same splash. This is not meant to minimize the importance of over-inference. A (hypothetically) well-done study with a clean result should not have to be surprising for it to be interesting and consequential for decision making more generally. In my view this is the road that should have been taken.(To be fair to Tversky and Gilovich, they were careful, at least initially, to limit their no hot hand result to basketball [they don’t do this in the original paper, but they do it in the 1989 Chance paper]). Further, basketball is not the ideal domain to study this, as over-inference has nothing to do with basketball per se. Instead, a good study would find a domain with less noise, and it would involve different measures and different tests.

    • Following up and what I didn’t get a chance to respond to, and to be a bit more clear. Some of your points are definitely valid, while others do not appear to be supported by evidence, or analysis.

      Across a wide range of sports, players’ outcomes are just slightly more “streaky” than we’d expect from random chance, implying that momentum is at most a weak effect (and even some of that streakiness is accounted for by player health, which is not a true “hot hand”).

      I don’t see why this follows. As the simple simulation code we have shows, the former does not imply the latter.

      And since the actual impact on games is quite small *even if* your hypothesis is correct (because true hotness is rare), it follows that belief in a strong hot hand by players or fans still represents a kind of cognitive failure. The hot hand fallacy held by most fans, at least in my experience, is not that a very few (and unknowable) players sometimes get very hot, but rather that nearly all athletes sometimes get hot, and we can see this from their performance on the field/court.

      (1) Impact is not a linear function of hotness. Going from zero to non-negligible can make a big difference at the margins, even if rare; (2) I am not sure what you mean by rare—in our simulation that I linked to above, there are big hot hands on 15% of shots for some players, and it doesn’t show up with standard analyses of the data. In principle, if these states could be identified even a fraction of the time they occur, it could have a large impact. This is not to say fans would be good at it, or even that most players would be good at. But who the heck knows.

      (An important caveat: IF it proved possible to identify “true” hot hands in real time, or even to identify specific athletes who consistently exhibit true hot hand behavior, then my argument fails and the hot hand might have legitimate strategic implications. But I have not seen evidence that anyone knows how to do this in any sport.)

      I haven’t seen any unambiguous evidence of real-time detection, but as I mention in this comment, there is evidence that bettors can bet successfully at rates better than chance. Also, in that same link, I reference some evidence that players can identify specific athletes that tend to get the hot hand.

      The theoretical possibility that players’ shot selections and defensive responses could perfectly – and with astonishing consistency – mask the true hot hand effect is only a possibility. Before we dismiss a large body of inconvenient studies, I’d argue that hot hand believers need to demonstrate that these confounds regularly operate at the necessary scale, not just assume it.

      The confounds do not need to operate at a large scale to mask this in game data, as the simple simulation code we have shows. This code is easy to modify, and you can easily demonstrate to yourself that it doesn’t take much to completely mask the hot hand. There is nothing particularly innovative here.

      A sophisticated effort to control for shot selection and defensive attention to hot basketball shooters concludes that the remaining hot hand effect is quite modest. Conversely, as far as I know no one has shown empirically that the enhanced confidence of hot players and/or opponents’ defensive responses can account for the lack of observed momentum in a sport.

      The first part of this is addressed in this comment. The second part I don’t quite understand.

      I would also note that this “confounds” objection is in fact a strong argument *in favor* of the notion that the hot hand is a cognitive failure, given your argument that in-game streaks are a very poor marker of true hotness. If the latter is true, then it would still be a cognitive error for a player or his opponents to act on this usually-false indicator of enhanced short-term talent.

      First, in agreement, the models of the hot hand with a hidden parameter—e.g. regime change models that we investigate (here and here) or the time-varying continuous parameter models that Andrew Gelman and Daniel Lakeland investigate—recent success is not that diagnostic of a player’s current state. This has not been appreciated by people who study the hot hand, and Daniel Stone was the first person to point this out, in the context of autocorrelation estimates. Researchers were making a cognitive error on this one, whether they be Psychologists, Economists, or Sports Analytics folks. If this is the true model of the hot hand, then any fan, coach, or player who uses shot outcomes *alone* is also making a mistake. This is an important point, but we don’t know what the true model is! Whether or not streaks are *always* a poor indicator of current success probability (holding everything else constant), depends on the model. In a positive feedback model, if you can control for shot difficulty *perfectly*, its not so bad. The reality is probably a mix of both, and combine that with (1) & (3), and you see the issue.

      If players on a streak take more difficult shots, they are wrong to do so, and teams that change defensive alignments in response are also making a mistake.

      This is not obviously a mistake. When the defensive plays you more aggressively, you should be more willing to pass all else equal, but you still need to shoot to keep the defense honest, even if the shot opportunities are not as good.

      So, these are the reasons I remain unpersuaded that I should believe in a hot hand in the wild, or even consider it an open question.

      Per the comments above, this simply doesn’t follow. Of course it’s an open question! I really have difficult time understanding your closed attitude. As you can verify for yourself with Andrew Gelman’s code (can’t find it), or Daniel Lakeland’s code, or our code , the question happens to be a hopeless one if you tie your hands and either: (1) refuse to inform your beliefs based on studies in which shots are taken without defense (these tasks are not always repetitive), or (2) don’t acquire more granular measures of shot outcomes, such as distance from the rim, or more granular measures of a player’s current state, such as expert observers who have experience with a player (or bio-measures).

      That leaves us, finally, with the experimental data that some feel should be privileged as evidence. I haven’t read enough of the experimental research to form any view on its quality or validity. But for answering the question of whether belief in the hot hand is a fallacy, I don’t see how the results of these experiments much matter. Fans and athletes believe they see the hot hand in real games. If a pitcher has retired the last nine batters he faced, many fans (and managers!) believe he is more likely than usual to get the next batter out. If a batter has 10 hits in his last 20 at bats, fans believe he is “on a tear” and more likely to be successful in his 21st at bat (and his manager is more likely to keep him in the lineup). But we know these beliefs are wrong.

      I agree, controlled and semi-controlled basketball shooting data can say little about the hot hand in baseball and even less about the beliefs of baseball fans. The psychological point GVT were making was about basketball data, and they, and others, were careful to make that clear (here and here. They didn’t care about whether the hot hand existed in other sports, they found that it didn’t exist in basketball, yet people believed it did, which made the hot hand an illusion.
      For baseball, I haven’t studied the data, but I can easily imagine a hitter making an adjustment that takes a while for pitchers to pick up on. The streak that fans label the “hot hand” could have a perfectly prosaic explanation. But I digress. Anyway, I agree with your psychological point, and the poor diagnostic value of such small samples. But, in principle, a player hiding an injury, or exhibiting newly inconsistent mechanics (e.g. overthinking), will probably be doing worse, and be more likely to be pulled… so, by symmetry someone on a tear must also be more likely to stay in. I doubt coaches are perfectly calibrated in this, but I bet some pay attention to these things better than others. The game can be aided greatly by an analysis of data, but it cannot be run based on an analysis of data—some important variables are not measured. Red Auerbach and Bobby Knight were correct to dismiss GVT’s findings.

      Even if experiments do demonstrate real momentum for some repetitive athletic tasks in controlled settings, this would not challenge either of my contentions: that the hot hand has a negligible impact on competitive sports outcomes, and fans’ belief in the hot hand (in real games) is a cognitive error. Personally, I find it easy to believe that humans may get into (and out of) a rhythm for some extremely repetitive tasks – like shooting a large number of 3-point baskets. Perhaps this kind of “muscle memory” momentum exists, and is revealed in controlled experiments. But it seems to me that those conducting such studies have ranged far from the original topic of a hot hand in competitive sports — indeed, I’m not sure it is even in sight.

      In conducting controlled studies, researchers have not “ranged far” from the original topic. GVT’s paper was the first paper on the topic, they had a controlled study, and they considered it to be their critical test of hot hand shooting. Importantly GVT’s task was not a “repetitive” muscle memory momentum task as you label it, in fact their task had players changing positions after each shot. In any event, if you need plausible mechanisms for you to consider the possibility that streak shooting exists in games, there are plenty you can cook up. The funny thing is, once you start thinking about it, the hot hand seems less magical and more prosaic. Here are a few: (1) a player makes a slight (unconscious) adjustment to his shooting mechanics in a game, which is easy to repeat that day because of muscle memory, but doesn’t carry over to the next game because the coach doesn’t point it out, (2) a player has certain preferred spots on the court where he his shot 1000s of times, but some nights the muscle memory for these shots just isn’t as available, (3) we know that muscle coordination and control can be moderated by drugs (e.g. Adderall), and there may be fluctuating endogenous analogues, (4) the player hits a few shots in a row, the crowd gets excited, the player puts in more attention, or effort, into getting the timing right, (5) there is something called the “winner effect” that is related to dominance behavior in animal contests. We can speculate forever on this. Who knows?!

      • sorry, many links don’t work because I copy pasted from ms-word which uses different characters for quote marks (“quote” instead of “quote”).

        I tried copy-pasting a fix, but I think the spam filter caught it, my comments don’t even get to the “awaiting moderation” stage now.

  6. Baseball players talk a lot about when they’re “seeing the ball well”. Example: Manny Ramirez would warm up with a trainer who would throw him balls to different places and Manny would, in his stance, have to grab them. On some days, Manny could identify and grab the ones low and outside, which is of course where a pitcher would concentrate. The team and Manny believed he hit better on those days – don’t have any stats for this but the point is the body is a complex system and of course the eyes and body don’t work exactly the same every day. The differences in our individual behavior averages out and we as a group average out when people are aggregated, so the more data you look at, the higher your vantage point, the smaller the effect. Maybe it’s as simple (or as difficult) as concentration: that we’re as unable to maintain high levels of focus as we are unable to stare at an object without our vision moving, as unable to think about one subject without our minds wandering.

    I know I have hot hand periods. When I’m playing piano, there are stretches when it’s absolutely effortless even though I’m improvising with complex melody, rhythm and with the exact tonality that’s in my head. It’s the same when you’re playing music that’s written down: sometimes it comes easy and sometimes it’s harder and you practice so the harder times sound the same even though in your head it may feel like a mess of work to get through it. I’ve never met a person who doesn’t think in melodies of thought, who hasn’t engaged in conversations which absolutely clicked, who hasn’t connected with something or someone in such a flowing manner that it makes the rest of the day seem dull or perhaps even laborious. Musically, for example, I can play at speed and my fingers go where they need to go with almost no lag time between the thought and the playing – so in jazz terms I’m right on top of it.

    Practice allows you to go to certain spots that your muscles can handle: just like practice creates certain areas on the floor where your body knows how to launch a good jump shot, so your mind and fingers go to the riffs you can finger and fit into a key or other tonality where you have options which you can process in real time quickly enough so you don’t lose focus. In fact, I’d say a lot of hot hand stuff – and a problem with analyzing it – is that we can’t directly measure the practice that went into things like being able to hit a pitch in this region with the skills of recognizing that, etc. or of getting to this spot and being able to turn and rise to get off a clean shot so what I often see as a “hot hand” is that someone has practiced and the other side is feeding that by throwing those pitches, by letting you get to that spot (and teammates can help by setting you and the defense up so you can get to your spots. Ray Allen was a master of this, setting up just close enough to the corner that his defender had to play off it a bit and then Ray could step there and his man wouldn’t be able to reach him if he took the pass and instantly rose to shoot. The original Isaiah Thomas spend the night before the championship game practicing 3’s from a few specific spots top of the circle because he knew he could get those shots – and the other team let Zeke get to those spots, probably expecting him to keep the ball or pass. We measure results but they’re imperfect renditions of the contexts that generated them. Like Joe Dumars “Jordan rules” for guarding Michael. Or Bill Belichick’s focus on taking away tendencies.

    I find an interesting relation to math in pauses while playing. Or in playing at a slow tempo. That is, I see the mass of choices playing out – often literally in my head – while I wait for them to come to or to resolve to the one that fits or which reaches across the time to connect that musical bit to this. It’s hard to see this when playing at speed of thought but it’s in there too – and it’s even in there when you lock to a key signature or specific tonality, sort of like the set theories which create a universal set through artifice. This range of musical choices extending off to the “side” in the space between music is a complex zeta function. (It’s obviously complex if you consider any branching.) This is one of the great joys of running plays, of playing music: that you count across time, that you start something over here with a bang of some sort – like a chord or you go to these positions or you arrange your pieces thusly on the board – and then the plan, the play comes together. It’s not action at a distance – which has a specific meaning and which orders events – but rather it’s connecting actions over time.

    To get back to hot hand – you can tell I’m avoiding work by the length of this – we’d really have to evaluate how often a play worked. So when you feed x player the ball, that’s likely not because he’s “hot” on his own but rather because you’re able to run the plays for him that work given what the other team is doing. Like when Vinnie Johnson would come in to games for the Pistons, they’d run specific plays meant to give him his shot. He never shot from all over the floor; like any sensible player he grooved a few specific locations so they were deeply built into his muscle memory. His teammates knew those spots. The other teams did too but not as well and his teammates would set the floor so he could more easily get to his spots – which typically were to the side of the key off the dribble. This happened enough times that people thought of him as the Microwave but it was really a team effort. If the other team defended those spots, then this didn’t happen, which became a problem in extended playoff series. And it helps to remember, an NBA player should be quick enough to get off a shot in a very small amount of time. I mean specifically that the games adjust so they run at the margins of skills – with certain exceptions like Wilt when dunking was allowed. So Ray Allen practiced given that the defender would need to be step off him so he could take that step to the corner or could run to it and turn, rise and shoot in the amount of time which fits into a fast played NBA game. Or as Wynton Marsalis said about playing really, really fast: you start playing slow and speed it up until you go as fast as you can.

    I sometimes think an issue with stats in general is that it is so focused on the results that can be measured that the argument becomes focused way too much on what can be measured. Thus the hideousness of ESP “studies”: can it be measured … so then you’re forced to argue about whether something can or can’t be measured when you should bleeping start with a model that says how this could conceivably happen.

      • Bug if you “let the data speak” without thinking about how the data is generated and how your measures relate to the underlying parameters of the data generating process.

        Bug if you haven’t put any effort into finding new measures, and new data sources.

        Feature after that.

  7. I agree with what Molyneaux says and thus implicitly disagree with what some of the other commenters are saying…except that to some extent I think there is some tendency to talk past each other because the same definition of ‘hot hand’ is not necessarily agreed on by everyone.

    Does a “hot hand” mean (1) the player is more likely to make her next shot? Or does it mean (2) he is more likely to make her next shot, CONDITIONAL on some aspects of game play (distance from the basket, closeness of defenders, time remaining on shot clock, etc.)? The original Gilovich paper used the former definition, if I recall correctly.

    But also, suppose (as is surely the case) there is slight day-to-day or week-to-week variation of player performance due to things like minor injuries, amount of sleep the night before, jet lag, etc., etc., so that the player’s “true ability” varies somewhat. The player really is slightly better in some games than in others. Sometimes favorable stochastic variability and higher true ability will align and the player will be especially ‘hot’, other times they will go the other way and be especially ‘cold.’ Is this what we mean when we talk about a ‘hot hand’?

    Gilovich made a simple statistical error that nobody (including me, or Andrew) noticed for decades, and that was repeated by many subsequent ‘hot hand’ researchers. Re-analysis of Gilovich’s own data reveals a very weak ‘hot hand’ effect, even by definition 1. Since players and coaches (and fans) believe in the hot hand, I’d be surprised if the ‘hot’ player isn’t more likely to take a difficult shot, and if opposing players don’t guard her more carefully, so I suspect there is a slightly stronger ‘hot hand’ effect by definition 2 than by definition 1. I know there has been lots of work on this effect, in different sports and settings, and although I have only read a tiny fraction of it I think the gist of it is: in many sports there is slight autocorrelation of successes — consistent with a weak ‘hot hand’ but of course that could be due entirely to factors discussed in the previous paragraph.

    One of the great things about Gilovich’s original paper is that it didn’t just look into whether or not a ‘hot hand’ effect was apparent in the shooting of a pro basketball team, Gilovich also looked at why people believe there is a large ‘hot hand’ effect. An answer, which will surprise no one, is that people take a series of consecutive successes or failures as strong evidence of autocorrelation. My recollection is that Gilovich presented people with several sequences of heads and tails (e.g. H T T T H T H…) and asked them to pick out the one that was produced by flipping coins. In addition to the coin-flippy one, there were some that were designed to be streaky (positive lag-1 autocorrelation) and some designed to be anti-streaky (negative lag-1 autocorrelation). Most people picked one of the anti-streaky ones as the uncorrelated ones. People expect a series of coin flips to look like H T T H H T H T H H T H, so when they see something like T T H T H H H T T T T H they think it’s “too streaky to be random.” Andrew used to (perhaps still does) exploit this fact in a classroom demonstration sometimes, asking students to try to make up a fake series of 30 or so coin flips, and to also generate one by actually flipping a coin, and then Andrew picks out which is which by simply (as I recall) counting the number of switches between H and T, and the length of the longest sequence of consecutive H or T.

    Putting it all together, I think there is no question that people overestimate the strength of the ‘hot hand’ effect. Our cognitive bias pushes us that way. I also think it’s been shown that in at least several sports there is a hot hand of the type 1 that I discussed above, but it’s quite weak. I think there’s slight evidence of a hot hand of type 2.

    In general I agree with Molyneaux: the hot hand effect is probably small enough that it can be ignored in making most coaching decisions and playing decisions…maybe not all of them, but most of them.

    And now, a story. Last year I went to the deciding game between the Giants and the Cubs for the NLCS wildcard. A friend and I were watching the game while listening to the radio coverage. Gillaspie got hits in his first three at-bats. One of the coverage guys said ‘Gillaspie is really hot right now.’ I said to my friend “NOW he tells us. If he told us BEFORE Gillaspie got that hit, THAT would be informative.”

    So the next time Gillaspie comes up, the guy says “OK, watch out, Gillaspie is really hot!” And Gillaspie struck out. I said to my friend “dude, there’s no such thing as ‘hot’ in baseball.”

    No, I lie. That’s what I _would_ have said, if Gillaspie had failed to get a hit. But actually, GIllaspie got his fourth consecutive hit. Of course! He was hot!

    • Hi Phil

      I have some comments above that touch on this, and agree with what you say a few places.

      Below I’ll highlight four areas where the evidence doesn’t agree with what you say:

      You say: “The original Gilovich paper used the former definition, if I recall correctly.” Actually they discuss both positive feedback notions of hot hand, and regime change notions.

      You say: “Re-analysis of Gilovich’s own data reveals a very weak ‘hot hand’ effect, even by definition 1.” but the point estimates are large.

      You say: “Putting it all together, I think there is no question that people overestimate the strength of the ‘hot hand’ effect.” While I believe this is almost certainly true for some observers, the evidence for this does not come from GVT’s study unless you take the rather limited evidence of peoples recollections of play-by-play data vs. real play-by-play data, which, again has all the attenuation issues we have discussed. I am not sure what evidence you draw upon to have “no question” in your easement.

      You discuss the auxiliary experiment that GVT reported in the discussion section of their original paper. This study wasn’t part of their main results (Study 1-4), presumably because the ability of people to detect “chance” sequences had been tested in many previous studies—for example, the sweet-spot alternation rate for what people consider a “chance” sequence is .6 (rather than the expected .5). Keep in mind though, there are some real limitations even with this experiment: (1) first the obvious one. Each of the 6 sequences that GVT presented to subjects is consistent with chance shooting, in that each sequence has the some probability of occurrence under an iid
      Bernoulli process with p=.5, (2) GVT used “alternation rate,” as their objective measure streakiness relative to chance, which, as you appropriately note, is a lag-1 measure. GVT did not consider how representative these sequences were for streaky and anti-streaky when you condition on actual streaks. In this case the expected alternation rate for a random sequence is not .5, but actually it is higher than that, so .5 is consistent with streaky. The bias is even worse if people are thinking about hot hand shooting, because in that case they concentrate on streaks of hits, so you have to look at the alternation rate for hits. This all follows from the bias we found.

    • >>>except that to some extent I think there is some tendency to talk past each other because the same definition of ‘hot hand’ is not necessarily agreed on by everyone.<<<

      That's the whole problem. If you take an ill-posed problem, with fuzzy definitions & throw in a bunch of noisy big data what you have is a recipe for endless talking past each other.

      • Rahul:

        it doesn’t have to be endless, unless those involved aren’t familiar with the details, or want to intentionally muddy the waters. There are some clear alternative (and complementary) definitions. We can answer what can be said, and what can’t be said, with those definitions.

  8. To quantify the difficulty of detecting a time varying signal in hit/miss data I put together a simulation study with Stan:

    models.street-artists.org/2017/04/10/under-stan-ding-the-hot-hand/

    Even with a relatively large, perfectly periodic sinusoidal signal and informative priors on the model parameters, you can’t detect this signal in less than a few hundred hit/misses

    In real world games, there is no periodic signal and no strongly informative priors. But this doesn’t mean that the “hotness” is not there. You just don’t know how to measure it. See Jonathan’s comment above for several hundred words on the subject that I mostly agree with.

    • It doesn’t mean the hotness IS there either!!! The only thing driving the idea of “hotness” is a prior on what it means to be human. We experience what seems like fluidity, ease of accomplishment, being “in the zone”. It’s entirely possible this is some kind of hallucination and doesn’t affect objective measures of performance… But as an objective measure of performance hit/miss data is so evidence-free that it just doesn’t count.

    • The reason behind all this is really interesting. I’m not talking human context but the reasoning underlying: when you push measurement out to the edges of what can be measured with those tools then you are examining an aspect of where points and waves become hard if not impossible to distinguish. Take a basketball player and put him in all these varying contexts of actions and we treat his shots within all these contexts as a wave and yet if we look closely in any number of ways like detecting a pattern related to 4 shots back or whatever then we start to treat the player like a point which happens to describe a wave. Or with a bit more abstraction put into the words, you take so many object defining statistics away that you’re not looking at an actual point but at a generality which acts more as a wave, so sometimes you see it and sometimes you don’t depending on which view you take. I think much of what AG’s blog does is discuss that fact: effects which disappear or appear depending on which path you take or how much power is in the n or which appears in a model that may or may not relate coherently to the space actually analyzed and the effects measured.

      That and the interest is the difference between stuff we all experience and stuff we never experience. So we all experience the sense of getting stuff really right but no one ever actually hears another person’s thoughts. We can share conclusions and feel like we’re really close together and can even finish each other’s sentences but all that says is we share so many attributes in the same contexts that the same words fit the occasion like we’ve practiced that play so much we have it wired. So on one side of the “impossible line” we investigate what we all experience and that hides in the measurement problem and on the other side we have people trying to show that what can’t happen actually does. Measurement problem versus model problem.

    • >>>In real world games, there is no periodic signal and no strongly informative priors. But this doesn’t mean that the “hotness” is not there. You just don’t know how to measure it. <<<

      I think that's an operationally irrelevant distinction.

      Suppose you suspect that your drinking water has fugutoxin in it. I get your water analyzed on the best instrument in the world and it shows repeatably that fugutoxin is absent accurate to less than 1 part in quadrillion limit of detection of my instrument (and say the LD50 was 100 ppm).

      Now you could still tell me, no technically that still does not mean the toxin is not there. We just don't know how to measure it.

      Fair enough. As an exercise in pure logic you are indeed right. But pragmatically, I don't think that is a sensible tack to take.

      • Rahul:

        This is not an exercise in pure logic.

        The key to making your fugutoxin analogy fit, is to specify why you suspect its presence.

        Is it (1) because you just watched the most recent episode of “Wild Ways to Die”? or (2) because a physician experienced with fugutoxin poisoning concluded that you have the symptoms and etiology consistent with exposure, and his somewhat reliable lab test says you have it?

        If it is (2), then if the current state-of-the-art measures aren’t precise enough, I am still going to update my beliefs based on the information I have.

        I still don’t think this more specific analogy fits, but its closer.

      • This is more like you measure figure toxin to 1 part in 2 so if your water was 50% toxin you could detect it. It comes up negative and you assert that the water is totally safe even though 1 part per million would kill you in 3 minutes

  9. It’s a cognitive sport rather than a physical one, but might chess provide some more data-rich tests of the hot hand hypothesis? Competitive chessplayers tend to believe in “hotness” and among players I know it is quite common to withdraw from a tournament or take a bye if you feel you are “cold.”

    Recently there has been interest in judging individual chess moves by how much worse they are than a computer’s best choice. The computer is treated as a gold standard here, which is not entirely correct but probably close enough, especially for players below the master level. An average tournament game might have 30 moves of which about 20 are not routine, so a five-round tournament would provide about 100 moves worth of data per player.

    Unfortunately there is an ugly confound. It is known to be harder to find good moves in a bad position than in a good one. As a result your scores will tend to be closer to the computer’s when you have a superior position, or when your opponent is weak. So successive moves in a given game are quite non-independent. Furthermore, if you adopt a foolhardy plan it is easy to make five or six sub-optimal moves in a row.

    (There are also some positions where it is next to impossible to make a bad move; I had a game with 29 computer-perfect moves in a row, but on examination that was because literally every legal move had the same evaluation! The game was so badly blocked that nothing whatsoever could change the outcome. These would need to be filtered out somehow.)

    If we use only a few moves per game we’ll be back in the basketball/baseball problem of inadequate data. If we use all of them, we’ll see a false “hot hand” that has to do with whether the player’s position in that game was easy to play or not.

    It’s a surprisingly difficult thing to test, for a phenomenon so widely and commonly perceived! (When a 13-year-old girl of my acquaintance beat three masters in a row, everyone treated this as an undeniable “hot hand”, especially when she could not repeat the feat next tournament.)

    Hm. Another thought about chess: All serious tournament players have statistical ratings in one or more systems, so there is an objective measure of the probability a player will beat a given opponent. So there is more chance that we could know the difficulty of the task in a given chess game than in a given basketball game. If you wanted to use win/lose/draw data in this way, though, you would need to limit yourself to round-robin tournaments where the roster of opponents is fixed, as in the common Swiss-system tournaments success in early games yields more difficult opponents in later games, and I bet this muddies the statistics badly.

    • Funny that you mention this, Mary: I have just experienced one of my occasional cold streaks in chess! I am certainly a believer in the hot hand theory when it comes to my own chess play!

      I am a rather poor player. When I started playing online “blitz” games, a little over two years ago, I was rated about 1400. Over the next couple of years I played over 10,000 games, and my rating verrrry graduallllly increased until I was solidly mid-1500s. But superimposed on the tend there would be these enormous excursions in which I would lose, say, 30 out of 40 games…and of course, as my rating declined, I would be pitted against worse opponents, on average. If you look at a plot of my rating as a function of time, there are occasional sharp valleys in which I drop 150 or 200 points in two or three days, then gradually climb back up over the next six or seven days. At times I am very aware of at least some of the cognitive mechanisms at work here: my game gets out of balance in the sense that I move too slowly and end up losing on time or in severe time pressure, or I move to quickly and make blunders. Once I become aware of the problem, I sometimes overcorrect. I find it very hard to preserve that balance.

      But as for your large point about the “objective probability” of beating a given opponent, I think we could actually do better than this: have the computer evaluate each position, and treat its evaluation as the ‘true’ value of the position. Computers are by far the strongest chess players, so I think we can take their evaluations as close to truth. With that kind of data, you can look at things like the average quality of each move, compared to the best move that could have been played. You are right, though, that there is a lot of autocorrelation because some positions are easier to play than others…but perhaps it is possible to adjust for that, at least partly, by finding a metric for it (such as the difference between the best- and second-best or n-th best move in the position). Indeed, this basic approach has been used to compare grandmasters from different eras.

      Anyway, I agree with your observation that we can quantify the difficulty of a given chess situation more accurately than in basketball or baseball or whatever other active sport. (Hmm, I wonder…what about tennis? Speed and location of each shot, position of each player when the shot was hit, and a few other parameters could be determined from video and might be enough to fairly accurately quantify difficulty).

  10. What about the mirror image of the hot hand? I know an inveterate gambler who believes in the “I’m due” phenomenon. His strategy when losing money is to increase the size of wagers since he is certain to win based on the laws of chance. He believes that one should double the bet after losing on a coin flip.

  11. oncodoc: That is known as the “gambler’s fallacy.”

    The same phenomena occur in demography studies. Some people believe that, after they’ve had, say 5 girl babies in a row, they are due a boy. Other people believe that, after they’d had 5 girl babies in a row, they are just girl producers.

  12. A lot of good comments here – far more than I can reply to. Looking at the big picture, I think there is actually more agreement than disagreement. I made three main claims:
    1) “the hot hand likely has a negligible impact on game outcomes,”
    2) “teams and athletes should largely ignore the hot hand in making strategic decisions,”
    3) “people’s belief in the hot hand is reasonably considered a kind of cognitive error.”

    Working from the bottom, I don’t think there is really a significant dispute about #3. Fans, athletes, and coaches all perceive a hot hand effect that is much larger and/or more frequent (likely both) than can possibly be true. Josh suggests this is evidence that “people over-interpret weak binary signals” rather than “mis-perception of random sequences.” Perhaps that is an important distinction in the field, I don’t know — I come to this discussion as a sports fan, not an economist. My claim is simply that people’s intuition about the hot hand is frequently wrong, and wrong by a lot, not a little. As best I can tell, we all agree on that.

    On #2, it seems we all agree that there is limited evidence of an ability to correctly identify a hot hand in real-time with enough accuracy to usefully inform decisions on the court or field. However, Josh suggests the Gilovich data demonstrates a modest but non-trivial ability by observers to predict a shooter’s next shot. I can’t find a working link for Josh’s paper at the moment, but I wonder whether he has accounted for the variance in the true shooting ability of Gilovich’s subjects? If not, betting aggressively after made shots (as the observers did, presumably because they believe in a hot hand) will cause the bettors to make more accurate predictions than random guesses. In any case, if there is non-experimental evidence that sports bettors can discern “truly hot” players, I would count that as very serious evidence for Josh’s claims (but I don’t know of any).

    On issue #1 – “the hot hand likely has a negligible impact on game outcomes” – I will concede what several commenters have argued here: the in-game evidence cannot rule out a non-trivial impact. Even if only a fraction of shooters on a streak are truly “hot,” such players could theoretically affect win/loss outcomes. I see little evidence that this is true, but it also seems impossible to disprove the hypothesis. Which is why this debate can continue indefinitely, I suppose.

    But unless and until someone can identify the truly ‘hot’ players, I think fans, players, and coaches are all better off making decisions on the assumption that players have a constant short-term true talent (constantly updating our estimates of that talent with new information), and suppressing out intuition to respond to streaks. It’s analogous to investing: maybe there are a couple of money managers out there who can regularly beat the market, but since we can’t know who they are, you are much better off as an investor believing in a random walk and buying index funds.

    Finally, I guess I do disagree somewhat with Josh’s argument that experimental data establishes a strong prior in favor of the hot hand. To me, the two activities are not similar enough to make that leap. Josh notes that Gilovich et. al. used experimental data, but it seems to me there is a certain asymmetry on the implications of experimental data. If one cannot detect a substantial hot hand *even in the laboratory*, then it’s very unlikely it exits in the “real world” of competitive games. This was the argument of Gilovich et. al. However, the reverse argument doesn’t follow: researchers’ ability to create a hot hand effect in laboratory conditions is not, to me, a convincing reason to assume we will discover one of comparable size in real games.

    • You seem to have summarized your beliefs and assigned them to the larger group. There is substantial disagreement about all of the issues you mention in terms of both conceptualization as well as the operationalization.

    • Guy

      there is definitely seems too be some agreement, but perhaps not as much convergence as we would probably hope for. For your three points.

      1) “the hot hand likely has a negligible impact on game outcomes,”

      We agreed that this is not obvious, so it seems like there has been some convergence. Also agreed its a bit hopeless of a question with current measurements.

      2) “teams and athletes should largely ignore the hot hand in making strategic decisions,”

      We would be more narrow on #2. If teams and athletes are using binary hit/miss data exclusively (or primarily) to inform their decisions, they are making a mistake. I am not sure how much weight they put in hit/miss data, and how much weight they put on other contextual factors that they can pick up, because they are experienced.

      I would partially disagree with the investment analogy, and this is actually quite important point. I agree that ignorant investors are probably better off believing there is no skill, but not all investors. In theory, skilled managers should get all the returns on their skill in equilibrium, and accumulate funds under management until the marginal return from an additional investment is equal to index returns. By analogy back to basketball, if a relatively ignorant fan were to coach a basketball team, he would be better off not believing in the hot hand—but for a knowledgeable and experienced coach/player, that is not so obvious. Just as in the stock market, there are measurement issues (and hidden information).

      btw: Here is the link to our re-analysis GVT’s prediction data, p.24-25. A 7.7 p.p. difference isn’t modest, and this cannot be accounted for be the variance in true shooting ability. But, as I noted above, this doesn’t guarantee real-time detection, as a simple betting heuristic would also be successful.

      3) “people’s belief in the hot hand is reasonably considered a kind of cognitive error.”

      We should be careful to acknowledge that believing in the hot hand is not a cognitive error and believing that the hot hand may sometimes be large is not a cognitive error.

      You say: “I don’t think there is really a significant dispute about #3. Fans, athletes, and coaches all perceive a hot hand effect that is much larger and/or more frequent (likely both) than can possibly be true”

      I don’t think there is strong formal *evidence* that this is the case for players and coaches, for all the reasons I described in the above comments. I would guess, just based on my priors from behavioral research, that more players over-react to the hot hand, than under-react. I would also imagine some react appropriately. Again, if you want to tie yourself to the evidence we have, who knows. For fans, sure, the surveys of their re-collections of patterns in play-by-play data (or what they believe is in there) do not come close matching reality. But they are having fun, and they aren’t making decisions. What kind of other evidence were you thinking of?

      • 1. I don’t see how one can logically conclude that there is little impact on game outcomes if it cannot be measured accurately.

        2. Teams and coaches can certainly ignore it, but that is rather irrelevant to the question itself. The real question is whether gamblers and sports betting companies should ignore it. I am not ready to concede this point at all. It only takes a small change in probabilities and a big bank roll to take advantage of it.

        3. I again do not see how one comes to the conclusion that the perceived effect is an error if one cannot accurately measure the effect in actual games.

        • Curous: On #1, I have acknowledged that I cannot prove there is little impact — that is my judgment, based only on the countless failed attempts over the years to document a meaningful hot hand effect in game conditions. If the hypothetical hot hand effect is rare but large, then it appears to be impossible to disprove. You seem to consider the non-falsifiable nature of the theory a problem for my argument; I see it as more of a problem for the hot hand believers. Hence our impasse.

          On #2, please share your betting strategy once you develop it. I’d be happy to take the other side of those bets….

        • Guy said:

          …that is my judgment, based only on the countless failed attempts over the years to document a meaningful hot hand effect in game conditions.

          Honestly Guy, you don’t strike me as someone who would have an agenda, or has anything to gain by muddying the waters for political reasons to try revive the hot hand fallacy (you aren’t an academic). So I will continue to assume the most charitable version of your motives, which means I have to conclude that either you haven’t yet read or haven’t yet processed all the comments that I (and others) made to you. Frankly, if you are still making this judgment, you haven’t yet done the hard work of bench-marking these game data analyses (e.g. with code provided). Thinking hard and writing words isn’t gonna cut it here. Just because dozens of older game data studies didn’t find anything, doesn’t mean anything. You have to look at the quality of those studies. As Jordan Ellenberg nicely put in “How Not to Be Wrong”:

          If you look at Mars with a research-grade telescope, you’ll see moons; if you look with binoculars, you won’t. (Jordan Ellenberg)

          That analogy fits.

      • Josh, in your re-evaluation of the Gilovich bettor data, I’m not understanding how you controlled for the players’ respective shooting abilities when measuring the correlations. Indeed, in footnote 63 you note one hypothetical scenario in which bettors bet more aggressively in response to past made shots, and this yields a correlation nearly as high as what you report. I’m sure I’m just missing something, but can you clarify that?

        • Guy:

          thanks for reading our paper.

          I have to assume that if we are down to talking about footnote #63, that my work is done here. :-)

          As mentioned in the text, its an average of coefficients, just as the correlation is. More details in footnote 60.

          btw: I think it would be worthwhile for you to play with the code that I mentioned above and on the previous post that you are responding to here (our code here ). You don’t seem to have acknowledged that the study which you cite to support your belief in the email to Andrew above—“A sophisticated effort to control for shot selection and defensive attention”—is designed to pick up a signal, and it cannot be treated as an effect size estimate. As you simulate with that code, and perhaps spike it with more variation, you will see that you can have a massive hot hand in games, and with that approach, you will detect estimates that are a fraction of a percentage point. It is simply not informative.

        • Well, I read the two pages you directed me to, so your work may not yet be complete (though I did read an earlier working paper version of this paper some time ago). But I still don’t see how you are controlling for the talent of the shooter. For example, if the bettors placed more aggressive bets on the male shooters than the female shooters, then you will find a correlation between bets and outcomes. This might reflect sexism, or simply betting progressively more aggressively as the shooter demonstrates skill, rather than a hotness-detection skill.

        • Now you’ve doubled my reading assignment! Let’s make this simple. Can you tell us the average expected success rate after a “high” bet (weighted to reflect the respective success rates of the shooters) and also the actual average success rate for these shots? Presumably the latter percentage is 7-8 points higher than the former. What are those figures?

        • Guy:

          replying here because the comment thread won’t indent further.

          In order to evaluate the evidence it is important to read about how the experiment was designed, and who bet on what, so please check that.

          If people bet randomly, there should be no difference, because $latex E[\hat{\beta}]=E[\hat{p)(\text{hit}|\text{bet hit})-\hat{p)(\text{hit}|\text{bet miss})]=0$ for any given player under random betting. As mentioned in the text, the stat is an average across bettors $i$, $latex (1/n)\sum_{i=1}^n \hat{\beta}_i$.

        • After reading this thread, pp. 24-25, and footnote 63, I’m still a bit unclear on one point. As I understand it, the bettors i =1..n bet on two shooters each, him/herself, and one other shooter. So, the correlation of high bets with success rates is a combination of two things: (a) possibly recognizing a hot hand based on recent success and other unobserved factors, and (b) possibly recognizing a good shooter based on success rates.

          Is that correct?

        • Phil:

          sorry, I stopped reading the comments after I (regrettably) lost patience with Guy, and we got away from the content.

          to answer you questions:
          1. Yes, the bettor bets on self and one other.
          2. The bettor takes either a high stakes bet or a low stakes bet — wins money for a hit, loses money for a miss.
          3. The positive average correlation between bets and outcomes occurs for both bets on own shots, and bets on other shots, separately and jointly.

          For your two possibilities, we can rule out (b), because for the correlation test we average the correlation across bettors, generate the null sampling distribution for this stat by stratifying the randomization at the sequence level. Another way to look at this: being able to recognize good shooter won’t help any individual bettor do well in correlating bets with outcomes. The same story works for our conditional probability test, which is more intuitive to interpret because the measure isn’t unit-less like correlation is.

          I would add a third possibility, (c), which I mention above, instead of “recognizing a hot hand based on recent success and other unobserved factors,” the bettors bet heuristically, based on previous outcomes, and they do well not because of any skill in recognition, but because their heuristics tend to work when there is positive serial correlation in the data.

        • OK, so the high correlation (.07) exists even when the sequence is one bettor predicting one shooter. What I was getting at is, if you put two shooters in one sequence, and one shoots .3 and one shoots .7, and the bettor notices and puts more “high” bets on the .7 shooter than on the .3 shooter, there will be a correlation based on proficiency, independent of any hot hand effect.

          But if the .07 occurs on a single bettor/shooter sequence, there’s no proficiency effect. It’s all being able to distinguish differing levels of success among attempts from a single shooter.

          And .07 is a large effect! Would it be fair to say that if bettors can produce .07 intuitively, then a more specific algorithm could produce even higher? It sounds like, maybe, the shooters might vary by as many as 10 percentage points when hot compared to cold.

  13. Guy:

    oh, and on the laboratory conditions… yes there is some *degree* of asymmetry, but keep in mind, there are 4 controlled or semi-controlled studies that we were able to obtain data for: Jagacinski et al (1979), GVT (1985), NBA Three Point contest (1986-2017), our Spain data (2013)… heck even free throws (Arkes, 2011). The shooting tasks all vary, some have more repetitive “muscle memory”/calibration issues than others, but many have different aspects of real game situations (moving around, crowds, incentives, etc). They cannot all be attributed to simple “muscle memory” and yet they all have the same message. I think the is room to disagree on how informative this data is, but I don’t really understand why someone would ignore this data completely, it doesn’t strike me as very scientific, especially when your relatively non-informative prior is that there is little to no hot hand.

  14. If you reinforce a pigeon’s key-pecking on a variable-interval (VI) schedule (a response results in food delivery after a period of time has elapsed since the last reinforced response and the time period changes from reinforcer to reinforcer) you obtain a complex, multi-model distribution of inter-response times (IRTs). Now, if you make a nice 3-d plot of the distribution over consecutive experimental sessions, you get a very interesting picture. The distributions tend to maintain their general shape but different peaks wax and wane. And “wax and wane” was chosen carefully – the modes don’t jump around randomly (loosely defined). A given mode may shrink gradually over days while another one grows. Modes representing shorter IRTs in general may become more prevalent slowly across days while longer IRTs shrink a bit, only to have the state of affairs slowly return to approximate the previous configuration from which it flows away again. Now…what has this to do with the HH?

    Well…the pigeons are “streaky.”

    Their behavior changes across time in orderly (the peaks change progressively…they “head in a direction”) but not-easily-described ways* (there are irregular oscillations in measures, often over different time scales). Now, you could measure other aspects of behavior like the force of pecks and the angle at which the beak strikes the key etc. Pigeons on VI schedules, remarkably, tend to miss the key quite frequently (while on variable-ratio schedules they almost never miss). So they even have “hits and “misses.” The point is, if it isn’t already clear, that all the measures of behavior I just described change in complex (but apparently non-random) ways over time. And remember that this takes place while there is an attempt to hold everything constant (but, of course, this is the real world we are talking about…even if it is that small part called a laboratory). And the description I gave would be similar if it was rats or monkeys pressing levers, mice nose-poking, humans “pulling a plungers” in a big operant chamber or humans shooting baskets in a stadium. It is the norm, these orderly-but-complex-oscillations…and they are largely due to the feedback loops inherent in a complex organism-environment interaction (and, yes, even a pigeon and the environment of an operant chamber forms a breathtakingly complex dynamic system). So…many aspects (often rather subtle) of a player’s shooting will be the continually varying elements that determine whether the ball goes in or just barely misses. All of these various properties of behavior alluded to WILL undergo complex aperiodic oscillations, and they will do so even if the variables other than the dynamically-interacting system variables are held constant.

    That is, something like the hot-hand MUST exist in the sense that all of the properties of behavior that determine whether the ball goes in or not WILL vary in time simply under the impetus of the dynamically-interacting system variables. The measurement of some properties will, every now and then, be “extreme” – streaky.

    The operant behavior of pigeons is streaky. So, too, is that of rats, monkeys and people – and that includes atheletes.

    Now…as to the verbal behavior of coaches, players and spectators when they “report hotness” – that is a whole other kettle of fish. Gotta be simple stuff, though, hey? All we are trying to do is offer a functional account of an individual’s utterances in a highly complex environment. Probably no big deal!

    *One can do Fourier transforms, power spectra, autocorrelation etc. of course. This hasn’t been done too much though because it wasn’t of sufficient interest and some of the oscillations are on the order of months – and who knows…maybe longer. You have to be real interested in a phenomenon to maintain a pigeon’s key-pecking on a particular VI schedule parameter for two or three years so you can look at data in the frequency domain.

    • This is insightful. Non-humans displaying a hot hand is a strong indicator for its existence. Might be a good way to get an idea of its possible magnitude as well.

  15. From the original post: “Across a wide range of sports, players’ outcomes are just slightly more “streaky” than we’d expect from random chance, implying that momentum is at most a weak effect.”

    I’ve never followed that line of reasoning. Outcomes could look exactly like a “chance” distribution, but that says nothing whatsoever about whether “hot hands” occasionally exist (no one ever said that “hot hand” means that all players have an increased chance of making a second shot after making the first), and it says nothing whatsoever about the actual causal factors at play. By analogy, the distribution of human height might look exactly like a chance distribution, but that in no way allows us to say, “Genetics plays no role in determining whether someone is 7 feet tall, because 7-footers are about as common as you’d expect by chance.” No.

    Watch this record-setting performance by Klay Thompson in an NBC game from 2 years ago. http://www.youtube.com/watch?v=Sc3m3BwfylA Is there a statistical way of “proving” that Klay Thompson wasn’t experiencing a “hot hand” in those moments?

  16. I haven’t seen this addressed. If you’re interested in the Hot Hand per se and not only in pro sports, the situation may look very different. Pro players are obviously a heavily selected bunch, and the qualities selected for may almost completely destroy the possibility to the detect Hot Hands. These players are playing much closer to the top of their game much more of the time than average players like you and me. This “ceiling effect” leaves much less variability to detect Hot Hand effects. Yes, average players, you and me, boring, I know. But if it’s about the detection of a potentially important psychological effect, why not do studies in inexperienced players?

    • this would predict that centers and power forwards, players that aren’t as skilled at shooting the basketball, would be the ones thought to get the hot hand. No one believes that. Maybe you are thinking about cold hands? The challenge to studying this isn’t professional/amateur it’s the messiness of games. You aren’t going to find it in high school statistics either. I think people have studied the 3 point shoot-out with NBA players.

      • “The challenge to studying this isn’t professional/amateur it’s the messiness of games.”

        GS: When it comes to, it seems to me, everybody who has looked at this issue, and most folks around here, the challenge is to do the necessary conceptual work in order to carry out any scientifically-coherent studies. Ya’ll don’t seem to know what you want to investigate. What is the domain into which you put the phenomenon or phenomena that you want to investigate? “Hot-handedness”? Can’t say that with a straight face? OK…what IS the domain into which you put the phenomenon or phenomena you want to investigate?

      • “No one really believes that” doesn’t really matter. Anyway, these players are still pro players and I was suggesting looking at an entirely different population. This also connects with what Glen says below. If the phenomenon is supposed to be a general human psychological effect that could manifest in all sorts of contexts, then restricting attention on pro sports is unreasonable.

  17. Like the original poster here I am also getting to this topic very late. It bothers me that the finding by Miller and Sanjurjo often seems to be explained in such a complicated and unclear way, especially in media outlets that have picked up on it. My understanding is that it’s just like Simpson’s paradox, in that it is just a matter of averaging among people with different sample sizes. Everyone knows about the problems that can come about when you average among people with different sample sizes, and I think that’s exactly the problem here. A person with HHHHHHHH has a large sample of times when his toss was preceded by HHH. A person with HHHTHTHT only has one time when he was preceded by HHH. But if you ignore sample size and just average everyone’s score as if they all had the sample sample size, then you get a biased estimate of the overall average. Isn’t that clearer?

    I saw that in an earlier post, Andrew included code he used to check that the average was indeed less than 50%, and he got 41%. But if you modify your code to take a weighted average, weighting by sample size, then you will get extremely close to 50%.

    rep <- 1e6
    n <- 4
    data <- array(sample(c(0,1), rep*n, replace=TRUE), c(rep,n))
    prob <- rep(0, rep)
    freq = rep(0, rep)
    for (i in 1:rep){
    heads1 <- data[i,1:(n-1)]==1
    heads2 0) prob[i] <- sum(heads1 & heads2)/sum(heads1)
    freq[i] = sum(heads1)
    }
    mean(prob) ## simple, unweighted average
    sum(prob*freq)/sum(freq) ## weighted average

    Try it. I get 0.5000766. To me this resolves the paradox more clearly. Yours,
    Rick

    • That certainly should be true of the example given in Table 1 of that paper… the 5/12 comes from (1+1+1/2)/6, but the actual estimate should be (1+2*1/2 + 2*1)/8 = 1/2. Seems like a pretty egregious error methodological error actually in the field if that is the way they had been computing the frequencies.

      • Rick:

        The weighted-average thing indeed returns the correct answer if the hot hand is zero, but it can’t easily be adapted to give an estimate of the hot hand if it exists. This is a point that Miller and Sanjurjo discuss. So: yes, the weighting idea can help people understand the difficulty, but it doesn’t provide an immediate method for correcting the bias if the true effect is nonzero.

        Josh:

        Calling it an “egregious error” is pretty strong. It’s a subtle mistake that nobody noticed for almost 30 years after the famous Gilovich et al. paper came out. Let’s just accept that probability calculations can be tricky!

        • Really? Off the top of my head I could think of several ways to estimate a hot hand effect without taking simple averages over people with different numbers of streaks.

          For example, and again this is after thinking about this for like 30 seconds so I apologize if this is stupid, if you’re interested in the effect as a function of n, where n is the number of consecutive shots the player has just made, then you could take each player i, find their percentage of made shots after just making n shots (frac_i), and also record the number of such attempts, freq_i, and that player’s overall fraction of made shots in general, general_i. Then just take the weighted average over all m players, i.e.,
          1/m ∑_i (frac_i – general_i) freq_i .

          This would give you an estimate of the overall hot hand effect. If you want to allow the hot hand effect to vary from player to player, you could just take frac_i – general_i as your estimate for each player. If you don’t want to assume the hot hand is additive, then you could adjust things accordingly. I don’t really see the difficulty.

        • Hi Rick:

          In GVT 1985 a Cornell player takes 100 shots in an arc from distance in which GVT estimate he/she will hit half his shots. There is no general_i.

          On page 19, footnote 34 of the 11/15/2016 version of our paper we explain an alternative approach, which is in the spirit of what you suggest.

          Btw: yeah, the bias can be related to Simpson’s paradox, but its different. It’s closer to Berkson’s paradox and Monty Hall, here is a general interest explanation for t

Leave a Reply to Joshua B. Miller Cancel reply

Your email address will not be published. Required fields are marked *