A couple of thoughts regarding the hot hand fallacy fallacy

For many years we all believed the hot hand was a fallacy. It turns out we were all wrong. Fine. Such reversals happen.

Anyway, now that we know the score, we can reflect on some of the cognitive biases that led us to stick with the “hot hand fallacy” story for so long.

Jason Collins writes:

Apart from the fact that this statistical bias slipped past everyone’s attention for close to thirty years, I [Collins] find this result extraordinarily interesting for another reason. We have a body of research that suggests that even slight cues in the environment can change our actions. Words associated with old people can slow us down. Images of money can make us selfish. And so on. Yet why haven’t these same researchers been asking why a basketball player would not be influenced by their earlier shots – surely a more salient part of the environment than the word “Florida”? The desire to show one bias allowed them to overlook another.

Also I was thinking a bit more about the hot hand, in particular a flaw in the underlying logic of Gilovich etc (and also me, before Miller and Sanjurjo convinced me about the hot hand): The null model is that each player j has a probability p_j of making a given shot, and that p_j is constant for the player (considering only shots of some particular difficulty level). But where does p_j come from? Obviously players improve with practice, with game experience, with coaching, etc. So p_j isn’t really a constant. But if “p” varies among players, and “p” varies over the time scale of years or months for individual players, why shouldn’t “p” vary over shorter time scales too? In what sense is “constant probability” a sensible null model at all?

I can see that “constant probability for any given player during a one-year period” is a better model than “p varies wildly from 0.2 to 0.8 for any player during the game.” But that’s a different story. The more I think about the “there is no hot hand” model, the more I don’t like it as any sort of default.

In any case, it’s good to revisit our thinking about these theories in light of new arguments and new evidence.

1. Peter says:

Hasn’t the “body of research that suggests that even slight cues in the environment can change our actions” repeatedly failed to replicate? I thought it this point a lot of these types of things were mostly accepted as likely false (at least in the more skeptical side of the research community). I definitely recall that at least the “words associated with old people can slow us down” experiment has been widely discredited.

2. Tom Passin says:

Suppose that the probability p_j did vary over the course of a game. Could we ever possibly get enough statistics to provide strong support for that idea? Maybe by averaging variability over a lot of games and players… But then we get into inter-player variability vs intra-player variability. This sounds hard.

It seems like a good place for simulations to learn better ways to study the data.

3. Kevin Dick says:

I think it’s doubly ironic that Collins cites, “Words associated with old people can slow us down.” as evidence suggesting that hot hand might be real. A questionable finding could suggest that another finding is questionable, but not in the way that I believe Collins intends here.

• Andrew says:

Kevin:

Collins is writing conditionally. He’s not saying that he believes that words associated with old people can slow us down. He’s saying that, if you believe that words associated with old people can slow us down, that you should not think that hot hand effects are so ridiculous. His point is that the same people who were completely ready to believe that words associated with old people can slow us down, were also dismissive of the hot hand phenomenon.

4. Roger says:

My guess is that people stuck with the hot hand fallacy story for so long, because they could pretend it was a lesson on elementary probability. If a sportscaster said that a player was in a hot streak, the statistician could smugly that he does not understand getting heads on a coin toss 5 straight times says nothing about the next toss.

• someone says:

I think this may be right.

Similarly, I’ve noticed in a completely different context, people with no technical expertise telling others, “You’ve missed the point,” when they are the ones missing the real point.

5. Phil says:

I’m not so interested in the question of why some group of researchers who believe so-and-so weren’t more skeptical about such-and-such.

One of the things about the original Hot Hand paper was that it had such a clear story based on something that really is easily demonstrated to be true: people see patterns in random numbers. As I recall, Tversky et al. didn’t just look at the stats of the Philadelpha 76ers, they also had people try to pick out which of several strings of ‘hits’ and ‘misses’ was generated by coin flips, and they found that most people didn’t pick the right answer because it had too many longish strings of consecutive hits and misses. Four hits in a row and people think “nah, that’s not ‘random'”. This effect is very well known, and I know that Gelman and Nolan use it as the basis for a fun exercise in their book of stats demonstrations. Anyway, armed with the knowledge that people see patterns in strings of uncorrelated hits and misses, it’s easy to see that you can’t trust people’s _perception_ of a hot hand. You have to do the stats.

I never believed there was no hot hand effect at all, just because no real-world effect like this can really be exactly zero. I did think the hot hand effect could be very small, and could be either positive or negative. I also remember noting that in my own sports experience I was sometimes very convinced that I was ‘hot’ (or ‘cold’), even though I knew about the tendency to think that way even when faced by random successes and failures.

Anyway, it did seem plausible that the hot hand effect could be very small. And the stats seemingly showed that if it existed it had to be very small. To me, at least, there was no reason to believe that was wrong. Indeed, isn’t it still thought to be small, just not _very_ small?

• Keith O’Rourke says:

> I’m not so interested in the question of why some group of researchers who believe so-and-so weren’t more skeptical about such-and-such.
Well sometimes that will be us and so I would be interested in knowing what to look for to suspect that.

• Andrew says:

Phil:

As you wrote, “the stats seemingly showed that if it existed it had to be very small.” The key is the word “seemingly.” Actually, the conclusion that any hot hand had to be small is a mistake, because of bias and attenuation. That is the key point of the papers by Miller, Sanjurjo, and others. The data (including the data in the original hot hand paper of Gilovich et al.) are consistent with large hot hand effects.

• Phil says:

I used the word ‘seemingly’ to indicate that I know the statistics didn’t actually point to a ‘hot hand’ effect very close to zero, but only seemed to.

I understand that the method was used would make even a large hot hand effect appear small. But of course, a small hot hand effect would also appear small! You just have to use a different method, which people have now done.

But the new looks at the ‘hot hand’ that I remember seeing were consistent with an effect that I would consider small, e.g. a free throw percentage 3 to 5 points higher following a hit than following a miss. …or, to put it another way, if you know nothing about the previous shot you’d assign the player a percentage equal to his season average, but if you knew whether he hit or missed his previous free throw you’d change your estimate by 3 percentage points. I’d call that a small effect, or at least a smallish effect, but some might argue that it’s big, I dunno.

Or, perhaps more likely, I am either misremembering the result that I saw, or what I saw did find a small effect but people have looked at other datasets and found bigger ones.

Here https://www.youtube.com/watch?v=bPZFQ6i759g is a pretty good Numberphile video (though many readers of this blog will find that it moves slowly in the first half), featuring UC Berkeley stats professor Lisa Goldberg, saying she and her co-authors looked at a bunch of basketball games played by the Golden State Warriors and found no evidence for a hot hand, using a method that does not have a bias towards ‘no hot hand.’

• Hi Phil

The second half of the Numberphile video that you refer to has a serious oversight that is well-known to the literature. If you look at their references they don’t cite the relevant literature. Their analysis cannot say anything one way or another because they fail to the control for the quality of the defense, or the offensive shot selection. Even the original Gilovich, Vallone, and Tversky paper addressed this issue with their follow-up studies.

Regarding effect size, in every dataset that does not have defense, e.g. Three Point Contest, controlled shooting (but not FT foul shots), the effect size is on the order 5pp-13pp, which substantial.

• Phil says:

Joshua,
For uncontested 3 point shots I’d call a 5 percentage point difference a modest effect, but 10 or greater is certainly large.

But controlling for the quality of defense, and for offensive shot selection… If I typically shoot 50%, but I get hot so I take riskier shots such that my percentage stays at 50%, but I’m now shooting 50% on harder shots, would we really want to call that the ‘hot hand’? It is, I think, not what most fans mean when they say someone is hot. This gets into the philosophy of what it means to be ‘hot’, an issue that was raised in the original paper. I suppose there are different definitions of ‘hot hand’ and people can argue about which one is right.

• Hi Phil

There are two ways of addressing the issue of 5pp being a modest effect.

1. The 5 percentage point effect is an attenuated estimate due to measurement error, so it is consistent with a much higher probability point effect (Terry makes this point below)

2. Even if we assume that they are the same, 5 percentage points is not obviously “modest.” We need to define what modest means in terms of impact. First, 5pp is around half the difference between the median and best three point shooter. This is an *average* effect, which is a bit absurd if you think about it because you are including players that don’t shoot that often in the category. Of course, this doesn’t measure impact. To measure impact we would need to write down a model of basketball and calculate how many wins you gain over the course of the season if the hot hand is of that size (on average!), and you try to create open looks when it makes sense. I haven’t done the calculation, but my intuition is that the impact would be large.

You also make the point of the defense neutralizing the hot hand *in the data* by making shots more difficult. That’s fine, as long as Curry still has the hot hand in reality (probability), the defensive attention on him can open up opportunities for other players.

• Terry says:

Good points.

Tversky’s logic is quite simplistic and weak: the data looks random using a simplistic analysis, so you are wrong to see patterns in it.

But you need to carefully lay out a model with real numbers to see whether the effect is important. My guess is that a small hot hands effect (virtually undetectable by Tversky’s analysis), skillfully acted on, might make a difference of a couple points a game, which would substantially improve a team’s performance over a season.

Further, coaches and players aren’t looking only at the hit rate of a player to identiry a hot hand. General playing skill gives other clues, and the player’s perceptions of hotness can play a role too.

The more you think about it, the more Tversky’s argument depends on the clumsiness of his analysis. He can’t find it in his very crude statistical analysis, so it can’t possibly be true.

• Guy says:

“My guess is that a small hot hands effect….skillfully acted on, might make a difference of a couple points a game, which would substantially improve a team’s performance over a season.”

This is a very poor guess. Two points a game would indeed be a huge gain for an NBA team (approximately 6 wins per season). But there is no evidence I’m aware of that any team has — or could — find a way to take advantage of a HH effect to anything like this degree. The best attempt to measure the HH effect while correcting for difficulty of shot and defensive pressure found a 2% boost in shooting efficiency — not enough to be actionable (this doesn’t account for the attenuation problem, but in terms of practical effect that is immaterial).

And that’s why smart sports decision-makers will continue to treat the hot hand as a fallacy — because practically speaking, it is. We don’t need to explain this via any “cognitive bias,” because discarding the hot hand is absolutely the correct strategic approach in the vast majority of situations — certainly far more correct than the public’s intuitive belief in hot hand effects (which are vastly larger than the reality). That is, a model that says each player will have a constant talent level throughout today’s game — regardless of short-term changes in performance — is an *excellent* model for sports decision-makers to use, and far better than what many have used (even though we know it’s not quite correct). Maybe that will change at some point, when someone discovers a way to identify a HH effect in real time that is large enough to impact decision-making. But so far, no one has done that.

• Andrew says:

Guy:

Regarding the “2% boost in shooting efficiency” thing . . . That’s what I used to think too, before I thought about the attenuation bias; see for example the last paragraph of Terrys comment here. You say the attenuation factor is immaterial but it’s a factor of 5 (in that simple calculation), which is a big deal. The point is that players and coaches have more information available than just whether the last shot was a hit or a miss.

Regarding your last paragraph: I agree that the model of constant probability is not a bad default, and indeed I think it’s the model that just about everyone uses as a starting point. The hot hand is on top of that default model. It would be ridiculous to play the hot hand instead of using long-term frequencies; the point of using the hot hand is to gain an incremental advantage.

• Guy says:

Andrew: I understand that the attenuation issue means the HH effect could theoretically be large, but in practice this is immaterial unless/until there is some way (other than recent streakiness) to identify the “true” hot hand. You say players have other information, but AFAIK there is virtually no evidence that such other information predicts true HHs — despite the fact that coaches and players have every incentive to find it, and an intuitive inclination to believe the effect is real.

As for incremental advantage over the constant P model, yes I agree that’s the potential next step. It may be possible, in some situations in some sports, to improve slightly on that model. But I think more weight should be given to the fact no one has yet found a way to do this — which is strong evidence that it’s hard to do, and not likely to lead to a model that’s significantly better. I’m also not sure folks here — perhaps because I approach this from the perspective of sports analytics rather than economics/academia — fully appreciate what a huge advance the constant P model has been over a crude belief in a hot hand effect. Indeed, even today there are cases where sports decision-makers give far too much weight to short-term performance and would be better-served by sticking to a constant-P model (and I’m not aware of any situations where the reverse is true).

So I do worry when someone of your standing announces a “reversal” or says that the HH fallacy is itself a “fallacy.” Yes, the specific claim that P is always constant has been knocked down. It’s wrong. But the common understanding of the “hot hand” is also wrong. Gilovich’s poll found players believed a 50% shooter had a 61% chance of success after making a basket, and just 42% after a miss. This is wildly, fantastically wrong — much more wrong than the claim of a constant P. So while the hot hand effect may be “real,” the Hot Hand Myth also remains a myth. Both can be true….

• Andrew says:

Guy:

What makes you say, “no one has yet found a way to do this”? Isn’t it possible that basketball players and coaches make use of the hot hand all the time in useful ways?

• Guy says:

I suppose it’s possible that coaches successfully exploit hot hand effects. This would be a very hard thing to study, but perhaps not impossible. But I’ve never heard of a team using a decision rule built on the HH idea. For example, an NBA team might say “player X is generally our 3rd choice to take shots (in a lineup of 5 players), but when he has made 4 of his last 5 shots he moves up to 2nd choice.” Or an NFL coach might plan to pass about 65% of the time, but increase that to 75% any time his QB completes 4 of 5 passes. Or a baseball manager might decide “I will hit player Z 7th in the lineup, but will move him up to 5th any time his 10-game trailing batting average exceeds .400.” I have not heard of any sports decision-maker using such a strategy successfully, nor seen evidence that one has — perhaps others have? A successful coach would have incentive to keep this quiet, I suppose, but over many years and many sports — and with so many players and coaches switching teams — I think we would likely hear about this.

Of course coaches could have other, non-streak methods of identifying hot players. It would be hard to disprove that….

• Kyle C says:

Guy: Honestly — you’ve never heard a player say, “So and so was on fire, so we kept feeding him the ball”? Do you follow basketball?

• Guy says:

I have no doubt that teammates feed the ball to hot players, and likely with the approval of coaches at least some of the time (the less talented coaches, for the most part). But note that I said “successfully” exploit the HH. Where is the evidence that this happens? I think it could be studied, for example by looking at periods of time where a player’s usage rate increases substantially above his norm in the wake of a streak of successful shooting. Then compare his performance to that of the players who lost shots in that time interval. But it will be hard to control for all confounds, such as is the player getting more shots because he is matched with a weak defender? My guess is that a good study will find there’s little or no payoff to feeding the ball to ostensibly “hot” players (in part because of the attentuation issue Andrew frequently cites). But if someone wants to prove otherwise, I’d love to see the study.

In the meantime, we are left with evidence of some streakiness in repetitive physical tasks in experimental settings (or meaningless exhibitions). So on a scale of 0 (the HH is a “fallacy”) to 100 (the perceived HH is “real”), where are we now? I’d say the evidence suggests maybe 6 or 7. It’s likely there, but tiny compared to the perception. Andrew calls this a “reversal” on the order of the South switching allegiance from the Democrats to the Republicans (honestly, the Cavs losing LeBron would have been a more apt analogy!). I’d say it’s more analogous to the record Houston set last night for 3-point baskets made in a game (26, beating the old record of 25).

• Kyle C says:

Guy, you wrote: “I’ve never heard of a team using a decision rule built on the HH idea.” But OK, new points taken.

• Andrew says:

Guy:

1. You ask, “where’s the evidence that that this happens?” I’d flip this around and say, Why are you so sure that players and coaches are systematically doing the wrong thing? They have lots of experience and also motivation to win games.

2. You write, “we are left with evidence of some streakiness in repetitive physical tasks in experimental settings (or meaningless exhibitions).” That’s not true. Miller and Sanjurjo and others analyzed data from real games.

3. You write that it is “the less talented coaches” who would use the hot hand. Why would you say that? Red Auerbach and Phil Jackson are both on record as saying they think the hot hand is real.

• Interesting convo between Guy, Andrew and Terry, saw this a bit late.

I agree with many of the responses to Guy’s points.

Guy writes (here) :

The best attempt to measure the HH effect while correcting for difficulty of shot and defensive pressure found a 2% boost in shooting efficiency — not enough to be actionable (this doesn’t account for the attenuation problem, but in terms of practical effect that is immaterial).

I am not sure what the criteria for “best” is, but if you look at studies that completely control for difficulty of shot and defensive pressure (e.g. 3-point contest, shooting studies), the boosts are much larger than that. Regarding correcting for difficulty in game situations, the statistical controls are a strict subset of what you would want to control for. The missing controls relate to quality of defense (e.g. defender identity!), which would expect make the shot more difficult.

There is no way you can rule out the practical effect of the attenuation issue. The entire point is that it is researcher measurement error, not necessarily practitioner measurement error.

Guy writes:

And that’s why smart sports decision-makers will continue to treat the hot hand as a fallacy — because practically speaking, it is.

Phil Jackson is sports decision-maker, is he not smart? (I see Andrew makes this point as well)

Guy writes:

We don’t need to explain this via any “cognitive bias,” because discarding the hot hand is absolutely the correct strategic approach in the vast majority of situations — certainly far more correct than the public’s intuitive belief in hot hand effects (which are vastly larger than the reality).

Whether or not people over-estimate, depends how you ask the question, and to whom you ask the question. As you acknowledge above, there is attenuation. This implies that many intuitive notions of the hot hand are not testable with data.

Guy writes:

That is, a model that says each player will have a constant talent level throughout today’s game — regardless of short-term changes in performance — is an *excellent* model for sports decision-makers to use, and far better than what many have used (even though we know it’s not quite correct).

As written, this is a silly statement. I am sure it wasn’t intended to be taken literally. At the very least we all agree that there are situations in which sports decision-makers can ignore what the model says and pull out players that are visibly tired, emotional, etc (of course there may be models of the effect of minutes played that are better than intuition!). Beliefs regarding the hot hand are tricky to evaluate.

Guy writes, in response to Andrew (here)

You say players have other information, but AFAIK there is virtually no evidence that such other information predicts true HHs — despite the fact that coaches and players have every incentive to find it, and an intuitive inclination to believe the effect is real.

This is a very difficult thing to measure and test. Lack of formal evidence is not so informative when measurement is so difficult. The question is what to believe in the interim? I wouldn’t be so quick to dismiss practitioners entirely. As I see further below that you appear to have clarified what you mean: “Of course coaches could have other, non-streak methods of identifying hot players. It would be hard to disprove that….”

Guy writes:

So I do worry when someone of your standing announces a “reversal” or says that the HH fallacy is itself a “fallacy.” Yes, the specific claim that P is always constant has been knocked down. It’s wrong.

Actually what was knocked down is much more than that. There are results on effect size and predictability, see this paper and the references therein.

Guy write:

But the common understanding of the “hot hand” is also wrong. Gilovich’s poll found players believed a 50% shooter had a 61% chance of success after making a basket, and just 42% after a miss. This is wildly, fantastically wrong — much more wrong than the claim of a constant P. So while the hot hand effect may be “real,” the Hot Hand Myth also remains a myth. Both can be true….

Does Gilovich, Vallone and Tverksy (1985)‘s poll indicate that players are fantastically wrong? It is important to consider the nature of this evidence. There were five survey questions:

Most of the players (six out of eight) reported that they have on occasion felt that after having made a few shots in a row they “know” they are going to make their next shot-that they “almost can’t miss.”
Five players believed that a player “has a better chance of making a shot after having just made his last two or three shots than he does after having just missed his last two or three shots.” (Two players did not endorse this statement and one did not answer this question.)
Seven of the eight players reported that after having made a series of shots in a row, they “tend to take more shots than they normally would.”
All of the players believed that it is important’ ‘for the players on a team to pass the ball to someone who has just made several (two, three, or four) shots in a row.”
Five players and the coach also made numerical estimates. Five of these six respondents estimated their field goal percentage for shots taken after a hit (mean: 62.5%) to be higher than their percentage for shots taken after a miss (mean: 49.5%).

Nothing fantastically wrong about #1-#4, though we may want to inquire further on #4 to see what they mean by important. If we take #5 seriously, that is obviously wrong. But how seriously should we take this? Sampling error aside (n=6!), they were asking professional athletes for their numerical estimates of a conditional shooting percentage (not to mention the fact that this was in an era in which play-by-play data didn’t even exist). Do we really believe that these estimates relate at all to operational beliefs? Rao (2009) has a nice and on point discussion of why inferring player belief player beliefs from these kinds of questionnaires is problematic.

Guy writes (here):

I have no doubt that teammates feed the ball to hot players, and likely with the approval of coaches at least some of the time (the less talented coaches, for the most part)

“less talented coaches”? This sounds like a strong and authoritative statement. Is it based on evidence, or experience? Experience I presume. What is your experience? Can you name names? It would be an interesting test. Get a measure of coach talent correlate it with belief in hot hand. Do it with collegiate and NBA coaches. Collecting the data may be tricky.

Guy writes:

In the meantime, we are left with evidence of some streakiness in repetitive physical tasks in experimental settings (or meaningless exhibitions). So on a scale of 0 (the HH is a “fallacy”) to 100 (the perceived HH is “real”), where are we now? I’d say the evidence suggests maybe 6 or 7. It’s likely there, but tiny compared to the perception.

I am not sure what evidence this quote is based upon: “It’s likely there, but tiny compared to the perception,” so I will not address it directly, though below it is referred to.

By “meaningless”, I presume what is meant is “not a game.” Clearly shooters in the 3 point contest care about their performance, and they are engaging in the same act used in the game, so in that sense it is meaningful. While I cannot speak for Guy, it is interesting how the goals posts for evidence have shifted for many folks that insist on maintaining the fallacy view. Over the past decades Gilovich, Vallone and Tverksy (1985)’s study was viewed as having this over-whelming evidence that the hot hand was a myth because it had 3 different shooting studies, including a controlled study to address the problems with game data. This controlled study was analogous to the NBA’s 3 point shooting contest. Almost 20 years after the original study Koehler & Conley (2003) collected data from the three point shooting contest and replicated GVT’s results. Thaler wrote: “Without any defense or alternative shots, this [3 point contest] would seem to be an ideal situation in which to observe the hot hand.” Researchers clearly knew that they could not use game data to conclude that the hot hand is small, or that it is a myth. The fallacy view was built on these foundations.

There are no longer any foundations to justify the original conclusion, and the consensus that emerged. The evidence from these studies have been invalidated as one can see by reading this paper, and the references therein. Moreover, if you play by the same rules as previous studies, using the same data, there is significant evidence of a substantial hot hand effect. Not tiny by any metric.

Now, I am aware that Guy views this entire literature as irrelevant with regards to the main hot hand question (our work included). In his view the tests that they contain are asymmetric. In particular, he has stated elsewhere something along the lines of “if you don’t find shooting performance X in these tests, then X does not occur in games, but if do you find shooting performance X in these tests, that means nothing regarding whether X occurs in games.” This statement seems pretty strong. One can debate the external validity of any specific study, to be sure. But, if X represents simple shooting performance, we know this statement must be wrong. I believe it goes without saying that shooting better in practice does predict shooting better in games (on average!). Conversely, we know that players who shoot better in games also shoot better in the three point contest. Why should it be any different if the type of shooting performance we are referring to is the hot hand? This debate about external validity could go on, and probably isn’t worth it, as there is no way to resolve it.

The question that remains: what we should do now that the foundations of our beliefs have been stripped away? I would argue that our beliefs should be what our beliefs would have been had Gilovich, Vallone and Tverksy (1985)’s study never existed. What would we have believed now if we were to have looked at the evidence with fresh eyes, without the “hot hand fallacy” idea in our head? It is hard to say, but I’d bet we’d look at the evidence and combine it with our priors (“hot hand” for just about anyone normal). We’d immediately realize that game data can’t address the question for the well-known issues (control, attenuation from several sources). We might look for evidence in studies that eliminate shot selection and defensive pressure. I find it difficult to imagine that with the natural prior and this evidence that we would conclude the hot hand is a myth, and that players and coaches are in error. That would be very silly. In fact, we might feel more justified with our prior beliefs. On the other hand, we may recognize that in general (i.e. outside of basketball), people tend to confuse performance with competence, and often over-react to recent information. We may recognize that these tendencies are a recipe for error, which should inform our beliefs, and make us more humble.

• Terry says:

But the new looks at the ‘hot hand’ that I remember seeing were consistent with an effect that I would consider small, e.g. a free throw percentage 3 to 5 points higher following a hit than following a miss

Whether 3-5% is small or not depends on what you think is happening.

If you think that all players flip from hot to cold hands every few shots, then yes, these are small numbers.

But, if you think that a small fraction of players (say 10%) are having a hot hand at any given time, then those numbers are pretty large.

Also, if you think a hot hand lasts all day, then these numbers are also very large. This is because players make a lot of free throws even when cold, so the probability of making a free throw given you made the last basket is a mixture of the hot and cold probabilities. (To put this another way, the fact that you made the last free throw is a weak indicator of whether you are hot or cold at that moment.)

To see this, say a player is equally likely to be hot or cold on a given day, and makes free throws 60% of the time when hot and 40% when cold. Then, the probability of making a free throw given you made the last free throw is .60(.60) + .40(.40) = .52, while the probability given you missed the last free throw is .40(.60) + .60(.40) = .48. Thus, while players are much better when hot, the probability of making the next free throw given you made the last free throw is little changed because making the last free throw is a weak indicator of whether you are hot.

• Terry says:

Thinking about it this way makes it sound like Tversky is straining to find irrationality. He wants to equate the hot hands fallacy with the gambler’s fallacy that thinks a flipped coin can be hot. Maybe a few basketball fans think this way sometimes, but I don’t think many do. (It sounds too much like magic.) I think most basketball fans think more along the lines of “Shaq is really hot today” or this week or this season. That sounds much more consistent with the numbers cited.

I can’t believe I’m the only one who has thought of this. The papers probably discuss this in detail.

• Andrew says:

Terry:

Yes, that 60/40, 52/48 example is exactly how I explain this to students. In econometrics it’s called the attenuation of the regression coefficient when the predictor variable is measured with error. It’s a well known problem but the practical applications are not well understood, as Eric Loken and I discuss in the context of a different example here.

• Terry says:

Makes sense. Another way to think of it is that hotness is an unobserved state variable and recent hits are a poor proxy of it, so you can’t just apply the coefficient on the proxy to the state variable.

• Terry, Andrew

Yeah that simple example is nice and is perfect for seminar and teaching. The proxy-thinking was how we initially discussed attenuation bias due to measurement error in our 2014 “Cold Shower” paper by using a Hidden Markov Model and noting that streaks are a poor proxy for the state (now here). While discrete states may be unrealistic, our motivation for approaching it this way was to follow the approach of an influential 2009 review paper by Oskarrson, Van Boven, McClelland, and Hastie, in which they use Hidden Markov Models to think about biases in judgment of random sequences.

In the second version of your “Surprised Paper” we elaborate upon it in the context of under-adjusting for bias ( now here, in the Appendix).

Interestingly, this attenuation can also explain why GVT’s betting task reveals more predictability than they thought (now here)

The first paper to discuss measurement error in the context of the hot hand literature was Daniel Stone’s paper . His example uses an AR process for a player’s shooting ability and estimates the autocorrelation in ability with autocorrelation in outcomes

6. Anoneuoid says:

I can see that “constant probability for any given player during a one-year period” is a better model than “p varies wildly from 0.2 to 0.8 for any player during the game.”

In general it should increase during warm up, then decrease when the player gets tired (if they get to that point). But I mean these are statistical models, the point is to usefully quantify uncertainty. It isn’t supposed to be a physical model of reality.

7. Terry says:

Andrew:

There is something wrong with how the website handles comments. When using the Chrome browser, it can take days before you see new comments. (Reloading the page does not fix the problem.) I am not the only one who has had this problem.

Firefox seems to not be affected. Safari also seems to be ok. I don’t know about Internet Explorer because I live in the Twenty First century.

• Martha (Smith) says:

A few minute ago, I wrote a comment that appeared right away, A couple of minutes later, I tried writing another comment, which did not appear. I tried posting it again, assuming I had done something wrong. But the second try also didn’t appear. I wonder if this one will.

• Bob says:

I have similar problems using Firefox. For example, I will open andrewgelman.com and not see the most recent post.

Bob

8. Terry says:

why shouldn’t “p” vary over shorter time scales too? In what sense is “constant probability” a sensible null model at all?

It was never a sensible null even over short time scales. It couldn’t possibly be true.

If only one player in the entire league is injured but doesn’t immediately realize it, p will not be constant for all players over the entire period. The only question that remains is how large the variation is. It may be so small we can ignore it for some purposes, but we know it is not zero. And there are a million other reasons players’ performance will vary over time: sickness, amount of sleep, personal setbacks or triumphs, hangovers, blisters, coaching/learning/experience, incomplete recovery from injuries, etc.

Tiger Woods’s performance over the past few years is enough to disprove the null all by itself.

9. jim says:

I bought the fallacious fallacy arg at fist too. But if you’ve ever had a hot hand shooting hoops (and I suppose also at the plate or on the links) you know its real. There are times when your locked in. It’s like you’re on autopilot. Everything you do works. You see everything in slow motion. You anticipate everything before it happens. It’s really a beautiful feeling.

10. Hey Andrew

You didn’t drink as much cool-aid are your post suggests; remember this quote?

A better framing is to start from the position that the effects are certainly not zero. Athletes are not machines, and anything that can affect their expectations (for example, success in previous tries) should affect their performance—one way or another. To put it another way, there is little debate that a “cold hand” can exist: It is no surprise that a player will be less successful if he or she is sick, or injured, or playing against excellent defense. Occasional periods of poor performance will manifest themselves as a small positive time correlation when data are aggregated.

However, the effects that have been seen are small, on the order of 2 percentage points (for example, the probability of a success in some sports task might be 45% if a player is “hot” and 43% otherwise). These small average differences exist amid a huge amount of variation, not just among players but also across different scenarios for a particular player. Sometimes if you succeed, you will stay relaxed and focused; other times you can succeed and get overconfident.

• ugh, typo: “You didn’t drink as much cool-aid *as* your post suggests”

• Andrew says:

Josh:

In that quote, I made the same mistake that Phil did in his comments above: I’d considered Pr(success | previous shot was a success) – Pr(success | previous shot was a miss) to be an estimate of Pr(success | hot) – Pr(success | cold), without recognizing the huge bias toward zero of that estimate.

• Jonathan (another one) says:

I think this comment gets to the heart of the matter, and inclines me (paradoxically) towards Phil’s comment above. The layman’s estimate of Pr(success|hot) is really, really close to 1. That’s nothing more than the layman’s *definition* of hot, as seen in jim’s comment just above. Similarly, Pr(success|cold) is essentially 0. So the probability you (Andrew) are trying to estimate is, to most of the world, close to 1. The downward bias of this estimate when one substitutes the real-world conditions of “making a shot” for the metaphysical condition “hot” follows immediately. But I think, like Phil, that the original Gilovich point was that it was the metaphysical condition that was a cognitive error. It was unfortunate that it turned into a statistical point, because of the bias later uncovered by Miller et al. You (Andrew and other statisticians) clearly find the metaphysical point so patently absurd that you turn it into a statistics point which finds, unsurprisingly, that real-world shooting probabilities move around, sometimes substantially. But that can happen even when the metaphysical existence of a hot hand is completely false, and was ever so, no matter how psychologically satisfying it is to jim and to spectators who channel him psychologically.

• Jonathan (another one):

You write: “But I think, like Phil, that the original Gilovich point was that it was the metaphysical condition that was a cognitive error.”

I often see (and hear) people assert that the main point of the original hot hand paper by Gilovich, Vallone, and Tversky (1985) was that people believe too much in the hot hand, rather than that the hot hand doesn’t exist. This is not correct.

Primary sources can help answer this question, and I happen to have some quotes lying around:

First the title of the original hot hand paper:

The Hot Hand in Basketball: On the Misperception of Random Sequences

That seems clear enough, but let’s take a look at their opening paragraph:

In describing an outstanding performance by a basketball player, reporters and spectators commonly use expressions such as “Larry Bird has the hot hand” or “Andrew Toney is a streak shooter.” These phrases express a belief that the performance of a player during a particular period is significantly better than expected on the basis of the player’s overall record. The belief in “the hot hand” and in “streak shooting” is shared by basketball players, coaches, and fans, and it appears to affect the selection of plays and the choice of players. In this paper we investigate the origin and the validity of these beliefs.

They are clearly talking about the existence of the hot hand, let’s see what they conclude:

This account explains both the formation and maintenance of the erroneous belief in the hot hand: If random sequences are perceived as streak shooting, then no amount of exposure to such sequences will convince the player, the coach, or the fan that the sequences are in fact random. The more basketball one watches and plays, the more opportunities one has to observe what appears to be streak shooting. In order to appreciate the sequential properties of basketball data, one has to realize that coin tossing produces just as many runs. If people’s perceptions of coin tossing are biased, it should not be surprising that they perceive sequential dependencies in basketball when none exist.

And they make their main point stronger here:

Neither coin tossing nor basketball are inherently random, once all the relevant parameters are specified. In the absence of this information, however, both processes may be adequately described by a simple binomial model. A major difference between the two processes is that it is hard to think of a credible mechanism that would create a correlation between successive coin tosses, but there are many factors (e.g., confidence, fatigue) that could produce positive dependence in basketball. The availability of plausible explanations may contribute to the erroneous belief that the probability of a hit is greater following a hit than following a miss.

In their final paragraph they highlight their main contribution:

The present data demonstrate the operation of a powerful and widely shared cognitive illusion… If the present results are surprising, it is because of the robustness with which the erroneous belief in the “hot hand” is held by experienced and knowledgeable observers. This belief is particularly intriguing because it has consequences for the conduct of the game. Passing the ball to the player who is “hot” is a common strategy endorsed by basketball players. It is also anticipated by the opposing team who can concentrate on guarding the “hot” player. If another player, who is less “hot” on that particular day, is equally skilled, then the less guarded player would have a better chance of scoring. Thus the belief in the “hot hand” is not just erroneous, it could also be costly

The key contribution here is not that people misperceive random sequences. That result was well known in casino gambling, discussed since the days of Laplace, and explored more carefully under controlled conditions by psychologists in the second half of the 20th century. No, the contribution here was that expert practitioners who were motivated to do well were getting it wrong. What made the bias so powerful was that expert practitioners famously rejected the scientific results; they were never convinced.

Tversky highlights the stubbornness of experts below (quote via Gilovich 2002):

I’ve been in thousand arguments over this topic, won them all, but convinced no one

Kahneman highlights the consensus that emerged (Thinking Fast & Slow, 2011):

The hot hand is entirely in the eye of the beholders, who are consistently too quick to perceive order and causality in randomness. The hot hand is a massive and widespread cognitive illusion.

Thaler & Sunstein show how researchers have been beguiled by the hot hand (Nudge, 2008):

It turns out that the “hot hand” is just a myth…. Many researchers have been so sure that the original Gilovich results were wrong that they set out to find the hot hand. To date, no one has found it

• Jonathan (another one) says:

Thanks for the extensive quoting, Joshua, but I am still not quite convinced. Several of your quotes apply equally to metaphysical certitude as they do to misperception of random sequences. And many of the quotes are really abuot the misperception of randomness. (Certainly the title, definitely Kahneman, Thaler and Sunstein as well.) You’re certainly correct that the misperception of random sequences in dice and roulette wheels and decks of cards was a well-understood phenomenon at the time Gilovich et al. wrote. But simply from the title of the work you can tell that Gilovich is expanding this notion. People know that random sequences don’t have memory, but they absolutely know that people do. But what Gilovich points out (still correctly, in my opinion) is that when sequences of made and missed shots are reduced to ones and zeros, the very powerful illusion can be at least partly dissipated, albeit not enough to displaced powerfully held notions of human agency. That said, I want to acknowledge your tremendous two-fold advance: (a) their statistical proof was wanting; and (b) better statistical analysis shows that there is in fact probably a hot hand. Against this strong form of the hypothesis you have undoubtedly prevailed. But let me propose a weak form Turing Test form: give people series of 1’s and 0’s generated by a random number generator with some underlying p. Now give them some actual set of basketball shots derived from actual data any way you want (same players, same team, every fifth shot of some player, shots taken closest to 9 PM across the league on some set of days, ehatever) with the same observed p. Will people be able to reliably discern, even armed with all of your results, the random sequences from the NBA sequences? I doubt it. And I further would bet that NBA players and coaches couldn’t do any better…. or worse. I think of your results as analogous to those that showed that early RNG’s weren’t really random in that there were high dimensional tests that showed they weren’t, but that for fooling humans, they were perfectly good at that task.

• Jonathan (another one) says:

Arggh… I just had a long comment on this swallowed up, I think. I’ll wait a couple of days to see if it exists and, if not, try to recreate it.)

• Jonathan (another one) says:

And it appears! The hot hand in commenting!

• Indeed comments here are buggy. When I comment and submit, the comment disappears, but if I refresh a minute later, it re-appears. I think the blog has a latency issue.

Regarding the quotes, they were partially for you, and partially just to post them, because many folks have ideas about what the paper claims, but don’t check.

Regarding your notion of the hot hand fallacy as “metaphysical certitude,” I am not sure if you are introducing a new conception of the hot hand, or referring to something specific in the original hot hand paper. I see too interpretations based on the various ways you have explained what you mean: (i) People are too confident when they believe that they have detected the hot hand (even if they are able to detect it to some degree), (ii) People believe that Pr(success|hot) is really, really close to 1.

Regarding (i), the original hot hand paper, GVT (1985), has no measure of confidence, so it cannot address that point. Regarding (ii), the GVT does not measure people’s beliefs in Pr(success|hot) so it cannot address that question either. I doubt that practitioners (players and coaches) actually believe that Pr(success|hot) is close to 1. GVT do measure fan beliefs in Pr(success|success) and Pr(success|failure) and if fans interpret Pr(success|success) as Pr(success|hot), their beliefs are reasonable and not close to 1.

Regarding your challenge of discerning random data from real shooting data. I believe people would successfully distinguish between the two if you juxtaposed random data with the data of a truly streaky shooter. GVT do not address this issue directly, though in their discussion they discuss their lab results on perception of randomness/streakiness in sequences and find that people have a bias towards expecting alternation in random sequences (a known result at the time), so sequences with the expected rate of alternation (.5) get labeled as “streak shooting.”

Anyway, the questions relating to the hot hand that you are hinting at—calibration, detection, confidence, reaction—are interesting empirical questions that should be addressed. Basketball may not be the best domain in which to study this. In basketball I have no reason to expect people to be super great in terms of calibration, detection, confidence, and their response to it, given that the underlying reality of the data generating process is so difficult to measure.

• Terry says:

“The layman’s estimate of Pr(success|hot) is really, really close to 1.”

I doubt this true of players and coaches who believe in hot hands. Basketball games are often very close and won by accumulating many little advantages, so you only need the hot hand phenomenon to be a little true to justify acting on it.

• Jonathan (another one) says:

Terry, and Joshua: For my example of metaphysical certitude, I refer to the phenomenon known as The Zone. See Jim’s comment above. “Everything works.” I grant coaches need to be smarter than that.

• Andrew:

Oh, yeah, the attenuation bias. What I was referring to in my comment was that in this blog post you wrote about revisiting your thinking. In particular:

Also I was thinking a bit more about the hot hand, in particular a flaw in the underlying logic of Gilovich etc… In what sense is “constant probability” a sensible null model at all?

This suggests that at one point you thought the null model was a sensible model. I am not sure you ever drank this much cool-aid.

11. Let’s not forget that arguing about the variation in the value of p during a game is arguing about the color of magic fairy wings ****there is no p humans are not random number generators****

• Jonathan (another one) says:

Sure, but as I say above, the real question is whether shots taken by basketball players are meaningfully distinguishable from RNGs. Gilovich chose one dimension (last shot memory) and tried to show that the effect of last shot memory was nil. Miller et al showed both that the Gilovich test was flawed and that first order dependence seemed to actually exist, or at least not be rejected. But people thought flawed random number generators were fine for a long time before tests in other dimensions showed they weren’t. I take the Gilovich result (as Miller doesn’t) to say that something you thought was*obviously* true turns out to require a very large degree of subtle analysis to demonstrate in *any* degree.

• Consider the following sequence of Hit or Miss

HHHMHMHHHMMMMH

Can you *ever* distinguish between the following two RNG models:

1) This is the output of a binomial RNG with p = 0.5

2) This is the output of a binomial RNG with time varying p such that p was equal to 11101011100001

Can you distinguish between these two models for N = 10k shots (and an appropriate choice of constant p = whatever)?

Fundamentally the question about basketball players shouldn’t be “what is their time varying p” because there is not a time varying p, all there is is a human animal playing basketball. But as soon as you allow time varying p, if you don’t somehow provide regularization of what sequences are allowable for p, then it’s always possible to give a sequence of 0 and 1 values for p that precisely replicates the exact output, and to say “inevitably, this was the only sequence that could have possibly occurred”. Which since it really is true that no other sequence could have occurred during that time period… is post-facto probably the “correct” answer.

• Interestingly, P(data|Model1) = 1/2^14 whereas p(data|Model2)=1 so unless the probability of Model2 declines exponentially fast with length, you will certainly conclude that the model where nothing but the given sequence could have been observed is the correct model. In some sense, when physics is what’s going on, and your state of knowledge is to look in hindsight, that’s absolutely correct.

• Jonathan (another one) says:

That’s exactly the point Richard Royall memorably (at least to me) makes in Statistical Evidence where he compares the likelihood, on observing the Queen of Spades drawn from a deck the dueling hypotheses: this is a normal deck vs. this is a trick deck consisting entirely of Queens of Spades. People who think (in the absence of a prior over the two hypotheses) that one doesn’t have 52 times the likelihood of the other just doesn’t understand likelihood.

• Phil says:

I probably have the story slightly wrong, but as I remember it: when someone would talk about some amazing coincidence they had experienced, Feynman would say “On the way over here I saw a car with license play CHX 189. Can you believe it? Almost 200 million possibilities and I happened to see that one!

• Charles Twardy says:

“unless the probability of Model2 declines exponentially fast with length”

It does. About 1/2^N, plus or minus some coding tricks.

• Just linking back to the above. Given a long enough sequence, I’d bet that people can distinguish between (i) real data from a truly streaky shooter with an overall 50% hit rate, and (ii) randomly generated data from i.i.d p=.5 Bernoulli trials. Of course this is on average—sometimes the randomly generated sequence will be identical, or more streaky.

Also “Jonathan (another one)” writes:

Gilovich chose one dimension (last shot memory) and tried to show that the effect of last shot memory was nil. Miller et al showed both that the Gilovich test was flawed and that first order dependence seemed to actually exist, or at least not be rejected

Minor correction: they also tested how shooting percentage depended on streaks of recent success, which had the same flaw.

• Jonathan (another one) says:

“Given a long enough sequence…” Is this a sequence longer than, say, four shots taken and made in a single quarter? Because that’s all required for the psychological feeling to take hold, but ought to happen to a p=0.5 shooter who shoots a lot a few times every month. I’m saying take sequences no longer than it takes people to infer streakiness without any statistical background in the concept. And what I’m betting is that given streaks of these modest lengths, no one can statistically infer streakiness.

And your correction of my prior post is dutifully noted and accepted…

• Hi Jonathan (another one)

Thanks. Just to be clear my “Given a long enough sequence…” discrimination task was tailored as a response to this quote from your comment above (please note that my response just below your comment has additional detail):

Now give them some actual set of basketball shots derived from actual data any way you want (same players, same team, every fifth shot of some player, shots taken closest to 9 PM across the league on some set of days, ehatever) with the same observed p. Will people be able to reliably discern, even armed with all of your results, the random sequences from the NBA sequences? I doubt it. And I further would bet that NBA players and coaches couldn’t do any better…. or worse. I think of your results as analogous to those that showed that early RNG’s weren’t really random in that there were high dimensional tests that showed they weren’t, but that for fooling humans, they were perfectly good at that task.

If you prefer to ask if people can reliably discern between a single 4-shot sequence generated by a shooter with a hot hand and one randomly generated, I agree that human observers would not be able to reliably tell the difference even if there is a true difference in the underlying process. This would not be a failure of the human; the task itself is impossible, there simply is not enough information to be had in a single sequence of 4 shots.

• Daniel

on the magic fairy wings: all models are wrong, some are useless.

But we all knew this would end in a debate about how many angels can dance on the head of a pin. Meanwhile, 2019 approaches…

• Charles Twardy says:

You start taking shots. Periodically, I give you shots of alcohol. Your ability to make the shot will decrease in a fairly predictable, but noisy fashion. Until it stabilizes at 0, I’m integrating over more variables than I can name, let alone measure. Random seems right.

Oh sure, it might be deterministic underneath. But that’s even more true of coins, and we regard them as pretty good RNGs.

• You’re using “random” to mean “uncertain” meaning your knowledge doesn’t let you predict what will happen, which is a Bayesian concept. But that’s not what random in terms of frequency means, and it’s random in terms of frequency which is what “tests” check for “random binomial with p = p_0” in terms of frequency means long sequences from this “system” pass a series of stringent computational tests for randomness. Mathematical sequences aren’t just random because you don’t know what they’re going to do, they’re random because they pass all the tests you can think of for idealized random sequences.

That’s simply *not* going to be true for basketball shots.

• Matt says:

Just like flipping a coin, basketball shots can be thought of from a frequentist perspective: the probability of making the shot is the long run frequency of all shots taken that have the same characteristic that we can observe about the shot at hand (player identity, previous shot outcome, distance from hoop, etc). This logic can equally be applied to elections: the probability of trump winning is the long-run frequency of elections with the same characteristics as the one we currently observe with trump. Same logic goes through with a coin flip. The underlying process is deterministic, but most things can be framed from a frequentist perspective, no? You can disagree with this philosophically but I think this is a useful way to think about things. It seems like you have an axe to grind with frequentists even though the point you make is of zero practical consequence.

• Andrew says:

Matt:

Your description of the frequentist model of probability is a good one because it makes clear that this model is based on strong assumptions, as is the Bayesian model. The frequentist model depends on the definition of the reference set, which is mathematically very similar to the Bayesian prior distribution; in either case there is some assumption about the set of problems being studied.

• Chris Wilson says:

Andrew, to be clear, by the frequentist “reference set” in this case you mean ‘set of all shots taken with the exact same characteristics as the shot at hand’?

• Andrew says:

Chris:

Yes, I was referring to Matt’s phrases, “all shots taken that have the same characteristic that we can observe about the shot at hand,” and “elections with the same characteristics as the one we currently observe with trump.”

See here, here, and here for more on this general point.

• Chris Wilson says:

Thanks Andrew! I think I’m still trying to wrap my head around this statement from your second link: “What I mean is, the Bayesian prior distribution corresponds to the frequentist sample space: it’s the set of problems for which a particular statistical model or procedure will be applied.”

So you conceive of the prior as a distribution over a population of effects, and in particular not restricted to the case at hand? I still find it easier to think of the full Bayes model as a double use of probability, first to model the sampling procedure, and second to quantify uncertainty in parameters.

To me it *seems* more useful to think of that ‘second’ use as about a distribution over plausibility for parameter values *in* the problem at hand. That kind of reasoning comes up all the time for me in trying to scientifically model things. By contrast, thinking about the set of problems for which the model might be applied seems more obscure to me.

Not trying to nit-pick, just curious where you think my view might be limited or off-base here.

• Matt says:

I think this is very flexible though, as it will vary with the information set of the observer. So as I accumulate more information the probability tends towards 1. So initially I have p = .6 when all I know is the shot is from a certain distance. Then suppose I know who shot it (curry) so p moves to .7, and so on. It’s nice because obviously there are many things that really don’t seem to fit the frequentist perspective, but this allows for it. You use your current state of information to refine or expand the reference set of events from which you define probability by the long run frequencies.

• Andrew says:

Matt:

Yes, we also discuss this connection between frequencies and conditional probability in chapter 1 of BDA.

• Keith O’Rourke says:

Chris:

> By contrast, thinking about the set of problems for which the model might be applied seems more obscure to me.
Opposite for me as only repeated use matters. As David Spiegelhalter once put it, when predicting a football championship game, one is always at risk of losing to the octopus, tortoise, etc.

That is, I believe we want to think in (inexhaustible) collectives and so a distribution over plausibility for parameter values *in* the problem at hand is made more concrete as a distribution of parameter values in a set a problems where this Bayesian analysis _would_ be used. (The _would be used_ makes it conditional on the information set of the observer.)

I think this discussion has arisen many times before on this blog and never seems to make much impact.

• Chris Wilson says:

Hi Keith,
Thanks for engaging! I agree this discussion has arisen before, but I never felt any sense of intellectual “closure” on the issue :) I will have to ponder some more. The subjunctive still bothers me somewhat. I want to do the “best” analysis, in a decision-theoretic sense, for the set of data and models I have at hand. So I’m not sure about the *only care about repeated use* part.
I agree that for scientific *research programs* something like their behavior over time is all that really matters- so I can imagine evaluating whole sets of procedures, including modeling choices, based on their repeated-use properties. I am certainly not one of those people who is philosophically opposed to small-f frequentism in this sense.
Another question. Practically speaking, how does imagining a ‘distribution of parameter values in a set a problems where this Bayesian analysis _would_ be used’ help with prior specification? To me, the idea of maximizing the information leveraged *for this one analysis* helps tremendously in thinking about the goals for prior modeling.

• Chris Wilson says:

Keith, I think I see a little more what you (and Andrew) mean here. But, how broadly does one define ‘the set of problems where this Bayesian analysis would be used’? I mean, I might apply a structurally identical linear regression model in a whole bunch of different instances, but in each case I would write out priors over, say, scale parameters based on plausibility *in that particular application*, or arbitrary choices like how I re-scaled data.

For example, my priors might differ if I am setting a hyper-prior on a model to partially pool closely related cultivars of Maize, versus interspecific comparisons of, say, nitrogen response curves. So, is ‘the set of problems where this Bayesian analysis would be used’ as specific as “closely related maize cultivars”, or is it everything where the same linear model might written out for? How does thinking about this Bayesian reference set help me clarify my prior modeling choices compared to the more intuitive notion of plausibility?

• Keith O’Rourke says:

Chris:

> but I never felt any sense of intellectual “closure” on the issue :)
That was my sense – maybe I should do a post on it (maybe check if Andrew already has one in the cue – I can only see the titles though).

> How does thinking about this Bayesian reference set help me clarify my prior modelling choices compared to the more intuitive notion of plausibility?
For one, it provides a sense of error rates in settings where the reference set is appropriate.

> how broadly does one define ‘the set of problems where this Bayesian analysis would be used’?
I referred to that as the Goldilocks challenge/opportunity – intuitively treat same as same, different as different and related as exchangeable.

And no reason not to be sharky about it – Shared kernel Bayesian screening. https://arxiv.org/abs/1311.0307

• Chris, I’m with you in that I don’t see how a frequency interpretation of a “population of effects” over which this kind of analysis could be used is so helpful, but here’s where I think it could meld with a more “probability/credibility” interpretation:

In social science especially, we often have *groups* of people over which we are interested in aggregating, the point here is that there is a *discrete set* of values for the group variable. In this context, we’d like to specify the plausible value for a given parameter across *all groups mixed together*. In this context, we can think of a “meta prior” where we aren’t yet in the state of information where we know which group is selected, and so we’d like to model the plausibility in this context, which requires thinking about an explicit mixture of sub-populations, and we may have some state of information about the relative frequency of the groups in the population (say from background data in the census, or health status surveys or whatever). Using this background knowledge about the population frequency mixture when developing the “state of knowledge” for the prior can be very helpful.

• Keith O’Rourke says:

Daniel:

I think you are trying to discern a population of existents (that is needed for this “repeated use model” to refer to) whereas by (inexhaustible) collectives I simply mean possibilities.

• Keith, right, I don’t really buy into that. Anything where we merely imagine possibilities doesn’t do it for me. I see Bayesian probability distributions as expressions of specific states of information. It’s fine to create some imagined state of information out of curiosity for what calculation that would lead to, but I don’t see why it should be used in any actual analysis about real world things. When doing a real world analysis I should use information that best approximates my knowledge about the real world. Imagining the frequency in the future that someone would reuse my analysis throughout the rest of human existence seems to lead to zero actual information about the average size of Maize kernels or the thermal efficiency of a particular model of internal combustion engine, or the actual range of annual income of black people in Arkansas this year….

• Keith O’Rourke says:

Daniel:

> When doing a real world analysis I should use information that best approximates my knowledge about the real world.

Using information that best approximates my knowledge about the real world I can work out (or simulate) the frequency in the future that [for] someone [that] would reuse my analysis throughout the rest of human existence would be wrong…

OK, let disagree until the new year, but thanks for anticipating an ugly perspective that might kill my beautiful theory ;-)

• Matt, I have no problem with you modeling basketball shots by person X as having a particular long term frequency distribution. What I have a problem with is specifying that it’s a binomial with p=x for any x based on *zero actual information*.

This model says certain kinds of sequences are probable and other kinds of sequences are improbable. For example sequences of 200 shots that have 107 hits but 36 of the hits are in a row are entirely ruled out by the binomial p=0.5 model

A *frequency assumption* is a *strong assumption* that a Bayesian “probability distribution” does not make. The Bayes assumption is “for all you know these sequences are reasonable” the Frequentist assumption is “the future of the actual world will behave in a way that verifiably has certain mathematical properties”

My complaint is that Frequentist “likelihoods” *specify physical behavior* that simply *isn’t true*

12. Perhaps there’s an existing term for this, or someone can come up with one. …

One thing that contributed to the survival of the hot hand fallacy fallacy — in addition to the cognitive biases and inadequate modeling — was thinking that any applicable statistical test sufficed for assessment, unaware that there might be statistically more powerful tests that could be performed on the same data.

Gilovich, Vallone & Tversky relied on serial correlation, while in the study of the outbreak of war as a Poisson process, Lewis Fry Richardson and subsequent international relations researchers relied on chi-square analysis of annual frequencies.

13. Jellyjuke says:

I wrote an illustration that I think give an intuitive visualization of why smaller sample sizes tend to consistently skew results one direction: http://www.jellyjuke.com/a-conceptual-explanation-of-the-hot-hand-fallacy.html

14. Jordan Anaya says:

This post is popular on NBA reddit:
https://i.imgur.com/M6ASWml.png

Obviously I don’t think it proves the hot hand exists, but it did get me thinking that when analyzing the data certain shots might be affected while others won’t be. For example, dunks and layups probably won’t be affected by whether you previously made a shot, while 3-pointers might be more affected.