https://i.imgur.com/M6ASWml.png

Obviously I don’t think it proves the hot hand exists, but it did get me thinking that when analyzing the data certain shots might be affected while others won’t be. For example, dunks and layups probably won’t be affected by whether you previously made a shot, while 3-pointers might be more affected.

]]>Interesting convo between Guy, Andrew and Terry, saw this a bit late.

I agree with many of the responses to Guy’s points.

I’ll add some details here to a few of Guy’s comments.

Guy writes (here) :

The best attempt to measure the HH effect while correcting for difficulty of shot and defensive pressure found a 2% boost in shooting efficiency — not enough to be actionable (this doesn’t account for the attenuation problem, but in terms of practical effect that is immaterial).

I am not sure what the criteria for “best” is, but if you look at studies that completely control for difficulty of shot and defensive pressure (e.g. 3-point contest, shooting studies), the boosts are much larger than that. Regarding correcting for difficulty in game situations, the statistical controls are a strict subset of what you would want to control for. The missing controls relate to quality of defense (e.g. defender identity!), which would expect make the shot more difficult.

There is no way you can rule out the practical effect of the attenuation issue. The entire point is that it is researcher measurement error, not necessarily practitioner measurement error.

Guy writes:

And that’s why smart sports decision-makers will continue to treat the hot hand as a fallacy — because practically speaking, it is.

Phil Jackson is sports decision-maker, is he not smart? (I see Andrew makes this point as well)

Guy writes:

We don’t need to explain this via any “cognitive bias,” because discarding the hot hand is absolutely the correct strategic approach in the vast majority of situations — certainly far more correct than the public’s intuitive belief in hot hand effects (which are vastly larger than the reality).

Whether or not people over-estimate, depends how you ask the question, and to whom you ask the question. As you acknowledge above, there is attenuation. This implies that many intuitive notions of the hot hand are not testable with data.

Guy writes:

That is, a model that says each player will have a constant talent level throughout today’s game — regardless of short-term changes in performance — is an *excellent* model for sports decision-makers to use, and far better than what many have used (even though we know it’s not quite correct).

As written, this is a silly statement. I am sure it wasn’t intended to be taken literally. At the very least we all agree that there are situations in which sports decision-makers can ignore what the model says and pull out players that are visibly tired, emotional, etc (of course there may be models of the effect of minutes played that are better than intuition!). Beliefs regarding the hot hand are tricky to evaluate.

Guy writes, in response to Andrew (here)

You say players have other information, but AFAIK there is virtually no evidence that such other information predicts true HHs — despite the fact that coaches and players have every incentive to find it, and an intuitive inclination to believe the effect is real.

This is a very difficult thing to measure and test. Lack of formal evidence is not so informative when measurement is so difficult. The question is what to believe in the interim? I wouldn’t be so quick to dismiss practitioners entirely. As I see further below that you appear to have clarified what you mean: “Of course coaches could have other, non-streak methods of identifying hot players. It would be hard to disprove that….”

Guy writes:

So I do worry when someone of your standing announces a “reversal” or says that the HH fallacy is itself a “fallacy.” Yes, the specific claim that P is always constant has been knocked down. It’s wrong.

Actually what was knocked down is much more than that. There are results on effect size and predictability, see this paper and the references therein.

Guy write:

But the common understanding of the “hot hand” is also wrong. Gilovich’s poll found players believed a 50% shooter had a 61% chance of success after making a basket, and just 42% after a miss. This is wildly, fantastically wrong — much more wrong than the claim of a constant P. So while the hot hand effect may be “real,” the Hot Hand Myth also remains a myth. Both can be true….

Does Gilovich, Vallone and Tverksy (1985)‘s poll indicate that players are fantastically wrong? It is important to consider the nature of this evidence. There were five survey questions:

Most of the players (six out of eight) reported that they have on occasion felt that after having made a few shots in a row they “know” they are going to make their next shot-that they “almost can’t miss.”

Five players believed that a player “has a better chance of making a shot after having just made his last two or three shots than he does after having just missed his last two or three shots.” (Two players did not endorse this statement and one did not answer this question.)

Seven of the eight players reported that after having made a series of shots in a row, they “tend to take more shots than they normally would.”

All of the players believed that it is important’ ‘for the players on a team to pass the ball to someone who has just made several (two, three, or four) shots in a row.”

Five players and the coach also made numerical estimates. Five of these six respondents estimated their field goal percentage for shots taken after a hit (mean: 62.5%) to be higher than their percentage for shots taken after a miss (mean: 49.5%).

Nothing fantastically wrong about #1-#4, though we may want to inquire further on #4 to see what they mean by important. If we take #5 seriously, that is obviously wrong. But how seriously should we take this? Sampling error aside (n=6!), they were asking professional athletes for their numerical estimates of a conditional shooting percentage (not to mention the fact that this was in an era in which play-by-play data didn’t even exist). Do we really believe that these estimates relate at all to operational beliefs? Rao (2009) has a nice and on point discussion of why inferring player belief player beliefs from these kinds of questionnaires is problematic.

Guy writes (here):

I have no doubt that teammates feed the ball to hot players, and likely with the approval of coaches at least some of the time (the less talented coaches, for the most part)

“less talented coaches”? This sounds like a strong and authoritative statement. Is it based on evidence, or experience? Experience I presume. What is your experience? Can you name names? It would be an interesting test. Get a measure of coach talent correlate it with belief in hot hand. Do it with collegiate and NBA coaches. Collecting the data may be tricky.

Guy writes:

In the meantime, we are left with evidence of some streakiness in repetitive physical tasks in experimental settings (or meaningless exhibitions). So on a scale of 0 (the HH is a “fallacy”) to 100 (the perceived HH is “real”), where are we now? I’d say the evidence suggests maybe 6 or 7. It’s likely there, but tiny compared to the perception.

I am not sure what evidence this quote is based upon: “It’s likely there, but tiny compared to the perception,” so I will not address it directly, though below it is referred to.

By “meaningless”, I presume what is meant is “not a game.” Clearly shooters in the 3 point contest care about their performance, and they are engaging in the same act used in the game, so in that sense it is meaningful. While I cannot speak for Guy, it is interesting how the goals posts for evidence have shifted for many folks that insist on maintaining the fallacy view. Over the past decades Gilovich, Vallone and Tverksy (1985)’s study was viewed as having this over-whelming evidence that the hot hand was a myth because it had 3 different shooting studies, including a controlled study to address the problems with game data. This controlled study was analogous to the NBA’s 3 point shooting contest. Almost 20 years after the original study Koehler & Conley (2003) collected data from the three point shooting contest and replicated GVT’s results. Thaler wrote: “Without any defense or alternative shots, this [3 point contest] would seem to be an ideal situation in which to observe the hot hand.” Researchers clearly knew that they could not use game data to conclude that the hot hand is small, or that it is a myth. The fallacy view was built on these foundations.

There are no longer any foundations to justify the original conclusion, and the consensus that emerged. The evidence from these studies have been invalidated as one can see by reading this paper, and the references therein. Moreover, if you play by the same rules as previous studies, using the same data, there is significant evidence of a substantial hot hand effect. Not tiny by any metric.

Now, I am aware that Guy views this entire literature as irrelevant with regards to the main hot hand question (our work included). In his view the tests that they contain are asymmetric. In particular, he has stated elsewhere something along the lines of “if you don’t find shooting performance X in these tests, then X does not occur in games, but if do you find shooting performance X in these tests, that means nothing regarding whether X occurs in games.” This statement seems pretty strong. One can debate the external validity of any specific study, to be sure. But, if X represents simple shooting performance, we know this statement must be wrong. I believe it goes without saying that shooting better in practice does predict shooting better in games (on average!). Conversely, we know that players who shoot better in games also shoot better in the three point contest. Why should it be any different if the type of shooting performance we are referring to is the hot hand? This debate about external validity could go on, and probably isn’t worth it, as there is no way to resolve it.

The question that remains: what we should do now that the foundations of our beliefs have been stripped away? I would argue that our beliefs should be what our beliefs would have been had Gilovich, Vallone and Tverksy (1985)’s study never existed. What would we have believed now if we were to have looked at the evidence with fresh eyes, without the “hot hand fallacy” idea in our head? It is hard to say, but I’d bet we’d look at the evidence and combine it with our priors (“hot hand” for just about anyone normal). We’d immediately realize that game data can’t address the question for the well-known issues (control, attenuation from several sources). We might look for evidence in studies that eliminate shot selection and defensive pressure. I find it difficult to imagine that with the natural prior and this evidence that we would conclude the hot hand is a myth, and that players and coaches are in error. That would be very silly. In fact, we might feel more justified with our prior beliefs. On the other hand, we may recognize that in general (i.e. outside of basketball), people tend to confuse performance with competence, and often over-react to recent information. We may recognize that these tendencies are a recipe for error, which should inform our beliefs, and make us more humble.

]]>I decided to add dots at a 45-degree because it looked cleaner than a line.

]]>Jellyjuke

luv it!

That is really nice.

One thing: since it isn’t square, it is easy to mentally super-impose the wrong 45-degree line… a reference line could help :)

]]>Daniel:

> When doing a real world analysis I should use information that best approximates my knowledge about the real world.

Using information that best approximates my knowledge about the real world I can work out (or simulate) the frequency in the future that [for] someone [that] would reuse my analysis throughout the rest of human existence would be wrong…

OK, let disagree until the new year, but thanks for anticipating an ugly perspective that might kill my beautiful theory ;-)

]]>Keith, right, I don’t really buy into that. Anything where we merely imagine possibilities doesn’t do it for me. I see Bayesian probability distributions as expressions of specific states of information. It’s fine to create some imagined state of information out of curiosity for what calculation that would lead to, but I don’t see why it should be used in any actual analysis about real world things. When doing a real world analysis I should use information that best approximates my knowledge about the real world. Imagining the frequency in the future that someone would reuse my analysis throughout the rest of human existence seems to lead to zero actual information about the average size of Maize kernels or the thermal efficiency of a particular model of internal combustion engine, or the actual range of annual income of black people in Arkansas this year….

]]>Daniel:

I think you are trying to discern a population of existents (that is needed for this “repeated use model” to refer to) whereas by (inexhaustible) collectives I simply mean possibilities.

]]>Good idea. Ask and you shall receive:

http://www.jellyjuke.com/hot-hand-fallacy-joint-distribution.html

Guy:

1. You ask, “where’s the evidence that that this happens?” I’d flip this around and say, Why are you so sure that players and coaches are systematically doing the wrong thing? They have lots of experience and also motivation to win games.

2. You write, “we are left with evidence of some streakiness in repetitive physical tasks in experimental settings (or meaningless exhibitions).” That’s not true. Miller and Sanjurjo and others analyzed data from real games.

3. You write that it is “the less talented coaches” who would use the hot hand. Why would you say that? Red Auerbach and Phil Jackson are both on record as saying they think the hot hand is real.

]]>Guy, you wrote: “I’ve never heard of a team using a decision rule built on the HH idea.” But OK, new points taken.

]]>Chris, I’m with you in that I don’t see how a frequency interpretation of a “population of effects” over which this kind of analysis could be used is so helpful, but here’s where I think it could meld with a more “probability/credibility” interpretation:

In social science especially, we often have *groups* of people over which we are interested in aggregating, the point here is that there is a *discrete set* of values for the group variable. In this context, we’d like to specify the plausible value for a given parameter across *all groups mixed together*. In this context, we can think of a “meta prior” where we aren’t yet in the state of information where we know which group is selected, and so we’d like to model the plausibility in this context, which requires thinking about an explicit mixture of sub-populations, and we may have some state of information about the relative frequency of the groups in the population (say from background data in the census, or health status surveys or whatever). Using this background knowledge about the population frequency mixture when developing the “state of knowledge” for the prior can be very helpful.

]]>Chris:

> but I never felt any sense of intellectual “closure” on the issue :)

That was my sense – maybe I should do a post on it (maybe check if Andrew already has one in the cue – I can only see the titles though).

> How does thinking about this Bayesian reference set help me clarify my prior modelling choices compared to the more intuitive notion of plausibility?

For one, it provides a sense of error rates in settings where the reference set is appropriate.

> how broadly does one define ‘the set of problems where this Bayesian analysis would be used’?

I referred to that as the Goldilocks challenge/opportunity – intuitively treat same as same, different as different and related as exchangeable.

And no reason not to be sharky about it – Shared kernel Bayesian screening. https://arxiv.org/abs/1311.0307

]]>I have no doubt that teammates feed the ball to hot players, and likely with the approval of coaches at least some of the time (the less talented coaches, for the most part). But note that I said “successfully” exploit the HH. Where is the evidence that this happens? I think it could be studied, for example by looking at periods of time where a player’s usage rate increases substantially above his norm in the wake of a streak of successful shooting. Then compare his performance to that of the players who lost shots in that time interval. But it will be hard to control for all confounds, such as is the player getting more shots because he is matched with a weak defender? My guess is that a good study will find there’s little or no payoff to feeding the ball to ostensibly “hot” players (in part because of the attentuation issue Andrew frequently cites). But if someone wants to prove otherwise, I’d love to see the study.

In the meantime, we are left with evidence of some streakiness in repetitive physical tasks in experimental settings (or meaningless exhibitions). So on a scale of 0 (the HH is a “fallacy”) to 100 (the perceived HH is “real”), where are we now? I’d say the evidence suggests maybe 6 or 7. It’s likely there, but tiny compared to the perception. Andrew calls this a “reversal” on the order of the South switching allegiance from the Democrats to the Republicans (honestly, the Cavs losing LeBron would have been a more apt analogy!). I’d say it’s more analogous to the record Houston set last night for 3-point baskets made in a game (26, beating the old record of 25).

]]>Keith, I think I see a little more what you (and Andrew) mean here. But, how broadly does one define ‘the set of problems where this Bayesian analysis would be used’? I mean, I might apply a structurally identical linear regression model in a whole bunch of different instances, but in each case I would write out priors over, say, scale parameters based on plausibility *in that particular application*, or arbitrary choices like how I re-scaled data.

For example, my priors might differ if I am setting a hyper-prior on a model to partially pool closely related cultivars of Maize, versus interspecific comparisons of, say, nitrogen response curves. So, is ‘the set of problems where this Bayesian analysis would be used’ as specific as “closely related maize cultivars”, or is it everything where the same linear model might written out for? How does thinking about this Bayesian reference set help me clarify my prior modeling choices compared to the more intuitive notion of plausibility?

]]>Guy: Honestly — you’ve never heard a player say, “So and so was on fire, so we kept feeding him the ball”? Do you follow basketball?

]]>I suppose it’s possible that coaches successfully exploit hot hand effects. This would be a very hard thing to study, but perhaps not impossible. But I’ve never heard of a team using a decision rule built on the HH idea. For example, an NBA team might say “player X is generally our 3rd choice to take shots (in a lineup of 5 players), but when he has made 4 of his last 5 shots he moves up to 2nd choice.” Or an NFL coach might plan to pass about 65% of the time, but increase that to 75% any time his QB completes 4 of 5 passes. Or a baseball manager might decide “I will hit player Z 7th in the lineup, but will move him up to 5th any time his 10-game trailing batting average exceeds .400.” I have not heard of any sports decision-maker using such a strategy successfully, nor seen evidence that one has — perhaps others have? A successful coach would have incentive to keep this quiet, I suppose, but over many years and many sports — and with so many players and coaches switching teams — I think we would likely hear about this.

Of course coaches could have other, non-streak methods of identifying hot players. It would be hard to disprove that….

]]>Guy:

What makes you say, “no one has yet found a way to do this”? Isn’t it possible that basketball players and coaches make use of the hot hand all the time in useful ways?

]]>Andrew: I understand that the attenuation issue means the HH effect could theoretically be large, but in practice this is immaterial unless/until there is some way (other than recent streakiness) to identify the “true” hot hand. You say players have other information, but AFAIK there is virtually no evidence that such other information predicts true HHs — despite the fact that coaches and players have every incentive to find it, and an intuitive inclination to believe the effect is real.

As for incremental advantage over the constant P model, yes I agree that’s the potential next step. It may be possible, in some situations in some sports, to improve slightly on that model. But I think more weight should be given to the fact no one has yet found a way to do this — which is strong evidence that it’s hard to do, and not likely to lead to a model that’s significantly better. I’m also not sure folks here — perhaps because I approach this from the perspective of sports analytics rather than economics/academia — fully appreciate what a huge advance the constant P model has been over a crude belief in a hot hand effect. Indeed, even today there are cases where sports decision-makers give far too much weight to short-term performance and would be better-served by sticking to a constant-P model (and I’m not aware of any situations where the reverse is true).

So I do worry when someone of your standing announces a “reversal” or says that the HH fallacy is itself a “fallacy.” Yes, the specific claim that P is always constant has been knocked down. It’s wrong. But the common understanding of the “hot hand” is also wrong. Gilovich’s poll found players believed a 50% shooter had a 61% chance of success after making a basket, and just 42% after a miss. This is wildly, fantastically wrong — much more wrong than the claim of a constant P. So while the hot hand effect may be “real,” the Hot Hand Myth also remains a myth. Both can be true….

]]>Guy:

Regarding the “2% boost in shooting efficiency” thing . . . That’s what I used to think too, before I thought about the attenuation bias; see for example the last paragraph of Terrys comment here. You say the attenuation factor is immaterial but it’s a factor of 5 (in that simple calculation), which is a big deal. The point is that players and coaches have more information available than just whether the last shot was a hit or a miss.

Regarding your last paragraph: I agree that the model of constant probability is not a bad default, and indeed I think it’s the model that just about everyone uses as a starting point. The hot hand is on top of that default model. It would be ridiculous to play the hot hand *instead* of using long-term frequencies; the point of using the hot hand is to gain an incremental advantage.

“My guess is that a small hot hands effect….skillfully acted on, might make a difference of a couple points a game, which would substantially improve a team’s performance over a season.”

This is a very poor guess. Two points a game would indeed be a huge gain for an NBA team (approximately 6 wins per season). But there is no evidence I’m aware of that any team has — or could — find a way to take advantage of a HH effect to anything like this degree. The best attempt to measure the HH effect while correcting for difficulty of shot and defensive pressure found a 2% boost in shooting efficiency — not enough to be actionable (this doesn’t account for the attenuation problem, but in terms of practical effect that is immaterial).

And that’s why smart sports decision-makers will continue to treat the hot hand as a fallacy — because practically speaking, it is. We don’t need to explain this via any “cognitive bias,” because discarding the hot hand is absolutely the correct strategic approach in the vast majority of situations — certainly far more correct than the public’s intuitive belief in hot hand effects (which are vastly larger than the reality). That is, a model that says each player will have a constant talent level throughout today’s game — regardless of short-term changes in performance — is an *excellent* model for sports decision-makers to use, and far better than what many have used (even though we know it’s not quite correct). Maybe that will change at some point, when someone discovers a way to identify a HH effect in real time that is large enough to impact decision-making. But so far, no one has done that.

]]>Hi Keith,

Thanks for engaging! I agree this discussion has arisen before, but I never felt any sense of intellectual “closure” on the issue :) I will have to ponder some more. The subjunctive still bothers me somewhat. I want to do the “best” analysis, in a decision-theoretic sense, for the set of data and models I have at hand. So I’m not sure about the *only care about repeated use* part.

I agree that for scientific *research programs* something like their behavior over time is all that really matters- so I can imagine evaluating whole sets of procedures, including modeling choices, based on their repeated-use properties. I am certainly not one of those people who is philosophically opposed to small-f frequentism in this sense.

Another question. Practically speaking, how does imagining a ‘distribution of parameter values in a set a problems where this Bayesian analysis _would_ be used’ help with prior specification? To me, the idea of maximizing the information leveraged *for this one analysis* helps tremendously in thinking about the goals for prior modeling.

Chris:

> By contrast, thinking about the set of problems for which the model might be applied seems more obscure to me.

Opposite for me as only repeated use matters. As David Spiegelhalter once put it, when predicting a football championship game, one is always at risk of losing to the octopus, tortoise, etc.

That is, I believe we want to think in (inexhaustible) collectives and so a distribution over plausibility for parameter values *in* the problem at hand is made more concrete as a distribution of parameter values in a set a problems where this Bayesian analysis _would_ be used. (The _would be used_ makes it conditional on the information set of the observer.)

I think this discussion has arisen many times before on this blog and never seems to make much impact.

]]>Terry, and Joshua: For my example of metaphysical certitude, I refer to the phenomenon known as The Zone. See Jim’s comment above. “Everything works.” I grant coaches need to be smarter than that.

]]>Good points.

Tversky’s logic is quite simplistic and weak: the data looks random using a simplistic analysis, so you are wrong to see patterns in it.

But you need to carefully lay out a model with real numbers to see whether the effect is important. My guess is that a small hot hands effect (virtually undetectable by Tversky’s analysis), skillfully acted on, might make a difference of a couple points a game, which would substantially improve a team’s performance over a season.

Further, coaches and players aren’t looking only at the hit rate of a player to identiry a hot hand. General playing skill gives other clues, and the player’s perceptions of hotness can play a role too.

The more you think about it, the more Tversky’s argument depends on the clumsiness of his analysis. He can’t find it in his very crude statistical analysis, so it can’t possibly be true.

]]>No, it doesn’t. It only does *if your prior is that all possible sequences are equally probable*

]]>Matt, I have no problem with you modeling basketball shots by person X as having a particular long term frequency distribution. What I have a problem with is specifying that it’s a binomial with p=x for any x based on *zero actual information*.

This model says certain kinds of sequences are probable and other kinds of sequences are improbable. For example sequences of 200 shots that have 107 hits but 36 of the hits are in a row are entirely ruled out by the binomial p=0.5 model

A *frequency assumption* is a *strong assumption* that a Bayesian “probability distribution” does not make. The Bayes assumption is “for all you know these sequences are reasonable” the Frequentist assumption is “the future of the actual world will behave in a way that verifiably has certain mathematical properties”

My complaint is that Frequentist “likelihoods” *specify physical behavior* that simply *isn’t true*

]]>Matt:

Yes, we also discuss this connection between frequencies and conditional probability in chapter 1 of BDA.

]]>I think this is very flexible though, as it will vary with the information set of the observer. So as I accumulate more information the probability tends towards 1. So initially I have p = .6 when all I know is the shot is from a certain distance. Then suppose I know who shot it (curry) so p moves to .7, and so on. It’s nice because obviously there are many things that really don’t seem to fit the frequentist perspective, but this allows for it. You use your current state of information to refine or expand the reference set of events from which you define probability by the long run frequencies.

]]>Thanks Andrew! I think I’m still trying to wrap my head around this statement from your second link: “What I mean is, the Bayesian prior distribution corresponds to the frequentist sample space: it’s the set of problems for which a particular statistical model or procedure will be applied.”

So you conceive of the prior as a distribution over a population of effects, and in particular not restricted to the case at hand? I still find it easier to think of the full Bayes model as a double use of probability, first to model the sampling procedure, and second to quantify uncertainty in parameters.

To me it *seems* more useful to think of that ‘second’ use as about a distribution over plausibility for parameter values *in* the problem at hand. That kind of reasoning comes up all the time for me in trying to scientifically model things. By contrast, thinking about the set of problems for which the model might be applied seems more obscure to me.

Not trying to nit-pick, just curious where you think my view might be limited or off-base here.

]]>Chris:

Yes, I was referring to Matt’s phrases, “all shots taken that have the same characteristic that we can observe about the shot at hand,” and “elections with the same characteristics as the one we currently observe with trump.”

]]>Andrew, to be clear, by the frequentist “reference set” in this case you mean ‘set of all shots taken with the exact same characteristics as the shot at hand’?

]]>Matt:

Your description of the frequentist model of probability is a good one because it makes clear that this model is based on strong assumptions, as is the Bayesian model. The frequentist model depends on the definition of the reference set, which is mathematically very similar to the Bayesian prior distribution; in either case there is some assumption about the set of problems being studied.

]]>Just like flipping a coin, basketball shots can be thought of from a frequentist perspective: the probability of making the shot is the long run frequency of all shots taken that have the same characteristic that we can observe about the shot at hand (player identity, previous shot outcome, distance from hoop, etc). This logic can equally be applied to elections: the probability of trump winning is the long-run frequency of elections with the same characteristics as the one we currently observe with trump. Same logic goes through with a coin flip. The underlying process is deterministic, but most things can be framed from a frequentist perspective, no? You can disagree with this philosophically but I think this is a useful way to think about things. It seems like you have an axe to grind with frequentists even though the point you make is of zero practical consequence.

]]>You’re using “random” to mean “uncertain” meaning your knowledge doesn’t let you predict what will happen, which is a Bayesian concept. But that’s not what random in terms of frequency means, and it’s random in terms of frequency which is what “tests” check for “random binomial with p = p_0” in terms of frequency means long sequences from this “system” pass a series of stringent computational tests for randomness. Mathematical sequences aren’t just random because you don’t know what they’re going to do, they’re random because they pass all the tests you can think of for idealized random sequences.

That’s simply *not* going to be true for basketball shots.

]]>You start taking shots. Periodically, I give you shots of alcohol. Your ability to make the shot will decrease in a fairly predictable, but noisy fashion. Until it stabilizes at 0, I’m integrating over more variables than I can name, let alone measure. Random seems right.

Oh sure, it might be deterministic underneath. But that’s even more true of coins, and we regard them as pretty good RNGs.

]]>“unless the probability of Model2 declines exponentially fast with length”

It does. About 1/2^N, plus or minus some coding tricks.

]]>Jellyjuke-

nice.

Since Prop(H|HHH)= #HHHH/(#HHHH + #HHHT) another interesting graph would be the joint distribution of (#HHHH,#HHHT).

]]>Terry, Andrew

Yeah that simple example is nice and is perfect for seminar and teaching. The proxy-thinking was how we initially discussed attenuation bias due to measurement error in our 2014 “Cold Shower” paper by using a Hidden Markov Model and noting that streaks are a poor proxy for the state (now here). While discrete states may be unrealistic, our motivation for approaching it this way was to follow the approach of an influential 2009 review paper by Oskarrson, Van Boven, McClelland, and Hastie, in which they use Hidden Markov Models to think about biases in judgment of random sequences.

In the second version of your “Surprised Paper” we elaborate upon it in the context of under-adjusting for bias ( now here, in the Appendix).

Interestingly, this attenuation can also explain why GVT’s betting task reveals more predictability than they thought (now here)

The first paper to discuss measurement error in the context of the hot hand literature was Daniel Stone’s paper . His example uses an AR process for a player’s shooting ability and estimates the autocorrelation in ability with autocorrelation in outcomes

]]>Daniel

on the magic fairy wings: all models are wrong, some are useless.

But we all knew this would end in a debate about how many angels can dance on the head of a pin. Meanwhile, 2019 approaches…

]]>Hi Jonathan (another one)

Thanks. Just to be clear my “Given a long enough sequence…” discrimination task was tailored as a response to this quote from your comment above (please note that my response just below your comment has additional detail):

Now give them some actual set of basketball shots derived from actual data any way you want (same players, same team, every fifth shot of some player, shots taken closest to 9 PM across the league on some set of days, ehatever) with the same observed p. Will people be able to reliably discern, even armed with all of your results, the random sequences from the NBA sequences? I doubt it. And I further would bet that NBA players and coaches couldn’t do any better…. or worse. I think of your results as analogous to those that showed that early RNG’s weren’t really random in that there were high dimensional tests that showed they weren’t, but that for fooling humans, they were perfectly good at that task.

If you prefer to ask if people can reliably discern between a single 4-shot sequence generated by a shooter with a hot hand and one randomly generated, I agree that human observers would not be able to reliably tell the difference even if there is a true difference in the underlying process. This would not be a failure of the human; the task itself is impossible, there simply is not enough information to be had in a single sequence of 4 shots.

]]>I probably have the story slightly wrong, but as I remember it: when someone would talk about some amazing coincidence they had experienced, Feynman would say “On the way over here I saw a car with license play CHX 189. Can you believe it? Almost 200 million possibilities and I happened to see that one!

]]>“Given a long enough sequence…” Is this a sequence longer than, say, four shots taken and made in a single quarter? Because that’s all required for the psychological feeling to take hold, but ought to happen to a p=0.5 shooter who shoots a lot a few times every month. I’m saying take sequences no longer than it takes people to infer streakiness without any statistical background in the concept. And what I’m betting is that given streaks of these modest lengths, no one can statistically infer streakiness.

And your correction of my prior post is dutifully noted and accepted…

]]>Just linking back to the above. Given a long enough sequence, I’d bet that people can distinguish between (i) real data from a truly streaky shooter with an overall 50% hit rate, and (ii) randomly generated data from i.i.d p=.5 Bernoulli trials. Of course this is on average—sometimes the randomly generated sequence will be identical, or more streaky.

Also “Jonathan (another one)” writes:

Gilovich chose one dimension (last shot memory) and tried to show that the effect of last shot memory was nil. Miller et al showed both that the Gilovich test was flawed and that first order dependence seemed to actually exist, or at least not be rejected

Minor correction: they also tested how shooting percentage depended on streaks of recent success, which had the same flaw.

]]>Hi Phil

There are two ways of addressing the issue of 5pp being a modest effect.

1. The 5 percentage point effect is an attenuated estimate due to measurement error, so it is consistent with a much higher probability point effect (Terry makes this point below)

2. Even if we assume that they are the same, 5 percentage points is not obviously “modest.” We need to define what modest means in terms of impact. First, 5pp is around half the difference between the median and best three point shooter. This is an *average* effect, which is a bit absurd if you think about it because you are including players that don’t shoot that often in the category. Of course, this doesn’t measure impact. To measure impact we would need to write down a model of basketball and calculate how many wins you gain over the course of the season if the hot hand is of that size (on average!), and you try to create open looks when it makes sense. I haven’t done the calculation, but my intuition is that the impact would be large.

You also make the point of the defense neutralizing the hot hand *in the data* by making shots more difficult. That’s fine, as long as Curry still has the hot hand in reality (probability), the defensive attention on him can open up opportunities for other players.

]]>Indeed comments here are buggy. When I comment and submit, the comment disappears, but if I refresh a minute later, it re-appears. I think the blog has a latency issue.

Regarding the quotes, they were partially for you, and partially just to post them, because many folks have ideas about what the paper claims, but don’t check.

Regarding your notion of the hot hand fallacy as “metaphysical certitude,” I am not sure if you are introducing a new conception of the hot hand, or referring to something specific in the original hot hand paper. I see too interpretations based on the various ways you have explained what you mean: (i) People are too confident when they believe that they have detected the hot hand (even if they are able to detect it to some degree), (ii) People believe that Pr(success|hot) is really, really close to 1.

Regarding (i), the original hot hand paper, GVT (1985), has no measure of confidence, so it cannot address that point. Regarding (ii), the GVT does not measure people’s beliefs in Pr(success|hot) so it cannot address that question either. I doubt that practitioners (players and coaches) actually believe that Pr(success|hot) is close to 1. GVT do measure fan beliefs in Pr(success|success) and Pr(success|failure) and if fans interpret Pr(success|success) as Pr(success|hot), their beliefs are reasonable and not close to 1.

Regarding your challenge of discerning random data from real shooting data. I believe people would successfully distinguish between the two if you juxtaposed random data with the data of a truly streaky shooter. GVT do not address this issue directly, though in their discussion they discuss their lab results on perception of randomness/streakiness in sequences and find that people have a bias towards expecting alternation in random sequences (a known result at the time), so sequences with the expected rate of alternation (.5) get labeled as “streak shooting.”

Anyway, the questions relating to the hot hand that you are hinting at—calibration, detection, confidence, reaction—are interesting empirical questions that should be addressed. Basketball may not be the best domain in which to study this. In basketball I have no reason to expect people to be super great in terms of calibration, detection, confidence, and their response to it, given that the underlying reality of the data generating process is so difficult to measure.

]]>That’s exactly the point Richard Royall memorably (at least to me) makes in Statistical Evidence where he compares the likelihood, on observing the Queen of Spades drawn from a deck the dueling hypotheses: this is a normal deck vs. this is a trick deck consisting entirely of Queens of Spades. People who think (in the absence of a prior over the two hypotheses) that one doesn’t have 52 times the likelihood of the other just doesn’t understand likelihood.

]]>Interestingly, P(data|Model1) = 1/2^14 whereas p(data|Model2)=1 so unless the probability of Model2 declines exponentially fast with length, you will certainly conclude that the model where nothing but the given sequence could have been observed is the correct model. In some sense, when physics is what’s going on, and your state of knowledge is to look in hindsight, that’s absolutely correct.

]]>Joshua,

For uncontested 3 point shots I’d call a 5 percentage point difference a modest effect, but 10 or greater is certainly large.

But controlling for the quality of defense, and for offensive shot selection… If I typically shoot 50%, but I get hot so I take riskier shots such that my percentage stays at 50%, but I’m now shooting 50% on harder shots, would we really want to call that the ‘hot hand’? It is, I think, not what most fans mean when they say someone is hot. This gets into the philosophy of what it means to be ‘hot’, an issue that was raised in the original paper. I suppose there are different definitions of ‘hot hand’ and people can argue about which one is right.

]]>Consider the following sequence of Hit or Miss

HHHMHMHHHMMMMH

Can you *ever* distinguish between the following two RNG models:

1) This is the output of a binomial RNG with p = 0.5

2) This is the output of a binomial RNG with time varying p such that p was equal to 11101011100001

Can you distinguish between these two models for N = 10k shots (and an appropriate choice of constant p = whatever)?

Fundamentally the question about basketball players shouldn’t be “what is their time varying p” because there is not a time varying p, all there is is a human animal playing basketball. But as soon as you allow time varying p, if you don’t somehow provide regularization of what sequences are allowable for p, then it’s always possible to give a sequence of 0 and 1 values for p that precisely replicates the exact output, and to say “inevitably, this was the only sequence that could have possibly occurred”. Which since it really is true that no other sequence could have occurred during that time period… is post-facto probably the “correct” answer.

]]>