## Gilovich doubles down on hot hand denial

[cat picture]

A correspondent pointed me to this Freaknomics radio interview with Thomas Gilovich, one of the authors of that famous “hot hand” paper from 1985, “Misperception of Chance Processes in Basketball.” Here’s the key bit from the Freakonomics interview:

DUBNER: Right. The “hot-hand notion” or maybe the “hot-hand fallacy.”

GILOVICH: Well, everyone who’s ever played the game of basketball knows you get this feeling where the game seems to slow down. It becomes easier, or you almost don’t even have to aim that carefully. The ball’s going to go in. It’s one of the most compelling feelings that you can have. And it turns out if you statistically analyze people’s shots — whether it’s professional games, college basketball players shooting in a gym, although the feeling exists when you make several shots in a row — you will feel hot. That feeling very surprisingly doesn’t predict how you’re going to do in the next shot or the next several shots — the distribution of hits and misses in the game of basketball looks just like the distribution of heads and tails when you’re flipping a coin. although of course, not every player shoots 50%. Very few of them do.

That’s wrong. The distribution of hits and misses in the game of basketball does not look just like the distribution of heads and tails when you’re flipping a coin.

In their 1985 paper, Gilovich et al. thought they found that the distribution of hits and misses in the game of basketball looked just like the distribution of heads and tails when you’re flipping a coin. But they’d made a couple mistakes. Subtle mistakes, but mistakes nonetheless. Miller and Sanjurjo explained the problems:

1. The simple estimate of correlation, computing for each player the empirical frequency of hits right after a sequence of hits, minus the frequency of hits right after a sequence of misses, is biased. And the bias is large enough to make a difference in data sequences of realistic length. (Hence Gilovich was making a mathematical error when he said, “Because our samples were fairly large, I don’t believe this changes the original conclusions about the hot hand.”)

2. Whether you made or missed the last couple of shots is itself a very noisy measure of your “hotness,” so estimates of the hot hand based on these correlations are themselves strongly attenuated toward zero.

Combining 1 and 2, even large hot-hand effects will show up as zero in Gilovich-like studies.

None of this is new, and Gilovich is aware of these criticisms. Yet in this Freakonomics interview, he chose to do a full Cuddy, as it were, and not even acknowledge the demonstrated problems with his work.

That’s too bad, especially given that his 1985 paper has an important place in the history of our understanding of cognitive illusions, and its errors are subtle. There’s no shame in making a mistake.

What’s with the refusal to acknowledge error? Is it something in the water at Cornell University? Is there some bar in Ithaca where Daryl Bem, Brian Wansink, and Thomas Gilovich go to complain about the fickle nature of the science press?

This might be unfair to Gilovich, though. All I have to work with is the transcript of the interview, and for all I know he went on like this:

GILOVICH: Actually, though, we were wrong! Miller and Sanjurjo showed that our data were consistent with a hot hand all along. The problem was that our correlation-based estimate, which seemed so reasonable, was actually a biased and noisy estimate of the hot hand. My bad for not noticing this back in 1985. But, hey, both problems—the bias and the variance—were subtle. In my defense, the brilliant Amos Tversky didn’t catch these problems either.

DUBNER: Yup, almost everyone missed it. As late as 2014, Gelman was pooh-poohing the idea that the hot hand could amount to much. Anyway, it’s great to have the opportunity to correct the record now, here on Freakonomics radio!

And then maybe that last exchange got cut, for lack of space.

What’s goin on here?

Seriously, though, I see two problems here.

First, Gilovich has seen that serious scholars have criticized his hot-hand claims. The criticisms are real, and they’re spectacular, and it’s not like Gilovich has any refutation, he’s just bobbing and weaving. Then a reporter comes to him for a feature story and he presents it completely straight, as if it’s 1985 again, Run DMC on the beatbox, the Cosby Show on TV, Oliver North sending weapons to the Ayatollah, and New Coke in every 7-11 in the country. What kind of attitude is that?

Second, the Freaknomics team didn’t think to even check.

OK, this thing jumped out at me because I’m a statistician and I’ve spent a lot of time thinking about the hot hand. But you don’t need to be an expert to know about this.

Suppose you’re running a radio show and you’re going to interview someone about a particular topic. (And this one didn’t come as a surprise: if you check the transcript, you’ll see that it’s Dubner, not Gilovich, who first brings up the “hot hand.”)

What do you do? You quickly research the topic? How? First step is Google, right?

(I ran the search in anonymous mode so I don’t think it’s using my own search history.)

– First item is the Wikipedia entry which right away describes the hot hand as an “allegedly fallacious belief” and includes an entire section on recent research in support of the hot hand (i.e., disagreeing with Gilovich’s claims).
– The second item is a news article saying that the hot hand is real, and featuring the work of Miller and Sanjurjo.
– The third item is the Gilovich et al. paper from 1985.
– The fourth item is another news article saying the hot hand is real.

So, it would be hard to google the topic and not come to the conclusion that Gilovich’s claims are, at best, controversial.

Yet this did not come up in the interview. The Freakonomics team was not doing their job. Either they showed up for an interview without doing the simplest Google search, or they did know about the controversy but let their interviewee get away with misrepresenting the state of knowledge.

Too bad. It would’ve been easy for the interviewer to follow up with something like,

DUBNER: That all sounds good, but in preparation for this interview, I looked up the hot hand and I saw that your claims are controversial. Nowadays a lot of people are saying that the hot hand is real, and there seems to be general agreement that the conclusions from your 1985 paper were a mistake, turning on a subtle probability error.

Then Gilovich could reply, maybe something like this:

GILOVICH: Yeah, I’ve heard about this. I’m not a stats expert myself so, sure, I could believe that there were some subtleties in the estimation and the power analysis that we missed. Still, when you look at our data, players and spectators seem sooo convinced that the hot hand is huge, and even if it’s real, it’s hard for me to believe that it’s big as people think. So I’ll hold to my larger point that people overinterpret random events. That’s the real message of our project.

DUBNER: OK, now let’s talk for just a minute about your work in happiness or hedonic studies. . . .

OK, having written this, I can see how Dubner might not have wanted to include such an exchange in the interview, as it’s a bit of a distraction from the main point. But, really, I think you have to do it. The exchange doesn’t make Gilovich look so good, as it forces him to backpedal, but ultimately that’s Gilovich’s fault as he was the one to overstate his conclusions in the interview.

I guess this happens in political and celebrity interviews all the time: the interviewee says something false, and then the interviewer has to choose between letting a false statement go by unchallenged, or else blowing the whistle and losing the trust of the interviewee.

But when you’re interviewing a scientist, it should be different.

Look. When I talk about my own research, I try to be complete and open but I’m sure that I do some hype, I don’t dwell on my failures, and there must be times that I don’t get around to mentioning some serious criticisms. I should do better. If I’m being interviewed, I’d appreciate the interviewer calling me on these things. Not that I want to be hassled—I’m not planning to go on Troll TV anytime soon—but if I say something false or incomplete, that’s on me. I’d like to have a chance to explain, in the way that the (hypothetical) Gilovich did above.

1. Isaac says:

It would take more than one study and team to overturn the large literature (not just Gilovich’s research) failing to detect a hot-hand effect using different methods and approaches. Yes, future studies should address weaknesses of prior studies, including any biases in estimates, and attempt to replicate Miller and Sanjurjo’s findings. At the same time, even more recent methods than Miller and Sanjurjo’s (including Bayesian ones) fail to easily detect a hot hand: https://www.sciencedirect.com/science/article/pii/S0022249615000814. Moreover, the more important question may be, if such an effect exists, how consequential is it? The hot hand, if it exists at all, likely has a very small effect size.

• Andrew says:

Isaac:

See my point #2 above. Low correlation do not imply that the probabilities don’t vary much; you get low correlations because if you simply model hits and misses, you don’t have an accurate measure of the current state of hotness. In short, I’ve not seen any evidence for your claim that “The hot hand, if it exists at all, likely has a very small effect size.”

• Isaac says:

Even the studies that find a hot-hand effect show a small effect size:

An effect with a small effect size can still be important and meaningful; however, the evidence suggests that little (if any) of the variance in performance is attributable to a hot-hand effect.

• Andrew says:

Isaac:

1. It looks like Bocskocsky, John Ezekowitz, and Carolyn Stein are measuring current hotness status using the previous 4 shots, which is fine but it it still inherently a noisy measure of hotness, hence estimated results will be attenuated toward zero (that was my point #2 above).

2. Czienskowski, Raab, and Bar-Eli are doing a meta-analysis, which means their results are subject to all the biases in the original studies. For example, they use the Gilovich et al. (1985) results straight-up, even though we know those estimates are waaay biased because of points #1 and 2 above. So I consider this meta-analysis to be essentially valueless.

• Curious says:

This is an example of why much of psychology finds itself in the current replication predicament. When confronted with logical flaws in the methods used, they produce meta-analyses of studies that used these flawed methods to support their argument. It is as if there is a widespread belief that incorrect evidence in sufficient quantity overcomes its inherent incorrectness.

• Isaac says:

Yes, studies of behavior use noisy measures that can suppress associations. The presence of measurement error isn’t sufficient, by itself, to establish that a predictor has a large effect size. I haven’t seen evidence of the hot-hand effect being a large effect (with any measure), despite considerable evidence to the contrary, using different measures (with varying degrees of measurement error), methods, and approaches–many studies used a different approach than Gilovich. I agree that future studies should improve upon the measurement, address biased estimates, etc. However, the totality of the currently available evidence suggests that the effect, if it exists, is small.

• Curious says:

For the sake of reason and discussion, let’s agree to accept the reality that Andrew actually proved the existence of the effect and that alternative measures do not contradict this fact, despite a seeming desire for some to holdout for the possibility that illogic will defeat logic at some point. Let’s further accept that the causal effect is small in terms of points scored.

That said, if the variation observed in this small effect can mean the difference between a win and a loss and when that win occurs at the single elimination point in a season, then the small effect actually provides large returns which arguably could be described as a large substantive effect.

• Isaac

You are right that measurement error isn’t sufficient to establish effect size.

You say there is “considerable evidence to the contrary.” While it is true that there are abundantly many basketball studies that find small effect sizes (or none), this doesn’t constitute *evidence* that the effect size is small. These studies are typically done on game data, which is messy on top of the measurement error issue. At best they can pick up a signal of hot hand, but the estimates should not be treated as effect size. If you are doubtful about this, I will copy paste below some STATA code that we have lying around and adapted it a little to address your point.

In controlled studies (NBA 3point, Cornell shooters of GVT, etc) the effect sizes estimates are conservative and still meaningfully large (e.g. >10pp). The task of shooting a basketball in a three point contest, while different than in a game in some respects, is not that far away. Since this is the only real evidence we have, the totality of the currently available evidence strongly indicates that the effect exists, and moreover, the point estimates are meaningfully large.

• Joshua B. Miller says:

Issac (and others)

for whatever reason I couldn’t preserve formatting in the comments, so here is a pdf which shows exactly why these estimates of effect size you site, do not constitute evidence that the effect size is small.

• Kim Kaivanto says:

My understanding of the Miller & Sanjurjo result is that much, if not the entirety, of the hot-hand literature has been employing an incorrect operationalization of the null hypothesis of ‘no streakiness’. Hence calculations of effect size also need to be revisited. Given M&S’s paper came out only in 2015, no meta-analysis from the pre-2015 can possibly be cited as evidence against M&S’s result either.

• Anonymous says:

Curious: +1

• Rahul says:

>>>I’ve not seen any evidence for your claim that “The hot hand, if it exists at all, likely has a very small effect size.<<<

Andrew:

What's your best estimate for the effect size of the hot hand.

Actually, this is funny because last time I dug into this it seems there isn't even any agreement about a rigorous definition of what constitutes a "hot hand", right?

• Andrew says:

Rahul:

I agree that the hot hand can be defined in different ways. Roughly speaking, I’d define it as the difference in probability of success, before taking the shot, comparing two otherwise-identical shots, but one is taken when you’re “hot” and one when you’re not. “Hot” would be defined based on some latent variable, not simply based on whether you made your last shot or last two shots.

What’s my best estimate? I don’t have a great estimate. I think it must vary a lot by player and by scenario. I’d guess that it could easily be 0.1 or more, but that in some situations it could be a lot less. The data I’ve seen, seem to be consistent with a hot hand, but it’s hard to get a sense of how large it is, because the estimates of the latent “hotness” status are themselves so noisy. I don’t think it’s been proven that the hot hand effect is large, nor do I think there’s any good evidence that it’s small. It’s just hard to measure.

• Rahul says:

The entire definition seems very ambiguous and fuzzy. Isn’t what you term as “some latent variable” too broad?

This is why I feel the whole “hot hands” debate is very unscientific. You cannot argue productively about a concept which is imprecisely defined, highly variable from player to player and it the same time possibly a tiny effect.

How can we even argue whether or not “Phenomenon X has a small effect size” when it is very likely that depending on who / which latent variable etc. the effect size is going to vary all over the place?

• Rahul

Players, fans and coaches certainly don’t have an agreed-upon definition. Gilovich, Vallone, and Tversky called attention to this and didn’t take a firm position, but they didn’t need to. Their idea was to show the hot hand was a cognitive illusion, i.e. people were believing in something that wasn’t there. They used a variety of tests that were intended to detect the different patterns that they expected to be related to what they called “non-stationarity” (streaks) and “positive association” (conditional probability of success).

What GVT call “non-stationarity,” refers to what players and fans call Zone, flow, rhythm, and groove. These descriptions evoke the idea of that a player’s underlying ability can shift, i.e. $latex Pr(success)$ is moving around. Here one wants to estimate $latex Pr(\text{success} |\text{hot})$ and compare it to the estimate of $latex Pr(\text{success} |\text{not hot})$. This seems naturally modeled with a time-varying ability parameter, whether it be continuous, or discrete (e.g. markov chain). Precisely estimating this seems impossible, and it depends on structural assumptions.

The subsequent literature in basketball mostly focuses on measures of positive association between shot outcomes, in particular estimates of $latex Pr(\text{success} |\text{recent success})$. The fact that researchers used this as a measure of hot hand offers support for the idea that people’s intuitive approach to data leads them to over-interpret noise. Recent success is a noisy measure of one’s current state (i.e. current probability of success). In this case researchers over-interpreted noise as evidence against the hot hand.

Now estimating this data-driven notion of hot hand, $latex Pr(\text{success} |\text{recent success})- Pr(\text{success} |\text{recent failure})$, isn’t completely silly. It captures one notion of hot hand, and that is positive feedback, i.e. “success breeds success.” Unfortunately game data is too messy to get much out of these as *effect size* estimates. BUT, if you try really hard, and you have really nice data, you can pick up a signal of hot hand despite all the noise, and that is what Andrew Bocskocsky, John Ezekowitz, and Carolyn Stein were able to do.
For controlled studies, measurement error, while still bad, is less bad because shots are taken closer together in time, and attenuation from strategic confounds are non-existent. As long as one corrects for the bias, you can get an estimate, and this estimate is *conservative*. Our point estimate of this effect for the *average* player is around 8 percentage points in the Three point contest with $latex \approx$30 shooters, 7 percentage points in Jagcinksi et al. (1979) with 6 shooters, 4 percentage points in the two-phased study we ran with 8 Spanish players (some players were higher, and this was predicted out of sample). Interestingly, the highest (bias-corrected) point estimate we found was 13 percentage points in GVT’s original study with 26 players (recent success = streaks of 3+). We discuss this in Section 3 of our paper, and the bias correction in Section E. Of course these are point estimates and the data is binary, so it is not as precisely estimated as one might like. For example, the standard error in GVT’s study for the 13pp average effect is 4.7pp.

• Rahul says:

Thanks. You mentioned “recent success = streaks of 3+”.

How robust are your point estimates to the definition of “recent success”? If instead of 3, you used streaks of 2, 4 etc. how stable is your structural estimate?

• Rahul

It was robust. Quickly checking footnote 36 of our paper, for GVT it was +10pp after 4 or more (which 3 or more includes), and +5.4pp after 2 or more. There is a trade-off: the higher your cut-off for streak, the less the measurement error, but you have less data, and the bias is stronger.

Not sure what you mean by our structural estimate–do you mean the bias correction? If so, the correction is conservative, it can only go up with the most reasonable alternative DGPs (see Appendix E).

• Rahul says:

The fundamental problem here is to take some metric that’s fuzzily defined (“hot hand”) & highly variable from game to game, person to person & time to time and then to make a blanket statement about it (“hot hand exists” or “there is no evidence that the effect size is small”).

It’s a recipe for disaster and no wonder it breeds the kind of interminable academic arguments we are seeing.

• Curious says:

Rahul:

What is ill-defined about *the probability of making the next shot*?

• Glen M. Sizemore says:

“What is ill-defined about *the probability of making the next shot*?”

But Andrew and some others appear to want to talk about something unobservable (e.g., “hotness”) that is the cause of temporally- or sequentially-local accuracies. Now, “hotness” (and presumably “coldness”) is a function of variables that are unspecified. The “hotness” is inferred from, in part, the occurrence of locally high accuracies, and then the inferred “hotness” is offered as the cause of that from which it was inferred. This strategy is everything that is wrong with mainstream psychology (a bit of hyperbole – there is SO much wrong with mainstream psychology). And statistics won’t fix this. But perhaps I have inaccurately characterized Andrew’s position…

• Corey says:

You missed the part where the time-varying hotness model is compared to the time-constant shot success probability model.

2. Dale Lehman says:

If he had said this in a courtroom, it is likely that he would have been forced to address the controversy. Courtrooms are far from perfect, but we can’t seem to create the same kinds of checks and balances in either academia or the media.

• Curious says:

Courts don’t seem very different to academia:

1. An assertion from an expert — similar to a top tier journal article.
2. A contradictory assertion an opposing expert — similar to a critique of that article.
3. A judge coordinates the presentation of evidence — similar to editors and reviewers.
3. “Truth” is determined by juries largely lacking the knowledge and skills to analyze the data and inferences presented and thus which side is right is decided primarily based on conventional wisdom — similar to the majority of any discipline who were trained in and continue to believe flawed methods and logic embedded in the conventional wisdom of the field.

• Rahul says:

I see two big differences: Vigorous Representation & Adversarial nature.

In most academic publishing no one really cares. You can usually say a lot of crap and it will pass without very close examination.

In courts, there’s usually a very motivated counter-party looking to tear down any gaps in your argument.

• Martha (Smith) says:

Rahul:”In courts, there’s usually a very motivated counter-party looking to tear down any gaps in your argument.”

“Usually” is to strong. It might fit if you add “when the opposing partly has enough money to hire a good attorney”. But when the opposing party is a poor person — they rarely have any real expectation of having good opposition to a prosecutor or someone bringing a dubious claim against them.

• Dale Lehman says:

Martha
That is certainly true an worth noting (perhaps even emphasizing). I was speaking from my experience in regulatory proceedings where everyone has enough money to contest any testimony. Even there, things are far from perfect and I don’t want to misrepresent the court proceedings as some sort of model. But compared to academia, it is light years ahead. Having referees and editors act the part of judges is not the same thing as having a legal process where decisions generally need to be justified and there is always the threat of appeal.

• Andrew says:

Dale:

I was a juror in a slip-and-fall case once. It was terrible. The two sides could’ve been making up just about anything they were saying—we had no way of checking. Even after it was all over I have no idea what to believe about the plaintiff’s case. It was really creepy, kind of like the whole fake-news thing we’ve been hearing about lately.

• Rahul says:

*You*, as jurors, may have no way of checking, yes. But the other side is who’s supposed to do the checking, right? If Party-A makes up some absolute bull-shit Party-B is supposed to be the one that catches them at it?

• Keith O'Rourke says:

Part of the challenge is that adversarial court proceedings are not trying to cause understanding of what happened but rather just causing the jurors/judges to agree with one case being put forward.

That displaces the value of any statistical/scientific argument that jurors/judges would have difficulty understanding and apparently also statistical/scientific arguments about how to best modify current court proceedings https://clp.law.harvard.edu/clp-research/access-to-justice/

• Rahul says:

@Martha

Aren’t you confusing competence / ability with motivation?

i.e. Usually there always is a very motivated counter party but it may not always be very skilled or have resources etc. e.g. Your example of a poor person vs an expensive legal firm.

I think this is still different than academic publishing: There just isn’t any counter party very often, forget about motivation or not. You write some crap and essentially no-one has the motivation even to question it critically.

3. Ibn says:

This seems to be a very fundamental problem, as eliminating a published result typically creates huge existential problems to someone, either the PI, but often some postdoc or student. I should not be so necessarily: without the original hot hand negative result, the more nuanced analysis probably would not come about, and there’d be no progress in neither understanding the phenomenon, nor the methodology. Naive first approaches are often needed to trigger deeper thinking about a question. So Gilovich’s original 1985 paper should be a valid CV item. The problem gets ugly when we are talking about people who entered research without any kind of formal training (e.g. almost everyone in psychology), and they produce fake results with studies designed so poorly that they don’t have a chance to begin with, as Andrew often mentions. In these cases, like Wansink’s, there is obviously no other way out than leaving research, leaving behind the battered bodies of postdocs and grad students along the way. This is ugly, and it might be the case the some people fear an outcome like this, and they don’t want to try with the whole ‘admitting errors’ roulette.

I think the situation could only be made better by significantly increasing the training requirements of getting into a research position in all areas of inquiry that are lagging in this regard. One other problem with this is that there are legions of people working in (mostly experimental, biomedical) research who’s compensation is in large part being called a scientist. If you take that away, research becomes much more expensive, as no one will do the gruelling part when being called a technician. But this way we have to let basically anyone in. I don’t know the solution to the problem. (Well, another aspect is that glorified technicians often become PIs, as there are many of them, and they burn funding on misguided studies, using e.g. fMRI, which is expensive as hell. So reducing this kind of behaviour would save a lot of money. Ok, I’ll stop now, this issue is complicated.)

4. Jordan Anaya says:

I’m pleasantly surprised by this post. Prior to pizzagate I didn’t follow any psychology news or blogs, but I do consume a lot of NBA news/blogs/podcasts, so I had actually heard that there was research that showed the hot hand was just a fallacy, but I had also heard that the topic was controversial, maybe from someone at Nylon Calculus.

As someone who has played a lot of basketball, and watches a lot of basketball, I can’t be convinced the hot hand does not exist. I mean look at this clip. Dwyane Wade is literally yelling at his hand it is so hot.

5. Glen M. Sizemore says:

Wouldn’t the first thing to do, with large samples from individual players, be to examine the goals per opportunity function? Is it obviously different than “flat”? Am I missing something? Wouldn’t be the first time…

6. MJT says:

glad you’re fighting the good fight on this

any expert still claiming the hot hand does not exist does a disservice to the credibility of the stats metrics analysis field.

from a pro athlete angle, (even retired pro athletes) these guys work on their jumpshot daily in order to tune every minute detail of shooting mechanics.

https://www.instagram.com/p/BSSa2SEARUS/

even at the amateur level, you can physically feel it in your forearm if your shooting form was slightly off. on the flip side, the positive feedback you get from making a shot will reinforce you to preserve that shooting mechanic the next trip down.

in the future, a reinforcement learning type of analysis would be interesting to re examine this.

in the present, i really like the exposition of how this specific bias leeches out based on the sequential DGP

7. Peter Ellis says:

Is there a publicly available large dataset for exploring this interesting issue?

8. I think this is a really important point about the way science should work.

There’s no harm in being wrong; we all make mistakes. The harm comes in not admitting an honest mistake. Heck, sometimes it’s not even a mistake but noisy or limited data. We should be able to acknowledge new evidence without feeling threatened.

• Z Basehore says:

+1

I think that speaks in part to the ‘incentives’ problem.

I’ve observed that many times, people are naturally defensive when shown to be wrong. I’m not exempt from behaving this way myself; none of us are! Then, compound that tendency with the fact that a retracted/error-filled publication can endanger someone’s tenure (and thereby their career stability), and you have a recipe for a refusal to accept the possibility of error in one’s own work.

The English clergyman Matthew Henry made the famous (and completely on-point!) observation in the 1600s that “…it is the corrupt bias of the will that bribes and besots the understanding: none so blind as those that will not see.” [See: https://www.biblegateway.com/resources/matthew-henry/Jer.5.20-Jer.5.24%5D

9. Elin says:

As far as I know, none of these studies has done anything like study this issue “That feeling very surprisingly doesn’t predict how you’re going to do in the next shot or the next several shots —”.

• Elin:

GVT had Cornell shooters predict their own shots and that of their teammates (with a betting task). GVT found that players couldn’t predict successfully. Unfortunately their analysis was carried out incorrectly, and when we correct for it, there is compelling evidence that Cornell players can predict at better-than-chance rates, and meaningfully better. See Section 3.4 of our paper. There is a surprising fact associated with this part of GVT’s study, which we shared last year, and you can find it in an earlier blog post here: “Low correlation of predictions and outcomes is no evidence against hot hand”. Essentially its measurement error in the opposite direction. A predictor can detect the hot hand without error, but you measure the predictor’s accuracy with error, because you only see success/failure, not hot/cold. Even if a predictor is perfect at identifying the hot hand, there is ceiling on how accurate a predictor can be in terms of outcomes.

• Rahul says:

>>> A predictor can detect the hot hand without error, but you measure the predictor’s accuracy with error, because you only see success/failure, not hot/cold. <<<

I'm not sure I'm understanding this correctly: You have a "predictor" here (hot/cold) that you can only measure indirectly via the outcome you are trying to predict (success/failure)?

How do you use such a predictor to predict the same outcome (which is the only way to measure the predictor in the first place)?

• Glen M. Sizemore says:

Rahul: I’m not sure I’m understanding this correctly: You have a “predictor” here (hot/cold) that you can only measure indirectly via the outcome you are trying to predict (success/failure)?

How do you use such a predictor to predict the same outcome (which is the only way to measure the predictor in the first place)?

GS: I’m pretty sure “predictor” here pertains to the shooter – Elin is talking about the shooter being able to observe the inner hotness and predict his or her accuracy.

• Rahul, Glen

sorry, I wasn’t clear. Bettors bet on the shot outcome of a shooter (or his or her own shot). In the toy example, the bettor can see perfectly when the shooter is hot and bets hit, and when the shooter is not hot and bets miss. In the example linked it was Pr(hit|hot)=.55 and Pr(hit|not hot)=.45, and Pr(hot)=.13. So bettor is good at predicting hot state, which is relevant, but not so good at predicting hits.

• Rahul says:

>>>when we correct for it, there is compelling evidence that Cornell players can predict at better-than-chance rates, and meaningfully better. <<<

"Meaningfully better" meaning "statistically significant"? Or……?

In any case, can you summarize *how much* better than pure chance?

• Rahul

on “meaningfully better”

sorry lazy/sloppy.

In our paper, on p.24-25 (Nov. 15 2016 version), we report the result. The correlation between hits and predictions of hit is $latex \hat{\rho}=.07$, with $(p<.001, permutation test)$. Correlations are dimensionless, so it is not obvious on the surface that this is meaningfully large, but because the variance in both hits and the predictions are close, this correlation is nearly equal to the coefficient in the regression of hits on predictions of hits, i.e. the difference in proportions $latex \hat{p}(\text{hit}|\text{predict hit})-\hat{p}(\text{hit}|\text{predict miss})$. This difference happens to be 7.7 percentage points with a standard error of 1.8pp, which has a nice interpretation, and is meaningfully large. More details in the footnotes.

• Elin says:

Yes, this difference between success/failure and hot/cold is the issue. Couldn’t you have a string of baskets without being in the hot hand zone? To me, what the sentence I quoted implies is that they interviewed or in some other way gathered data from players mid game and mid streak about if they were in the zone and that this zone/no zone variable was not predictive. How would you even do that?

I think that your work is still saying that the hot hand is a cognitive illusion, just a different one. Meaning, that it’s an artifact of probability not a state that someone gets into.

• Elin, Rahul: You could imagine hotness is a mental state about reliability and accuracy of motor coordination. In other words, it’s a very real internal state, and is perceivable by the player or by teammates who are very familiar with the player. In Joshua’s example, if I understand it correctly, teammates are asked to bet based on their perception of another teammate’s “hotness”. The thing is that hotness could dramatically effect the probability of a hit, say from 0.45 to 0.55 and you still wouldn’t be able to detect it very well because in a short string of trials 0.55 isn’t that noticeably different from 0.45 in terms of the kinds of sequences of Hit/Miss you’d see (say in a string of 5 “hot” shots)

Now, over a full game, if you have 2 or 3 players who are “hot” at least half of the time… we’re talking about say 100 total shots, and maybe 50 of them are hot, and 10% more are being sunk when hot… that’s 5 baskets in a game, or 10 points, which is a substantial advantage compared to if they were “not hot” the entire game.

So I don’t think this is an illusion (in other words, “hotness” could be very real), it’s just that observing Hit/Miss is a poor way to determine if someone is “hot”.

As Corey said above, the idea is that you could test the time-varying hotness model against a time-constant probability of success model, and in long sequences you could find that the time-varying model predicts the sequence better. Using additional information from a teammate observer could help as well. You’d want to compare the teammate observer’s prediction of hotness against a model in which they automatically predict hot after some number of hits in a row. It’s possible the teammate could predict hot even from just say one shot beautifully executed, it’s also possible that the observer is having a cognitive illusion based on just “multiple hits randomly in a row with constant hotness”

An interesting experiment would be to show a teammate the game, by video recording but whenever the player takes a shot, cut the video out so that they teammate doesn’t see the outcome or the few seconds afterwards (where they could infer the outcome). Then ask the teammate if they perceive a “hotness” in the actions of the player and see if their perception of hotness improves your prediction of the shot over a long game. Without the knowledge of whether the shot was hit or missed, you couldn’t have a “cognitive illusion” related to multiple hits, it’d just be whether the fluidity and mastery evidenced by the motion of the player up to the shot is valuable information.

• Glen M. Sizemore says:

“So I don’t think this is an illusion (in other words, ‘hotness’ could be very real), it’s just that observing Hit/Miss is a poor way to determine if someone is ‘hot’.”

GS: Yeah! If the measurements you are making (you know…of actual observable stuff) is inconvenient (it is “noisy” or doesn’t show what you want it to) just make up something unobservable. After all, doesn’t physics just make unobservable stuff up?

• Corey says:

Missing the point!

…which is that the aim is to show that the constant probability model cannot account for some features of the observed data and that the time-varying model can.

• It’s not unobservable to the player or to the teammates. The question is whether what the player or the player’s teammate observes (“flow state” or “hotness”) is in fact predictive of hits over the long run, or is a cognitive illusion (player “looks hot” but doesn’t actually do better during periods of “hotness”).

I give an example of an experiment in which the teammate is allowed to *observe* the motion of the player up to release of the ball but not the outcome. If “flow state” is observable through things like fluidity, quickness, or accuracy of motion, then the teammate will in fact *tell you* what they observe, and you can then see if their observations are associated with increased frequency of hits.

And, yes, physics does make up unobserved stuff all the time… Later they come up with some specific experimental techniques that can help to observe predicted effects, and this confirmation of the predicted effect is in fact evidence increasing the posterior probability of the model. The thing itself is essentially never observed (have you “seen” a nucleus recently?) but the effect of the thing is measurable.

In 1905 one of Einsteins “miracle year” papers was about brownian motion. At this point the atom was not a definite theory of matter, and Einstein used the unobserved atomic structure of matter to show that it could predict jiggling around of larger particles, he also used this idea to derive concepts related to osmotic pressure and to viscosity of fluids.

To explain the radiation of a black body, Max Planck assumed what was essentially a quantization of electromagnetic radiation. In the same year as his brownian motion paper, Einstein posited that a quantum of electromagnetic radiation, now called a photon, could interact with electrons in a metal and knock free an electron, and that this radiation must have a particular sufficient energy to allow the electron to escape the matrix of the metal. This was known as the photoelectric effect, and the posited unobserved thing was a quantum of light called the photon. You can’t observe an individual photon directly, but you can observe that it has certain effects that are different from what would occur if light came as a continuum of energy. (you can’t observe hotness directly, but you can observe that it would have consequences for hits and misses)

Rutherford’s experiment involving bombarding gold foil resulted in strange rebounds of alpha particles. This caused them to “make up” the nucleus as an explanation for the rebounds. Of course, you can’t observe a nucleus directly, but you can observe the consequences of nucleus (you can’t observe flow state directly but you can observe the consequences of hits and misses)

In the 1960’s Higgs “made up” the Higgs Boson to explain why some particles have mass. It wasn’t until recently in LHC collider experiments that sufficient evidence was collected of the existence of the Higgs Boson. Of course you can’t observe the Higgs Boson directly, but you can observe that it has consequences for the kinds of decays you’d expect (the Diphoton decay, and the 4-lepton decay https://en.wikipedia.org/wiki/Higgs_boson#Discovery_of_candidate_boson_at_CERN )

So, yes, exactly, that is what physics does. That is *exactly* how science proceeds, it posits unobserved causes that regularize the predictions of myriad effects.

• Andrew says:

Daniel:

Also, if we want to talk social and behavioral science, here’s a short list of important concepts that are not precisely defined, or whose definitions have changed in response to events:
– Democracy
– War
– Happiness
– Anger
– Musical ability
The list could go on forever.

• Right, just look at Miles Davis or Bill Evans and tell me “musical ability” isn’t a thing…. But as far as observable effects go, apparently my cell phone has as much “musical ability” as either of them, as it can play exact copies of everything they ever played!

• Anoneuoid says:

In the 1960’s Higgs “made up” the Higgs Boson to explain why some particles have mass.

I thought it was because they were calculating probabilities greater than 1 and less than zero, and people didn’t like that:

If you try to take the Higgs field out of the mathematics but keep the W and Z particles and the other heavy particles (such as the top quark) that we have already discovered and know are present in nature, you will find that the mathematics of the Standard Model simply makes no sense. You get a theory that predicts that certain processes (including ones that the LHC can study) occur with a probability bigger than one. Sorry, that can’t happen; it’s logically unsound. The probability of anything obviously cannot be bigger than one or less than zero. It might surprise you that it is very hard to write down logically sound theories. Most theories that you can imagine predict negative probabilities or probabilities bigger than one. Only a very, very few make sense. To restore the theory of the Standard Model to working order, you must add a Higgs field, or something like it, to the fields that we have already discovered experimentally.

https://profmattstrassler.com/articles-and-posts/the-higgs-particle/360-2/higgs-faq-1-0/

• Anoneuoid: I don’t pretend to know the details of Higgs bosons or particle physics at this level, but your link does describe how various particles (electrons, W, Z bosons etc) have mass precisely because of the Higgs field. My guess is that the nonsensical probabilities you mention are closely tied to the mass of certain particles.

• Glen M. Sizemore says:

“just look at Miles Davis or Bill Evans and tell me ‘musical ability’ isn’t a thing.”

GS: I know, hey! And just look at pancake syrup and tell me there’s no such thing as viscosity! Otherwise why would the damn stuff flow so slowly? And…just a word to the wise…watch out which medications you take…some will put you to sleep because they possess a “dormitive virtue.”

https://en.wiktionary.org/wiki/dormitive_virtue

• Glen: viscosity is an observed fact (things flow more or less slowly), to Einstein atoms were not an observed fact. But, at the same time, established models of how large particles like billiard balls move (newton’s laws), together with assumptions he made about the existence of atoms showed a prediction of… you guessed it, observed facts about viscosity.

You know what? Causing sleepiness is an observed fact (that is, we give people say Benadryl, and the vast majority of them become sleepy). An H1 receptor in the brain is more or less unobservable (not directly at least) but you know what? Drugs which are observed to cause certain chemical assays to come out in certain ways which are consistent with our assumptions about the existence of H1 receptors and which enter the brain as an observed fact through experiments on animals using radioactive tagging assays, consistently cause sleepiness. So, I guess what making up the idea of an H1 receptor wasn’t such a bad thing for the purpose of advancing the state of understanding of biochemistry and explaining why something has a “dormative virtue”.

Doctors are particularly prone to the disease you seem to be railing against (A doctor friend of mine went to an orthopedist complaining of pain in his foot…. Oh, says the orthopedist, you have metatarsalgia.)

But renaming a thing in an abstract way and pretending it’s an explanation is not the same thing as saying “there are states of the world that we can’t see, but they are real, and they predict the outcomes of stuff”.

In fact, this is essentially 100% of Bayesian statistics, which could potentially be defined as the use of the algebra of probability for the estimation of the values of parameters (by definition unobservable) through the observable changes they are associated to regarding outcomes (I don’t want to say causes here, because sometimes it’s a causal model and sometimes it’s just associational). Bayes is 100% model driven, that’s what p(Data | Parameters) means. It’s an assumption about how unobserved things (parameters) would affect our knowledge of what Data values might obtain in the world.

• Glen M. Sizemore says:

DL: Glen: viscosity is an observed fact (things flow more or less slowly),[…]

GS: Err…no. Rate of flow is observed (well…calculated from observable stuff). There is nothing wrong with naming that property “viscosity.” But to think that something like the amount of “viscosity” determines the rate of flow is just plain silly. Ahh…welcome to the world of mainstream psychology – and I’m guessing, to the world of most statisticians.
DL: […]to Einstein atoms were not an observed fact. But, at the same time, established models of how large particles like billiard balls move (newton’s laws), together with assumptions he made about the existence of atoms showed a prediction of… you guessed it, observed facts about viscosity.

GS: You mean “observed facts about flow rates,” right? Einstein did not validate “viscosity” as a cause of flow rate, which is where you’re really coming from. Of course, “viscosity” was never entertained as a cause like any number of “mental things” are in psychology (whether or not they are said to reside in the alleged mind or the brain). Like “hotness,” for example. Accuracy is related to, but not the same as, “hotness” – but “hotness” is the cause of local accuracy increases according to you. Right? Variables may “influence hotness,” but it is the degree of “hotness” that mostly determines whether or not there is an unusual surge in accuracy. The relationship is not perfect – or else the measured accuracies would suffice. No physicist would say that “viscosity exists,” but you are quite willing to say that, for example, “musical ability” exists when it is, quite clearly, a reification – a figment of circular reasoning.

DL: You know what? Causing sleepiness is an observed fact (that is, we give people say Benadryl, and the vast majority of them become sleepy).

DL: An H1 receptor in the brain is more or less unobservable (not directly at least) but you know what?

GS: Let’s just say that I’m prepared for what you are about to declare.

DL: Drugs which are observed to cause certain chemical assays to come out in certain ways which are consistent with our assumptions about the existence of H1 receptors and which enter the brain as an observed fact through experiments on animals using radioactive tagging assays, consistently cause sleepiness.
GS: Thanks for the heads up. I didn’t say that receptors were reifications – I said that “hotness” is and so is “musical ability” and “dormitive virtue” and so are a variety of “things” when they are described as causes of behavioral phenomena…that includes stuff like “knowledge,” “beliefs,” “expectations,” “intentions,” etc. etc. etc.

DL: So, I guess what making up the idea of an H1 receptor wasn’t such a bad thing for the purpose of advancing the state of understanding of biochemistry and explaining why something has a “dormative virtue”.

GS: First, you really need to read this paper:
http://meehl.umn.edu/sites/g/files/pua1696/f/013hypotheticalcconstructs.pdf

Then you have to actually think about the differences between how physics goes about positing hypothetical constructs and how psychology does. Finally, the issue is not whether or not something has a “dormitive virtue” – the issue is about claiming “dormitive virtue” is a thing or collection of things and then (LOL) claiming that “it” was remarkably prescient! But “dormitive virtue” was never a theory or hypothesis – it was simply a placeholder for whatever events occurred between cause and effect observed at one level of analysis. “It” specifies nothing other than that which has to be explained – not any aspects of the cause (like occupation or blockade of a receptor).

DL: Doctors are particularly prone to the disease you seem to be railing against (A doctor friend of mine went to an orthopedist complaining of pain in his foot…. Oh, says the orthopedist, you have metatarsalgia.)

GS: Gosh, I sure hope that that word means more than “pain in your foot.” Otherwise…you know…it can’t be a cause because…well…”it” would just be a name. But, unlike something like “dysgraphia,” “metatarsalgia” is probably a reference to actual structures. As for what I am railing against, there seems to be little indication that you know what that is.

DL: But renaming a thing in an abstract way and pretending it’s an explanation is not the same thing as saying “there are states of the world that we can’t see, but they are real, and they predict the outcomes of stuff”.

GS: The devil is in the details – superficially, hypostatisations resemble, formally, legitimate postulations of unobservable stuff. Just because there are atoms and receptors doesn’t mean that “beliefs” are things that can cause behavior.

DL: In fact, this is essentially 100% of Bayesian statistics, which could potentially be defined as the use of the algebra of probability for the estimation of the values of parameters (by definition unobservable) through the observable changes they are associated to regarding outcomes (I don’t want to say causes here, because sometimes it’s a causal model and sometimes it’s just associational). Bayes is 100% model driven, that’s what p(Data | Parameters) means. It’s an assumption about how unobserved things (parameters) would affect our knowledge of what Data values might obtain in the world.

GS: But the “unobservables” are, quite frankly, NOT THINGS – unless, say, a “mean” is a thing. You really need to read this paper:
http://meehl.umn.edu/sites/g/files/pua1696/f/013hypotheticalcconstructs.pdf

and then actually think about what you have been taught.

• Glen: I don’t do psychology, I have a PhD in Engineering and a BS in Mathematics, and I model real world causes of things, and I know the difference between an actual cause, and a renamed effect posing as a pseudocause, that’s why I made the quip about “metatarsalgia” which just means “a pain in the metatarsal bones of the foot”. That’s not a cause, that’s a symptom posing as a cause, just like “dormative virtue” just means “it makes you sleepy” whereas “it binds to H1 receptors in the brain which makes you sleepy” is a causal explanation.

We seem to be talking past each other. As an Engineer I understand the concept of feedback control, and I see that a human playing basketball is a system under feedback control, and I see that the organism goes through a time-varying state because I am one of those organisms and I realize that my golf game isn’t as good if I have a hangover or whatever. There *is no* “binomial probability of success”, this is a fiction that helps us quantify Y/N outcomes, and it’s a poor fiction, and it isn’t sensitive to things like “periods of increased accuracy during which less than several thousand shots are taken” because the binomial “p” is an asymptotic concept (ie. the fraction of successes in a *very long* sequence of trials).

to me “hotness” isn’t a cause, “hotness” is a name for when the internal feedback control system of a basketball playing human is operating in a relatively more accurate region of phase space for a relatively prolonged period of time (like a whole game or a half a game or something… not just one shot).

So, I think we’re in agreement, “hotness” isn’t a cause. But, this doesn’t mean that there isn’t “hotness” it’s like saying because there is iron oxide in a paint and iron oxide reflects long wavelength visible light that there isn’t “redness”.

So, I don’t posit “hotness” as a cause of long sequences of high accuracy, I posit time-varying internal state of the brain’s feedback control systems that results in more fluidity, more grace, more quickness, more accuracy, more quick recovery, etc. And when a teammate sees all that stuff happening, I assume they say “Joe is really hot” just like when you pour a bunch of iron oxide into a paint base people look at it and say “the paint is really red”.

• Glen M. Sizemore says:

DL: Glen: I don’t do psychology,[…]
GS: You do when the dependent-variable is human behavior – or, at least, one can make a good argument that that is so.

DL: […]I have a PhD in Engineering and a BS in Mathematics, and I model real world causes of things, and I know the difference between an actual cause, and a renamed effect posing as a pseudocause, that’s why I made the quip about “metatarsalgia” which just means “a pain in the metatarsal bones of the foot”. That’s not a cause, that’s a symptom posing as a cause, just like “dormative virtue” just means “it makes you sleepy” whereas “it binds to H1 receptors in the brain which makes you sleepy” is a causal explanation.

GS: But pointing to receptors doesn’t validate previous talk about “dormitive virtue.” That remains a silly reification.

DL: We seem to be talking past each other. As an Engineer I understand the concept of feedback control, and I see that a human playing basketball is a system under feedback control,[…]

GS: Actually, and importantly, the “feedback system in question is not the human but, rather, the human embedded in an environment. No, that’s not exactly right…the specific feedback system consists of the response class and its consequences.

DL: […]and I see that the organism goes through a time-varying state because I am one of those organisms and I realize that my golf game isn’t as good if I have a hangover or whatever.

GS: Well…it is actually what is measured that “goes through the time-varying state” then, no? It isn’t the “organism” or its insides that assumes various loci in state-space.

DL: There *is no* “binomial probability of success”, this is a fiction that helps us quantify Y/N outcomes, and it’s a poor fiction, and it isn’t sensitive to things like “periods of increased accuracy during which less than several thousand shots are taken” because the binomial “p” is an asymptotic concept (ie. the fraction of successes in a *very long* sequence of trials).

GS: I’d have to think some about this just to make sure I know what you are saying and what I think about it. It almost sounds like you are attacking the use of accuracy measures in psychology…and…the “binomial probability of success” may not exist, but there is a sense in which what is represented by hits/total trials does. I’m probably missing your point…I do suspect more about what you are saying but I’ll keep it to myself for now.

DL: to me “hotness” isn’t a cause, “hotness” is a name for when the internal feedback control system of a basketball playing human is operating in a relatively more accurate region of phase space for a relatively prolonged period of time (like a whole game or a half a game or something… not just one shot).

GS: Now, see, I’m mighty suspicious about terms like “…the internal feedback control system of a basketball playing human.” Mighty suspicious indeed. It sounds like standard mentalism to me. You are measuring accuracy still, no matter what you say, and you are attributing increases in accuracy over some time frame to an unmeasured, unobserved inner “thing” that is the temporally-proximate cause of characteristics of behavior. How is it that you think that you are saying anything fundamentally different than: “…to me ‘hotness’ isn’t a cause, ‘hotness’ is a name for when the MIND is operating in a relatively more accurate region of phase space for a relatively prolonged period of time”?

DL: So, I think we’re in agreement, “hotness” isn’t a cause.

GS: Are we? I’m guessing that you are sort of saying that “hotness” is just a shorthand for the combined effects of several continuous variables that affect behavior and that the relevant behavioral measures are continuous as well. In any event, you are taking the “shorthand” route, right? But if so, then what you are really talking about are the combined effects of several independent-variables on aspects of behavior that ultimately result in the dichotomous hit or miss. But that means that, whatever the didactic utility of “hotness,” it isn’t a thing (hypothetical construct; you’re not gonna read that M&M paper are you?) but it can function like a made-up thing and curtail investigation of the actual variables. We do this in ordinary-language and it is why it is done in ostensibly-scientific endeavors. We ask why John did X, and we are told “because he believes that Y.” Now…that sounds like an explanation but the “belief” is either a thing (according to some people) or it is shorthand for other stuff (you know, like a person’s history?) – in any event, in neither case is the connection made between historical events and the behavior. This is, in part, also driven by the quest for talking about temporally-contiguous causation which fuels the reification of behavior as mental stuff. We infer a “belief” or “knowledge” from behavior and then say that the inferred thing as the cause.

DL: But, this doesn’t mean that there isn’t “hotness”[…]

GS: Actually, that’s exactly what it means.
DL: […]it’s like saying because there is iron oxide in a paint and iron oxide reflects long wavelength visible light that there isn’t “redness”.

GS: I could go along with that. There isn’t any such thing as redness. But “red” is still a name emitted under stimulus control of certain variables (and not just wavelength, though that is an important one). So is “belief.” So is “hot.”

DL: So, I don’t posit “hotness” as a cause of long sequences of high accuracy, I posit time-varying internal state of the brain’s feedback control systems that results in more fluidity, more grace, more quickness, more accuracy, more quick recovery, etc.

GS: Sorry…to me eye, you are saying exactly that hotness is a thing “…that results in more fluidity, more grace, more quickness, more accuracy, more quick recovery, etc.”

DL: And when a teammate sees all that stuff happening, I assume they say “Joe is really hot” just like when you pour a bunch of iron oxide into a paint base people look at it and say “the paint is really red”.

GS: OK…agreed. But right now we aren’t really talking about what a teammate says…we are talking about what you say and, with all due respect, it sounds like standard mentalism. Didn’t you say in a previous post somewhere that “hotness” is “mental” or “in the mind” or something like that?

• Rahul says:

@Daniel

I’m still struggling to understand how this is different from the Himmicanes stuff Andrew regularly criticizes.

The signal is tiny and the noise in the data seems to be swamping the signal in the actual data but we still cling to some tiny signal we can discern in an artificial control and brandish it as evidence of the phenomenon?

• In this case, statistics *predicts* that the observable signal would be very noisy even for “large” (say 10 percentage points) increases in the internal state. So, the Bayesian thing to do is to hold on to the possibility until you’ve accumulated a *very large* body of evidence that eventually allows you to distinguish what’s going on.

The “typical” frequentist NHST type analysis looks at a smallish dataset, assumes a null of no effect, finds that it can’t reject the null given that say p = 0.22 or whatever, and then falsely assumes the null is true and *there is no effect*

• @Rahul (and maybe @Daniel)

In what you refer to as the “artificial control” the signal is not small. Also, I see no relationship to himmacanes, I mean the two principle issues with himmacanes are: (1) forking paths, and (2) selection (outside of NHST). These aren’t big issues here, because: (a) we use statistics previous researchers used, (b) selection has plausibly operated in the opposite direction, and (c) the results replicate out-of-sample.

In any event, I agree that if you tie your hands and restrict yourself to game data, current studies either fail to find a signal (Gilovich, Vallone & Tversky (1985, Study 2) Huizinga & Weil (2009), Rao (2009)), or it’s appears small (Bocskocsky, Ezekowitz & Carolyn Stein (2014)). As mentioned above, even if there existed game data that is cleaner and more controlled than you could ever expect, and the players had big hot hands, current methods of analysis wouldn’t pick up much, as this bit of code that we have illustrates so clearly (you are welcome to play with the parameters).

Now in settings where defense is not an issue—Gilovich, Vallone & Tversky (1985, Study 4-Cornell data), our Spain semi-pro data and Jagacinski et al., and the NBA’s Three Point Shooting contest—the effect size estimates are meaningfully large by basketball standards (5p.p. to 13p.p). The degree of control here varies, as well as the degree of artificiality, yet the results are consistent.

Should we really restrict ourselves to analyses of game data when forming beliefs about the hot hand in game data? Aren’t we trying to be scientists here? Why can we not perform inductive inference to game data here? We have: (1) a strong estimated effect of a prior streak on subsequent success in relatively controlled settings, and (2) there is some evidence regarding underlying mechanisms, e.g. variation in motor control (steadiness & coordination) which, btw, can be infuenced by amphetamines (and likely by their endogenous analogues), self-efficacy, flow.

10. I had a feeling, one of the most compelling feelings that I can have, that we were finally done with posts about hot hands. That feeling surprisingly didn’t predict what was going to happen with this post. It’s disappointing, on many levels.

11. rtd says:

Dubner has repeatedly shown himself to be an absolute hack. Levitt should’ve stopped attaching his name to the ‘Freakonomics’ franchise after book #2 – the podcast is a lot of Dubner hacking away & riding Levitt’s success.

12. Kit Joyce says:

I think you’re all missing the more important question which is: is the hot hand caused by incidentally assuming power poses in the course of the regular athletic movements made during the game?

• Alex says:

+100

• Jonathan (another one) says:

In the basketball games I watch, the power poses all come *after* the shot.

• Andrew says:

The real question is, are you more likely to make a 3-point shot if your name begins with 3?

• Jordan Anaya says:

Actually, a question that came up recently in the NBA was whether having a father who was a superstar prevents you from becoming a superstar.

Lavar Ball claimed Lebron’s kids will feel too much pressure to live up to his legacy so they won’t become superstars. I tend to think it’s just really rare to be a superstar, so the fact that a superstar has never had a son that is also a superstar is just basic probability, but it would be interesting to get some Stats Pros on this problem.

• Jonathan (another one) says:

The fact that a superstar has never had a son who is also a superstar is also an artifact of regression to the mean as well as the fact that few NBA players of any caliber are the offspring of multimilionaires for a number of well-understood environmental reasons. Baseball, on the other hand, has the Griffeys at least (although Griffey Sr is not in the Hall, or particularly close to superstar caliber) and a number of multigenerational good talents, if not a superstar line. Football has the Mannings, though, and that’s pretty close.

• Steve Sailer says:

Golf is surprisingly lacking in dynasties since Old and Young Tom Morris in the 19th Century.

Baseball used to have few dynasties, but then had a remarkable number in the late 20th Century with the Bonds, the Griffeys, etc. I haven’t checked in the last ten years to see if this is continuing, but I did notice about 5 years ago that 4 of the 25 Dodgers were sons of MLB players.

Basketball seems to be filling up with guys who are the offspring of both male and female college basketball players.

One question is whether any famous athletes are the secret illegitimate children of famous athletes. Probably some, but maybe not as many as we might think off the top of our heads.

There apparently haven’t been all that many famous people in history who were secret sons of other famous people. For example, there’s a reasonable but not proven theory that the painter Delacroix was the illegitimate son of the statesman Talleyrand, but most of the evidence (Talleyrand backed Delacroix’s career) is also congruent with Talleyrand just being a friend of the family.

I’ve long been on the lookout for other examples like this, but there don’t seem to be all that many.

• Steve Sailer says:

Auto racing has lots of dynasties because the job is kind of like being a political candidate or a movie producer: talent is involved, but also having name recognition and being a plausible leader for other talented people to work for.

13. Glen M. Sizemore says:

How would a natural science of behavior ask questions about the alleged “hot hand”?

First, what is the general nature of the problem? Shooting (and sometimes making) baskets is operant behavior that is (thus, by definition) affected by its consequences. Clearly, aspects of the behavior’s topography (how the behavior looks, as opposed to how it affects the environment) is a function of the effects that the form of responses has on the trajectory of the ball. This is how skilled (and sometimes not-so-skilled) operant response classes that are relevant to many sports (and other areas – the issue pertains to all operant behavior including, importantly, rats pressing levers) get produced. Initial responses are wildly variable but are not without effect upon the environment. A self-taught golfer hits the ball every which way, but the differences in trajectories differentially select forms of the response that produce those trajectories – “more toward the green” makes members of the class of responses that produced them more frequent (“positive reinforcement”), while “not toward the green” probably makes members of another class of responses occur less frequently (“positive punishment”). This is not rocket science – it is much harder, as the study of the behavior of animals is probably the most complex scientific endeavor of all time. Now…reinforcers, by definition, make more probable a class of responses where the class is defined by the “endpoints” of the members – by the effect of the members on the environment. Such responses may look different (sometimes considerably so – for example “opening a door” is an operant response class whose members may be of quite different topographies) but instances of members of the class are defined in terms of their consequences. It is important to note that a definition of “a response” can never be in terms of its form – it can never be strictly in terms of “physical movements” since each “occurrence” is different. Here “occurrence” is in quotes because, in terms of physical movements, there is no “response” that can be counted as “an occurrence” as no segment of the behavior stream is exactly like any other segment in terms of form. Just watch a rat emit food-reinforced lever-presses – no two are identical. But members (by definition) all have one thing in common – they close the microswitch mounted on the lever. And, of course, that is the definitional operant class. There is a lot of behavior that occurs because of reinforcement of members of the definitional operant that is not, itself, in the definitional response class (i.e., the “functional response class”). The rat will, for example, sometimes touch and depress the lever with insufficient force to close the microswitch. Such a “response” exists because of the reinforcement of lever-presses but is not, itself, a member of the definitional response class (which is defined by the closure of the microswitch). So, instead of the complexities of the basketball court, turn to the relative simplicity of the rat pressing a lever. Note I said “relative simplicity.” Now, if there was such a thing as the “hot hand,” it could be very, very complex. One need not see, for example, that hits per opportunity increase in general. It might be a rare but real phenomenon. But if it is real, it is something that takes place in the context of a complex dynamic system where certain movements result in certain consequences (and the probability is thus “strengthened” or “weakened”) but reinforcement also produces behavior that does not, in fact, have the same consequence. Behavior that results in the ball going in also results in the strengthening of behavior that results in hitting the rim. But “hitting the rim” likely exerts some punishment effect on the probability of similar responses…but similar responses include those that result in the ball going in! That is the nature of the scientific problem that we’re talking about here. So…say that in addition to looking at (counting, keeping track of, measuring) behavior that closes the microswitch we record the arc through which “failed lever-presses” depress the lever. Now…how do such responses change in terms of the arc over time? Imagine a 3d frequency graph where the axes are frequency, time, and arc. Now…look at that graph for, say, 30 experimental sessions. You would see order but you would also see the hills and valleys of the distribution of arcs drift around. If the “hot paw” is located anywhere, it is in such distributions over time. And, keep in mind, we have only been considering the arc. Any given rat presses levers with one paw, two paws, one on top one on the bottom, one paw and the snout, two paws while biting on the lever. All these features of behavior change over time and do so because of the dynamic system that results from a simple dependency between microswitch closures and food deliveries. How do such aspects of responding change over time and how do we describe the general features of the dynamic system wrought by the “simple” contingency of reinforcement in order to have a science relevant to lever-presses, key-pecks, jump-shots, and guitar riffs etc. etc. etc. If the “hot hand” exists, it exists somewhere in the realm of the complex dynamics of operant conditioning. OK…I’ll stop here. Apologies for any typos or unclear writing that I missed.

14. Rahul says:

Just because game data is too noisy to detect evidence of a “hot hand” the studies seem to use an artificial setting as a surrogate.

Doesn’t these raise big questions about external validity? When the effect itself is so small can we be confident that what we detect in the surrogate games is a good indicator what happens in the real games?

• Yes, it does.

I think it’s wrong to assume that the “hot hand” is a definitely *proven* effect at this time. But it’s also very very wrong to assume that it’s definitely “proven” to be an illusion.

• Rahul: True that there is no guarantee that results will extrapolate. On the other hand one must consider (1) what should be one’s priors that the hot hand exists (i.e. pre-GVT), and what should inform them (I’d weight practitioner perspectives), and (2) there are degrees of artificiality. In some controlled shooting situations the task is quite close, and the main difference from the game is (1) defender, (2) incentives. Even in games there are unguarded shots. In some semi-controlled shooting situations, like the Three Point Contest, the incentives are high.

Daniel: I think “proven” is too strong. The question is what should we believe given the available evidence. We know that there is no scientific way to measure the strength of the hot hand using game data—-e.g. see our code here—so our beliefs have to come from somewhere else. Two things are true: (1) there is no justification for the “hot hand fallacy” view, (2) the best available evidence says that the hot hand exists, and it is meaningfully large, in the settings where it can be estimated. This is using data that GVT considered the critical test of hot hand shooting.

• I was saying they *weren’t* proven. Basically I am agreeing with you that we have reasons to believe that there are “hot hands” and also that there is no real justification of the “fallacy” notion, but we don’t have a lot of evidence to tell us much about hotness or its variability. See below where I discuss how the “fallacy” notion implies basically a “constant p” which is in essence an extremely precise and implausible prior on shot-taking accuracy as measured by regions of phase space for the initial conditions of the trajectory. It’s much more reasonable to believe that at various times players will have more or less precise and accurate ball control just because…. with humans stuff varies through time.

In fact, what is really the case is that Hit/Miss data gives an extremely imprecise tiny-bit of information about the time-averaged value of p and the “fallacy” people have taken “absence of evidence” as “evidence of absence” of time variation in p. In other words, they have effectively a very strong prior of no variation that they’ve hidden inside an implausible “null hypothesis”.

• Glen M. Sizemore says:

“…use an artificial setting as a surrogate.”

GS: LOL. Yeah…in another context such an “artificial setting” might be called a “laboratory.” Now, I be naught but a simple lad, but I done heard tell that them thar fancy laboratories can sometimes be mighty useful to them high-falootin’ scientists. I guess there’s no guarantee WRT “external validity” though. You pays your money and you takes your chances.

15. Dave Meyer says:

Miller & Sanjuro? That’s Reggie Miller; right?!

16. Hot under the collar says:

Every physicist I know starts from the premise that the hot hand is fallacy and refuses to listen to the contrary until presented with a decent explanation why it might not be. Players get tired, so it might be reasonable to expect the operative p to decline during a game or a season, but basically shooting a basketball is as close to flipping the famous nonexistent Bernoulli p coin as anything macroscopic comes. If your statistics is showing a hot hand effect, the operative premise should be that you are surely doing something wrong. The problem is apparently statisticians are also almost surely doing something wrong when they don’t find a hot hand effect. Since none of them can be trusted, and none of them can give a decent mechanism for the purported effect, best to assume it’s nonsense until they do.

All sarcasm aside, since no one can define “hot hand” in any kind of clear operational way, isn’t this just the usual statistical test cherry picking that so much of this blog goes on about?

• Rahul says:

>>>isn’t this just the usual statistical test cherry picking that so much of this blog goes on about?<<<

Exactly my thought. What puzzles me is how this is any different from himmicanes or Tracy & Baell's red-garment-fertility fable?

Just coz' there's no p-value waving in our face, do we let down our crap defenses? We all have our blind spots, eh?

To me this seems the same old story of obsessing after some subliminal, ephemeral "effect" signal in a mountain of noisy data. When the effect is elusive just keep shifting the goal posts. Sooner or later you are bound to smoke the artifact out. Just make sure you have enough data to through at your data mining machine.

• I see this as the difference between people who think about causality, physics, feedback control systems, and bio-organisms as machines and people who think like “every outcome in science is secretly the output of a random number generator”.

I don’t disagree with the basic idea that time of the month could influence wearing red etc. In fact I’ve come on the record here at the blog before about believing it probably does… What I, and Andrew and others here find problematic is *whether the kinds of results obtained from those studies prove the effect and correctly estimate its magnitude*

Here we have a situation where the complexity of what it means to play basketball as a Homo sapien almost HAS to ensure that your performance varies through time, and yet because performance is inconsistent (any given shot is not perfectly predictable) the RandomNumberGenerator-istas assert that it HAS to be the case that we assume that basketball players are Bernoulli RNGs with constant p until proven otherwise through extensive data collection.

Why do I HAVE to assume a clearly biological-physical-feedback-controlled organism *is* a Bernoulli RNG? This is a disease of people trained in frequentist statistics.

• Elin says:

And from a qualitative perspective, the people who know the most–players, coaches, dedicated fans– tell you over and over that there is a hot hand. (And what they mean is really that p changes.) There is a kind of patronizing normal “people are foolish about the things they think they are experts in” argument in much of this work. The point of the RNG argument is not really that there is or is not a hot hand, it is that people misperceive reality in basketball and most other contexts.

I think it is always not a bad idea to ask the question “could this be the result of a simple random process?” but even if it “could” be that does not mean it is. But when the point of it becomes to make a larger argument about people misperceiving random events that becomes a different issue. To me, the larger argument about human misperception of probabilities (risks and benefits) is one that I have always found problematic. I don’t think that argument has a lot to do with frequentist versus Bayesian statistics (in fact they’ll happily use Bayes when it serves their purposes). Though as I started to press the submit button I think that actually, that’s not quite right, in that this perspective certainly has kind of implicit bias toward the idea of a very literal “no difference” model and fixed, known probabilities.

• Hot under the collar & Rahul-

I don’t see why it is cherry picking—it is the same stat everyone else used, and it replicates out-of-sample.

• Lots of people don’t want to believe in IQ either, but you know what? SAT tests do better than flipping coins at predicting who will and who won’t do well in a university education, and high SAT math scores do well in predicting who will do well in Engineering for example.

Sure, GPA + SAT may not be much better than GPA alone, but if all you had were SAT scores you’d be stupid to generate random numbers and pick the “winners” when betting against someone who uses the SAT information.

So, regardless of what you’d like to attribute to IQ (it may be oversold for example) it’s still clear that the SAT measures something that persists through time about students and which is predictive of their success at doing stuff related to say an Engineering education.

Now, suppose I have a robot with a hand, and it has a camera and force sensors and some fairly sophisticated software, and there are two of them. one has software that always does exactly the same thing… and one has software that randomly perturbs its internal parameters and then uses feedback on its success to converge on operational success (stochastic optimization).

The first one will always have the same success rate for free-throws for example, the second one will have successes that vary through time because it intentionally varies its internal algorithm to try to get into the best regime possible… The second one will have “hot hands” during time periods when the internal parameters are more successful. End of story. Physicists, not spending much time writing code for feedback control systems, generally shouldn’t be trusted to opine about feedback control systems.

The idea that a professional basketball player would be “as close to flipping the famous nonexistent Bernoulli p coin as anything macroscopic comes” is the most laughable thing I can think of. Let me give you some real world issues to consider:

1) Rhinoviruses
2) Minor injuries
3) Fatigue
4) Marital Distress
5) Recent birth of a child
6) Contract negotiations
7) Medications
9) Discouragement
10) Encouragement
11) Team coordination through successful training of new members
12) Recent success on deep sea fishing expeditions
13) Color of opponents jerseys
14) Quality of Breakfast
15) Date of last vacation

I mean really.

17. Bryan says:

I don’t think any of these studies can do a good job of detecting the hot hand effect because there are so many confounding factors such as teammates, opponents, and strategy. If my team goes to a small lineup with multiple good passers, they will create good opportunities for me and I will shoot well, so I will appear hot. If the player guarding me is bad at defense, I will shoot well, so I will appear hot. If the other team starts double teaming my teammate, I will get open a lot and shoot better, and appear hot. If I’m shooting well, the other team may guard me with a better defender or start double teaming me and I will shoot worse, which would negate the hotness effect. I could go on.

• David says:

You should ask Zach Lowe (NBA writer at The Ringer) who has led a lot of the mainstream discussion around NBA data statistics to suggest a better data-set to run this test against, if the idea is really to discover about the hot hand and not just illustrate a bigger point about human intuition.

18. For the randomnumbergenerator-istas….

Suppose I generate a random number uniformly between 0 and 1 and then use it to as p to generate a bernoulli random number… and I give you the output of this bernoulli stream.

Now, objectively p was different EVERY SINGLE TIME. Can you detect this fact?

We could look at autocorrelation for example, but we won’t find any unusual levels of autocorrelation… but, there’s lots of stuff you can do. I invite you to start with this code and try to find out from just the output of the 100 numbers whether p varied each time, or not:

http://www.r-fiddle.org/#/fiddle?id=Q7GA8GXf&version=1

Let’s face it, the binary yes/no output just doesn’t tell us much about the internal state of the human shooting the basketball, and this is precisely WHY statisticians tell people NOT to artificially dichotomize results that are otherwise continuous.

• Another way to think about this is, there *IS NO* objectively correct p value for a given shooter’s performance. This is because *HUMANS PLAYING BASKETBALL ARE NOT BERNOULLI RNGS*

The real question is whether some of the factors that affect real world time-variation in performance are in fact detectable by the players themselves and their teammates.

Now, I don’t play basketball very often, but when I do shoot a basket, I very frequently have the feeling “ooh I missed that one” long before the ball gets to the rim. This is because I know what it is supposed to feel like when the ball goes in the direction I wanted it to go in. I can totally imagine that there are times when a player consistently feels “that one is going to be in, or at least very close” and other times when they often feel like “gee that’s an airball” well before the ball gets anywhere close to the basket.

That internal knowledge that the forces on the ball and the timing of the release, and the subtle placement of the hands, and the wrist action and timing etc are all “right” (that is, close to what was really needed) would be predictive of success for a person sufficiently trained with sufficient experience to know what success “feels like”. And “feels like” is exactly the thing, it’s feedback of reaction forces on the ball that is the key source of knowledge for the player. But, no-one else can possibly feel those forces… so they’re not “observable” outside the player’s head.

On the other hand, the gracefulness or smoothness or timing or whatever are potentially observable with your eyes, and you can guess “gee that one is going in” if you’re familiar with what the physics of a good basketball shot look like… and to say that people don’t go through periods where they consistently get “closer” to perfect shots vs other times where they are “farther” just because a sequence of Y/N answers is too noisy to help you detect it… that’s crazy.

Now, suppose for example that you were to calculate the “perfect” trajectory from the point of where the ball was released. That is, the trajectory a ball would have to be a perfect “nothing but net” shot. And then you time integrate the euclidean distance from that perfect trajectory for the ball from the time it leaves the hands to the time it first hits any object… Would it be so hard to believe that there are periods of time where players consistently get smaller integrated deviations from perfect, whereas other times they have larger integrated deviations from perfect?

Thinking like “all my data comes out of an RNG and I need strong evidence to assume otherwise” is a disease.

• Let’s suppose a basketball shot outcome (swish or not) is completely determined by 3 position coordinates, 3 velocity coordinates, and 3 angular velocity coordinates (9 degrees of freedom). We’re ignoring air currents, and pre-existing vibrations in the basket frame and blablabla.

Now let’s define an epsilon neighborhood of an initial condition as all initial conditions within 5mm for the location coordinates, within 5mm/s for the velocity coordinates, and within 2*pi/100 radians per second for the angular coordinates. This is more or less the accuracy with which we could extract the initial conditions using several laser scanners or something.

Now, imagine breaking the hyper-volume of the epsilon neighborhood into say 1 million sub hyper-cubes and suppose that regardless of what the location is within the sub-cubes you always get the same Y/N output for whether the basket was made. If not, sub-divide more than 1M times until you have a constant outcome in each cube.

The assumption “a basketball player is as good as a Bernoulli random number generator with a time-constant p” is in essence the same as the **very strong** physical assumption that **every time** a given professional basketball player shoots the ball, there are the same fraction of sub-hyper-cubes in the epsilon neighborhood of the shot that have “Swish” labeled on them to within say 0.001 (I think a tenth of a percentage variation is probably irrelevant for game outcomes where only on the order of a hundred shots are taken)

This assertion is patently ridiculous on its face and should require enormous amounts of data to convince us… and yet, it’s asserted that this has to be the default NULL hypothesis and we should require enormous amounts of data to reject it!

Rahul: this is why I don’t think we’re talking about “noise mining” here. In the ovulation and voting / himmicanes type case researchers are using what is essentially “no information” (very poor measurements) to assert strong evidence of an effect and its magnitude, and in this case, they are using “no information” (ie. very poor measurements using only Y/N outcomes) to assert strong evidence of no effect (or an effect with nearly precisely zero magnitude). In both cases, they are vastly over-playing their information hand and arriving at incorrect conclusions.

• In fact, the translation invariance of the laws of physics, together with the existence of airballs completely invalidates this constant p assumption stated as I’ve stated it here on the face of it. So, again, we’re being asked to take “absence of evidence from a bernoulli trial analysis” as “evidence of absence of variation in p” even though airballs show that p must vary significantly such that p~0 has some nontrivial density, possibly even a point mass for some shots.

You might object to my definition of p in terms of the epsilon neighborhood, but it will require a very careful description of some alternative meaning for p to be satisfactory, and the Frequentist “the default is bernoulli trials with constant p” is not one of those careful descriptions of alternative meanings.

• Hot under the collar says:

You need to convince me shooting a basketball is not well modeled as a simple Poisson process and that every such process (or similar ones) does not exhibit what you consider to be hot hand effects.

• Corey says:

Suppose I generate a random number uniformly between 0 and 1 and then use it to as p to generate a bernoulli random number… and I give you the output of this bernoulli stream.

Now, objectively p was different EVERY SINGLE TIME. Can you detect this fact?

Too easy. How about you simulate p from a Beta distribution with given shape parameters?

• Corey says:

Just noticed the curious fact that if you assume this model and unbounded data, then eventually you learn everything about (alpha / beta) and nothing about (alpha + beta).

• Right, I think this was the intuition behind my idea. the uniform distribution is alpha=beta=1 and alpha/beta = 1 of course, the distribution which is 0.5 as a delta function is alpha = beta = N for N nonstandard and so alpha/beta = 1 as well.

From an enormous sample, you can tell that alpha=beta but you can’t tell what either alpha or beta were by themselves. There are an infinite family of possible beta distributions from which your bernoulli sequence could have been generated.

• Intuitionally, in the Bernoulli sequence p is defined as the long time average for n/N. In the case where we assume p is randomly and independently chosen from any distribution on [0,1] with a given average p* the value n/N will be the same regardless of the distribution for p. It’s not just for the beta distribution but I suspect *any* distribution with a given average p* will allow you to in the long time average learn what p* was without learning *anything else* about the shape of the distribution of p.

• Do you have the algebra on this? I tried to put it together but got muddled up and I don’t have the time and sufficient motivation to sort out where I went wrong.

• Corey says:

theta ~ anything supported on (0,1)
X|theta ~ Bernoulli(theta)

Then Pr(X = 1) marginal of theta is:

integral w.r.t theta of Pr(X = 1| theta)p(theta)

and since Pr(X = 1| theta) = theta, the integral is the same as E(theta).

• Nice, I was working with long sequences and the beta distribution and priors on the alpha/beta and etc and that got really messy, but your more abstract version is also much clearer, you can’t learn anything about the shape of p(theta) other than the mean from binomial trials. That would seem to apply as well if you used something like a gaussian process type construct on a sequence of thetas. If you do that and consider the values of theta as samples of a discrete time process interpolated by a smooth function, then the mean value of the function theta(t) will be determined, but the shape of the function theta(t) will not without additional outside information such as “estimates of hotness” by a teammate, and a lot of assumptions about what an estimate of hotness does to the theta

On the other hand, with a more continuous measure, such as “time integrated euclidean deviation of the ball trajectory from the ‘swish’ trajectory” you could learn a lot about time varying accuracy of shots. If you had that kind of kinematic data on ball trajectories, I can’t imagine you would see that every game had exactly the same distribution of accuracy.

• Corey says:

Well, correlation in the sequence of thetas will induce correlation in the sequence of X values, and given sufficient data that correlation will be detectible. Move out of the basketball context and consider a very very long sequence of Bernoulli variables and slow drift in the success probability from zero-ish to one-ish and back again. You’ll be able to roughly track that drift.

• Corey says:

(In the Beta distribution example we can’t learn anything about (alpha + beta) because we’ve assumed the sequence of theta values is IID.)

• Corey, Yes you’re right, slow trends will be deducible. There’s probably a theorem in here using sinusoidal decomposition of theta(t) about detectability given the length of the sequence vs the frequency of the sinusoid.

• frequency and amplitude of course. you’ll “never” detect epsilon * sin(2*pi*i/k) for epsilon infinitesimal with a limited length sequence.

• Kit Joyce says:

This is an interesting point which I hadn’t come across before. I guess the consolation is that any theta distribution with the right mean also gives the same posterior predictive distribution. So if you’re in this situation (you only have binary outcomes and an iid model for theta is the best you can come up with that fits the data) 1) Commiserations. 2) Just use constant theta = n/N

19. Steve Sailer says:

Shouldn’t there be a distinction between free throws and field goals in thinking about the hot hand? Making a field goal tends to lead to tighter defense the next time, making the analytical problem difficult. But free throws are almost all taken under similar circumstances, so free throws should be more amenable to statistical analysis of whether hot hands exist.

20. Jordan Anaya says:

I don’t know if this post is serious or not, but it appears Daryl Bem is still defending his work:
http://www.dailygrail.com/Mind-Mysteries/2017/4/Precognition-Researcher-Daryl-Bem-Responds-Criticism-His-Famous-Experiments

• Andrew says:

Jordan:

Sure, Bem’s been consistent on this position. See, for example, here: Study published in 2011, followed by successful replication in 2003 [sic]. Bem’s work is just terrible, its problems are clear for all to see. But there’s no feedback mechanism to push him to recognize this. It’s as if the 2015-2016 76ers wanted to insist that they were actually a championship-quality basketball team.

• Rahul says:

>>>its problems are clear for all to see<<<

All, but not the referees apparently. Nor the editor.

Any idea, who the editor / referees were?

• Andrew says:

Rahul:

As I discussed here, our awareness of these problems has changed during the past several years. The serious problems with the Bem analysis were in plain sight all along, but we (that is, the editor, reviewers, and lots of other people, including me) didn’t really know where to look. We’d all heard about the file-drawer effect but most of us had no sense of the importance of the garden of forking paths in driving Bem-style research.

• Andrew says:

Coincidentally, I wrote a Bem-themed post awhile ago that just happens to be appearing today. It was not written in response to this comment thread.

21. Steve Sailer says:

One possibility is that Hot Hands are merely the absence of various causes of Cold Hands.

There are definitely Cold Hands. For example, Wilt Chamberlain, who bored easily, experimented with various ways to shoot free throws, some of which worked poorly, others of which work terribly. For example, when I watched him in 1971-72, he only made 42.2% of his free throws, in part because he had decided that his problem was that, due to all his weightlifting, he had gotten too strong to shoot from 15 feet so he moved back to 18 feet. Not surprisingly, he suffered a year-long Cold Hand.

On the other hand, most professional players who aren’t Wilt Chamberlain or Shaq can’t afford to have comic stuff like this happen to their free throw shooting.

On the other hand, free throw shooting is the least difficult and most boring part of basketball, so it could be that free throw shooting is less susceptible to small cold hands than is field goal shooting.

For example, it would appear that Stephen Curry had a hot hand at extreme long range shooting for most of last season (at least the regular season), but doesn’t have it this season. But is a Career Peak different from a Hot Hand?

• Jordan Anaya says:

Steph Curry is definitely a good example, he’s definitely what people call a “heat check” type of player. The problem with heat checks is that the players start to take more and more ridiculous shots, thus resulting in some misses.

I think one of the best data sets for this might be the 3 point shooting contest. It definitely seems players either miss a bunch in a row or make a bunch in a row, but there is also the factor of rack location to take into account.

With regards to Steph Curry, his 3 point shooting last year was basically the same percentage he had shot the years prior, so I don’t know I would call it a career peak. His poor percentage this year could be explained by him not having the ball in his hand as much and therefore not getting in rhythm, i.e., not having the opportunities to get a hot hand and go on some hot streaks.

The ‘hot hand’ hypothesis might actually just be a result of having recently shot the ball. Imagine not taking a shot for ten minutes then having to shoot a 3. If you just shot the ball the prior possession you are “primed” to do better on your next shot since you got a feel for what just happened. If you made the prior shot do the exact same thing. If you were short use some more legs, etc.

22. Alex says:

Even if it is a small effect, that doesn’t mean it’s not a useful one. Sport is an agonistic activity where there is a winner and a loser; as a result, small advantages are worth having. See Dave Brailsford’s concept of the aggregation of marginal gains. A slightly better judgment of teammates’ or indeed your own performance is such an advantage.

23. Lee Jussim says:

This pattern of simply ignoring studies, analyses, critiques of one’s cherished findings and claims is so common in social psychology, that it is probably a norm. I could write a whole paper exposing example after example. In fact, I did:

http://www.rci.rutgers.edu/~jussim/Interps%20and%20Methods,%20JESP.pdf

And that barely scratches the surface of a vast field of icebergs.

Lee

24. Josh says:

My coworker pointed me at this post though coincidentally I first introduced him to your blog. I have a paper recently submitted on selecting the degree of memory for multistep Markov Chains… I used free throw shooting as an illustrative example (LeBron James’ 2016-2017 season/postseason). The method is actually based on the class of cross-validation methods in your recent review paper. Anyways, I did not find “hot-handedness” per-se, but I did find evidence of statistical dependencies. I submitted to PRL, no idea if they will choose to review it: https://github.com/joshchang/markovmemory/blob/master/markovpaths_nba.pdf

• Josh says:

Though now I am wondering whether the effect I saw was really the bias reported in Miller and Sanjurjo.

• Josh says:

After a set of simulations, I found the bias to be negligible for the string sizes in my data. The bias is on the order of a few tenths of a percentage, some of which is attributable to using a prior. I think the main result in my example still holds.

• Andrew says:

Josh:

I think that some sort of latent model would be more appropriate, in that whether you just made a shot is not a perfect measure of hotness. It’s possible for a lucky shot to go in, and it’s possible for a pretty good shot to not quite make it. But it’s not so easy to fit a latent model to a short sequence of 0/1 data because the information is so sparse. So . . . it’s possible to get a sense that simple Markov estimates of the hot hand will be attenuated (biased toward zero) even while it’s difficult to get a precise estimate of the underlying effect, given 0/1 data alone.

• Josh says:

I completely agree. I think hot hands are difficult to find when looking at free throws due to

1) limited number of them shot by any single player
2) relatively high accuracy (about 80 percent) for the shooters that shoot the most free throws with few exceptions.

3) nonstationarity due to latent factors such as training and fatigue and psychological pressures. It is hard for instance to pool data across seasons for a given player like LeBron as his percentages have fluctuated annually

However free throws are shot in a more controlled manner and shooters are able to go into the exact same routine each time.

The Markovian type models without latent variables are not great but of all the types of shooting I would expect them to work best for free throws.

25. Jordan Anaya says:

I’ve got good news, the Houston Rockets proved the existence of the hot hand last night (or rather cold hand). They missed 27 3’s in a row!

In all seriousness people have been doing calculations on how unlikely that is given their average 3-point percentage. I think you should take into account it’s a Game 7 and everyone’s nervous. Just look at Boston who also had a horrific Game 7 shooting performance.

Also, the basketball Gods don’t like floppers, so there’s that.