Skip to content

I don’t like discrete models (hot hand in baseball edition)

Bill Jefferys points us to this article, “Baseball’s ‘Hot Hand’ Is Real,” in which Rob Arthur and Greg Matthews analyze a year of pitch-by-pitch data from Major League Baseball.

There are some good things in their analysis, and I think a lot can be learned from these data using what Arthur and Matthews did, so my overall impression is positive. But here I want to point to two aspects of their analysis that I don’t like, that I think they could do better.

First and most obviously, their presentation is incomplete. I don’t know exactly what they did or what model they fit. They say they fit a hidden Markov model but they don’t say how many states the model had. From context I think they were fitting 3 states—hot, cold, or normal—for each player, but I’m not sure. The problem is . . . they shared no code. It would be the simplest thing in the world for them to have shared their Stan code, or R code, or Python code, or whatever—but they didn’t. This doesn’t make Arthur and Matthews uniquely bad—I’ve published a few hundred papers not sharing my code too—it’s commonplace. But it’s still a flaw. It’s hard to understand or evaluate work when you can’t see the math or the code or the data.

Second, I don’t think the discrete model makes sense. I do believe that pitchers have days when they throw harder or less hard, and I’m sure a lot more can be learned from these data too, but I would not model this as discrete states. Rather I’d say that the max pitch speed varies continuously over time.

I can see how a discrete model could be easier to fit—and I can certainly see the virtue of a discrete model in a simulation study, indeed I’ve used simulations of discrete models to understand the “hot hand fallacy fallacy” in basketball—but I think that any discrete model here should be held at arms length, as it were, and not taken too seriously.

In particular, I don’t see how we can get much out of statements such as “the typical pitcher goes through 57 streaks in a season, jumping between hot and cold every 24 pitches,” which seems extremely sensitive to how the model is parameterized.

Again, I say this not to slam Arthur and Matthews (hey, they linked to this blog! We’re all on the same side here!) but rather to point to a couple places where I think their analysis could be improved.

Also, let me emphasize that my comments above do not reflect any particular baseball knowledge on my part; I’d say the same thing if the analysis were done for football, or golf, or tennis, or any other sport.

Speaking generally, it should be much easier to study hotness using continuous measurements (such as pitch speed in baseball or ball speed and angle in basketball) than using discrete measurements (such as strikeouts in baseball or successful shots in basketball). With continuous data you just have so much more to work with. Remember the adage that the most important aspect of a statistical method is not what it does with the data but what data it uses.


  1. Jon says:

    They’re preparing a proper paper on it, and intend to throw it on a preprint server (I think). I imagine the code will become available then.

    My understanding is they fit a one-state, two-\state and three-state model. The two-state fit substantially better than the others, so they all the results in the 538 article are from that one. Presumably the others will be in the academic paper.

    As freelancers, they’ve gotta publish or not-make-rent, hence the news article appearing first, before all of the technical details have been written up for an academic article.

    I, for one, am really curious if they did any sort of regularization for the hot/cold states….But we’ll find out soon enough!

    • Andrew says:


      Regarding publication on a news site: Yes, good point. The authors have no obligation to share their data, model, and code, and I respect the journalistic motivation to get their results out right away and even to keep some details secret so that they won’t get scooped as they do further analyses. I recognize that not everyone has a comfortable job such as mine where I’m free to publish at will. It’s hard for me to follow what they’re doing when I can’t see the details, but the authors are free to tell us more on their schedule.

    • Andrew says:


      Also, regarding the one-state, two-state, three-state models: Despite my distaste for the discrete models, I can certainly believe that much can be learned from them, as long as the inferences are interpreted with care.

      And, if it’s really true that the two-state model fits substantially better than the three-state model, this suggests to me that the three-state model in question has major problems. See this quote from Radford Neal.

  2. It’s quite an honor to be linked to here! Thanks so much for the (generally) positive write-up.

    I wanted to address the two points you made here.

    1) It’s true that we didn’t explain the whole model in this piece. We were writing for a broad audience, and we wanted to make sure our presentation was accessible and not too intimidating. That said, I’m sure we could have done better at communicating the technical particulars of the model without scaring people off. For the record, we fit a two state model. (A three state model we tried at one point proved to be a worse fit.)

    We have posted the code for the HMM model to GitHub and can be viewed here:
    (Side note: The delay in posting this code was due to the fact that my wife attended a James Taylor concert at Fenway this weekend, and I was alone with an adorable 11-month-old baby girl for 72 hours who refuses to let me do anything other than pay full attention to her…..)

    We are in the process of preparing an academic paper, and will release all code, data, and a full specification of the model when it is at a preprint stage. In the meantime, both Greg and I are happy to answer any methodological questions readers/fellow statisticians might have.

    2) We intend to also try a continuous model at some point. In regards to your point about the sensitivity of our findings to parameterization, we agree. We wanted to show that the hot hand exists (in some form) and can provide meaningful information about the skill of individual pitchers. But we didn’t intend for this to be last word or final parameterization, so tried not to focus on these details.

    We do think there is some reason to believe that at least some pitchers might have discrete states, in particular with regards to injuries (the two states being injured and not injured). That we found significant correlations between our two states and injuries is probably no accident.

    If you have any questions feel free to contact us on twitter at @statsinthewild (Greg) or @no_little_plans (Rob).

  3. Angus says:

    Waiting for someone to do a hot-hand analysis of professional darts players.

  4. Matthieu says:

    I agree with you when you say that usually , continuous data are better than discrete ones as far as statistical analysis goes, but in this particular example, what if the question at stake here was not whether or not the “hot hand” is real per se, but if the perception that we have of the “hot hand” effect exists? Within that “framework” wouldn’t you say that the discrete labeling of a “hot” or “cold” hand should be discrete rather than continuous given this is how we, as spectators, label a pitch when we see one?

Leave a Reply