I wish him well. My son was a classmate of Giancarlo/Mike in school and says he was by far the nicest guy of the top jocks. (I hope he’s got an honest accountant looking out for his money.)

Here’s something to consider … Stanton has typically been an “unlucky” player so far in his career. He gets a lot of muscle-pull type injuries that may be related to how muscular he is.

And here’s what happened to him in September 2014 during his one previous totally healthy year, when he was cruising toward the NL most valuable player award:

https://www.youtube.com/watch?v=HlbXtg_-31c

He missed the last few weeks of the season and finished second in the MVP voting.

It would be interesting to test perceptions of a player as “unlucky” — does that add utility to predictions?

]]>To make it even more work…I wonder if similar examples could be collected and disseminated through mc-stan.org. Maybe some “Simple Tutorial Exercises” — something to help get your feet wet with Stan syntax and to make sure to get some practice wrapping your head around problems.

]]>And as has come up in other discussions, once you start regressing (usually on the log odds), you can start doing things like regressing on at bats—more at bats is correlated with being a better hitter.

]]>+1

As someone with an only elementary understanding of Bayesian models, reading discussions like this is illuminating.

]]>I used a similar example of hits in the case study on binary trials and used it to highlight the difference between plugging in an MLE for batting average and using a Bayesian posterior which accounts for uncertainty in the estimate of hit rates. Plugging in an MLE gives you a binomial, which is underdispersed compared to what you actually see. The Bayesian posterior in the beta prior case gives you a beta-binomial posterior, which is better calibrated on actual data sizes compared to plugging in an MLE; the difference approaches zero asymptotically as the beta posterior concentrates around the MLE. The case study also contrasted complete pooling, no pooling and partial pooling and also contrasted beta priors on the probability scale and normal priors on the log odds scale.

]]>And I agree that using 154 games rather than the 149 he played in would’ve made more sense. I’d probably want to do something like separately model whether he played (something like an injury occurring and then persisting or healing) separately from number of at-bats per game.

This is great. We can crowdsource stats models on Andrew’s blog!

]]>And yes, you’d definitely want a better prior on home run rate. A hierarchical model would be obvious if you have other player data. I was just working from the very simple data presented. The hierarchical model itself is tricky in that the population of home run rates is very skewed with a pileup at the top and a long tail.

The way I’d really start with this would be to plot (a) at bats (for individual and other players), and (b) home run rates across players. Then I’d start thinking about beter distributions.

]]>I think that in this sort of example the real benefits of Bayesian inference start coming when you start adding more information such as what sorts of pitches are thrown to Stanton and aspects of his swing, compared to that of other players. And also when there is interest in predicting more granular outcomes, not just the number of home runs that will be hit.

]]>So, my question is: what is gained by moving to a Bayesian framework over this straightforward calculation?

]]>My hunch would be that as long as no subject matter information about plausible values for lambda & theta enter the model, one could as well forget the at-bats and model the Poisson distribution of home runs directly.

With a “scale-free” p(param) propto param^(-1) for the home-runs rate, the posterior predictive for the home runs during the last 10 games is Neg-bin(56, 149/10) [BDA2, p. 52-53], with which I get a probability of 0.326 for the case event (compare to the 33% I got with random sampling from Bob’s model). Of course, (I suppose) the two-level approach starts to be useful as soon as some kind of informative priors are used.

]]>1. The “b” parameter of the beta posterior distribution for theta should be 1+555-56, not 1+555, with this correction I get the event probability to be 33% .

2. The last factor of the integrand in the “event probability” integral should be the beta posterior density, not the binomial likelihood (the constant matters here). ]]>

There are some things I don’t like about your model. First, the prior of a 50% home run rate per at bat is too high. Second, the Poisson number of at-bats doesn’t look right either.

]]>Lots of other stuff going on, like opposing pitching, travel, rest, whether the opposing teams have been eliminated from playoff contention.

]]>