Bill James is back

I checked Bill James Online the other day and it’s full of baseball articles! I guess now that he’s retired from the Red Sox, he’s free to share his baseball thoughts to all. Cool!

He has 8 posts in the past week or so, which is pretty impressive given that each post has some mixture of data, statistical analysis, and baseball thinking. It’s hard for me to imagine he can keep this up—sure, I do a post a day or so, but most of my posts don’t include original statistical analysis!—but he should go for it as long as he can. Keep the momentum going.

James’s most recent post (at the time of this writing) begins:

Double Plays and Stolen Base Prevention; these things keep the game under control. Our first task today is to estimate how many runs each team has prevented by turning the Double Play. . . .

The 1941 Yankees turned 196 Double Plays. Had they been just average at turning the double play we would have expected them to turn 151, which is an above-average average; the average over time is 139. (The team which would have been expected to turn the most double plays, for whatever this is worth, is the 1983 California Angels, who could have been expected to turn 202 Double Plays, since (a) the team gave up a huge number of hits, and (b) they had an extreme ground ball staff. The Angels actually turned 190 Double Plays, only six fewer than the 1941 Yankees, but 12 below expectation in their case.) . . .

I made a decision earlier that I would use three standard deviations below the norm as the zero-standard in an area in which higher numbers represented excellence, and four standard deviations below the norm as the zero-standard in an area in which higher numbers represented failure. . . .

This was a questionable decision, in the construction of the system, and we’ll revisit it at an appropriate point, but for now, I’m proceeding with 3 standard deviations below the norm as the zero-value standard for double plays. The standard deviation for the 1940s is 16.12—another questionable choice in there, by the way—so three standard deviations below the norm would be 52 double plays. . . .

I just looove this, not so much the baseball and the statistical analysis—that’s all fine—what I really love is the style. It’s just sooo Bill James. I’m reminded not so much of previous Bill James things I’ve read, but of Veronica Geng’s affectionate parody of the Bill James abstracts from back in the 1980s. Reading Geng’s story takes me back to what it felt like then, seeing the new Abstract appear every spring. The Bill James Abstract was pretty much the only statistics out there, period. There was no Freakonomics, there were no data journalists, etc. And that style! It’s hard to pick out exactly what James is doing here, but the style is unmistakably his. Good to see that some things never change.

Further reading

Also relevant:

A Statistician Rereads Bill James

Jim Albert’s blog on baseball statistics

Bill James does model checking

“Faith means belief in something concerning which doubt is theoretically possible.”

A collection of quotes from William James that all could’ve come from Bill James

P.S. I came across this post. Dude should learn about Bayes and partial pooling!

9 thoughts on “Bill James is back

  1. He’s been active on that site for years, dating back to while he was still working for the Red Sox! You can spend months in the backlog of articles (subscription required).

    I was a teenager when the original Abstracts were coming out and I read them over and over, learning a ton about how to think about and study issues quantitatively. The amazing thing about James is that he is 1) very uninformed about anything we call “statistics” (I’m sure he knows far less than a single undergrad semester on the mathematical subject), and also 2) incredibly astute about how to analyze and interpret data, given the tools that he uses. He somehow almost always gets to the heart of the matter, and of course he has usually gotten there before anyone else. He uses these naïve tools and writes an essay on the subject that crystallizes the heart of the matter; then later I read about some modern approach and it’s filled with tables to three decimal places of various moments and p-values and I fall asleep. It’s like reading a genius 18th century mathematician who didn’t have any modern tools. He’s as responsible as anyone for the way that I solve problems in math and computer science these days, always trying to step back and reason about what’s going on instead of just diving in and throwing data into a meat grinder. I appreciate having had the pleasure of reading and learning from him for almost 40 years.

Leave a Reply to Bill Cancel reply

Your email address will not be published. Required fields are marked *