Super Sam Fuld Needs Your Help (with Foul Ball stats)

I was pleasantly surprised to have my recreational reading about baseball in the New Yorker interrupted by a digression on statistics. Sam Fuld of the Tampa Bay Rays, was the subjet of a Ben McGrath profile in the 4 July 2011 issue of the New Yorker, in an article titled Super Sam. After quoting a minor-league trainer who described Fuld as “a bit of a geek” (who isn’t these days?), McGrath gets into that lovely New Yorker detail:

One could have pointed out the more persuasive and telling examples, such as the fact that in 2005, after his first pro season, with the Class-A Peoria Chiefs, Fuld applied for a fall internship with Stats, Inc., the research firm that supplies broadcasters with much of the data anad analysis that you hear in sports telecasts.

After a description of what they had him doing, reviewing footage of games and cataloguing, he said

“I thought, They have a stat for everything, but they don’t have any stats regarding foul balls.”

Fuld’s Conjecture

Fuld went on to tell McGrath that “he’d explained that he’d conceived a study to test the received wisdom that good hitters are able to foul off difficult pitches deliberately. If this were true, he reaosned, there ought to be a measurable correlation between over-all batting success and the distribuiton of foul balls within counts. Skilled hitters, adept at protecting th eplate, might tend to produce a greater proportion of foul balls late in couns than weaker hitters, whose fouls would skew earlier — evidence of poort contact.

Turns out Fuld had taken Stats 50, the “math of sports” class at Stanford, taught by none other than the imminent information theorist Thomas Cover! It turns out that Fuld’s on leave from the masters program in statistics at Stanford (not that you’d know it from his enrollment in Stats 50). Alas, as McGrath relays

Fuld never completed the degree (although he intends to), because the next spring his other dream seemed suddenly to be coming true, as he was promoted up the developmental ladder from AA to AAA and then, just as the fall academic calendar was beginning, to the Show [Major League Baseball].

You can Help

While Stats, Inc. may not care about foul balls, the great baseball data site Retrosheet does (just follow the link).

In fact, Dan Fox of Baseball Analysts
has already presented a zero-order analysis using the Retrosheet data several years ago, though not of the exact question Fuld was asking.

Hierarchical Models Away

Now this is going to be a great problem for hierarchical modeling because the data in each count cell is going to be sparse. With 500 at-bats in a year, how many instances do we get of all of the possible pitch counts (0-0, 0-1, 0-2, 1-0, 1-1, 1-2, 2-0, 2-1, 2-2, 3-0, 3-1, 3-2)?

But, Please, Start with a Scatterplot

I’d be happy to see a set of scatter plots, one for each count, with a simple hitting stat on the x axis (like on-base percentage) and the observed percentage of foul balls on the y axis.

I’d do it myself, but Andrew has me chained to the C++ compiler working on Stan [just kidding, of course — we share priorities here]. But maybe if I need more time to procrastinate than this blog entry afforded, I might do it myself. The only tricky part would be writing the Python (or whatever) data munger to get the counts out of their character-sequence based pitch encoding.

7 thoughts on “Super Sam Fuld Needs Your Help (with Foul Ball stats)

  1. I feel the hunger in that last paragraph. Remember: if you get anything out it, it's not procrastination.

    "Schraw, Wadkins, and Olafson have proposed three criteria for a behavior to be classified as procrastination: it must be counterproductive, needless, and delaying."

    It's none of the above.

  2. I think I already have the code to do most of this – especially the parsing of the pitch sequence field – which is really the hard part.

    It's been a few years, but I'll take a look at when I get home and see if I can modify it to get what you/Sam are looking for.

  3. Here's some shorter term, very preliminary, very basic, very ugly plots:

    Keep in mind the cleaning of the data is rather minimal and only spans from 2007 through 2010 (with only partial data from 2007). I think there's a lot more to account for here, especially the fact that we have very different hitters in the different counts…but there doesn't seem to be much enlightenment from my ugly scatter plots. (be sure to click and zoom in once, they're too large just to stick on the page).

  4. This is based on my softball knowledge from a long time ago but…

    It seems to me that if you are thinking
    "Do I hit this or not?" and then
    "I am going to have to hit so should I foul it or not?"
    then those are split second decisions that are going to reduce the time in which you can hit the ball. Therefore you lose swing time so, assuming a right-hander, the ball will go right.

    An intended hit that is mishit will more likely go straight back (if undercut) or to the left.

    I'd also expect that the distance would be less on an intended foul. The batter only needs to get it over the foul line, a mishit foul would have been intended for out of the park and have more power.

    Therefore, it seems if better players can foul on purpose then they'll have a higher count of shorter right-side fouls then straight-back or left-side fouls or long right-side fouls.

    I think I would consider only X-2 counts because players can bunt foul, which seems easier, at strike counts lower than 2. A bunt foul after the second strike is a strike-out.

  5. That's awesome. Thanks. Exactly the graphs I was talking about. Andrew's blog's like magic!

    You should send them to Sam Fuld. Maybe through Thomas Cover, who may be more likely to read his e-mail.

  6. Bob is my brother and he mentioned this blog entry to me as being "not that technical." lol. I can't understand it. I feel like I'm reading the first page of your book again. Maybe Bradley and Dylan will get this one day.

Comments are closed.