Going for it on 4th down: What’s striking is not so much that we were wrong, but that we had so little imagination that we didn’t even consider the possibility that we might be wrong. (I wonder what Gerd Gigerenzer, Daniel Kahneman, Josh “hot hand” Miller, and other experts on cognitive illusions think about this one.)

In retrospect, it’s kind of amazing how narrow our sports thinking used to be. As a kid, I always loved when teams would go for it on 4th down or try an onside kick or run trick plays like fake punts, double reverses, etc., but I just assumed that the standard by-the-book approach was the best. The idea that going for it on 4th down was not just fun but also a smart move . . . I had no idea, and I don’t recall any sportswriters or TV commentators suggesting it.

That said, I know next to nothing about football analytics, and it’s possible that these unconventional plays had less of an expected-value payoff back in the 70s when field position was more important and points were harder to come by.

I guess part of the problem is, to use some psychology and statistics jargon, a cognitive bias induced by ecological correlation. There always were some teams that tried unconventional plays, but they tended to be less successful teams that tried these tactics as a last resort. The Oklahomas, the Michigans, the Vikings and Steelers didn’t need this sort of thing. The only thing at all out of the ordinary I can remember being routinely played is Dallas’s two-minute offense with Roger Staubach in the shotgun, but that was a rare exception, as I recall it.

Consider a sequence over the decades:

1. Tactics are developed during the play-in-the-mud, Army-beats-Navy-3-to-0 era.

2. Conservative coaches stick with these tactics for decades.

3. Spectators are so used to things being done that way that they don’t even question it.

4. Analytics revolution.

5. Even now, coaches shade toward the conservative choices, even when stakes are high.

We’re now in step 5. In his above-linked post, Campos expresses frustration about it. And I get his frustration, as this is similar to my frustrations about misconceptions in science, or clueless political reporting, or whatever. But what really intrigues me is step 3, the subject of this post, which is how we were so deep inside this particular framework of assumptions that we couldn’t even see out. Or, it’s not that we couldn’t see out, but that we didn’t even know we were inside all this time.

I wonder what Gerd Gigerenzer, Daniel Kahneman, Josh “hot hand” Miller, and other experts on cognitive illusions think about this one.

P.S. We discussed some of this back in 2006, but there we were focused on the question of why were teams almost always punting on 4th down. Now that it’s become routine to go for it on 4th down, the question shifts to why did it take so long and why hasn’t the new approach completely taken over.

34 thoughts on “Going for it on 4th down: What’s striking is not so much that we were wrong, but that we had so little imagination that we didn’t even consider the possibility that we might be wrong. (I wonder what Gerd Gigerenzer, Daniel Kahneman, Josh “hot hand” Miller, and other experts on cognitive illusions think about this one.)

  1. Even funnier is baseball pitching rotations are really suboptimal but you can’t be the team that changes them because your pitchers’ stats will be ruined for their next contract. (from Scorecasting by Moskowitz & Wertheim)

      • Guy:

        Wasn’t there some discussion a few years ago about getting rid of starting pitchers and just going through a bunch of relievers each game? I haven’t been following this, though. My vague impression was that batters can swing as hard as they want at each pitch but pitchers need to conserve their efforts.

        • > Wasn’t there some discussion a few years ago about getting rid of starting pitchers and just going through a bunch of relievers each game?

          My sense is that has kind of happened. They generally are taking pitchers out much earlier, and you’re much more likely to see “bullpen games.” I’m not sure that was even a thing before maybe 8 years ago?

        • The game has definitely evolved in that direction. In 2012 the average team used 22 pitchers and starting pitchers averaged 5.9 innings/game, this year teams used 29 pitchers and starters went only 5.2 innings. We have also seen a convergence in performance: in 2012 relievers had a large advantage in ERA of 3.67 vs 4.19 (0.52 runs better), but this year the gap was just 0.19 (3.86 vs. 4.05). That suggests that shifting more innings to relievers at this stage would provide tiny gains, if any.

          In any case, the issue is likely moot as MLB has for the first time established a limit on the number of pitchers that teams may carry on their roster (and may lower that number in the future). Fans like a game where starting pitchers play a prominent role, as opposed to a parade of fungible relievers throwing 97mph for 1 inning, and I think MLB will try to restore that version of the game (or at least prevent further transfer of innings to relievers).

        • As people have mentioned, there’s been movement in this direction. But the logical conclusion, a totally undifferentiated pitching staff, runs against the fact that pitching staffs contain a wide range of talent — and if you’re going to apportion innings, you’d prefer to assign the bulk of them to your best pitchers (i.e., your starters (who, by virtue of a rotation, aren’t pitching daily (and thus blowing up their arms)). Your 2nd best tier will be assigned high-leverage innings; your third-tier the low leverage (as needed). And as such, you end up with what looks like a typically structured pitching staff, determined by the nature and schedule of the season.

  2. The real issue is that, despite introducing a lot more sanity to the discussion, the analytics revolution in football is still in its infancy. There are several unobserved (some unobservable) variables that are not considered by analysts in taking these decisions. For example, if there is wind either direction, that changes the calculation for how valuable a punt can be, or if a pass play for just a couple of yards is risky or not. Or maybe you are missing a key O-line player for a 4th down play. Or maybe your defense played so well compared to your offense in a particular game that giving the ball back is not a big deal.

    Also, plays are not independent of each other, both within a singular game and in the course of the season (and across seasons!). Teams study each other, so when you have say 3rd-and-4 you may run the play like you are going for it regardless on 4th, and defenses have to account for that. So maybe the easy run play for at least a couple of yards is not there at the moment, forcing you to run a play-action that fails to convert. Now you are at 4th-and-4 instead of maybe 4th-and-1 or inches.

    Analytics are great, but there are still not that many very good tools developed taking most relevant factors into account. Ultimately coaches know better what their players can actually execute at any given play, which does not mean they always make the optimal decision (far from it!), only that our limited information is just not enough to make better decisions at the margin.

    • > so when you have say 3rd-and-4 you may run the play like you are going for it regardless on 4th, and defenses have to account for that. So maybe the easy run play for at least a couple of yards is not there at the moment, forcing you to run a play-action that fails to convert. Now you are at 4th-and-4 instead of maybe 4th-and-1 or inches.

      I’m but entirely sure what you were going for there but if it’s what I think it is…

      Yes, lots of times I think I’d be running the ball rather than passing, say on a 3rd down, based on the knowledge that I’d be going for it if it’s 4th and short. In the very least do an RPO there.

      It’s impossible for me to know how much that does or doesn’t go into the play calling, but from the outside it does look like that kind of counterfactual thinking is often missing. So often, it seems, they don’t think that they have two plays to make short yardage on 3rd down and instead they call a pass as if they have to get the first down on the 3rd down in a situation where they’d likely be going for it if it were 4th and short.

      Of course, just like I’d probably be a better play caller than most offensive coordinators, I’d probably make a lot of passes that Brady missed on.

      • My point (not very well explained, for sure) is that defenses prepare based on expectations. If your team often goes for 4th down (like the Ravens, for example), you defend the run in 3rd-and-4 as much as you would 2nd-and-4. So sometimes the box is loaded and then your play-calling needs to account for that, with passing becoming the best option based on what you see before the snap.

        But granted, teams often do not seem to be taking this kind of thing into account. Also granted, play-calling is definitely not an easy task. Your decision is conditioned on so many things, particularly on a play that just happened and you have only a few seconds to relay your call to the QB, who then needs to have some leeway to change the play based on what the defense is showing.

      • During the eagles super bowl year with Doug Pederson as coach they did this a lot to great effectiveness. They ran way more often than I’ve ever seen on 3rd and 4 or similar and it worked often to either catch the defense off guard and pick up the first or set up a short 4th down where they would go for it.

        Not sure if it’s because their running attack got worse or if teams just adjusted or both but it was not as successful in subsequent seasons.

    • Definitely true that this can’t be effectively analyzed (or modeled if you think everything’s a model) by simply comparing expected outcomes for single-play alternatives.

      What needs to be modeled is an optimized strategy for an entire possession under conventional old-school assumptions versus an optimized strategy for an entire possession under the assumption of generally going for every 4th down that isn’t in an extremely unfavorable situation (4th and 20 from your own 5 yard line).

      • That is fair. And I think teams have been doing a better job at it recently. I don’t have the numbers but I would assume opening-drive TDs are more frequent now than in the past, even adjusting for more scoring in general nowadays.

        What I find funny is that we seem to think there is a lot of improvements to be taken on the offensive side when there is probably as much on defense as well. Of course, defensive play-calling is very different, but the decisions to blitz, to have man coverage, to put a spy, or to rush the pass from the inside, etc. All of them do not get the same scrutiny as decisions to go for 4th downs, and are likely to have a big impact on win probability.

        • Vinicius:

          I agree that defensive strategy matters too, but I don’t see why we should assume that there is as much room for improvement there as in offense. Defense reacts to offense, so it seems to make sense that decision making will be more important on the offensive side of things.

        • Andrew, I think it depends on how you model it. To the extent defense is conditional on what the offense does (and one could argue both are conditional on each other all the time, despite the offense having the initiative), there are still decisions you can take, particularly for the game strategy as a whole but also play-by-play, that increase your chances of winning significantly. I do believe there is an offense-bias that makes people focus too much on how to make optimal decisions, but they fail to consider what optimized decisions on defense would mean for the game.

          For example, say you decide going on 4th is always a good option, so the offense always tries to convert. Knowing that, what is the best course of action for the defense? I assume it would be more aggressiveness, to get more sacks, more interceptions, etc., at the cost of giving up bigger plays. Some teams would be better positioned to take advantage of that (faster receivers, better pass protection, etc.), while others would suffer if they have to withstand more blitze. So these teams would revert back to a more traditional approach.

          Bottom line is, maybe there is more of a payoff for optimization on offense, but if you only consider one side of the ball you miss on the back and forth that leads to different equilibria depending on the teams and personnel strategies they might have.

    • I would be a bit surprised if teams weren’t considering things like wind since that seems like easily collectable data. Public data/algorithms might not have it but the proprietary stuff should be easy better, no?

      • I am not sure if they have information at the minute-by-minute level, which is the relevant info. They might have. I would love to work to a NFL team crunching all these numbers.

  3. There are all sorts of analogous beliefs among golfers, even elite ones. For generations it was accepted as 100% obvious that when a golfer cannot get his second shot onto the putting green, the best strategy was to hit a “lay up” shot to exactly the distance that lets him make a full swing on this third shot, rather than getting it as close to the green as possible. That might be 90 yards for a typical golfer.

    For as long as people have compiled statistics on golf, the results have clearly shown that being closer to the hole is better in terms of scoring. You’ll score better on average being 60 yards rather than 90, better from 30 rather than 60 and so on. Which I personally would consider common sense.

    That idea has finally permeated the game at least at the elite professional level. Very few professional golfers with a chance to hit their second shot 30 yards from the green are going to deliberately lay up to 90 yards nowadays. Still a few throwbacks but they are vanishing.

    Here’s the interesting part. At least at the amateur level, this idea that you should leave yourself (longer) full swing shots rather than (shorter) partial swing shots causes many weekend players to simply avoid practicing anything other than full swings. So if they were to end up 30 yards from a green, they are unequipped to hit what is actually an easier shot to get close to the hole. It’s circular and self-fulfilling. Because they’ve never learned to hit that shot, they try to avoid it at all costs and by trying to avoid it they make sure they don’t get any experience hitting it.

    In a sense, it becomes true even when it’s not true. If you never even prepare or game plan for how you might go it on 4th-and-1 from midfield (or practice how to hit 30 yard golf shots) then indeed you are better off not even trying. Even if you’re costing yourself a small increment of success every time you do that.

    • That ignores the budget constraint. Amateur players might be better off practicing full swings more and just avoiding partial swings than dividing their practice across all types of swings. Meanwhile, a professional can do it all.

  4. There’s also a principal-agent aspect underlying many coaches’ decisions. Coaches with job security concerns likely steer them to more conservative options at the margin.

    • David:

      Yes, good point. I added a P.S. pointing to a 2006 post where I discussed the idea:

      In the particular example of fourth-down conversion, somebody–I think it was Bill James–pointed out a possibly rational reason for coaches to be conservative. The argument goes as follows: if a strategy succeeds and the game is won, everyone’s happy. The real issue comes when it fails. If the coach did the standard strategy and fails, then hey, it’s too bad, but everyone (well, everyone but George Steinbrenner) knows you can’t win ’em all. But if the coach does something that is perceived to be “radical” and it fails, then he looks bad and is much more easily Monday-morning-quarterbacked. Even if the probability of winning is higher under the radical strategy, the medium-term expected payoff (i.e., probability that the coach keeps his job at the end of the season) could be higher under the conservative strategy.

  5. Campos’ claim that the broncos had an 80% of getting it seems off. The Broncos run offense is bottom tier, and the 49ers’ run defense is possibly the best in the league. Additionally, since the score was 10–5 in the 4th quarter, obviously neither offense could get much going, so letting the 49ers start in field goal range seems dubious.

    Also, I agree with Vinicius that the NFL adjusts a lot. If a team that always goes for it on 4th, opponents will study and practice more against 4th downs, reducing the success rate. Additionally, a lot of teams rely on an element of surprise for those short plays (Philly Special), so running 4th downs more often would require running less surprising plays. At the end of the day, we’re taking observational data and trying to predict the outcome of interventions, which is always unreliable.

    I definitely agree that outcome bias is huge in the NFL. I’ll still die on the hill that the seahawks make an okay decision going with the pass play in the super bowl.

    • I am happy you agree with me, but I will have to disagree with that Seahawks call. It was 2nd-and-goal from the 1 with 26 seconds on the clock and they still had one timeout. You should definitely run, call timeout if you don’t get it and then you have two other chances at a pass with enough time for even a relatively extended play on 3rd down (as long as it is not a sack).

      • I know I’m in the minority here, but my feeling has always been that given the time remaining (and belichick not taking a timeout), the seahawks could either do pass-whatever-whatever, or run-pass-whatever. By passing on 2nd down, they’d force the patriots to protect against both the run and pass on 3rd. Marshawn still could’ve had 2 chances. It just failed spectacularly.

    • > I’ll still die on the hill that the seahawks make an okay decision going with the pass play in the super bowl.

      That’s quite a hill. I think it was maybe the dumbest call in the history of the sport. And the excuse-making afterwards was hilarious.

    • There’s nothing wrong with a pass in that situation. The stupid part was to throw it into traffic in the center third of the field. The obvious pass play in that situation is an out route toward the sideline where the ball can placed with absolute certainty away from the defender, so it’s either a TD or an incompletion.

    • IMO Campos’ claim that “Punting in that situation is completely idiotic” is what’s really off. His assessment of the game situation is shallow.

      He says stats show **teams** “are successful converting more than 80% of the time” – that’s not the Broncos. But regardless of Denver’s rushing game, the consequences of failure were too severe to go for it. It’s a five point game with 12 minutes remaining and Denver hasn’t scored a TD. Giving ball up on Denver’s 34 almost guarantees it’s an eight point game. Moreover, SF converted only one third down in the entire game, so putting them deep in their own territory with plenty of time to get the ball back seems like the less risky bet. The problem here is that while there is a high probability of success on 4th down, the consequences of failure are that the game is probably lost. With 12 mins to go, that’s a stupid risk to take.

  6. “But what really intrigues me is step 3, the subject of this post, which is how we were so deep inside this particular framework of assumptions that we couldn’t even see out.”

    What I find surprising is that the game didn’t slowly evolve toward a reasonable equilibrium. I’m not surprised that coaches wouldn’t go for it on 4th-and-1 from their own 40, even when that was the optimal play. But why didn’t some coaches start going for it from mid-field, or the opponents’ 45? Or in lieu of very long field goal attempts? We should have seen some experimentation of that kind, that worked more often than not, leading to a steady (even if slow) increase in aggressiveness, and erosion of the old CW. There was a bit of that, but it was incredibly slow. And it remained very slow for years even after David Romer’s paper.

  7. In the early 1970s NFL quarterback Virgil Carter and systems engineer Robert Machol developed expected points estimates for valuing field position, and started making use of them for decision-making. In 1978 they wrote a paper on 4th down decisions, which focused on decisions in field goal range (opposing 35 yard line or closer). Their estimates favored going for it on 4th & 3 or less, and favored punts over long field goals.

    https://www.wsj.com/articles/the-nerdy-quarterback-who-solved-football50-years-ago-11606057081
    Virgil Carter, Robert E. Machol, (1971) Technical Note—Operations Research on Football. Operations Research 19(2):541-544. https://doi.org/10.1287/opre.19.2.541
    Virgil Carter, Robert E. Machol, (1978) Note—Optimal Strategies on Fourth Down. Management Science 24(16):1758-1762. https://doi.org/10.1287/mnsc.24.16.1758

  8. I wish I could understand all the sports details discussed. But I fear, I am not much of help here. Where I live, football means soccer. Yet there is one general lesson from the research on cognitive biases in sports. We, the researchers, need to be more careful when we believe we know better than the practitioners. The so-called hot-hand fallacy is a case in point, where it took some 30 years and two smart economists, Joshua Miller and Adam Sanjuro, to show, using the original free shot data, that the hot hand is not a myth. It seems that the error was in the statistical thinking of the researchers rather than in that of the coaches and players. I have used the term “bias bias” for the wide-spread tendency to see systematic biases in intuition even if there are none.

    • Gerd:

      I was thinking about the “bias bias” thing. But I think that the advantages of going for it on 4th down are real. One difference of the 4th-down thing, compared to, say, the purported hot-hand fallacy or the Linda paradox or “the law of small numbers” or various other famous cognitive illusion topics is that, with this football example, there’s a very clear action you can do: you can go for it on 4th down! In contrast, you can’t directly do much with the hot-hand fallacy, even to the extent it is real. Sure, you can decide to pass the ball less frequently to players who have been hot, but that’s not as clearly defined as deciding not to punt at a certain juncture in the football game.

      I guess what I’m saying is that one strength of the “sports analytics” take on 4th down is that it’s a well-defined problem, so it doesn’t involve the same issues of extrapolation as typically are involved when generalizing from laboratory studies or statistical analyses to the real world.

  9. I believe the 4th Down charts use the arithmetic mean of win probability added (WPA) to assess which scenarios are WPA positive. I am curious if the arithmetic mean is appropriate here, or geometric mean of WPA would better suit the analysis. Geometric mean typically favors more risk averse strategies favored by practitioners (i.e. coaches) and could sway the balances on some of the more interesting go for it calls – like going for it on 4th-and-1 deep in your own territory, which I have seen some charts say is a “go for it” scenario.

    • Aeforty:

      With probabilities you always want to use expectation—that is, arithmetic mean—because the expectation of an expectation is an expectation. If the goal is winning the game, you want to go with the strategy that gives you the highest probability of winning. Concepts of risk aversion should not apply. In real life, though, winning is not the only goal. For example, the coach might not want to look bad. There are other factors such as minimizing risks of player injuries and increasing your team’s chance of beating the spread.

Leave a Reply

Your email address will not be published. Required fields are marked *