“Dow 36,000” guy offers an opinion on Tom Brady’s balls. The rest of us are supposed to listen?

Football season is returning so it’s time for us to return to that favorite statistical topic from the past football season: Tom Brady’s deflated balls.

Back in June, Jonathan Falk pointed me to this report.

You can click through if you’d like and take a look. I didn’t bother reading it because it had no graphs, just lots of text and some very long tables. Also I happened to notice the author list.

My response to Falk: “Kevin Hassett, huh? The Dow is at 36,000 and Tom Brady is innocent.”

Falk replied:

That part is just a bonus.

More seriously, both reports focus on unknown scenarios as to which gauge was used on which balls at hich time. This is almost exactly like one of those standard philosophical setups of the anomalies of classical frequentist statistics. Y’know, the ones where you randomize between more and less imperfect measuring devices and improve your p values. (References available on request.) Neither report adds a hyperparameter for the probability of each gauge preferring instead some sort of mishmash of permutation analysis and “cases analyzed.”

A couple days later he followed up:

OK. Now having spent a lot more time with both reports than I had before, while I stand by my previous remarks, more or less, there is one issue which actually deserves (IMO) wider dispersion.

The central difference between the original result and the Hassett et al. critique is the use of a variable which adjusts for the order in which the balls were measured at halftime. The original report considered such an adjustment but rejected it because the coefficient on the order variable was statistically insignificant. (p=0.299. fn. 49 [Gotta love the three decimal places there, no?]) Hassett, using a slightly different method, agrees that it is statistically insignificant but includes it because the effect is known, even by the authors of the original report, to be an important to the physics of what went on and its directionality (if not magnitude) is perfectly clear.

So, while I continue to believe that a full hierarchical Bayesian setup would have made the results much clearer, even more important is the statistical point that what needs to be included in the model or not is what is important, not whether its noise level covers an estimate of zero. Any Bayesian analysis would clearly have a strong prior on a positive order coefficient. To not do so violates PV=nRT. By including a “statistically insignificant” variable, Hassett et al. should be commended.

Sure, but I still can’t get around that Dow thing. I just can’t trust anything this guy writes. I’m not saying he’s in the category of John Gribbin or Dr. Anil Potti or Steven J. Gould or Michael Lacour, but, still, once I lose trust in an analyst, it’s hard to motivate myself to go to the trouble to take anything he writes seriously. There are just too many ways for someone to distort an argument, if he or she has a demonstrated willingness to do so.

30 thoughts on ““Dow 36,000” guy offers an opinion on Tom Brady’s balls. The rest of us are supposed to listen?

  1. If you read the Wells Report, it says the referee said he believes he used one gauge but the Report without giving a reason decides he used the other. That choice made the violation; if they took the referee’s memory as fact, this entire thing goes away. When the Report concludes “more probable than not”, it has included in that “probability” the choice of gauge that goes against the only testimony, but it doesn’t say that. To be clear, it doesn’t say, “more probable than not” if “either gauge was used” and doesn’t evaluate either scenario separately – which would be “more probable than not” if x gauge was used and “less (or not) probable” if y gauge was used. So when people cite this finding, it hides this manipulation. That says to me either agenda or bad work.

  2. Bad p-value definition from Exponent:
    “The convention in statistical applications is to declare a finding significant if the p-value is less than 0.05—i.e., there is less than a 5% probability of observing a finding of that magnitude by chance. In other words, if the p-value is less than 0.05, there is a statistically significant difference between the average decrease in pressure of the Patriots footballs when compared with the average decrease in pressure of the Colts footballs.”
    See page 171 of pdf: https://www.documentcloud.org/documents/2073728-ted-wells-report-deflategate.html

    Should be: “There is less than 5% probability of observing a difference of that magnitude or greater if the null model is true.”

    Those significance tests are irrelevant to the investigation anyway. They go ahead and do the scientific thing of figuring out whether different combinations of factors can explain a difference of that magnitude. I do think they missed a few possibilities though.

    What if the Patriot balls were filled with warmer (relative to Colts; say 80 F) air before being checked by the referee? Then there would be a bigger drop in pressure (due to bigger deltaT). They assume the air in the footballs was at equilibrium with the room air, but the measurements only took place about an hour later:

    “Approximately three hours before the officials arrived at Gillette Stadium, the Patriots were finalizing the preparation of the footballs that would be used during the AFC Championship Game.
    […]
    Jastremski packed the game balls in one bag and the back-up balls in another, leaving them in the equipment room for McNally to bring to the Officials Locker Room, which he did around 2:50 p.m., as can be seen on security footage from the corridor outside the Officials Locker Room
    […]
    at approximately 3:45 p.m., Anderson, with the assistance of Greg Yette, began preparing the footballs for inspection.”

    Later, in figure 21 they report:
    “the pressure inside the balls quickly drops as they are initially exposed to the colder temperature and stabilizes as they gradually approach equilibrium at the end of 2 hours.”

    Without knowing the initial air temperature and exact times the balls were filled/measured, I don’t think we can rule that one out. Maybe I missed it though.

    • > Later, in figure 21 they report: “the pressure inside the balls quickly drops as they are initially exposed to the colder temperature and stabilizes as they gradually approach equilibrium at the end of 2 hours.” Without knowing the initial air temperature and exact times the balls were filled/measured, I don’t think we can rule that one out.

      Indeed. pV=nRT. Doing a back-of-the-envelope calculation, the observed pressure drop is consistent with a temperature drop from 298 K (nominal locker room temp) to 286 K (55 deg F). What was the temperature on the field and how long after the balls were brought off the field were their pressure’s measured? Do a lab experiment where you inflate the balls to 12.5 psig at whatever temp you believe the facility was at before the game, cool them to whatever the field temp was, then monitor gauge pressure vs time after you bring them back into the locker room (or wherever the temp was measured). The reality check on whether you’ve got a potential scandal is whether the time window for the balls to fall into the 11.3-11.5 psig range is consistent with the time between when they were removed from the field and when the pressure was measured. And, setting aside the uncertainty in the gauges, you should be able to calculate the uncertainty in what pressure you’ll measure presuming Newton’s law of cooling (or what cooling rate you observe in your experiments) and the uncertainty in delta(time between removal from field and pressure measurement).

      • I was considering a different scenario. It could be accidental, or purposeful. Imagine filling the balls with hot air or storing them in a hot room before dropping them off at the referee before the game. Drop them off as soon before the pressure is measured as possible. Alternatively, keep them in an insulated bag. Then the referee measured pressure would be higher than expected due to room temperature (because the balls have not reached equilibration) and the observed deflation can be achieved on the field.

        I just don’t see where they address that scenario.

  3. Kevin Hassett has many reasonable thoughts on a number of topics which he knows a fair bit about.

    A lot of folk’s models predicted Dow 36,000. Most just didn’t have the courage of their convictions to put the prediction out there with their name on it. I’d rather make a bold call, one that is transparently falsifiable, then retreat to vagaries with no empirical content, which is so common in the space in which he was operating. I don’t think Hassett would deny that the call turned out to be wrong. Shaming people for making predictions which ex post didn’t pan out just discourages people from making the predictions, which discourages all the great things that come from making a sincere effort.

    • > Shaming people for making predictions which ex post didn’t pan out just discourages people from making the predictions, which discourages all the great things that come from making a sincere effort.

      No. Making a sincere effort is not adequate. Sincere efforts grounded ignorance should be savaged. Dow 36,000 was delusional. Patting a four-year-old on the head and saying, “Nice try.” may be acceptable but we need to demand that adults make an effort to be reality-based.

        • And someday the Sun will go out.

          PS I don’t hold Mankiw in high regard but +1 to him for this: “But should one be as confident as Glassman and Hassett that the process will continue until the risk premium shrinks to zero and the Dow reaches 36,000? I don’t think so. It is easy to imagine that some short-term event might shake investor confidence in the long-term stability of the market and push the equity risk premium back up.”

        • Yes, he disagrees with Hassett, as did I. But notice how, in an effort to understand where he is coming from, Mankiw sees his case as reasonable, even as it leaves something to be desired. He doesn’t compare him to a four year old, and doesn’t ignore everything he has to say on every topic because he made one bold, but bad call. Irving Fisher continues to be a major influence on current monetary economics, despite his terrible call Mankiw mentions.

        • Also, while you may not hold Mankiw in high regard, note that by every measure the economics profession does, as they teach from his textbook, make him chair of the top department in the world, and regularly list him as a possible Nobel candidate. So I think his opinion on this is not noteworthy simply because he agrees with you that it is all things considered a bad call.

        • A different kind of bold but bad call – http://www.theguardian.com/world/2010/apr/05/wikileaks-us-army-iraq-attack

          Consider the decision process of the protagonists. Is it appropriate to say “They were sincere.” and leave it at that? When is sincerity sufficient and when is it essential to be correct? Let’s say Hassett’s error is acceptable and the gunner’s error is not. Where (roughly) does one draw the boundary between acceptable and unacceptable errors?

    • Ram:

      The “Dow 36,000” prediction was not just transparently falsifiable, it was transparently false!

      And I completely disagree with you on the incentives. If someone makes a bold prediction and gets it right, then, sure, he gets credit. But the flipside of this is, if his prediction is wildly, terribly wrong, then, yes, he gets mocked. If you credit people for their bold and correct predictions and don’t debit them for their mistakes. Otherwise the expected value of making a ridiculous prediction is positive, and there’s no incentive for anyone to make sense.

      To put it another way: a risky move is just that, risky. By publishing “Dow 36,000” amid a hoopla of press, Hassett was putting his reputation on the line. He did the equivalent of placing all his credibility on the “00” spot on the roulette wheel and spinning away. And he lost. His reputation is now the property of some Las Vegas casino.

      That’s fair enough. In the unlikely event that the Dow had hit 36,000 on schedule, the guy would have a huge, Taleb-like reputation. He made his gamble and lost, and we should all respect that.

      I’m not “shaming” Hassett, I’m just saying I have no reason to take anything he says seriously.

      • I’m saying that you have at least some reason to take some of the things he says seriously. Namely those things on which he is a recognized expert. Among those things is not, of course, stock market forecasting, but everyone who continues to recognize him as an expert in certain areas, namely other economists specializing in those areas, is also aware that he made a spectacularly wrong call on the stock market once, but have not been led by this to ignore everything else he has to say. I think this might be because his call, though bold, does not provide any evidence that he is a fundamentally irrational person. But you’re a busy guy, you only have so much time, etc. etc. Feel free to ignore him, but if you want to appropriately evaluate his credibility, I think you’re going to have to present more than the single worst bit of reasoning in his long and otherwise productive career.

        FWIW, the same applies to every other person who has made 1 or a few spectacularly bad calls (almost everyone), but who continues to be recognized as an expert on any number of other subjects. This isn’t about Hassett–I don’t know the guy, and I doubt he much cares what any of us think.

        • Ram:

          Lots of people who are not stock market experts have made spectacularly wrong calls on the stock market. But few have written books and staked their reputations on such calls.

          Whether Hassett cares what I think it irrelevant. John Updike and Steven J. Gould certainly doesn’t care what I think about them, but I’m still gonna opine about them. More relevant in this case is that someone asked me for my opinion on this Hassett-penned piece, and I replied that I have no reason to take anything seriously that he writes. Hassett made a high-stakes gamble with his reputation as collateral, and he lost. Too bad—but that’s the kind of thing that can happen when you make a high-stakes gamble.

      • That’s kind of silly. If his prediction had been correct, does that mean you would automatically believe every prediction he made in the future? And assuming your answer is “no”, why would you then discount all his future predictions because his original one was incorrect?

        • Stephen:

          There are enough people out there who know what they’re talking about, that I don’t see the need for any bandwidth to be occupied by Hassett, someone who clearly doesn’t know what he’s talking about.

  4. Thanks for that. I’ll have a deeper read later. I really enjoyed Gould’s The Mismeasure of Man when I read it (has it really been?) 20 years ago. I also don’t know what Gribbin, Potti or Lacour did to make the list, but can it really be that Gould is in the league of Mr. Dow 36,000. Kevin Hassett is a part of a special wing of the pundocracy which specializes in preaching to the converted. I’m not a paleontologist, but Gould struck me as doing, y’know, Science. Hassett doesn’t even come close to doing Science; he’s a paid windbag.

    *Oh, look! It’s in Wiki:
    https://en.wikipedia.org/wiki/Stephen_Jay_Gould#The_Mismeasure_of_Man

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *