“Dow 36,000” guy offers an opinion on Tom Brady’s balls. The rest of us are supposed to listen?

Posted on September 8, 2015 9:14 AM by Andrew

Football season is returning so it’s time for us to return to that favorite statistical topic from the past football season: Tom Brady’s deflated balls.

Back in June, Jonathan Falk pointed me to this report.

You can click through if you’d like and take a look. I didn’t bother reading it because it had no graphs, just lots of text and some very long tables. Also I happened to notice the author list.

My response to Falk: “Kevin Hassett, huh? The Dow is at 36,000 and Tom Brady is innocent.”

Falk replied:

That part is just a bonus.

More seriously, both reports focus on unknown scenarios as to which gauge was used on which balls at hich time. This is almost exactly like one of those standard philosophical setups of the anomalies of classical frequentist statistics. Y’know, the ones where you randomize between more and less imperfect measuring devices and improve your p values. (References available on request.) Neither report adds a hyperparameter for the probability of each gauge preferring instead some sort of mishmash of permutation analysis and “cases analyzed.”

A couple days later he followed up:

OK. Now having spent a lot more time with both reports than I had before, while I stand by my previous remarks, more or less, there is one issue which actually deserves (IMO) wider dispersion.

The central difference between the original result and the Hassett et al. critique is the use of a variable which adjusts for the order in which the balls were measured at halftime. The original report considered such an adjustment but rejected it because the coefficient on the order variable was statistically insignificant. (p=0.299. fn. 49 [Gotta love the three decimal places there, no?]) Hassett, using a slightly different method, agrees that it is statistically insignificant but includes it because the effect is known, even by the authors of the original report, to be an important to the physics of what went on and its directionality (if not magnitude) is perfectly clear.

So, while I continue to believe that a full hierarchical Bayesian setup would have made the results much clearer, even more important is the statistical point that what needs to be included in the model or not is what is important, not whether its noise level covers an estimate of zero. Any Bayesian analysis would clearly have a strong prior on a positive order coefficient. To not do so violates PV=nRT. By including a “statistically insignificant” variable, Hassett et al. should be commended.

Sure, but I still can’t get around that Dow thing. I just can’t trust anything this guy writes. I’m not saying he’s in the category of John Gribbin or Dr. Anil Potti or Steven J. Gould or Michael Lacour, but, still, once I lose trust in an analyst, it’s hard to motivate myself to go to the trouble to take anything he writes seriously. There are just too many ways for someone to distort an argument, if he or she has a demonstrated willingness to do so.

30 thoughts on ““Dow 36,000” guy offers an opinion on Tom Brady’s balls. The rest of us are supposed to listen?”

Dan Linden on September 8, 2015 9:56 AM at 9:56 am said:

The use of informative priors that conform to scientific certainties is something that does not receive near enough attention.

Reply ↓
jonathan on September 8, 2015 10:07 AM at 10:07 am said:

If you read the Wells Report, it says the referee said he believes he used one gauge but the Report without giving a reason decides he used the other. That choice made the violation; if they took the referee’s memory as fact, this entire thing goes away. When the Report concludes “more probable than not”, it has included in that “probability” the choice of gauge that goes against the only testimony, but it doesn’t say that. To be clear, it doesn’t say, “more probable than not” if “either gauge was used” and doesn’t evaluate either scenario separately – which would be “more probable than not” if x gauge was used and “less (or not) probable” if y gauge was used. So when people cite this finding, it hides this manipulation. That says to me either agenda or bad work.

Reply ↓
Oliver on September 8, 2015 10:53 AM at 10:53 am said:

Forgive my ignorance, but what did Stephen J. Gould do?

Reply ↓
- Jeremy Fox on September 8, 2015 1:14 PM at 1:14 pm said:
  
  Andrew could of course confirm, but I’m guessing he’s referring to this: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001071
  
  Reply ↓
- Stephen on September 11, 2015 6:00 PM at 6:00 pm said:
  
  Gould had a long track record of essentially just lying about any scientific area that he disliked for ideological reasons (he was very far towards the left-wing). Most notably, his disingenuous trashing of IQ research set the public image of the field back decades, and many of his misrepresentations are still recited in the mainstream media today.
  
  Reply ↓
Anoneuoid on September 8, 2015 5:33 PM at 5:33 pm said:

Bad p-value definition from Exponent:
“The convention in statistical applications is to declare a finding significant if the p-value is less than 0.05—i.e., there is less than a 5% probability of observing a finding of that magnitude by chance. In other words, if the p-value is less than 0.05, there is a statistically significant difference between the average decrease in pressure of the Patriots footballs when compared with the average decrease in pressure of the Colts footballs.”
See page 171 of pdf: https://www.documentcloud.org/documents/2073728-ted-wells-report-deflategate.html

Should be: “There is less than 5% probability of observing a difference of that magnitude or greater if the null model is true.”

Those significance tests are irrelevant to the investigation anyway. They go ahead and do the scientific thing of figuring out whether different combinations of factors can explain a difference of that magnitude. I do think they missed a few possibilities though.

What if the Patriot balls were filled with warmer (relative to Colts; say 80 F) air before being checked by the referee? Then there would be a bigger drop in pressure (due to bigger deltaT). They assume the air in the footballs was at equilibrium with the room air, but the measurements only took place about an hour later:

“Approximately three hours before the officials arrived at Gillette Stadium, the Patriots were finalizing the preparation of the footballs that would be used during the AFC Championship Game.
[…]
Jastremski packed the game balls in one bag and the back-up balls in another, leaving them in the equipment room for McNally to bring to the Officials Locker Room, which he did around 2:50 p.m., as can be seen on security footage from the corridor outside the Officials Locker Room
[…]
at approximately 3:45 p.m., Anderson, with the assistance of Greg Yette, began preparing the footballs for inspection.”

Later, in figure 21 they report:
“the pressure inside the balls quickly drops as they are initially exposed to the colder temperature and stabilizes as they gradually approach equilibrium at the end of 2 hours.”

Without knowing the initial air temperature and exact times the balls were filled/measured, I don’t think we can rule that one out. Maybe I missed it though.

Reply ↓
- Chris G on September 8, 2015 10:30 PM at 10:30 pm said:
  
  > Later, in figure 21 they report: “the pressure inside the balls quickly drops as they are initially exposed to the colder temperature and stabilizes as they gradually approach equilibrium at the end of 2 hours.” Without knowing the initial air temperature and exact times the balls were filled/measured, I don’t think we can rule that one out.
  
  Indeed. pV=nRT. Doing a back-of-the-envelope calculation, the observed pressure drop is consistent with a temperature drop from 298 K (nominal locker room temp) to 286 K (55 deg F). What was the temperature on the field and how long after the balls were brought off the field were their pressure’s measured? Do a lab experiment where you inflate the balls to 12.5 psig at whatever temp you believe the facility was at before the game, cool them to whatever the field temp was, then monitor gauge pressure vs time after you bring them back into the locker room (or wherever the temp was measured). The reality check on whether you’ve got a potential scandal is whether the time window for the balls to fall into the 11.3-11.5 psig range is consistent with the time between when they were removed from the field and when the pressure was measured. And, setting aside the uncertainty in the gauges, you should be able to calculate the uncertainty in what pressure you’ll measure presuming Newton’s law of cooling (or what cooling rate you observe in your experiments) and the uncertainty in delta(time between removal from field and pressure measurement).
  
  Reply ↓
  - Anoneuoid on September 8, 2015 11:06 PM at 11:06 pm said:
    
    I was considering a different scenario. It could be accidental, or purposeful. Imagine filling the balls with hot air or storing them in a hot room before dropping them off at the referee before the game. Drop them off as soon before the pressure is measured as possible. Alternatively, keep them in an insulated bag. Then the referee measured pressure would be higher than expected due to room temperature (because the balls have not reached equilibration) and the observed deflation can be achieved on the field.
    
    I just don’t see where they address that scenario.
    
    Reply ↓
Ram on September 8, 2015 10:19 PM at 10:19 pm said:

Kevin Hassett has many reasonable thoughts on a number of topics which he knows a fair bit about.

A lot of folk’s models predicted Dow 36,000. Most just didn’t have the courage of their convictions to put the prediction out there with their name on it. I’d rather make a bold call, one that is transparently falsifiable, then retreat to vagaries with no empirical content, which is so common in the space in which he was operating. I don’t think Hassett would deny that the call turned out to be wrong. Shaming people for making predictions which ex post didn’t pan out just discourages people from making the predictions, which discourages all the great things that come from making a sincere effort.

Reply ↓
- Chris G on September 8, 2015 10:36 PM at 10:36 pm said:
  
  > Shaming people for making predictions which ex post didn’t pan out just discourages people from making the predictions, which discourages all the great things that come from making a sincere effort.
  
  No. Making a sincere effort is not adequate. Sincere efforts grounded ignorance should be savaged. Dow 36,000 was delusional. Patting a four-year-old on the head and saying, “Nice try.” may be acceptable but we need to demand that adults make an effort to be reality-based.
  
  Reply ↓
  - Ram on September 8, 2015 10:47 PM at 10:47 pm said:
    
    http://scholar.harvard.edu/mankiw/content/dow-will-hit-36000-someday
    
    Reply ↓
    - Chris G on September 8, 2015 10:52 PM at 10:52 pm said:
      
      And someday the Sun will go out.
      
      PS I don’t hold Mankiw in high regard but +1 to him for this: “But should one be as confident as Glassman and Hassett that the process will continue until the risk premium shrinks to zero and the Dow reaches 36,000? I don’t think so. It is easy to imagine that some short-term event might shake investor confidence in the long-term stability of the market and push the equity risk premium back up.”
    - Ram on September 8, 2015 10:59 PM at 10:59 pm said:
      
      Yes, he disagrees with Hassett, as did I. But notice how, in an effort to understand where he is coming from, Mankiw sees his case as reasonable, even as it leaves something to be desired. He doesn’t compare him to a four year old, and doesn’t ignore everything he has to say on every topic because he made one bold, but bad call. Irving Fisher continues to be a major influence on current monetary economics, despite his terrible call Mankiw mentions.
    - Ram on September 8, 2015 11:01 PM at 11:01 pm said:
      
      Also, while you may not hold Mankiw in high regard, note that by every measure the economics profession does, as they teach from his textbook, make him chair of the top department in the world, and regularly list him as a possible Nobel candidate. So I think his opinion on this is not noteworthy simply because he agrees with you that it is all things considered a bad call.
    - Chris G on September 8, 2015 11:03 PM at 11:03 pm said:
      
      I hear you. But I’m not so charitable.
    - Chris G on September 8, 2015 11:28 PM at 11:28 pm said:
      
      A different kind of bold but bad call – http://www.theguardian.com/world/2010/apr/05/wikileaks-us-army-iraq-attack
      
      Consider the decision process of the protagonists. Is it appropriate to say “They were sincere.” and leave it at that? When is sincerity sufficient and when is it essential to be correct? Let’s say Hassett’s error is acceptable and the gunner’s error is not. Where (roughly) does one draw the boundary between acceptable and unacceptable errors?
- Andrew on September 8, 2015 11:28 PM at 11:28 pm said:
  
  Ram:
  
  The “Dow 36,000” prediction was not just transparently falsifiable, it was transparently false!
  
  And I completely disagree with you on the incentives. If someone makes a bold prediction and gets it right, then, sure, he gets credit. But the flipside of this is, if his prediction is wildly, terribly wrong, then, yes, he gets mocked. If you credit people for their bold and correct predictions and don’t debit them for their mistakes. Otherwise the expected value of making a ridiculous prediction is positive, and there’s no incentive for anyone to make sense.
  
  To put it another way: a risky move is just that, risky. By publishing “Dow 36,000” amid a hoopla of press, Hassett was putting his reputation on the line. He did the equivalent of placing all his credibility on the “00” spot on the roulette wheel and spinning away. And he lost. His reputation is now the property of some Las Vegas casino.
  
  That’s fair enough. In the unlikely event that the Dow had hit 36,000 on schedule, the guy would have a huge, Taleb-like reputation. He made his gamble and lost, and we should all respect that.
  
  I’m not “shaming” Hassett, I’m just saying I have no reason to take anything he says seriously.
  
  Reply ↓
  - Ram on September 9, 2015 12:18 AM at 12:18 am said:
    
    I’m saying that you have at least some reason to take some of the things he says seriously. Namely those things on which he is a recognized expert. Among those things is not, of course, stock market forecasting, but everyone who continues to recognize him as an expert in certain areas, namely other economists specializing in those areas, is also aware that he made a spectacularly wrong call on the stock market once, but have not been led by this to ignore everything else he has to say. I think this might be because his call, though bold, does not provide any evidence that he is a fundamentally irrational person. But you’re a busy guy, you only have so much time, etc. etc. Feel free to ignore him, but if you want to appropriately evaluate his credibility, I think you’re going to have to present more than the single worst bit of reasoning in his long and otherwise productive career.
    
    FWIW, the same applies to every other person who has made 1 or a few spectacularly bad calls (almost everyone), but who continues to be recognized as an expert on any number of other subjects. This isn’t about Hassett–I don’t know the guy, and I doubt he much cares what any of us think.
    
    Reply ↓
    - Andrew on September 9, 2015 1:13 AM at 1:13 am said:
      
      Ram:
      
      Lots of people who are not stock market experts have made spectacularly wrong calls on the stock market. But few have written books and staked their reputations on such calls.
      
      Whether Hassett cares what I think it irrelevant. John Updike and Steven J. Gould certainly doesn’t care what I think about them, but I’m still gonna opine about them. More relevant in this case is that someone asked me for my opinion on this Hassett-penned piece, and I replied that I have no reason to take anything seriously that he writes. Hassett made a high-stakes gamble with his reputation as collateral, and he lost. Too bad—but that’s the kind of thing that can happen when you make a high-stakes gamble.
  - Stephen on September 11, 2015 6:05 PM at 6:05 pm said:
    
    That’s kind of silly. If his prediction had been correct, does that mean you would automatically believe every prediction he made in the future? And assuming your answer is “no”, why would you then discount all his future predictions because his original one was incorrect?
    
    Reply ↓
    - Andrew on September 11, 2015 6:18 PM at 6:18 pm said:
      
      Stephen:
      
      There are enough people out there who know what they’re talking about, that I don’t see the need for any bandwidth to be occupied by Hassett, someone who clearly doesn’t know what he’s talking about.
Oliver on September 9, 2015 3:18 AM at 3:18 am said:

Thanks for that. I’ll have a deeper read later. I really enjoyed Gould’s The Mismeasure of Man when I read it (has it really been?) 20 years ago. I also don’t know what Gribbin, Potti or Lacour did to make the list, but can it really be that Gould is in the league of Mr. Dow 36,000. Kevin Hassett is a part of a special wing of the pundocracy which specializes in preaching to the converted. I’m not a paleontologist, but Gould struck me as doing, y’know, Science. Hassett doesn’t even come close to doing Science; he’s a paid windbag.

*Oh, look! It’s in Wiki:
https://en.wikipedia.org/wiki/Stephen_Jay_Gould#The_Mismeasure_of_Man

Reply ↓
- konrad on September 9, 2015 2:56 PM at 2:56 pm said:
  
  The dirt on Potti and Lacour is readily available:
  https://en.wikipedia.org/wiki/Anil_Potti
  https://en.wikipedia.org/wiki/When_contact_changes_minds
  
  No idea why Andrew has a problem with John Gribbin though?
  
  Reply ↓
  - Andrew on September 9, 2015 3:12 PM at 3:12 pm said:
    
    Konrad:
    
    Search on Gribbin here.
    
    Reply ↓
    - konrad on September 9, 2015 4:16 PM at 4:16 pm said:
      
      Wow, that is rather surprising – I hadn’t heard of that book before, and the rest of his career seems entirely respectable. I wonder if it makes a difference that he had a co-author for the Jupiter Effect books. Also, perhaps he should get some slack for consistently and publicly admitting his error. E.g. from https://en.wikipedia.org/wiki/The_Jupiter_Effect:
      
      “In his book, The Little Book of Science (pub. 1999), Dr. Gribbin admitted about his “Jupiter Effect” theory “…I don’t like it, and I’m sorry I ever had anything to do with it.””
Bob on September 9, 2015 8:26 PM at 8:26 pm said:

Andrew,
How do you like this analysis of Brady’s Balls? It seems quite solid to me—but I did not check the calculations.

http://climateaudit.org/2015/08/10/variability-of-patriot-and-colt-footballs/

Bob

Reply ↓
- Andrew on September 9, 2015 8:40 PM at 8:40 pm said:
  
  Bob:
  
  I actually have no interest in the topic; I was just amazed to see that the Dow 36,000 guy was still out there!
  
  Reply ↓
  - Bob on September 9, 2015 11:35 PM at 11:35 pm said:
    
    Well, the topic isn’t interesting; but, the way our society deals with disputes about technology and how statistics are used in such disputes is a topic that is often reflected in the posts on this blog.
    
    Bob
    
    Reply ↓
    - Bob on September 9, 2015 11:51 PM at 11:51 pm said:
      
      “are topics” not “is a topic”
      
      Sorry.
- Corey on September 9, 2015 9:14 PM at 9:14 pm said:
  
  Geez, please stop writing “Brady’s balls” — it spins up this friggin’ earworm in my head:
  
  “Brady, your balls are the only balls I need / And my end zone is where I want you to be”
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

“Dow 36,000” guy offers an opinion on Tom Brady’s balls. The rest of us are supposed to listen?

30 thoughts on ““Dow 36,000” guy offers an opinion on Tom Brady’s balls. The rest of us are supposed to listen?”

Leave a Reply Cancel reply