Logical reasoning typically takes the following form:

1. I know that A is true.
2. I know that A implies B.
3. Therefore, I can conclude that B is true.

I, like Lewis Carroll, have problems with this process sometimes, but it’s pretty standard.

There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc.).

But in all these stories, the argument can fall down if you get the facts wrong. Perhaps that’s one reason that statisticians can be obsessed with detail.

For example, David Brooks wrote the following, in a column called “Living with Mistakes”:

The historian Leslie Hannah identified the ten largest American companies in 1912. None of those companies ranked in the top 100 companies by 1990.

Huh? Could that really be? I googled “ten largest american companies 1912” and found this, from Leslie Hannah:

No big deal: two still in the top 10 rather than zero in the top 100, but Brooks’s general point still holds. As Brooks said, we have to live with mistakes. This is more a comment on how a statistician such as myself will see a number and immediately feel the urge to check it.

If you don’t have that instinct—that feeling that numbers should directly correspond to reality—then I think you’re missing part of what it takes to really do statistics. A statistician who doesn’t care about the numbers can be helpful and even make major contributions, but I still think something is missing. The analogy might be a physicist who doesn’t like to tinker with machines or a chemist who doesn’t like to play around in the lab or a psychologist who has no curiosity about human motivations or an artist who doesn’t like to doodle.

Again, this is no criticism of Brooks—as a journalist, he’s of course more interested in good stories than in getting the details right (recall the notorious \$20 dinner at Red Lobster). That’s ok. Storytelling is his job, numbers are mine.

P.S. There also might be some important part of the story that I’m missing. Brooks’s column doesn’t supply a link to his data source but I’m willing to be corrected if there’s something else going on.

1. Jonathan says:

Well done Andrew.

• Evens Salies says:

To see clearly the problem, i would use the following example (http://www.evens-salies.com/2011_TrueNotTrue.jpg). One can use instead a Venn diagram with two sets having a non empty intersection.

Let the row event “took an iq test” that can be true or “not”, and B the column event “being a human” that can be true or not (intelligent computer, e.g.). The case described by Andrew relates to the rules of inference for sufficient and necessary conditions known as modus ponens and tollens. There are two premises (A true, A implies B), and one conlusion (B true). This is modus pollens in logic.

But as poeple who make inference, we often interpret the facts as follows: I took and iq test (A true), I am a human (B true) so it’s easy to conclude A implies B and possibly notB implies notA whereas it would be important to also consider whether A implies notB is true and notA implies B is true. In the bottom table, this is precisely what i consider, all cells are non empty.

But, as you can see, in the present example, it’s correct to behave as if the top table were correct when one is facing some inference/causation problem. If we do so, it’s because it works in many situations. Following Andrew’s argument, using the top table is not a big mistake. In fact, in this precise example, it is a good “approximation” to reality.

2. Ken says:

If you Google “David Brooks is wrong,” you get 20 million hits.

• Andrew says:

Just don’t google “Andrew Gelman proved a false theorem”!

• gpp says:

Actually, 27,400. David, is that you?

• matt w says:

It actually is 20 million if you Google without quotes — that is, if you Google “David Brooks is wrong” rather than “‘David Brooks is wrong.'”

But you shouldn’t do that.

3. Roger Peng says:

I think this is the “hacker instinct” that you need to have to be a good statistician/physicist/chemist/programmer/etc. Have you figured out a way to teach it? I haven’t….

4. Gabriel says:

I’m surprised you give Brooks such an easy time on this. In this particular case, the error doesn’t make much difference for his point, but if he’s habitually so sloppy with facts, in other cases misstating the facts may well lead him to draw conclusions not supported by evidence.

Additionally, as you suggest, it would be stunning if the statement as he made it were actually true. Doesn’t the fact that this didn’t raise a red flag for him–and thus lead him to take 30 seconds to Google it–make you doubt his judgement more generally?

• Andrew says:

Gabriel:

I already knew about the Red Lobster thing, so this new item didn’t much change my judgment about Brooks’s judgment. But I’m no expert on corporate history, so I’m willing to believe there might be part of this story that I’m missing.

• Zubon says:

I’m with Gabriel here. Confusing the US with the entire world seems common for Americans, but changing from “0 in the top 100” to “2 in the top 10” reverses the conclusion. As with the examples cited in the Red Lobster link, claims about whether something does or does not exist seem binary. Changing a statement from “I could not do this” to “I could trivially do this” is only a one-word change, but that’s an important word.

5. MAYO says:

I’m glad Andrew writes, “If you don’t have that instinct—that feeling that numbers should directly correspond to reality—then I think you’re missing part of what it takes to really do statistics”, because it is the position of the error-statistician, provided “the reality” refers to concerns some aspect of the real world, and not merely someone’s opinion of it. Some famous Bayesians (e.g., Savage) have said that an asset of Bayesian method is that it “reinstates opinion in statistics” (1964, 178)—among many other places.
Incidentially, my favorite little paper is one by E.S. Pearson (responding to Fisher) called “Statistical concepts in Their Relation to Reality”.

6. Epanechnikov says:

Here is another interesting example:

Professor H.Polemarchakis claimed that

“Larissa tops the list, world-wide, for the per-capita ownership of Porsche Cayennes, the pricey SUV. The proliferation of Cayennes is a curiosity, given that farming is not a flourishing sector in Greece, where agricultural output generated a mere 3.2 per cent of GNP in 2009 (down from 6.65 per cent in 2000) and transfers and subsidies from the European Commission provide roughly half of the nation’s agricultural income. A couple of years ago, there were more Cayennes circulating in Greece than individuals who declared and paid taxes on an annual income of more than €50,000, a figure only slightly above the vehicle’s list price.”

http://www2.warwick.ac.uk/knowledge/themes/02/credit_crocodile_hearts/

However I found out that according to Professor D. Spinellis and former General Secretery for Information Systems at the Greek Ministry of Finance

“The claim is incorrect by more than an order of magnitude. These are the numbers according to database queries we ran at the General Secreteriat for Information Systems at the Greek Ministry of Finance. In 2010 there were 130.385 taxpayers individually declaring more than €50,000 taxable income. Adding up the numbers of the published 2009 data (tables Π6Α.09 and Π6Β.09) gives a similar figure (138.060 taxpayers). The registry used for issuing the road tax contains 5808 cars with a vehicle identification number corresponding to Porsche (WP0 or WP1).”

http://skeptics.stackexchange.com/questions/6881/are-there-more-greeks-driving-porsche-cayennes-than-paying-high-rate-tax/6883#6883

7. Nameless says:

I’m not sure how Brooks could misread this – Hannah does not seem to have ever made that claim.

Just for the reference. Top 10 American companies in 1912:
1. U.S.Steel
2. Jersey Standard
3. Pullman
4. Anaconda
5. General Electric
6. Singer
7. American Tobacco
8. International Harvester
9. Eastman Kodak
10. Armour

Of the ten, GE, Kodak and Jersey Standard were still in top 100 in 1990 (Jersey Standard became Exxon), Pullman, Anaconda, and Armour were acquired, and the remaining four fell on hard times (U.S.Steel and International Harvester were delisted by Dow in 1991).

In the twenty years past, Kodak dropped out of top 100 as well. U.S.Steel and International Harvester are still around and sport annual revenues over \$5 billion. Executives of American Tobacco foresaw the drop in revenues in tobacco and diversified the company into anything they could put their hands on. Last month the descendant of American Tobacco split into two companies, “Fortune Brands Home & Security” (best known as a manufacturer of cheap combination padlocks and kitchen faucets for Home Depot), and “Beam Inc” (owner of such brands as Jim Beam and Courvoisier).

8. Nameless says:

OK, here’s the source of confusion. He’s reviewing the book “Adapt: Why Success Always Starts with Failure” by Tim Harford. An excerpt from that book, page 9:

“… the economic historian Leslie Hannah, who in the late 1990s decided to trace the fortunes of every one of the largest companies in the world in 1912. These were corporate gianst that had survived a merger shakedown over the preceding few years and typically employed at least ten thousand workers.
At the top of the list was US Steel, a gigantic corporation even by today’s standards, employing 221,000 workers. This was a company with everything going for it: it was the market leader in the largest and the most dynamic economy in the world; and it was in an industry that has been of tremendous importance ever since. Yet US Steel had disappeared from the world’s top hundred companies by 1995; at the time of writing, it was not even in the top five hundred.
Next on the list was Jersey Standard, which these days contunues to prosper under the name Exxon. General Electric and Shell were also in the top ten both in 1912 and in 1995. But none of the other top-ten titans was in the top ten in 1995. More remarkably, none of them was even in the top hundred.”

Note that Harford is talking about top 10 in the _world_ (which includes Shell – originally a British/Dutch company, but does not include Kodak). Brooks evidently skimmed this page, remembered the “none of them was even in the top hundred” soundbyte, missed the counterexamples, and, on top of that, assumed that Harford was talking about top American companies.

9. Again, this is no criticism of Brooks—as a journalist, he’s of course more interested in good stories than in getting the details right

As a journalism professor who does quantitative research, I feel the need to point that this should be a criticism of Brooks — it’s bad journalism. If you’re getting the details wrong in the service of a “good” story, you’re doing it wrong, whether it’s front-page news or an op-ed. It’s particularly egregious for somebody like Brooks, who likes to pretend he’s a sociologist.

• Mark Palko says:

I was perplexed by this line too. Were you understating, being sarcastic? Otherwise, I would assume that the point of journalism is to tell a good story that’s true to the facts, something Brooks obviously failed to do here..

• Phil says:

Whenever Andrew points out on this blog that someone has done something bad, he says something like “I don’t mean to criticize” or similar. You-know-who seems to be the only exception, everyone else gets a pass in the sense that their mistakes are explicitly not to be used (according to Andrew) to judge their competence or ethics or judgment. It’s just Andrew’s way.

10. K? O'Rourke says:

[from Andrew’s link] classically-trained statisticians, who, Achilles-like, seem to expect and demand that I accept various implicit concepts … without ever making it clear why these concepts are good ideas.

They are certainly wishful ideas and certainly convenient to not question
(though to be fair, the better ones, often, even if obscurely, considered their robustness – to some degree at least).

But definitely part of checking one’s evidence (i.e. what’s being construed as evidence).

11. Steve Sailer says:

It’s important to remember stuff, like that Exxon is descended from John D. Rockefeller’s Standard Oil. If you can quickly do reality checks from memory, you are more likely to not make big sweeping statements that are wrong.

• Andrew says:

Indeed, it was the Standard Oil example that made me suspicious enough to check.

12. donald A. Coffin says:

I need to check in more often. What Brooks says is not what Hannah wrote. Brooks says that only 2 of the 10 leagest *American* firms in 1912 were in the 100 largest *American* firms in 1990. Hannah says that of the 10 largest *global* firms in 1912 only 2 (both American) were in the top 100 *global* firms in 1990. Another paper by Hannah suggests that of the 100 largest *American* companies in 1912, 19 were still among the largest *American* companies in 1990. Stability in England and in Germany were both *much greater*, with over 40 companies in those two countries appearing on the 1912 and 1190 lists for those countries respectively.

The question is, given the growth in the world economy (which has been faster than the growth of the American economy, especially since WWII), what would our *expectation* be?