My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics”

I was thinking more about David Brooks’s anti-data column from yesterday, and I realized what is really bothering me.

Brooks expresses skepticism about numbers, about the limitations of raw data, about the importance of human thinking. Fine, I agree with all of this, to some extent.

But then Brooks turns around uses numbers and unquestioningly and uncritically (OK, not completely uncritically; see P.S. below). In a notorious recent case, Brooks wrote, in the context of college admissions:

You’re going to want to argue with Unz’s article all the way along, especially for its narrow, math-test-driven view of merit. But it’s potentially ground-shifting. Unz’s other big point is that Jews are vastly overrepresented at elite universities and that Jewish achievement has collapsed. In the 1970s, for example, 40 percent of top scorers in the Math Olympiad had Jewish names. Now 2.5 percent do.

But these numbers are incorrect, as I learned from a professor of oncology at the University of Wisconsin – Madison who has published a relevant article in the Notices of the American Mathematical Society on mathematics performance by gender and ethnicity on national and international mathematics competitions. Mertz found, based on her direct interviews with these students, that over 12% (her best guess is something like 16%, I think) of recent Math Olympiad participants were Jewish (and she believes the estimate of 40% for earlier years is too high). It turns out that the numbers Brooks was reported had been constructed from some sloppy counting.

My beef here, though, is not with Ron Unz, who did the sloppy counting. Unz is a political activist and it is natural for him to interpret the data in ways that are favorable to his case. Data analysis can be tricky, and even when people are trying to do their best, it’s easy to make mistakes and to get trapped by one’s own analysis (see, for example, Daryl Bem). It’s hard to get too angry at a political activist for finding what he’s looking for.

And my beef is not with David Brooks for including some faulty numbers in his column. There’s no way he has time to check every claim in everything he reads. There’s no perfect quality control, and the New York Times does not have the research to fact-check every one of their op-ed columns.

No, my beef is with David Brooks for not correcting his numbers. Janet Mertz contacted him and the Times to report that his published numbers were in error, and I also contacted Brooks (both directly and through an intermediary). But no correction has appeared.

The funny thing is, yesterday’s column would’ve been the perfect place for Brooks to make his correction. He could’ve just added a paragraph such as the following:

One trouble with numbers is they can be spuriously confusing. For example, I myself [Brooks] was misled just a couple months ago when reporting a claim by magazine publisher Ron Unz about a so-called “collapse of Jewish achievement.” In my column, I uncritically presented Unz’s claim that the percentage of top scorers in the American high school math Olympiad team had declined to 2.5%. The actual percentage is over 12%, as I have learned from Prof. Janet Mertz of the University of Wisconsin, who has published peer-reviewed articles on the topic of high-end mathematics achievement. The actual data show evidence not of a dramatic “collapse” but rather a gradual decline, explainable by increased competition for a fixed number of slots on the Olympiad team, together with demographic changes.

OK, that’s not so pithy. I’m sure Brooks and his editor could do better. My point is that, if Brooks wants to talk about the limitations of data, he could start with himself.

The problem with Brooks, as with many “quals,” is not that he operates on a purely qualitative level but rather that he does use data, he just doesn’t distinguish between good and bad data. He doesn’t seem to care.

To put it another way, if Brooks wants to claim, of American Jews, that “the fanatical generations of immigrant strivers have been replaced by a more comfortable generation of preprofessionals,” then, hey, go for it. The problem comes in when he supports this claim with bad data.

Just to be clear, I’m not trying to slam Brooks here. I have a beef with Brooks because I think he can do better. I think he’s right that overreliance on statistics can mislead, and I think he could make this point even better by recognizing how this problem has affected his own work.

As the great Bill James once said, the alternative to “good statistics” is not “no statistics,” it’s “bad statistics.”

P.S. I added additional sentences to the inline Brooks quote above in order to provide more context, to clarify that Brooks was presenting the numbers as coming from a particular outside source. It was not right for me to say he was presenting these numbers “unquestioningly,” as he does express some concerns. Brooks expressed some potential criticism of Unz’s conclusions but not of Unz’s numbers. The reason I still think a New York Times correction is in order is that the numbers appear to me to be presented as facts rather than as Unz’s claims. In any case, now that this 2.5% has been refuted, I think it makes sense to correct it. And, as noted above, I think such a correction is in keeping with Brooks’s larger message, which I support, that numbers can be misleading when we don’t know where they are coming from.

41 thoughts on “My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics”

    • cx,

      Unz (and Brooks) uses his bad data on % Jewish members of US IMO teams to claim “collapse” of Jewish achievement. Unz uses % Jews among National Merit Scholar semi-finalists as an indicator for Jewish achievement as a whole, comparing it to other bad data obtained from the Hillel Foundation regarding % Jews among undergraduate students attending Ivy League colleges. As the saying goes, “Garbage in, garbage out.”

      • NYC has math competitions like the Math League and the NYCIML , where both public and private schools participate. The Math League is actually nationwide. You will have a much bigger sample population compared to the NMS or the Math Olympiads etc. One can conduct interviews of participants in the contests as to who is Jewish and who is not. You can determine how many Jews end up as the top scorers in this contests or not and so on and so forth. Just looking at the state of New York and neigboring states will give you a better perspective of the situation. I suggest that Andrew talk to the organizers of these competitions if it will be feasible for him or anyone else who is interested to get a picture as to what is going on.

        • Biaknabato:

          “I suggest that Andrew . . .”: You gotta be kidding! If you want to do this research, do it yourself!

          I got involved in this because various people including me had made the mistake of taking certain numbers at face value and thus implicitly accepting a claim about bias in college admissions. It turned out that some of the key numbers and comparisons were wrong. So I wanted to clear things up and make it clear that what had been presented as evidence (not just by me, but also by David Brooks in the Times) was not correct. I have no desire to make this a research project of my own. If you want to study it, go for it! Interviewing kids and asking if they’re Jewish is not anything I want to do.

        • Well Andrew , I know life is very demanding on you however , that is a point where we can all start with , and I do know it would be hard on anyone to do that sort of thing. Sone guy with a million bucks to spare can……………

        • biaknabato,
          Analyzing data from math competitions is a good idea. One of the problems is one only wants to analyze the data of the top-scoring students if discussing elite college admissions. One possibility would be to examine data from ~300 students per year who qualify for the USA Math Olympiad exam rather than just the 6 top kids who are members of each year’s Math Olympiad team. However, it really is not possible to find out which ones among hundred or thousands over a multiple year period are Jewish by direct knowledge since this method is VERY labor intensive and, even then, incomplete, i.e., only provides a definitive lower bound since some students simply don’t want to reveal this personal information. The other, bigger problem with using math competition data is that it largely informs one about the %s for students who excel in mathematics. The very top math students tend to prefer colleges such as MIT and Caltech over most of the Ivies. As already noted here, most of the students who attend HPY major in the social sciences and humanities, not engineering or the hard sciences. Preferred major together with geography could well explain in larger part than the meritocracy hypothesis Unz assumes is the primary reason why Caltech has a much higher % Asian-Americans than does Harvard.

        • Janet,

          well at least i don;t have to to deal with with some of the boring blogs that I post in , I just found this blog BTW. Of course figuring out who is a Jew is problematic just like figuring out who is a CAtholic ? There are more people who attend Catholic services in AFrica on Sunday than all of Western Europe according to news reports. Are you going to call Spain a Catholic country or if Italy is a Catholic country ?. I’ll give you a follow up post later , no time rite now.

    • Slugger says: “Brooks is alive. These facts support the hypothesis of a sharp decline in Jewish achievement.”
      Yes, Unz’s data do. But Prof. Gelman’s point is that these are “bad data” from which an incorrect conclusion was drawn. Unz was claiming a 17-fold “collapse” in Jewish achievement. The actually drop among US IMO team members obtained from much more accurate data is 2-3 fold over a several decade period. Furthermore, the primary source of data Unz used for high academic achievement in his article was National Merit Scholar semi-finalist status, not the extremely high achievement level required to qualify for a 4-6 member Olympiad team. Assuming his number of 6% Jewish NMS from recent data is correct (which N.B. has disputed as being slightly low), it is a trivial drop from the 8% Jewish NMS reported in the 1980s, a drop even smaller than the change in % Jews among US high school students over this time period. Thus, Unz’s claim of a “collapse” that was reiterated by David Brooks simply did not happen, i.e., they reached an incorrect conclusion because they were basing it upon very poor quality data.

      • Janet, I think you missed Slugger’s joke. Slugger is saying that the fact that Walter Lippmann (an intellectual giant who was Jewish) has been replaced by David Brooks (an intellectual…um…not an intellectual giant, but Jewish) is itself evidence that Utz’s claim is true. It’s just a joke.

  1. Your reference to Bill James, particularly in this context, revived a question I’ve often considered: Did James’ lack of formal statistical training provide him with advantages over other baseball analysts (for instance, Pete Palmer, who was a radar systems engineer) in his baseball work? I see on James’ Wiki that he had education in English and Economics, so presumably he was exposed to some quantitative methods via Econ. (This was new to me; I’d always thought of him only as an English guy.) But clearly he’s far from a formally trained quantitative analyst, and perhaps this somehow allowed him to focus more tightly on the questions he wanted to answer — as opposed to falling in love with the math or actually being constrained in question asking the way a trained analyst might by something like an availability heuristic? Anyway, and going back to the blog post, perhaps some “quals” have value to add by providing clarity in question asking, and might even have an edge in doing so, even if they lack the technical abilities to answer the questions. Let’s hope they don’t lean toward “no statistics” and instead help work with the “quants” to ask and answer important questions with “good statistics.”

    • Right, James said his goal was to quantitatively answer questions about individual topics baseball men talked about. Pete Palmer’s 1984 book attempting to find a system to rank all players of all time was prematurely ambitious.

      • Bill James did, however, always have good things to say about Pete Palmer. James expressed skepticism about Palmer’s Linear Weights method but he respected it as a method parallel to his. I think that James was following the general principle that the most important thing about a statistical method is what goes in to it. So, even if Palmer’s approach was somewhat blinkered, it included important information that was not included in batting average etc.

        • Indeed he did, and the similarities between their approaches (eg, including additional important-but-missing inputs) and mutual respect put them on the same side of the battle, to be sure. But I still wonder why James was “better” (as I think he was) than Palmer, and my conjecture is that sometimes limitations (such as James’ relative lack of quantitative sophistication) provide clarity or demand a creativity that elevates a work. We see this in the arts sometimes, with the horror genre in cinema being a notable example where the lack of a big budget often results in some of the better and/or popular works (“Halloween”, “The Blair Witch Project”, “Paranormal Activity”, etc.) It’s not identical, but there’s something in common, I suspect.

        • Palmer’s LW method won out in the end. No one cares about Win Shares these days. And everyone uses WAR which uses wOBA, which is based on LW.

  2. This reminds me of sports fans who are anti-stats, but then cite extremely simple stats like points scored per game as all they need to know who’s the best player. It’s sort of a statistics version of the “uncanny valley” effect. Too much stats or data is troublesome to people, but if you just use a little bit less than that, it’s ok apparently.

  3. Okay, but what’s going on with National Merit Scholar semifinalists?

    My reader, who is Jewish, points out the decline in Jewish names from our day back in the 1970s. For example, in the entire state of California, only one semifinalist has a name beginning with “Gold…”. Other common Jewish names also show up only rarely among the semifinalists: Cohen (1), Levy (1), and Kaplan (1).

    So, in the state of California, there was one Cohen who was a semifinalist, but 49 Wangs. Wow.

    To give some perspective, if you search at Google News, there are 14,900 press pages currently mentioning “Cohen” (e.g., Sacha Baron-Cohen) and 14,500 currently mentioning “Wang” (e.g., Vera Wang), or about 1 to 1, not 49 to 1.

    http://isteve.blogspot.com/2010/09/national-merit-semifinalists-by-school.html

    • Steve:

      As Mertz wrote regarding math Olympiad participants, there has been a decline in the Jewish percentages of these high achievers, but not the dramatic collapse claimed by Unz. I agree with Mertz that it is reasonable to explain the gradual decline as a result of increased competition (notably from Asian immigrants and children of immigrants) and, I think to a lesser extent, demographics (Jews being a smaller proportion of the U.S. population than before).

      But that’s just a guess. As I wrote in the blog, I don’t really mind Brooks making broad claims; I just don’t think he should be basing them on bad numbers. Or, more to the point in this case, I think he should correct the numbers when the mistakes are pointed out to him.

      • So Andrew there are more Asians than whites in the fall 2012 freshman classes at nearby CCNY in Harlem and Baruch , they made up 46% of the freshman class compared to 30% for whites in the 2011 Baruch freshman class, so what can you infer from that vis a vis Jews? And Asians made up only 15.6 % in the 2012 Columbia fall 2012 freshman class , quotas anyone?. 37% of the fall 2012 freshman class at CCNY is Asian .

        • CCNY and Columbia are drawing from very different applicant pools with respect to both geography and level of academic achievement. Most of the students attending CCNY live in the NYC area; many of them live at home to save money, commuting to school. On the other hand, Columbia attracts students from the entire US and world, with diversity of its student body being one of its goals; thus, the distribution of where its students come from is very different than it is for CCNY. The income distribution of the parents of their students is probably also very different, with many being much wealthier. Just like NYC was populated by a high % recent immigrant Jews 50-100 years ago, it is now populated by a high % of recent immigrant Asians. For example, my parents, children of Jewish immigrants growing up in poverty, attended these two CCNY colleges in the 1930s. Bronx High School of Science when I attended it in the 1960s had ~80% Jewish students; now, most of its students are Asian-Americans. Demography, geography, and wealth is much of the answer.

          Unz’s 15.6% number is probably wrong, just like many of the other numbers in his article, possibly because he is taking the data from the NCES which includes students simply taking courses at the college who are not among the elite ones officially in the class of 2016. Columbia claims their class of 2016 is 29% Asian/Asian-American (http://www.studentaffairs.columbia.edu/admissions/sites/default/files/class_of_2016_profile.pdf).
          Assuming that 29% includes ~7% foreign Asian students, that comes to ~22% Asian-American, similar to the % Asian-Americans reported by Harvard for their class of 2016. Just like the Hillel data over-estimates % Jews attending these elite colleges, the NECS data under-reports % Asian-Americans attending them. Why didn’t Unz use the % Asian-Americans data the colleges were claiming in their official publications?

    • Steve, Jews aged 16-18 represent a far smaller share of the population in CA than Asian-Americans aged 16-18. Please note that Unz himself remarks in his article, “Further evidence is supplied by Weyl, who estimated that over 8 percent of the 1987 NMS semifinalists were Jewish,60 a figure 35 percent higher than found in today’s results. Moreover, in that period the math and verbal scores were weighted equally for qualification purposes, but after 1997 the verbal score was double-weighted*, which should have produced a large rise in the number of Jewish semifinalists, given the verbal-loading of Jewish ability. But instead, today’s Jewish numbers are far below those of the late 1980s.” So now Unz is claiming it’s 6%, which is not really a collapse when you consider the decline in the % of Jews (or at least non-ultra-Orthodox Jews) in the US population.

      Steve, I noticed that Unz attributes to you the J1 list of names: Cohen, Gold[], Kaplan, and Levy. Can you please explain what Gold[] means? Is it a specific list of names, or all names starting with Gold?

      *Unz is mistaken – the verbal score had always been double-weighted for NMS qualification purposes.

  4. Why would anyone pay attention to David Brooks? – if nothing else (and there is lots else) he is known to make stuff up whenever it is convenient.

  5. I haven’t read Unz’s paper, but think that articles of the sort that allege Jews are vastly overrepresented at elite universities, that Jewish achievement has collapsed, etc. are anti-semitic. I doubt they would be treated as mere topics for ‘objective’ data analysis, were they to concern groups defined by other ethnic/religious criteria.

    • Jews *are* vastly overrepresented at elite universities. That’s not in doubt. Why shouldn’t people talk about that? And we certainly do talk about representation of other demographic groups in college – that’s actually a huge topic of discussion as it relates, for instance, to the legality of college affirmative action programs.
      The problem with Unz’s article on Jews was that it turned out to be bad, not that the topic should be off-limits.

      • Emily: “The problem with Unz’s article on Jews was that it turned out to be bad, not that the topic should be off-limits.”
        I absolutely agree with you. The questions Unz raises in his article are excellent, timely ones. They are: (i) whether the Ivy League colleges are using quotas in admissions that may be unfair to some ethic/racial groups; and (ii)whether very high performing immigrant groups (e.g., Jews) lose their drive to excel within 3-4 generations as they assimilate into the larger US culture. I would be delighted to see articles that describe careful, critical analyses performed in ways that provide definitive answers to these questions via the use of appropriate methodologies and data sets. Unfortunately, Unz’s self-published article, while containing lots and lots of data, fails in numerous ways he has yet to acknowledge to even begin to meet these criteria. Importantly, as pointed out by Prof. Gelman, Unz fails to use one consistent methodology when comparing 2 sets of numbers. For example, he compares % Jews on NMS semi-finalist lists determined by his subjective direct inspection method against % Jews attending Harvard College obtained from a Hillel Foundation list rather than using the objective Weyl method to obtain both numbers. Likewise, he over-estimates % Jews on US IMO teams from the 1970s by assuming, quite incorrectly, that all of the students with German or Poland names are Jewish (even ones that are always Catholic!), yet he under-estimates % Jews from the 21st century teams by not counting as Jewish students with German (e.g., Mildorf) or Israeli-Hebrew (Nir) names, let alone students with possibly Jewish names such as Kane and Miller. By mixing data sets and varying methods even within data sets, each of which have their own significant errors, he can juggle things to obtain findings that support whatever conclusions he desires. The fact that the Hillel number for % Jews at Harvard College differs 2 1/2 fold from the % obtained by the Weyl method indicates that large errors exist in one or both of them. The fact that Unz identified only 2 Jewish names among the 78 21st-century US IMO team members while I know with absolute certainty that the lower bound is at least 12% Jews and he missed names he should have known or suspected might be Jewish indicates that Unz’s direct inspection method also has a very high error rate. Unz’s article also contains numerous other serious deficiencies (e.g., not separating out the foreign students and under-represented minorities). The overall result is that the data he presents are sufficiently error-prone that they are not useful for answering these important questions.

        • Since many Pilipinos have Spanish surnames , you can see the inherent problem that many Pilipinos face from marketers or politicians trying to cater to the Hispanic or Latino market. Janet there would be less of a discussion of Jewish overepresentation at the UW-Wisconsin, since over there is heavily based on grades and SAT scorss compared to Columbia where the admissions process is very subjective

        • biaknabato,
          You are correct. Admission to UW-Madison is largely determined quite objectively for most of the class by ACT score and percentile rank in the student’s high school graduating class. The entire application is only 3-4 pages long, including a short essay on why one desires to attend this college. If you are a Wisconsin resident with sufficiently high ACT score and class rank, you are automatically admitted as long as there are no red flags (e.g., unexplained suspension for school). A small % of the class is admitted by other criteria to have Division I sports teams and greater racial diversity. One problem with this mechanism of admission is that there are students attending highly competitive high schools who are rejected because they are not in the top 20% of their class who believe they would have been readily admitted if they had attended a less competitive high school. There is no perfectly fair system for admissions just like there is no perfectly fair voting system when there are 3 or more candidates.

      • It is written: “Jews *are* vastly overrepresented at elite universities. That’s not in doubt.” Of course not, assuming one even knows what that means. Nor is there doubt about the vast conspiracy of Jews manipulating the government, banks and the like. Lacking doubt, what are the inner motives of the data collectors?

      • Emily: “We certainly do talk about representation of other demographic groups in college.” Yes, can you remind me of another racial/ethic/religious group for which data were collected* on NMS seminfinalists in order to examine whether the group was overrepresented in the Ivies?
        *Using last names or a “subjective direct inspection methods” (Mertz)

        • There’s no need to, because we can directly examine test score performance by race, since that information, unlike religion, is collected by ETS (as well as colleges). It’s because there’s so little information available about academic performance by religion that researchers are using more novel methods.

  6. Pingback: My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics” | Fifth Estate

  7. Pingback: Meritocracy: Admitting My Mistakes | The American Conservative

Comments are closed.