Blogs > Twitter again

As we’ve discussed many times, I prefer blogs to twitter because in a blog you can have a focused conversation where you explain your ideas in detail, whereas twitter seems like more of a place for position-taking.

An example came up recently that demonstrates this point. Jennifer sent me a blurb for her causal inference conference and I blogged it. This was an announcement and not much more; it could’ve been on twitter without any real loss of information. A commenter then shot back:

Do you see how your policies might possibly negatively impact an outlier such as myself, when you arbitrarily reward contestants for uncovering effects you baked in? How do you know winners just haven’t figured out how you think about manipulating data to find effects? How far removed from my personal, actual, non-ergodic life are your statistical stories, and what policies that impede me unintentionally are you contributing to?

OK, this has more words than your typically twitter post, but if I saw it on twitter I’d be cool with it: it’s an expression of strong disagreement.

It’s the next step where things get interesting. When I saw the above comment, my quick reaction is, “What a crank!” And of course I have no duty to respond at all; responding to blog comments is something I can do for fun when I have the time for it: it can be helpful to explore the limits of what can be communicated. In a twitter setting I think the appropriate response would be some snappy response.

But this is a blog, not twitter, so I replied as follows:

That’s a funny way of putting things! I’d say that if you don’t buy the premise of this competition, then you don’t have to play. Kinda like if you aren’t interested in winter sports you don’t need to watch the olympics right now. I guess you might reply that our tax money is (indirectly) funding this competition, but then again our tax money funds the olympics too.

Getting to the topic at hand: No, I don’t know that the research resulting from this sort of competition will ultimately improve education policy. Or, even if does, it presumably won’t improve everyone’s education, and it could be that students who are similar to you in some ways will be among those who end up with worse outcomes. All I can say is that this sort of question—variation in treatment effects, looking at effects on individuals, not just on averages—is a central topic of modern causal inference and has been so for awhile. So, to the extent that you’re interested in evaluating policies in this way, I think this sort of competition is going in the right direction.

Regarding specifics: I think that after the competition is over, the team that constructed it will publicly release the details of what they did. So at that point in the not-so-distant future, you can take a look, and, if you see problems with it, you can publish your criticisms. That could be useful.

I’m not saying this response of mine was perfect. I’m just saying that the blog format was well suited to a thoughtful response, a deepening of the intellectual exchange and a rhetorical de-escalation, which is kind of the opposite of position-taking on twitter.

P.S. Also relevant is this post by Rob Hyndman, A brief history of time series forecasting competitions. I don’t know anything about the history of causal inference competitions, or the extent to which these were inspired by forecasting competitions. The same general question arise, of what’s being averaged over.

Quantitative science is (indirectly) relevant to the real world, also some comments on a book called The Case Against Education

Joe Campbell points to this post by economist Bryan Caplan, who writes:

The most painful part of writing The Case Against Education was calculating the return to education. I spent fifteen months working on the spreadsheets. I came up with the baseline case, did scores of “variations on a theme,” noticed a small mistake or blind alley, then started over. . . . About half a dozen friends gave up whole days of their lives to sit next to me while I gave them a guided tour of the reasoning behind my number-crunching. . . . When the book finally came out, I published final versions of all the spreadsheets . . .

Now guess what? Since the 2018 publication of The Case Against Education, precisely zero people have emailed me about those spreadsheets. . . . Don’t get me wrong; The Case Against Education drew plenty of criticism. Almost none of it, however, was quantitative. . . .

It’s hard to avoid a disheartening conclusion: Quantitative social science is barely relevant in the real world – and almost every social scientist covertly agrees. The complex math that researchers use is disposable. You deploy it to get a publication, then move on with your career. When it comes time to give policy advice, the math is AWOL. If you’re lucky, researchers default to common sense. Otherwise, they go with their ideology and status-quo bias, using the latest prestigious papers as fig leaves.

Regarding the specifics, I suspect that commenter Andrew (no relation) has a point when he responded:

You didn’t waste your time. If you had made your arguments without the spreadsheets—just guesstimating & eyeballing, you would’ve gotten quantitative criticism. A man who successfully deters burglars didn’t waste his money on a security system just because it never got used.

But then there’s the general question about quantitative social science. I actually wrote a post on this topic last year, The social sciences are useless. So why do we study them? Here’s a good reason. Here was my summary:

The utilitarian motivation for the natural sciences is that can make us healthier, happier, and more comfortable. The utilitarian motivation for the social sciences is they can protect us from bad social-science reasoning. It’s a lesser thing, but that’s what we’ve got, and it’s not nothing.

That post stirred some people up, as it sounded like I was making some techbro-type argument that society didn’t matter. But I wasn’t saying that society was useless, I was saying that social science was useless, at least relative to the natural sciences. Some social science research is really cool, but it’s nothing compared to natural-science breakthroughs such as transistors, plastics, vaccines, etc.

Anyway, my point is that quantitative social science has value in that it can displace empty default social science arguments. Caplan is disappointed that people didn’t engage with his spreadsheets, but I think that’s partly because he was presenting his ideas in book form. My colleagues and I had a similar experience with our Red State Blue State book a few years ago: our general point got out there, but people didn’t seem to really engage with the details. We had lots of quantitative analyses in there, but it was a book, so people weren’t expecting to engage in that way. Frustrating, but it would be a mistake to generalize from that experience to all of social science. If you want people to engage with your spreadsheets, I think you’re better off publishing an article rather than a book.

As a separate matter, Caplan’s “move on with your career” statement is all too true, but that’s a separate issue. Biology, physics, electrical engineering, etc., are all undeniably useful, but researchers in these fields also move on with their careers, etc. That just is telling us that research doesn’t have 100% efficiency, which is a product of the decentralized system that we have. It’s not like WW2 where the government was assigning people to projects.

Comments on The Case Against Education

This discussion reminded me that six years ago Caplan sent me a draft of his book, and I sent him comments. I might as well share them here:

1. Your intro is fine, it’s good to tell the reader where you’re coming from. But . . . the way it’s framed, it looks a bit like the “professors are pampered” attack on higher education. I don’t think this is the tack you want to be taking, for two reasons: First, most teaching jobs are not like yours: most teaching jobs are at the elementary or secondary level, and even at the college level, much of the teaching is done by adjuncts. So, while your presentation of _your_ experience is valid, it’s misleading if it is taken as a description of the education system in general. Second—and I know you’re aware of this too—if education were useful, there’d be no good reason to complain that some of its practitioners have good working conditions. Again, this does not affect your main argument but I think you want to avoid sounding like the proudly-overpaid guy discussed here: http://andrewgelman.com/2011/06/01/the_cushy_life/

This comes up again in your next chapter where you say you have very few skills and that “The stereotype of the head-in-the-clouds Ivory Tower professor is funny because it’s true.” Ummm, I don’t know about that. The stereotype of the head-in-the-clouds Ivory Tower professor is not so true, in the statistical sense. The better stereotype might be that adjunct working 5 jobs.

2. You write, “Junior high and high schools add higher mathematics, classic literature, and foreign languages – vital for a handful of budding scientists, authors, and translators, irrelevant for everyone else.” This seems pretty extreme. One point of teaching math—even the “higher mathematics” that is taught in high school—is to give people the opportunity to find out that they are “budding scientists” or even budding accountants. As to “authors,” millions of people are authors: you’ve heard of blogs, right? It can useful to understand how sentences, paragraphs, and chapters are put together, even if you’re not planning to be Joyce Carol Oates. As to foreign languages: millions of people speak multiple languages, it’s a way of understand the world that I think is very valuable. If _you_ want to say that you’re happy only speaking one language, or that many other people are happy speaking just one language, that’s fine—but I think it’s a real plus to give kids the opportunity to learn to speak and read in other languages. Now, at this point you might argue that most education in math, literature, and foreign language is crappy—that’s a case you can make, but I think you’re way overdoing it my minimizing the value of these subjects.

3. Regarding signaling: Suppose I take a biology course at a good college and get an A, but I don’t go into biology. Still, the A contributes to my GPA and to my graduation from the good college, which is a job signal. You might count this as part of the one-third signaling. But that would be a mistake! You’re engaging in retrospective reasoning. Even if I never use that biology in my life, I didn’t know that when took the course. Taking that bio course was an investment. I invest the time and effort to learn some biology in order to decide whether to do more of it. And even if I don’t become a biologist I might end up working in some area that uses biology. I won’t know ahead of time. This is not a new idea, it’s the general principle of a “well-rounded education,” which is popular in the U.S. (maybe not so much in Europe, where their post-secondary education is more focused on a student’s major.) Also relevant on this “signaling” point is this comment: http://andrewgelman.com/2011/02/17/credentialism_a/#comment-58035

4. Also, signaling is complicated and even non-monotonic! Consider this example (which I wrote up here: http://andrewgelman.com/2011/02/17/credentialism_a/):
“My senior year I applied to some grad schools (in physics and in statistics) and to some jobs. I got into all the grad schools and got zero job interviews. Not just zero jobs. Zero interviews. And these were not at McKinsey, Goldman Sachs, etc. (none of which I’d heard of). They were places like TRW, etc. The kind of places that were interviewing MIT physics grads (which is how I thought of applying for these jobs in the first place). And after all, what could a company like that do with a kid with perfect physics grades from MIT? Probably not enough of a conformist, eh?”
This is not to say your signaling story is wrong, just that I think it’s much more complicated than you’re portraying.

5. This is a minor point, but you write, “If the point of education is certifying the quality of labor, society would be better off if we all got less.” This is not so clear. From psychometric principles, more information will allow better discrimination. It’s naive to think of all students as being ranked on a single dimension so that employers just need to pick from the “top third.” There are many dimensions of abilities and it could take a lot of courses at different levels to make the necessary distinctions. Again, this isn’t central to your argument but you just have to be careful here because you’re saying something that’s not quite correct, statistically speaking.

6. You write, “Consider the typical high school curriculum. English is the international language of business, but American high school students spend years studying Spanish, or even French. During English, many students spend more time deciphering the archaic language of Shakespeare than practicing business writing. Few jobs require knowledge of higher mathematics, but over 80% of high school grads suffer through geometry.” I think all these topics could be taught better, but my real issue here is that this argument contradicts what you said back on page 6, that you were not going to just “complain we aren’t spending our money in the right way.”

To put it another way:

7. You write, “The Ivory Tower ignores the real world.” I think you need to define your terms. Is “Ivory Tower” all of education? All college education? All education at certain departments at certain colleges? Lots of teachers of economics are engaged with the real world, no? Actually it’s not so clear to me what you mean by the real world. I guess it does not include the world of teaching and learning. So what parts of the economy do count as real? I’m not saying you can’t make a point here, but I think you need to define your terms in some way to keep your statements from being meaningless!

And a couple things you didn’t talk about in your book, but I think you should:

– Side effects of Big Education: Big Ed provides jobs for a bunch of politically left-wing profs and grad students, it also gives them influence. For example, Paul Krugman without the existence of a big economics educational establishment would, presumably, not be as influential as the actual Paul Krugman. One could say the same thing about, say, Greg Mankiw, but the point is that academia as a whole, and prestigious academia in particular, contains lots more liberal Krugmans than conservative Mankiws. Setting aside one’s personal political preferences, one might consider this side effect of Big Ed to be bad (in that it biases the political system) or good (in that it provides a counterweight to the unavoidable conservative biases of Big Business) or neutral. Another side effect of Big Ed is powerful teachers unions. Which, once again, could be considered a plus, a minus, or neutral, depending on your political perspective. Yet another side effect of Big Ed is that it funds various things associated with schools, such as high school sports (they’re a big deal in Texas, or so I’ve heard!), college sports, and research in areas ranging from Shakespeare to statistics. Again, one can think of these extracurricular activities as a net benefit, a net cost, or a washout.

In any case, I think much of the debate over the value of education and the structuring of education is driven by attitudes toward its side effects. This is not something you discuss in your book but I think it’s worth mentioning. Where you stand on the side effects can well affect your attitude toward the efficacy of the education establishment. There’s a political dimension here. You’re a forthright guy and I think your book will be strengthened if you openly acknowledge the political dimension rather than leaving it implicit.

Rich guys and their dumb graphs: The visual equivalents of “Dow 36,000”

Palko links to this post by Russ Mitchell linking to this post by Hassan Khan casting deserved shade on this post, “The Third Transportation Revolution,” from 2016 by Lyft Co-Founder John Zimmer, which includes the above graph.

What is it about rich guys and their graphs?

slaves-serfs

screen-shot-2016-11-30-at-12-22-53-pm

Or is it just a problem with transportation forecasts?

VMT-C-P-chart-big1-541x550

I’m tempted to say that taking a silly statement and putting it in graph form makes it more persuasive. But maybe not. Maybe the graph thing is just an artifact of the power point era.

Rich guys . . .

I think the other problem is that people give these rich guys a lot of slack because, y’know, they’re rich, so they must know what they’re doing, right? That’s not a ridiculous bit of reasoning. But there are a few complications:

1. Overconfidence. You’re successful so you start to believe your own hype. It feels good to make big pronouncements, kinda like when Patrick Ewing kept “guaranteeing” the Knicks would win.

2. Luck. Successful people typically have had some lucky breaks. It can be natural to attribute that to skill.

3. Domain specificity. Skill in one endeavor does not necessarily translate to skill in another. You might be really skillful at persuading people to invest money in your company, or you might have had some really good ideas for starting a business, but that won’t necessarily translate into expertise in transportation forecasting. Indeed, your previous success in other areas might reduce your motivation to check with actual experts before mouthing off.

4. No safe haven. As indicated by the last graph above, some of the official transportation experts don’t know jack. So it’s not clear that it would even make sense to consult an official transportation expert before making your forecast. There’s no safe play, no good anchor for your forecast, so anything goes.

5. Selection. More extreme forecasts get attention. It’s the man-bites-dog thing. We don’t hear so much about all the non-ridiculous things that people say.

6. Motivations other than truth. Without commenting on this particular case, in general people can have financial incentives to take certain positions. Someone with a lot of money invested in a particular industry will want people to think that this industry has a bright future. That’s true of me too: I want to spread the good news about Stan.

So, yeah, rich people speak with a lot of authority, but we should be careful not to take their internet-style confident assertions too seriously.

P.S. I have no reason to be believe that rich people make stupider graphs than poor people do. Richies just have more resources so we all get to see their follies.

Stock prices, a notorious New York Times article, and that method from 1998 that was going to cure cancer in 2 years

Gur Huberman writes:

Apropos your blogpost today, here’s a piece from 2001 that (according to a colleague) shows that I can write an empirical paper based on a single observation.

Gur’s article, with Tomer Regev, is called “Contagious Speculation and a Cure for Cancer: A Nonevent that Made Stock Prices Soar” and begins:

A Sunday New York Times article on a potential development of new cancer-curing drugs caused EntreMed’s stock price to rise from 12.063 at the Friday close, to open at 85 and close near 52 on Monday. It closed above 30 in the three following weeks. The enthusiasm spilled over to other biotechnology stocks. The potential breakthrough in cancer research already had been reported, however, in the journal Nature, and in various popular newspapers—including the Times—more than five months earlier. Thus, enthusiastic public attention induced a permanent rise in share prices, even though no genuinely new information had been presented.

They argue that this contradicts certain theories of finance:

A fundamentals-based approach to stock pricing calls for a price revision when relevant news comes out. Within this framework it is experts who identify the biotechnology companies whose pricing should be most closely tied to do the price revision. These experts follow Nature closely, and therefore the main price reaction of shares of biotechnology firms should have taken place in late November 1997, and not been delayed until May 1998.

I’m not going to disagree with their general point, which is reminiscent of Keynes’s famous analogy of stock pricing to a beauty contest.

Huberman and Regev quote from the Sunday New York Times article that was followed by the stock rise:

Kolata’s (1998) Times article of Sunday, May 3, 1998, presents virtually the same information that the newspaper had reported in November, but much more prominently; namely, the article appeared in the upper left corner of the front page, accompanied by the label “A special report.” The article had comments from various experts, some very hopeful and others quite restrained (of the “this is interesting, but let’s wait and see” variety). The article’s most enthusiastic paragraph was “. . . ‘Judah is going to cure cancer in two years,’ said Dr. James D. Watson, a Nobel Laureate . . . Dr. Watson said Dr. Folkman would be remembered along with scientists like Charles Darwin as someone who permanently altered civilization.” (p. 1) (Watson, of The Double Helix fame, was later reported to have denied the quotes.)

And more:

In the May 10 issue of the Times, Abelson (1998) essentially acknowledges that its May 3 article contained no new news, noting that “[p]rofessional investors have long been familiar with [ENMD’s] cancer-therapy research and had reflected it in the pre-runup price of about $12 a share.” . . . On November 12, King (1998), in a front page article in the Wall Street Journal, reports that other laboratories had failed to replicate Dr. Folkman’s results. ENMD’s stock price plunged 24 percent to close at 24.875 on that day. But that price was still twice the closing price prior to the Times article of May 4!

They conclude:

To the skeptical reader we offer the following hypothetical question: What would have been the price of ENMD in late May 1998 if the editor of the Times had chosen to kill the May 3 story?

I feel like the whole Nobel prize thing just makes everything worse (see here, here, here, and here), but I just wanted to make two comments regarding the effect of the news story on the stock price.

First, the article appearing more prominently in the newspaper does provide some information, in that it represents the judgment of the New York Times editors that the result is important, beyond the earlier judgment of the researchers to write the paper in the first place, the journal editors to publish the article, and the Times to run their first story on the topic. Now, you might say that the judgment of a bunch of newspaper editors should count as nothing compared to the judgment of the journal, but (a) journals do make mistakes (search this blog on PNAS), and (b) Nature and comparable journals publish thousands of articles on biomedical research each year, and only some of these make it to prime slots in a national newspaper. So some judgment is necessary there.

The second point is that, yeah, James Watson is kind of a joke now, but back in 1998 he was still widely respected as a scientist, so his “Judah is going to cure cancer in two years” line, whether or not it was reported accurately, again represents additional information. Even professional investors might take this quote as some sort of evidence.

So I think Huberman and Regev are leaning a bit too hard on their assumption that no new information was conveyed, conditional on the Nature article and the earlier NYT report.

Some thoughts on academic research and advocacy

An economist who would prefer to remain anonymous writes:

There is an important question of the degree of legitimacy we grant to academic research as advocacy.

It is widely accepted, and I think also true, that the adversary system we have in courts, where one side is seeking to find and present every possible fact and claim that tends to mitigate the guilt of the accused, and the other side strives to find and present every possible fact and claim that tends to confirm the guilt of the accused, and a learned and impartial judge accountable to the public decides, is a good system that does a creditable job of reaching truth. [Tell that to the family of Nicole Brown Simpson. — ed.] (Examining magistrates probably do a better job, but jurisdictions with examining magistrates also have defense attorneys and DAs; they just allow the judge also to participate in the research project.) It is also widely accepted, and I think also true, that it is important that high standards of integrity should be imposed on lawyers to prevent them from presenting false and misleading cases. (Especially DAs.) There is no question of “forking paths” here, each side is explicitly tasked with looking for evidence for one side.

I don’t think that this is a bad model for academic policy research. Martin Feldstein, like most prominent academic economists from both right and left, was an advocate and did not make a secret of his political views or of their sources. He was also a solid researcher and was good at using data and techniques to reach results that confirmed (and occasionally conditioned) his views. The same is true of someone like Piketty from the social-democratic left, or Sam Bowles from a more Marxist perspective, or the farther-right Sam Peltzman from Chicago.

All these individuals were transparent and responsible advocates for a particular policy regime. They all carried out careful and fascinating research and all were able to learn from each other. This is the model that I see as the de facto standard for my profession (policy economics) and I think it is adequate and functional and sustainable.

Romer’s whole “mathiness” screed is not mostly about “Chicago economists are only interested in models that adopt assumptions that conform to their prejudices”, it is IMHO mostly about “Chicago economists work hard to hide the fact that they are only interested in models that adopt assumptions than conform to their prejudice”. I think Romer exaggerates a bit (that is his advocacy) but I agree that he makes an important point.

I’m coming at this as an outsider and have noting to add except to point out the converse point, that honesty and transparency are not enough, and if you’re a researcher and come to a conclusion that wasn’t expected, that you weren’t aiming for, that you weren’t paid or ideologically predisposed to find, that doesn’t automatically mean that you’re right. I’ve seen this attitude a lot, of researchers thinking that their conclusions absolutely must be correct because they came as a surprise or because they’re counter to their political ideologies. But that’s not right either.

The Economic Pit and the Political Pendulum: Predicting Midterm Elections

This post is written jointly with Chris Wlezien.

We were given the following prompt in July 2022 and asked to write a response: “The Economy, Stupid: Accepted wisdom has it that it’s the economy that matters to voters. Will other issues matter in the upcoming midterm elections, or will it really be all about the economy?”

In the Gallup poll taken that month, 35% of Americans listed the economy as one of the most important problems facing the country today, a value which is neither high nor low from a historical perspective. What does this portend for the 2022 midterm elections? The quick answer is that the economy is typically decisive for presidential elections, but midterm elections have traditionally been better described as the swinging of the pendulum, with the voters moving to moderate the party in power. This may partly reflect “thermostatic” public response (see here and here) to government policy actions.

It has long been understood that good economic performance benefits the incumbent party, and there’s a long history of presidents trying to time the business cycle to align with election years. Edward Tufte was among the first to seriously engage this possibility in his 1978 book, Political Control of the Economy, and other social scientists have taken up the gauntlet over the years. Without commenting on the wisdom of these policies, we merely note that even as presidents may try hard to set up favorable economic conditions for reelection, they do not always succeed, and for a variety of reasons: government is only part of the economic engine, the governing party does not control it all, and getting the timing right is an imperfect science.

To the extent that presidents are successful in helping ensure their own reelection, this may have consequences in the midterms. For example, one reason the Democrats lost so many seats in the 2010 midterms may be that Barack Obama in his first two years in office was trying to avoid Carter’s trajectory; his team seemed to want a slow recovery, at the cost of still being in recession in 2010, rather than pumping up the economy too quickly and then crashing by 2012 and paying the political price then.

In the 1970s and 1980s, Douglas Hibbs, Steven Rosenstone, James Campbell, and others established the statistical pattern that recent economic performance predicts presidential elections, and this is consistent with our general understanding of voters and their motivations. Economics is only part of the referendum judgment in presidential elections; consider that incumbents tend to do well even when economic conditions are not ideal. The factors that lead incumbent party candidates to do well in presidential elections also influence Congressional elections in those years, via coattails. So the economy matters here. This is less true in midterm elections. There, voters tend to balance the president, voting for and electing candidates of candidates of the “out-party.” This now is conventional wisdom. There is variation in the tendency, and this partly reflects public approval of the president, which offers some sense of how much people like what the president is doing – we are more (less) likely to elect members of the other party, the less (more) we approve of the president. The economy matters for approval, so it matters in midterm elections, but to a lesser degree than in presidential elections, and other things matter, including policy itself.

Whatever the specific causes, it is rare for voters to not punish the president at the midterm, and historically this has taken very high approval ratings. The exceptions to midterm loss are 1998, after impeachment proceedings against popular Bill Clinton were initiated, and again in 2002, when the even more popular George W. Bush continued to benefit from the 9/11 rally effect, and gains in these years were slight. It has not happened since, and to find another case of midterm gain, one has to go all the way back to Franklin Roosevelt’s first midterm in 1934. This is consistent with voters seeing the president as directing the ship of state, with Congressional voting providing a way to make smaller adjustments to the course. Democrats currently control two of the three branches of government, so it may be natural for some voters to swing Republican in the midterms, particularly given Joe Biden’s approval numbers, which have been hovering in the low-40s.

Elections are not just referenda on the incumbent; the alternative also matters. This may seem obvious in presidential election years, but it is true in midterms as well. Consider that Republican attempts to impeach Bill Clinton may have framed the 1998 election as a choice rather than a “balancing” referendum, contributing to the (slight) Democratic gains in that year. We may see something similar in 2022, encouraged in part by the Supreme Court decision on abortion, among other rulings. Given that the court now is dominated by Republican appointees, some swing voters will want to maintain Democratic control of Congress as a way to check Republicans’ judicial power and future appointments to the courts. The choice in the midterms also may be accentuated by the reemergence of Donald Trump in the wake of the FBI raid of Mar-a-Lago.

Change in party control of Congress is the result of contests in particular districts and states. These may matter less as national forces have increased in importance in recent years, but the choices voters face in those contests still do matter when they go to the polls. In the current election cycle, there has been an increase in the retirements of Democratic incumbents as would be expected in a midterm with a Democratic president, but some of the candidates Republicans are putting up may not be best positioned to win. This is particularly true in the Senate, where candidates and campaigns matter more.

History, historians, and causality

Through an old-fashioned pattern of web surfing of blogrolls (from here to here to here), I came across this post by Bret Devereaux on non-historians’ perceptions of academic history. Devereaux is responding to some particular remarks from economics journalist Noah Smith, but he also points to some more general issues, so these points seem worth discussing.

Also, I’d not previously encountered Smith’s writing on the study of history, but he recently interviewed me on the subjects of statistics and social science and science reform and causal inference so that made me curious to see what was up.

Here’s how Devereaux puts it:

Rather than focusing on converting the historical research of another field into data, historians deal directly with primary sources . . . rather than engaging in very expansive (mile wide, inch deep) studies aimed at teasing out general laws of society, historians focus very narrowly in both chronological and topical scope. It is not rare to see entire careers dedicated to the study of a single social institution in a single country for a relatively short time because that is frequently the level of granularity demanded when you are working with the actual source evidence ‘in the raw.’

Nevertheless as a discipline historians have always11 held that understanding the past is useful for understanding the present. . . . The epistemic foundation of these kinds of arguments is actually fairly simple: it rests on the notion that because humans remain relatively constant situations in the past that are similar to situations today may thus produce similar outcomes. . . . At the same time it comes with a caveat: historians avoid claiming strict predictability because our small-scale, granular studies direct so much of our attention to how contingent historical events are. Humans remain constant, but conditions, technology, culture, and a thousand other things do not. . . .

He continues:

I think it would be fair to say that historians – and this is a serious contrast with many social scientists – generally consider strong predictions of that sort impossible when applied to human affairs. Which is why, to the frustration of some, we tend to refuse to engage counter-factuals or grand narrative predictions.

And he then quotes a journalist, Matthew Yglesias, who wrote, “it’s remarkable — and honestly confusing to visitors from other fields — the extent to which historians resist explicit reasoning about causation and counterfactual analysis even while constantly saying things that clearly implicate these ideas.” Devereaux responds:

We tend to refuse to engage in counterfactual analysis because we look at the evidence and conclude that it cannot support the level of confidence we’d need to have. . . . historians are taught when making present-tense arguments to adopt a very limited kind of argument: Phenomenon A1 occurred before and it resulted in Result B, therefore as Phenomenon A2 occurs now, result B may happen. . . . The result is not a prediction but rather an acknowledgement of possibility; the historian does not offer a precise estimate of probability (in the Bayesian way) because they don’t think accurately calculating even that is possible – the ‘unknown unknowns’ (that is to say, contingent factors) overwhelm any system of assessing probability statistically.

This all makes sense to me. I just want to do one thing, which is to separate two ideas that I think are being conflated here:

1. Statistical analysis: generalizing from observed data to a larger population, a step that can arise in various settings including sampling, causal inference, prediction, and modeling of measurements.

2. Causal inference: making counterfactual statements about what would have happened, or could have happened, had some past decision been made differently, or making predictions about potential outcomes under different choices in some future decision.

Statistical analysis and causal inference are related but are not the same thing.

For example, if historians gather data on public records from some earlier period and then make inference about the distributions of people working at that time in different professions, that’s a statistical analysis but that does not involve causal inference.

From the other direction, historians can think about causal inference and use causal reasoning without formal statistical analysis or probabilistic modeling of data. Back before he became a joke and a cautionary tale of the paradox of influence, historian Niall Ferguson edited a fascinating book, Virtual History: Alternatives and Counterfactuals, a book of essays by historians on possible alternative courses of history, about which I wrote:

There have been and continue to be other books of this sort . . . but what makes the Ferguson book different is that he (and most of the other authors in his book) are fairly rigorous in only considering possible actions that the relevant historical personalities were actually considering. In the words of Ferguson’s introduction: “We shall consider as plausible or probable only those alternatives which we can show on the basis of contemporary evidence that contemporaries actually considered.”

I like this idea because it is a potentially rigorous extension of the now-standard “Rubin model” of causal inference.

As Ferguson puts it,

Firstly, it is a logical necessity when asking questions about causality to pose ‘but for’ questions, and to try to imagine what would have happened if our supposed cause had been absent.

And the extension to historical reasoning is not trivial, because it requires examination of actual historical records in order to assess which alternatives are historically reasonable. . . . to the best of their abilities, Ferguson et al. are not just telling stories; they are going through the documents and considering the possible other courses of action that had been considered during the historical events being considered. In addition to being cool, this is a rediscovery and extension of statistical ideas of causal inference to a new field of inquiry.

See also here. The point is that it was possible for Ferguson et al. to do formal causal reasoning, or at least consider the possibility of doing it, without performing statistical analysis (thus avoiding the concern that Devereaux raises about weak evidence in comparative historical studies).

Now let’s get back to Devereaux, who writes:

This historian’s approach [to avoid probabilistic reasoning about causality] holds significant advantages. By treating individual examples in something closer to the full complexity (in as much as the format will allow) rather than flattening them into data, they can offer context both to the past event and the current one. What elements of the past event – including elements that are difficult or even impossible to quantify – are like the current one? Which are unlike? How did it make people then feel and so how might it make me feel now? These are valid and useful questions which the historian’s approach can speak to, if not answer, and serve as good examples of how the quantitative or ’empirical’ approaches that Smith insists on are not, in fact, the sum of knowledge or required to make a useful and intellectually rigorous contribution to public debate.

That’s a good point. I still think that statistical analysis can be valuable, even with very speculative sampling and data models, but I agree that purely qualitative analysis is also an important part of how we learn from data. Again, this is orthogonal to the question of when we choose to engage in causal reasoning. There’s no reason for bad data to stop us from thinking causally; rather, the limitations in our data merely restrict the strengths of any causal conclusions we might draw.

The small-N problem

One other thing. Devereaux refers to the challenges of statistical inference: “we look at the evidence and conclude that it cannot support the level of confidence we’d need to have. . . .” That’s not just a problem with the field of history! It also arises in political science and economics, where we don’t have a lot of national elections or civil wars or depressions, so generalizations necessarily rely on strong assumptions. Even if you can produce a large dataset with thousands of elections or hundreds of wars or dozens of business cycles, any modeling will implicitly rely on some assumption of stability of a process over time, and assumption that won’t necessarily make sense given changes in political and economic systems.

So it’s not really history versus social sciences. Rather, I think of history as one of the social sciences (as in my book with Jeronimo from a few years back), and they all have this problem.

The controversy

After writing all the above, I clicked through the link and read the post by Smith that Devereaux was arguing.

And here’s the funny thing. I found Devereaux’s post to be very reasonable. Then I read Smith’s post, and I found that to be very reasonable too.

The two guys are arguing against each other furiously, but I agree with both of them!

What gives?

As discussed above, I think Devereaux in his post provides an excellent discussion of the limits of historical inquiry. On the other side, I take the main message of Smith’s post to be that, to the extent that historians want to use their expertise to make claims about the possible effects of recent or new policies, they should think seriously about statistical inference issues. Smith doesn’t just criticizes historians here; he leads off by criticizing academic economists:

After having endured several years of education in that field, I [Smith] was exasperated with the way unrealistic theories became conventional wisdom and even won Nobel prizes while refusing to submit themselves to rigorous empirical testing. . . . Though I never studied history, when I saw the way that some professional historians applied their academic knowledge to public commentary, I started to recognize some of the same problems I had encountered in macroeconomics. . . . This is not a blanket criticism of the history profession . . . All I am saying is that we ought to think about historians’ theories with the same empirically grounded skepticism with which we ought to regard the mathematized models of macroeconomics.

By saying that I found both Devereaux and Smith to be reasonable, I’m not claiming they have no disagreements. I think their main differences come because they’re focusing on two different things. Smith’s post is ultimately about public communication and the things that academic say in the public discourse (things like newspaper op-eds and twitter posts) with relevance to current political disputes. And, for that, we need to consider the steps, implicit or explicit, that commentators take to go from their expertise to the policy claims they make. Devereaux is mostly writing about academic historians in their professional roles. With rare exceptions, academic history is about getting the details right, and even popular books of history typically focus on what happened, and our uncertainty about what happened, not on larger theories.

I guess I do disagree with this statement from Smith:

The theories [from academic history] are given even more credence than macroeconomics even though they’re even less empirically testable. I spent years getting mad at macroeconomics for spinning theories that were politically influential and basically un-testable, then I discovered that theories about history are even more politically influential and even less testable.

Regarding the “less testable” part, I guess it depends on the theories—but, sure, many theories about what have happened in the past can be essentially impossible to test, if conditions have changed enough. That’s unavoidable. As Devereaux replies, this is not a problem with the study of history; it’s just the way things are.

But I can’t see how Smith could claim with a straight face that theories from academic history are “given more credence” and are “more politically influential” than macroeconomics. The president has a council of economic advisers, there are economists at all levels of the government, or if you want to talk about the news media there are economists such as Krugman, Summers, Stiglitz, etc. . . . sure, they don’t always get what they want when it comes to policy, but they’re quoted endlessly and given lots of credence. This is also the case in narrower areas, for example James Heckman on education policy or Angus Deaton on deaths of despair: these economists get tons of credence in the news media. There are no academic historians with that sort of influence. This has come up before: I’d say that economics now is comparable to Freudian psychology in the 1950s in its influence on our culture:

My best analogy to economics exceptionalism is Freudianism in the 1950s: Back then, Freudian psychiatrists were on the top of the world. Not only were they well paid, well respected, and secure in their theoretical foundations, they were also at the center of many important conversations. Even those people who disagreed with them felt the need to explain why the Freudians were wrong. Freudian ideas were essential, leaders in that field were national authorities, and students of Freudian theory and methods could feel that they were initiates in a grand tradition, a priesthood if you will. Freudians felt that, unlike just about everybody else, they treated human beings scientifically and dispassionately. What’s more, Freudians prided themselves on their boldness, their willingness to go beyond taboos to get to the essential truths of human nature. Sound familiar?

When it comes to influence in policy or culture or media, academic history doesn’t even come close to Freudianism in the 1950s or economics in recent decades.

This is not to say we should let historians off the hook when they make causal claims or policy recommendations. We shouldn’t let anyone off the hook. In that spirit, I appreciate Smith’s reminder of the limits of historical theories, along with Devereaux’s clarification of what historians really do when they’re doing academic history (as opposed to when they’re slinging around on twitter).

Why write about this at all?

As a statistician and political scientist, I’m interested in issues of generalization from academic research to policy recommendations. Even in the absence of any connection with academic research, people will spin general theories—and one problem with academic research is that it can give researchers, journalists, and policymakers undue confidence in bad theories. Consider, for example, the examples of junk science promoted over the years by the Freakonomics franchise. So I think these sorts of discussions are important.

Krueger Uber $100,000 update

Last month we discussed the controversy regarding the recent revelation that, back in 2015, economist Alan Krueger had been paid $100,000 by Uber to coauthor a research article that was published at the NBER (National Bureau of Economic Research, a sort of club of influential academic economists) and seems possibly to have been influential in policy.

Krueger was a prominent political liberal—he worked in the Clinton and Obama administrations and was most famous for research that reported positive effects of the minimum wage—so there was some dissonance about him working for a company that has a reputation for employing drivers as contractors to avoid compensating them more. But I think the key point of the controversy was the idea that academia (or, more specifically, academic economics (or, more specifically, the NBER)) was being used to launder this conflict of interest. The concern was that Uber paid Krueger, the payment went into a small note in the paper, and then the paper was mainlined into the academic literature and the news media. From that perspective, Uber wasn’t just paying for Krueger’s work or even for his reputation; they were also paying for access to NBER and the economics literature.

In this new post I’d like to address a few issues:

1. Why now?

The controversial article by Hall and Krueger came out in 2015, and right there it had the statement, “Jonathan Hall was an employee and shareholder of Uber Technologies before, during, and after the writing of this paper. Krueger acknowledges working as a consultant for Uber in December 2014 and January 2015 when the initial draft of this paper was written,” is not “an adequate disclosure that Uber paid $100,000 for Krueger to write this paper.”

So why the controversy now? Why are we getting comments like this now:

rather than in 2015?

The new information is that Krueger was paid $100,000 by Uber. But we already knew he was being paid by Uber—it’s right on that paper. Sure, $100,000 is a lot, but what were people expecting? If Uber was paying Krueger at all, you can be pretty sure it was more than $10,000.

I continue to think that the reason the controversy happened now rather than in 2015 is that, back in 2015, there was a general consensus among economists that Uber was cool. They weren’t considered to be the bad guy, so it was fine for him to get paid.

2. The Necker cube of conflict of interest

Conflict of interest goes in two directions. On one hand, as discussed above, there’s a concern that being paid will warp your perspective. And, even beyond this sort of bias, being paid will definitely affect what you do. Hall and Krueger could be the most ethical researchers in the universe—but there’s no way they would’ve done that particular study had Uber not paid them to do it. This is the usual way that we think about conflict of interest.

But, from the other direction, there’s also the responsibility, if someone pays you do to a job, to try to do your best. If Hall and Krueger were to take hundreds of thousands of dollars from Uber and then turn around and stab Uber in the back, that wouldn’t be cool. I’m not talking here about whistleblowing. If Hall and Krueger started working for Uber and then saw new things they hadn’t seen before, indicating illegal or immoral conduct by the company, then, sure, we can all agree that it would be appropriate for them to blow the whistle. What I’m saying is that it, if Hall and Krueger come to work for Uber, and Uber is what they were expecting when they took the job, then it’s their job to do right by Uber.

My point here is that these two issues of conflict of interest go in opposite directions. As with that Necker cube, it’s hard to find a stable intermediate position. Is Krueger an impartial academic who just happened to take some free money? Or does he have a duty to the company that hired him? I think it kind of oscillates between the two. And I wonder if that’s one reason for the strong reaction that people had, that they’re seeing the Necker cube switch back and forth.

3. A spectrum of conflicts of interests

Another way to think about this is to consider a range of actions in response to taking someone’s money.

– At one extreme are the pure hacks, the public relations officers who will repeat any lie that the boss tells them, the Mob lawyers who help in the planning of crimes, the expert witnesses who will come to whatever conclusion they’re paid to reach. Sometimes the hack factor overlaps with ideology, so you might get a statistician taking cash from a cigarette company and convincing himself that cigarettes aren’t really addictive, or a lawyer helping implement an illegal business plan and convincing himself that the underlying law is unjust.

– Somewhat more moderate are the employees or consultants who do things that might otherwise make them queasy, because they benefit the company that’s paying them. It’s hard to draw a sharp line here. “Writing a paper for Uber” is, presumably, not something that Krueger would’ve done without getting paid by Uber, but that doesn’t mean that the contents of the paper cross any ethical line.

– Complete neutrality. It’s hard for me to be completely neutral on a consulting project—people are paying me and I want to give them their money’s worth—but I can be neutral when evaluating a report. Someone pays me to read a report and I can give my comments without trying to shade them to make the client happy. Similarly, when I’ve been an expert witness, I’ve just written it like I see it. I understand that this will help the client—if they don’t like my report, they can just bury it!—so I’m not saying that I’m being neutral in my actions. But the content of my work is neutral.

– Bending over backward. An example here might be the admirable work of Columbia math professor Michael Thaddeus in criticizing the university’s apparently bogus numbers. Rather than going easy on his employer, Thaddeus is holding Columbia to a high standard. I guess it’s a tough call, when to do this. Hall and Krueger could’ve taken Uber’s money and then turned around and said something like, This project is kinda ok, but given the $ involved, we should really be careful and point out the red flags in the interpretation of the results. That would’ve been fine; I’ll just say again that this conflicts with the general attitude that, if someone pays you, you should give them your money’s worth. The Michael Thaddeus situation is different because Columbia pays him to teach and do research; it was never his job to pump up its U.S. News ranking.

– The bust-out operation. Remember this from Goodfellas? The mob guys take over someone’s business and strip it of all its assets, then torch the place for the insurance. This is the opposite of the hack who will do anything for the boss. An example here would be something like taking $100,000 from Uber and then using the knowledge gained from the inside to help a competitor.

In a horseshoe-like situation, the two extremes of the above spectrum seem the least ethical.

4. NBER

This is just a minor thing, but I was corresponding with someone who pointed out that NBER working papers are not equivalent to peer-reviewed publications.

I agree, but NBER ain’t nothing. It’s an exclusive club, and it’s my impression that an NBER paper gets attention in a way that an Arxiv paper or a Columbia University working paper or an Uber working paper would not. NBER papers may not count for academic CV’s, but I think they are noticed by journalists, researchers, and even policymakers. An NBER paper is considered legit by default, and that does have to do with one of the authors being a member of the club.

5. Feedback

In an email discussion following up on my post, economics journalist Felix Salmon wrote:

My main point the whole time has just been that if Uber pays for a paper, Uber should publish the paper. I don’t like the thing where they add Krueger as a second author, pay him $100k for his troubles, and thusly get their paper into NBER. Really what I’m worried about here is not that Krueger is being paid to come to a certain conclusion (the conflict-of-interest problem), so much as Krueger is being paid to get the paper into venues that would otherwise be inaccessible to Uber. I do think that’s problematic, and I don’t think it’s a problem that lends itself to solution via disclosure. On the other hand, it’s a problem with a simple solution, which is just that NBER shouldn’t publish such papers, and that they should live the way that god intended, which is as white papers (or even just blog posts) published by the company in question.

As for the idea that conflict is binary and “the conflict is there, whether it’s $1000 or $100,000” — I think that’s a little naive. At some point ($1 million? $10 million? more?) the sheer quantity of money becomes germane. . . .

Salmon also asked:

Is there a way of fixing the disclosure system to encompass this issue? It seems pretty clear to me that when the lead author is literally an Uber employee, Uber has de facto control over whether the paper gets published. Again, I think the solution to this problem is to have Uber publish the paper. But if there’s a disclosure solution then I’m interested in what it would look like.

I replied:

You say that Uber should just publish the paper. That’s fine with me. I will put a paper on my website (that’s kinda like a university working paper series, just for me), I will also put it on Arxiv and NBER (if I happen to have a coauthor who’s a member of that club), and I’ll publish in APSR or JASA or some lower-ranking journal. No reason a paper can’t be published in more than one “working paper” series. I also think it’s ok for the authors to publish in NBER (with a disclosure, which that paper had) and in a scholarly journal (again, with a disclosure).

You might be right that the published disclosure wasn’t enough, but in that case I think your problem is with academic standards in general, not with Krueger or even with the economics profession. For example, I was the second author on this paper: where the first and last authors worked at Novartis. I was paid by Novartis while working on the project—I consulted for them for several years, with most of the money going to my research collaborators at Columbia and elsewhere, but I pocketed some $ too. I did not disclose that conflict of interest in that paper—I just didn’t think about it! Or maybe it seemed unnecessary given that two of the authors were Novartis employees. In retrospect, yeah, I think I should’ve disclosed. But the point is, even if I had disclosed, I wouldn’t have given the dollar value, not because it’s a big secret (actually, I don’t remember the total, as it was over several years, and most of it went to others) but just because I’ve never seen anyone disclose the dollar amount in an acknowledgements or conflict-of-interest statement for a paper. It’s just not how things are done. Also I don’t think we shared our data. And our paper won an award! On the plus side, it’s only been cited 17 times so I guess there’s some justice in the world and vice is not always rewarded.

I agree that conflict is not binary. But I think that even with $1000, there is conflict. One reason I say this is that I think people often have very vague senses of conflict of interest. Is it conflict of interest to referee a paper written by a personal friend? If the friend is not a family member and there’s no shared employment or $ changing hands, I’d say no. That’s just my take, also my approximation to what I think the rules are, based on what Columbia University tells me. Again, though, I think it would’ve been really unusual had that paper had a statement saying that Krueger had been paid $100,000. I’ve just never seen that sort of thing in an academic paper.

Did Uber have the right to review the paper? I have no idea; that’s an interesting question. But I think there’s a conflict of interest, whether or not Uber had the right to review: these are Uber consultants and Uber employees working with Uber data, so no surprise they’ll come to an Uber-favorable conclusion. I still think the existing conflict of interest statement was just fine, and that the solution is for all journalists to report this as, “Uber study claims . . . ” The first author of the paper was an Uber employee and the second author was an Uber consultant—it’s right there on the first page of the paper!

As a separate issue, if the data and meta-data (the data-collection protocol) are not made available, then that should cause an immediate decline of trust, even setting aside any conflict of interest. I guess that would be a problem with my above-linked Novartis paper too; the difference is that our punch line was a methods conclusion, not a substantive conclusion, and you can check the methods conclusion with fake data.

Salmon replied:

If you want the media to put “Uber study claims” before reporting such results, then the way you get there is by putting the Uber logo on the top of the study. You don’t get there by adding a disclosure footnote and expecting front-line workers in the content mines to put two and two together.

As discussed in my original post on the topic, I have lots of conflicts of interest myself. So I’m not claiming to be writing about all this from some sort of ethically superior position.

6. Should we trust that published paper?

The economist who first pointed me to this story (and who wished to remain anonymous) followed up by email:

I’m surprised that the focus of your thoughts and of the others who commented on the blog were on whether Krueger acted correctly, not on what to do with the results they (and others) found. I guess my one big takeaway from now on is to move my priors more strongly towards being sceptical of papers using non-shared, proprietary data…

That’s a good point! I don’t think anyone in all these discussions is suggesting we should actually trust that published paper. The data were supplied by Uber; the conclusions were restricted to what Uber wanted to get; the whole thing was done in an environment where economists loved Uber and wanted to hear good things about it; etc.

I guess the reason the conversation was all about Krueger and the role of academic economics was that everyone was already starting from the position that the paper could not be trusted. If it were considered to be a trustworthy piece of research, I think there’d be a lot less objection to it. It’s not like the defenders of Krueger were saying they believed the claims in the paper. Again, part of that is political. Krueger was a prominent liberal economist, and now in 2022, liberal economists are not such Uber fans, so it makes sense that they’d defend his actions on procedural rather than substantive grounds.

Some concerns about the recent Chetty et al. study on social networks and economic inequality, and what to do next?

I happened to receive two different emails regarding a recently published research paper.

Dale Lehman writes:

Chetty et al. (and it is a long et al. list) have several publications about social and economic capital (see here for one such paper, and here for the website from which the data can also be accessed). In the paper above, the data is described as:

We focus on Facebook users with the following attributes: aged between 25 and 44 years who reside in the United States; active on the Facebook platform at least once in the previous 30 days; have at least 100 US-based Facebook friends; and have a non-missing residential ZIP code. We focus on the 25–44-year age range because its Facebook usage rate is greater than 80% (ref. 37). On the basis of comparisons to nationally representative surveys and other supplementary analyses, our Facebook analysis sample is reasonably representative of the national population.

They proceed to measure social and economic connectedness across counties, zip codes, and for graduates of colleges and high schools. The data is massive as is the effort to make sense out of it. In many respects it is an ambitious undertaking and one worthy of many kudos.

But I [Lehman] do have a question. Given their inclusion criteria, I wonder about selection bias when comparing counties, zip codes, colleges, or high schools. I would expect that the fraction of Facebook users – even in the targeted age group – that are included will vary across these segments. For example, one college may have many more of its graduates who have that number of Facebook friends and have used Facebook in the prior 30 days compared with a second college. Suppose the economic connectedness from the first college is greater than from the second college. But since the first college has a larger proportion of relatively inactive Facebook users, is it fair to describe college 1 as having greater connectedness?

It seems to me that the selection criteria make the comparisons potentially misleading. It might be accurate to say that the regular users of Facebook from college 1 are more connected than those from college 2, but this may not mean that the graduates from college 1 are more connected than the graduates from college 2. I haven’t been able to find anything in their documentation to address the possible selection bias and I haven’t found anything that mentions how the proportion of Facebook accounts that meet their criteria varies across these segments. Shouldn’t that be addressed?

That’s an interesting point. Perhaps one way to address it would be to preprocess the data by estimating a propensity to use facebook and then using this propensity as a poststratification variable in the analysis. I’m not sure. Lehman makes a convincing case that this is a concern when comparing different groups; that said, it’s the kind of selection problem we have all the time, and typically ignore, with survey data.

Richard Alba writes in with a completely different concern:

You may be aware of the recent research, published in Nature by the economist Raj Chetty and colleagues, purporting to show that social capital in the form of early-life ties to high-status friends provides a powerful pathway to upward mobility for low-status individuals. It has received a lot of attention, from The New York Times, Brookings, and no doubt other places I am not aware of.

In my view, they failed to show anything new. We have known since the 1950s that social capital has a role in mobility, but the evidence they develop about its great power is not convincing, in part because they fail to take into account how their measure of social capital, the predictor, is contaminated by the correlates and consequences of mobility, the outcome.

This research has been greeted in some media as a recipe for the secret sauce of mobility, and one of their articles in Nature (there are two published simultaneously) is concerned with how to increase social capital. In other words, the research is likely to give rise to policy proposals. I think it is important then to inform Americans about its unacknowledged limitations.

I sent my critique to Nature, and it was rejected because, in their view, it did not sufficiently challenge the articles’ conclusions. I find that ridiculous.

I have no idea how Nature decides what critiques to publish, and I have not read the Chetty et al. articles so I can’t comment on theme either, but I can share Alba’s critique. Here it is:

While the pioneering big-data research of Raj Chetty and his colleagues is transforming the long-standing stream of research into social mobility, their findings should not be exempt from critique.

Consider in this light the recent pair of articles in Nature, in which they claim to have demonstrated a powerful causal connection between early-life social capital and upward income mobility for individuals growing up in low-income families. According to one paper’s abstract, “the share of high-SES friends among individuals with low-SES—which we term economic connectedness—is among the strongest predictors of upward income mobility identified to date.”

But there are good reasons to doubt that this causal connection is as powerful as the authors claim. At a minimum, the social capital-mobility statistical relationship is significantly overstated.

This is not to deny a role for social capital in determining adult socioeconomic position. That has been well established for decades. As early as the 1950s, the Wisconsin mobility studies focused in part on what the researchers called “interpersonal influence,” measured partly in terms of high-school friends, an operationalization close to the idea in the Chetty et al. article. More generally, social capital is indisputably connected to labor-market position for many individuals because of the role social networks play in disseminating job information.

But these insights are not the same as saying that economic connectedness, i.e., cross-class ties, is the secret sauce in lifting individuals out of low-income situations. To understand why the articles’ evidence fails to demonstrate this, it is important to pay close attention to how the data and analysis are constructed. Many casual readers, who glance at the statements like the one above or read the journalistic accounts of the research (such as the August 1 article in The New York Times), will take away the impression that the researchers have established an individual-level relationship—that they have proven that individuals from low-SES families who have early-life cross-class relationships are much more likely to experience upward mobility. But, in fact, they have not.

Because of limitations in their data, their analysis is based on the aggregated characteristics of areas—counties and zip codes in this case—not individuals. This is made necessary because they cannot directly link the individuals in their main two sources of data—contemporary Facebook friendships and previous estimates by the team of upward income mobility from census and income-tax data. Hence, the fundamental relationship they demonstrate is better stated as: the level of social mobility is much higher in places with many cross-class friendships. The correlation, the basis of their analysis, is quite strong, both at the county level (.65) and at the zip-code level (.69).

Inferring that this evidence demonstrates a powerful causal mechanism linking social capital to the upward mobility of individuals runs headlong into a major problem: the black box of causal mechanisms at the individual level that can lie behind such an ecological correlation, where moreover both variables are measured for roughly the same time point. The temptation may be to think that the correlation reflects mainly, or only, the individual-level relationship between social capital and mobility as stated above. However, the magnitude of an area-based correlation may be deceptive about the strength of the correlation at the individual level. Ever since a classic 1950 article by W. S. Robinson, it has been known that ecological correlations can exaggerate the strength of the individual-level relationship. Sometimes the difference between the two is very large, and in the case of the Chetty et al. analysis it appears impossible given the data they possess to estimate the bias involved with any precision, because Robinson’s mathematics indicates that the individual-level correlations within area units are necessary to the calculation. Chetty et al. cannot calculate them.

A second aspect of the inferential problem lies in the entanglement in the social-capital measure of variables that are consequences or correlates of social mobility itself, confounding cause and effect. This risk is heightened because the Facebook friendships are measured in the present, not prior to the mobility. Chetty et al. are aware of this as a potential issue. In considering threats to the validity of their conclusion, they refer to the possibility of “reverse causality.” What they have in mind derives from an important insight about mobility—mobile individuals are leaving one social context for another. Therefore, they are also leaving behind some individuals, such as some siblings, cousins, and childhood buddies. These less mobile peers, who remain in low-SES situations but have in their social networks others who are now in high-SES ones, become the basis for the paper’s Facebook estimate of economic connectedness (which is defined from the perspective of low-SES adults between the ages of 25 and 44). This sort of phenomenon will be frequent in high-mobility places, but it is a consequence of mobility, not a cause. Yet it almost certainly contributes to the key correlation—between economic connectedness and social mobility—in the way the paper measures it.

Chetty et al. try to answer this concern with correlations estimated from high-school friendships, arguing that the timing purges this measure of mobility’s impact on friendships. The Facebook-based version of this correlation is noticeably weaker than the correlations that the paper emphasizes. In any event, demonstrating a correlation between teen-age economic connectedness and high mobility does not remove the confounding influence of social mobility from the latter correlations, on which the paper’s argument depends. And in the case of high-school friendships, too, the black-box nature of the causality behind the correlation leaves open the possibility of mechanisms aside from social capital.

This can be seen if we consider the upward mobility of the children of immigrants, surely a prominent part today of the mobility picture in many high-mobility places. Recently, the economists Ran Abramitzky and Leah Boustan have reminded us in their book Streets of Gold that, today as in the past, the children of immigrants, the second generation, leap on average far above their parents in any income ranking. Many of these children are raised in ambitious families, where as Abramitzky and Boustan put it, immigrants typically are “under-placed” in income terms relative to their abilities. Many immigrant parents encourage their children to take advantage of opportunities for educational advancement, such as specialized high schools or advanced-placement high-school classes, likely to bring them into contact with peers from more advantaged families. This can create social capital that boosts the social mobility of the second generation, but a large part of any effect on mobility is surely attributable to family-instilled ambition and to educational attainment substantially higher than one would predict from parental status. The increased social capital is to a significant extent a correlate of on-going mobility.

In sum, there is without doubt a causal linkage between social capital and mobility. But the Chetty et al. analysis overstates its strength, possibly by a large margin. To twist the old saw about correlation and causation, correlation in this case isn’t only causation.

I [Alba] believe that a critique is especially important in this case because the findings in the Chetty et al. paper create an obvious temptation for the formulation of social policy. Indeed, in their second paper in Nature, the authors make suggestions in this direction. But before we commit ourselves to new anti-poverty policies based on these findings, we need a more certain gauge of the potential effectiveness of social capital than the current analysis can give us.

I get what Alba is saying about the critique not strongly challenging the article’s conclusions. He’s not saying that Chetty et al. are wrong; it’s more that he’s saying there are a lot of unanswered questions here—a position I’m sure Chetty et al. would themselves agree with!

A possible way forward?

To step back a moment—and recall that I have not tried to digest the Nature articles or the associated news coverage—I’d say that Alba is criticizing a common paradigm of social science research in which a big claim is made from a study and the study has some clear limitations, so the researchers attack the problem in some different ways in an attempt to triangulate toward a better understanding.

There are two immediate reactions I’d like to avoid. The first is to say that the data aren’t perfect, the study isn’t perfect, so we just have to give up and say we’ve learned nothing. On the other direction is the unpalatable response that all studies are flawed so we shouldn’t criticize this one in particular.

Fortunately, nobody is suggesting either of these reactions. From one direction, critics such as Lehman and Alba are pointing out concerns but they’re not saying the conclusions of the Chetty et al. study are all wrong of that the study is useless; from the other, news reports do present qualifiers and they’re not implying that these results are a sure thing.

What we’d like here is a middle way—not just a rhetorical middle way (“This research, like all social science, has weaknesses and threats to validity, hence the topic should continue to be studied by others”) but a procedural middle way, a way to address the concerns, in particular to get some estimates of the biases in the conclusions resulting from various problems with the data.

Our default response is to say the data should be analyzed better: do a propensity analysis to address Lehman’s concern about who’s on facebook, and do some sort of multilevel model integrating individual and zipcode-level data to address Alba’s concern about aggregation. And this would all be fine, but it takes a lot of work—and Chetty et al. already did a lot of work, triangulating toward their conclusion from different directions. There’s always more analysis that could be done.

Maybe the problem with the triangulation approach is not the triangulation itself but rather the way it can be set up with a central analysis making a conclusion, and then lots of little studies (“robustness checks,” etc.) designed to support the main conclusion. What if the other studies were set up to estimate biases, with the goal not of building confidence in the big number but rather of getting a better, more realistic, estimate.

With this in mind, I’m thinking that a logical next step would be to construct a simulation study to get a sense of the biases arising from the issues raised by Lehman and Alba. We can’t easily gather the data required to know what these biases are, but it does seem like it should be possible to simulate a world in which different sorts of people are more or less likely to be on facebook, and in which there are local patterns of connectedness that are not simply what you’d get by averaging within zipcodes.

I’m not saying this would be easy—the simulation would have to make all sorts of assumptions about how these factors vary, and the variation would need to depend on relevant socioeconomic variables—but right now it seems to me to be a natural next step in the research.

One more thing

Above I stressed the importance and challenge of finding a middle ground between (1) saying the study’s flaws make it completely useless and (2) saying the study represents standard practice so we should believe it.

Sometimes, though, response #1 is appropriate. For example, the study of beauty and sex ratio or the study of ovulation and voting or the study claiming that losing an election for governor lops 5 to 10 years off your life—I think those really are useless (except as cautionary tales, lessons of research practices to avoid). How can I say this? Because those studies are just soooo noisy compared to any realistic effect size. There’s just no there there. Researchers can fool themselves because the think that if they have hundreds or thousands of data points, that they’re cool, and that if they have statistical significance, they’ve discovered something. We’ve talked about this attitude before, and I’ll talk about again; I just wanted to emphasize here that it doesn’t always make sense to take the middle way. Or, to put it another way, sometimes the appropriate middle way is very close to one of the extreme positions.

Bets as forecasts, bets as probability assessment, difficulty of using bets in this way

John Williams writes:

Bets as forecasts come up on your blog from time to time, so I thought you might be interested in this post from RealClimate, which is the place to go for informed commentary on climate science.

The post, by Gavin Schmidt, is entitled, “Don’t climate bet against the house,” and tells the story of various public bets in the past few decades regarding climate outcomes.

The examples are interesting in their own right and also as a reminder that betting is complicated. In theory, betting has close links to uncertainty, and you should be able to go back and forth between them:

1. From one direction, if you think the consensus is wrong, you can bet against it and make money (in expectation). You should be able to transform your probability statements into bets.

2. From the other direction, if bets are out there, you can use these to assess people’s uncertainties, and from there you can make probabilistic predictions.

In real life, though, both the above steps can have problems, for several reasons. First is the vig (in a betting market) or the uncertainty that you’ll be paid off (in an unregulated setting). Second is that you need to find someone to make that bet with you. Third, and relatedly, that “someone” who will bet with you might have extra information you don’t have, indeed even their willingness to bet at given odds provides some information, in a Newtonian action-and-reaction sort of way. Fourth, we hear about some of the bets and we don’t hear about others. Fifth, people can be in it to make a point or for laffs or thrills or whatever, not just for the money, enough so that, when combined with the earlier items on this list, there won’t be enough “smart money” to take up the slack.

This is not to say that betting is a useless approach to information aggregation; I’m just saying that betting, like other social institutions, works under certain conditions and not in absolute generality.

And this reminds me of another story.

Economist Bryan Caplan reports that his track record on bets is 23 for 23. That’s amazing! How is it possible? Here’s Caplan’s list, which starts in 2007 and continues through 2021, with some of the bets still unresolved.

Caplan’s bets are an interesting mix. The first one is a bet where he offered 1-to-100 odds so it’s no big surprise that he won, but most of them are at even odds. A couple of them he got lucky on (for example, he bet in 2008 that no large country would leave the European Union before January 1, 2020, so he just survived by one month on that one), but, hey, it’s ok to be lucky, and in any case even if he only had won 21 out of 23 bets, that would still be impressive.

It seems to me that Caplan’s trick here is to show good judgment on what pitches to swing at. People come at him with some strong, unrealistic opinions, and he’s been good at crystallizing these into bets. In poker terms, he waits till he has the nuts, or nearly so. 23 out of 23 . . . that’s a great record.

Weak separation in mixture models and implications for principal stratification

Avi Feller, Evan Greif, Nhat Ho, Luke Miratrix, and Natesh Pillai write:

Principal stratification is a widely used framework for addressing post-randomization complications. After using principal stratification to define causal effects of interest, researchers are increasingly turning to finite mixture models to estimate these quantities. Unfortunately, standard estimators of mixture parameters, like the MLE, are known to exhibit pathological behavior. We study this behavior in a simple but fundamental example, a two-component Gaussian mixture model in which only the component means and variances are unknown, and focus on the setting in which the components are weakly separated. . . . We provide diagnostics for all of these pathologies and apply these ideas to re-analyzing two randomized evaluations of job training programs, JOBS II and Job Corps.

The paper’s all about maximum likelihood estimates and I don’t care about that at all, but the general principles are relevant to understanding causal inference with intermediate outcomes and fitting such models in Stan or whatever.

Still more on the Heckman Curve!

Carlos Parada writes:

Saw your blog post on the Heckman Curve. I went through Heckman’s response that you linked, and it seems to be logically sound but terribly explained, so I feel like I need to explain why Rea+Burton is great empirical work, but it doesn’t actually measure the Heckman curve.

The Heckman curve just says that, for any particular person, there exists a point where getting more education isn’t worth it anymore because the costs grow as you get older, or equivalently, the benefits get smaller. This is just trivially true. The most obvious example is that nobody should spend 100% of their life studying, since then they wouldn’t get any work done at all. Or, more tellingly, getting a PhD isn’t worth it for most people, because most people either don’t want to work in academia or aren’t smart enough to complete a PhD. (Judging by some of the submissions to PPNAS, I’m starting to suspect most of academia isn’t smart enough to work in academia.)

The work you linked finds that participant age doesn’t predict the success of educational programs. I have no reason to suspect these results are wrong, but the effect of age on benefit:cost ratios for government programs does not measure the Heckman curve.

To give a toy model, imagine everyone goes to school as long as the benefits of schooling are greater than the costs for them, then drops out as soon as they’re equal. So now, for high school dropouts, what is the benefit:cost ratio of an extra year of school? 1 — the costs roughly equal the benefits. For college dropouts, what’s the benefit:cost ratio? 1 — the costs roughly equal the benefits. And so on. By measuring the effects of government interventions on people who completed x years of school before dropping out, the paper is conditioning on a collider. This methodology would only work if when people dropped out of school was independent of the benefits/costs of an extra year of school.

(You don’t have to assume perfect rationality for this to work: If everyone goes to school until the benefit:cost ratio equals 1.1 or 0.9, you still won’t find a Heckman curve. Models that assume rational behavior tend to be robust to biases of this sort, although they can be very vulnerable in some other cases.)

Heckman seems to have made this mistake at some points too, though, so the authors are in good company. The quotes in the paper suggest he thought an individual Heckman curve would translate to a downwards-sloping curve for government programs’ benefits, when there’s no reason to believe they would. I’ve made very similar mistakes myself.

Sincerely,

An econ undergrad who really should be getting back to his Real Analysis homework

Interesting. This relates to the marginal-or-aggregate question that comes up a lot in economics. It’s a common problem that we care about marginal effects but the data more easily allow us to estimate average effects. (For the statisticians in the room, let me remind you that “margin” has opposite meanings in statistics and economics.)

But one problem that Parada doesn’t address with the Heckman curve is that the estimates of efficacy used by Heckman are biased, sometimes by a huge amount, because of selection on statistical significance; see section 2.1 of this article. All the economic theory in the world won’t fix that problem.

P.S. In an amusing example of blog overlap, Parada informs us that he also worked on the Minecraft speedrunning analysis. It’s good to see students keeping busy!

Solution to that little problem to test your probability intuitions, and why I think it’s poorly stated

The other day I got this email from Ariel Rubinstein and Michele Piccione asking me to respond to this question which they sent to a bunch of survey respondents:

A very small proportion of the newborns in a certain country have a specific genetic trait.
Two screening tests, A and B, have been introduced for all newborns to identify this trait.
However, the tests are not precise.
A study has found that:
70% of the newborns who are found to be positive according to test A have the genetic trait (and conversely 30% do not).
20% of the newborns who are found to be positive according to test B have the genetic trait (and conversely 80% do not).
The study has also found that when a newborn has the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.
Likewise, when a newborn does not have the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.
Suppose that a newborn is found to be positive according to both tests.
What is your estimate of the likelihood (in %) that this newborn has the genetic trait?

Here was my response:

OK, let p = Pr(trait) in population, let a1 = Pr(positive test on A | trait), a2 = Pr(positive test on A | no trait), b1 = Pr(positive test on B | trait), b2 = Pr(positive test on B | no trait).
Your first statement is Pr(trait | positive on test A) = 0.7. That is, p*a1/(p*a1 + (1-p)*a2) = 0.7
Your second statement is Pr(trait | positive on test B) = 0.2. That is, p*b1/(p*b1 + (1-p)*b2) = 0.2

What you want is Pr(trait | positive on both tests) = p*a1*b1 / (p*a1*b1 + (1-p)*a2*b2)

It looks at first like there’s no unique solution to this one, as it’s a problem with 5 unknowns and just 2 data points!

But we can do that “likelihood ratio” trick . . .
Your first statement is equivalent to 1 / (1 + ((1-p)/p) * (a2/a1)) = 0.7; therefore (p/(1-p)) * (a1/a2) = 0.7 / 0.3
And your second statement is equivalent to (p/(1-p)) * (b1/b2) = 0.2 / 0.8
Finally, what you want is 1 / (1 + ((1-p)/p) * (a2/a1) * (b2/b1)). OK, this can be written as X / (1 + X), where X is (p/(1-p)) * (a1/a2) * (b1/b2).
Given the information above, X = (0.7 / 0.3) * (0.2 / 0.8) * (1-p)/p

Still not enough information, I think! We don’t know p.

OK, you give one more piece of information, that p is “very small.” I’ll suppose p = 0.001.

Then X = (0.7 / 0.3) * (0.2 / 0.8) * 999, which comes to 580, so the probability of having the trait given positive on both tests is 580 / 581 = 0.998.

OK, now let me check my math. According to the above calculations,
(1/999) * (a1/a2) = 0.7/0.3, thus a1/a2 = 2300, and
(1/999) * (b1/b2) = 0.2/0.8, thus b1/b2 = 250.
And then (p/(1-p))*(a1/a2)*(b1/b2) = (1/999)*2300*250 = 580.

So, yeah, I guess that checks out, unless I did something really stupid. The point is that if the trait is very rare, then the tests have to be very precise to give such good predictive power.

But . . . you also said “the tests are not precise.” This seems to contradict your earlier statement that only “a very small proportion” have the trait. So I feel like your puzzle has an embedded contradiction!

I’m just giving you my solution straight, no editing, so you can see how I thought it through.

Rubinstein and Piccione confirmed that my solution, that the probability is very close to 1, is correct, and they pointed me to this research article where they share the answers that were given to this question when they posed it to a bunch of survey respondents.

I found the Rubinstein and Piccione article a bit frustrating because . . . they never just give the damn responses! The paper is very much in the “economics” style rather than the “statistics” style in that they’re very focused on the theory, whereas statisticians would start with the data. I’m not saying the economics perspective is wrong here—the experiment was motivated by theory, so it makes sense to compare results to theoretical predictions—I just found it difficult to read because there was never a simple plot of all the data.

My problem with their problem

But my main beef with their example is that I think it’s a trick question. On one hand, it says only “very small proportion” in the population have the trait; indeed, I needed that information to solve the problem. On the other hand, it says “the tests are not precise”—but I don’t think that’s right, at least not in the usual way we think about the precision of a test. With this problem description, they’re kinda giving people an Escher box and then asking what side is up!

To put it another way, if you start with “a very small proportion,” and then you take one test and it gets your probability all the way up to 70%, then, yeah, that’s a precise test! It takes a precise test to give you that much information, to take you from 0.001 to 0.7.

So here’s how I think the problem is misleading: The test is described as “not precise,” and then you see the numbers 0.7 and 0.2, so it’s natural to think that these tests do not provide much information. Actually, though, if you accept the other part of the problem (that only “a very small proportion” have the trait), the tests provide a lot of information. It seems strange to me to call a test which offers a likelihood ratio of 2300 as being “not precise.”

To put it another way: I think of the precision of a test as a function of the test’s properties alone, not of the base rate. If you have a precise test and then apply it to a population with a very low base rate, you can end up with a posterior probability of close to 50/50. That posterior probability depends on the test’s precision and also on the base rate.

I guess they could try out this problem on a new set of respondents, where instead of describing the tests as “not precise,” they describe them as “very precise,” and see what happens.

One more thing

On page 11 of their article, Rubinstein and Piccione given an example where different referees have independent data in their private signals, when trying to determine if a defendant is guilty of a crime. This does not seem plausible in the context of deciding whether a defendant is guilty. I think it would make more sense to say that they have overlapping information. This does not change the math of the problem—you can think of their overlapping information, along with the base rate, as being a shared “prior” and the non-overlapping information corresponds to the two data points in your earlier formulation—but that would make it more realistic.

I understand that this model is just based on the literature. I just have political problems with oversimplified models of politics, juries, etc. I’d recommend that the authors either use a different “cover story” or else emphasize that this is just a mathematical story not applicable to real juries. In their paper, they talk about “the assumption that people are Bayesian,” but I’m bothered by the assumption that different referees have independent data in their private signals. That’s a really strong assumption! It’s funny which assumptions people will question and which assumptions they will just accept as representing neutral statements of a problem.

A connection to statistical inference and computing

This problem connects to some of our recent work on the computational challenges of combining posterior distributions. The quick idea is that if theta is your unknown parameter (in this case, the presence or absence of the trait) and you want to combine posteriors p_k(theta|y_k) from independent data sources y_k, k=1,…,K, then you can multiply these posteriors but then you need to divide by the factor p(theta)^(k-1). Dividing by the prior to a power in this way will in general induce computational instability. Here is a short paper on the problem and here is a long paper. We’re still working on this.

This journal is commissioning a sequel to one of my smash hits. How much will they pay me for it? You can share negotiation strategies in the comments section.

I know it was a mistake to respond to this spam but I couldn’t resist . . . For the rest of my days, I will pay the price of being on the sucker list.

The following came in the junk mail the other day:

Dear Dr. Andrew Gelman,

My name is **, the editorial assistant of **. ** is a peer-reviewed, open access journal published by **.

I have had an opportunity to read your paper, “Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs”, and can find that your expertise fits within the scope of our journal quite well.
Therefore, you are cordially invited to submit new, unpublished manuscripts to **. If you do not have any at the moment, it is appreciated if you could keep our journal in mind for your future research outputs.

You may see the journal’s profile at ** and submit online. You may also e-mail submissions to **.

We are recruiting reviewers for the journal. If you are interested in becoming a reviewer, we welcome you to join us. Please find the application form and details at ** and e-mail the completed application form to **.

** is included in:
· CrossRef; EBSCOhost; EconPapers
· Gale’s Academic Databases
· GetInfo; Google Scholar; IDEAS
· J-Gate; Journal Directory
· JournalTOCs; LOCKSS
· MediaFinder®-Standard Periodical Directory
· RePEc; Sherpa/Romeo
· Standard Periodical Directory
· Ulrich’s; WorldCat
Areas include but are not limited to:
· Accounting;
· Economics
· Finance & Investment;
· General Management;
· Management Information Systems;
· Business Law;
· Global Business;
· Marketing Theory and Applications;
· General Business Research;
· Business & Economics Education;
· Production/Operations Management;
· Organizational Behavior & Theory;
· Strategic Management Policy;
· Labor Relations & Human Resource Management;
· Technology & Innovation;
· Public Responsibility and Ethics;
· Public Administration and Small Business Entrepreneurship.

Please feel free to share this information with your colleagues and associates.

Thank you.

Best Regards,

**
Editorial Assistant
**
——————————————-
**
Tel: ** ext.**
Fax: **
E-mail 1: **
E-mail 2: **
URL: **

Usually I just delete these things, but just the other day we had this discussion of some dude who was paid $100,000 to be the second author on a paper. Which made me wonder how much I could make as a sole author!

And this reminded me of this other guy who claimed that scientific citations are worth $100,000 each. A hundred grand seems like the basic unit of currency here.

So I sent a quick response:

Hi–how much will you pay me to write an article for your journal?
AG

I’m not expecting $100,000 as their first offer—they’ll probably lowball me at first—but, hey, I can negotiate. They say the most important asset in negotiation is the willingness to say No, and I’m definitely willing to say No to these people!

Just a few hours later I received a reply! Here it is:

Dear Dr. Andrew Gelman,

Thanks for your email. We charge the Article Processing Charge (Formatting and Hosting) of 100USD for per article.

Welcome to submit your manuscript to our journal. If you have any questions, please feel free to contact me.

Best Regards,

**
Editorial Assistant
**
——————————————-
**
Tel: ** ext.**
Fax: **
E-mail 1: **
E-mail 2: **
URL: **

I don’t get it. They’re offering me negative $100? That makes no sense? What next, they’ll offer to take my (fully functional) fridge off my hands for a mere hundred bucks?? In what world am I supposed to pay them for the fruits of my labor?

So I responded:

No, I would only provide an article for you if you pay me. It would no make sense for me to pay you for my work.

No answer yet. If they do respond at some point, I’ll let you know. We’ll see what happens. If they offer me $100, I can come back with a counter-offer of $100,000, justifying it by the two links above. Then maybe they’ll say they can’t afford it, they’ll offer, say, $1000 . . . maybe we can converge around $10K. I’m not going to share the lowest value I’d accept—that’s something the negotiation books tell you never ever to do—but I’ll tell you right now, it’s a hell of a lot more than a hundred bucks.

P.S. That paper on higher-order polynomials that they scraped carefully vetted for suitability for their journal . . . according to Google Scholar it has 1501 citations, which implies a value of $150,100,000, according to the calculations referred to above. Now, sure, most of that value is probably due to Guido, my collaborator on that paper, but still . . . 150 million bucks! How hard could it be to squeeze out a few hundred thousand dollars for a sequel? It says online that Knives Out grossed $311.4 million, and Netflix paid $469 million for the rights for Knives Out 2 and 3. If this academic publisher doesn’t offer me a two-paper deal that’s at least in the mid four figures, my agent and I will be taking our talents to Netflix.

What can $100,000 get you nowadays? (Discussion of the market for papers in economics.)

Someone who would prefer to remain anonymous writes:

Before anything else and like many of the people who write to you, as an early career academic in the area of economics, I feel constrained in the criticisms that I can make publicly. So I have to kindly request that, if you do publish any of what follows, that identifying information about myself be not made public.

My correspondent continues:

The recent disclosed leaks at Uber revealed some, well, distressing behaviour by some of my peers. As the Guardian recently wrote (https://www.theguardian.com/news/2022/jul/12/uber-paid-academics-six-figure-sums-for-research-to-feed-to-the-media), several noted economists collaborated with Uber when writing some academic papers. In short, Uber shared data with a selected group of economists, paid them and had Uber economists collaborating with the authors.

The act of collaboration is not, in itself, necessarily a bad thing, as the potential access to proprietary data allows research to be done that otherwise would not be possible. But the way things were done raises several issues, many of which you have commented on before. I’d like to focus on one in particular: how do we deal with studies done with closed data shared by interested parties?

In the leaked emails, Uber staffers wrote that, concerning one economist who wanted to do a separate unpaid study using Uber’s shared data: “We see low risk here because we can work with Landier on framing the study and we also decide what data we share with him.” I.e., the issue here isn’t just of replication, already a serious concern, but the risk that a company may omit data so as to influence academics doing a study, so as to frame things in the best light possible. It is distressing to read that executives wanted to work on report’s message to “to ensure it’s not presented in a potentially negative light”.

Perhaps I’m being a bit too naive about all this, in that the obvious question when seeing a study like this is to ask why would Uber be ready to collaborate unless they are going to get what they wanted? Indeed, I recall being a little bit sceptical about Hall and Krueger’s initial NBER paper when I read it. But the excerpts produced by The Guardian are so much worse than I’d have feared, even if we don’t know the exact extent that Uber acted as these excerpts describe, it’s hard not to fear the worst. Where does that leave us with these studies? Should we dismiss them altogether or can we salvage some their analyses?

There is one small bright spot in all of this. Because I am not a Labour economist, I only ever read Hall and Krueger’s initial NBER paper, so missed Berg and Johnston’s later critique (https://journals.sagepub.com/doi/full/10.1177/0019793918798593?journalCode=ilra), where the issue of inadequate data was already being raised, among others even stronger criticisms. So even before the emails, there were some people tackling these issues head on. And, to the credit of ILR Review, the journal that published Hall and Krueger’s paper, the critique was published by themselves, unlike what happens so often.

Anyway, that’s all. I guess I’m shocked at how much worse things seem to be, at how willing Uber was to manipulate and try to use well regarded academic’s reputations to, it seems, launder their own reputation…

Some googling revealed this exchange between financial journalist Felix Salmon and economists Michael Strain and Justin Wolfers:

This followup wins the thread:

Lots to chew on here, so let me go through some issues one at a time:

1. Conflicts of interest

It’s easy to get all “moralistic” about this. It’s also easy to get all “It’s easy to get all ‘moralistic’ about this” about this.

So let’s be clear: the conflict of interest here is real; indeed, it’s hard to get much more “conflict of interest” than “Company pays you $100,000 to write a paper, then you write a paper favorable to that company.” At the same time, there’s nothing necessarily morally wrong about having a conflict of interest. It is what it is. Every year I fill out my conflict of interest form with Columbia. “Conflict of interest” is a description, not a pejorative.

With regard to the Hall and Krueger paper, the dispute was no whether there was a conflict of interest but rather (a) whether the conflict was sufficiently disclosed, and (b) how this conflict should affect the trust that policymakers would hold in its conclusions.

I don’t have a strong feeling about the disclosure issue—Salmon holds that the statement, “Jonathan Hall was an employee and shareholder of Uber Technologies before, during, and after the writing of this paper. Krueger acknowledges working as a consultant for Uber in December 2014 and January 2015 when the initial draft of this paper was written,” is not “an adequate disclosure that Uber paid $100,000 for Krueger to write this paper.” I dunno. I’ve written lots of acknowledgments and I don’t recall ever mentioning the dollar value. It seems to me pretty clear that if you have one author who worked at the company and another author who was paid by the company, the conflict is there, whether it’s $1000 or $100,000.

Regarding trust: yeah, with this level of conflict, you’d want to see some data analyses by an outside team, like with that Google chip-layout paper.

News reports should be more clear on this. A headline, “Ride-hailing apps have created jobs for Paris’s poorer youth, but a regulatory clampdown looms,” should be replaced by something like “Uber-paid study reports that ride-hailing apps have created jobs for Paris’s poorer youth, but a regulatory clampdown looms.” Just add “Uber-paid study” to the beginning of every headline!

2. Morality

I don’t get this reaction:

I mean, sure, I don’t like Uber either. Some people think a company like Uber is cool, some people think it’s creepy. Views differ. But “sell their souls . . . destroy their lives . . . especially distressing”? That seems a bit strong. Consider: back in 2015, economists absolutely loved Uber, which is no surprise given that economists loved to talk about the problems with the market for taxicabs, the famous medallion system, etc. Economists hated taxi regulation about as much as they hated rent control and other blatant restrictions on free enterprise. Economists on the center-left, economists on the center-right, they all hated those regulations, so it’s no surprise that they loved Uber. The company was an economist’s dream, along with being super convenient for users.

3. Interesting data

I get Wolfers’s point that Krueger would find the Uber data interesting. I would have too! Indeed, had Uber offered me $100,000, or even $50,000, I probably would’ve worked for them too. I can’t be sure—they never approached me, and it’s possible that I would’ve said no—, but, if I had said no, it wouldn’t have been because their data were not sufficiently interesting. The point Wolfers seems to be missing here is that God is in every leaf of every tree. Yes, Uber data are interesting, but lots of other data are interesting too. Ford’s data are interesting, GM’s data are interesting, Bureau of Labor Statistics data are interesting. Lots of interesting data out there, and often people will choose what to look at based on who is paying them. I think one missing piece in the public discussions of this case is how much economists looooved Uber back then: it was a symbol of all that was good about the free market! So they found these data to be particularly interesting.

4. What can you get for $100,000?

A funny thing about the discussion is how little an amount of money $100,000 seems to be to commenters, and that includes people on both sides of the issue! Wolfers thinks that $100,000 is so small that it is “extremely unlikely” that Krueger would write a paper for that paltry sum. From the other direction, Dubal thinks it’s “pathetic” that he would “violate basic rules of research for just 100,000.”

I only met Krueger once, so I can’t speak to his motivations, but I will say that being the second author on a paper can sometimes be pretty easy, and $100,000 is real money! For example, suppose Krueger’s consulting rate was $2000/hour. He should be able to do the work required to be second author on a paper in less than 50 hours. The disclosure in that article says he was working as a consultant during the 2-month period when the initial draft of the paper was written. Spending 50 hours on a project during a 2-month period, that’s plausible. So I can’t really see why Wolfers thinks this is “extremely unlikely.”

There is an alternative story, though, consistent with what Wolfers hypothesizes, which is that Krueger would’ve coauthored the paper anyway but took the $100,000 because it was offered to him, and who wouldn’t take free money? I’m willing to believe that story, actually. This also works as a motivation of Uber: they offered free money to Krueger for something he would’ve done anyway, just to give him an excuse to clear his schedule to do the work. So, he didn’t coauthor a paper for $100,000; he coauthored a paper for free and then accepted the $ to motivate himself to do it. Meanwhile, from Uber’s perspective, the money is well spent, in the same way that the National Science Foundation is motivated to pay me to free up my time to do research that’s they think will be valuable to society.

Regarding Dubal’s comment: I don’t see what “basic rules of research” were being violated? Not sharing your data? If working with private data is violating a basic rule of research, fine, but then scientists are doing that for free every day. If you set your time machine back to 2015, and you consider Krueger as an economist who thinks that Uber is cool, then getting paid by them and working with their data, that’s even cooler, right? I imagine that for an economist in 2015, working with Uber is as cool as working with a pro sports team would be for a statistician. Getting paid makes it even cooler: then you’re an insider! Sure, Krueger was a big-name academic, he’d served at the highest levels of government, and according to Wolfers he was doing well enough that $100,000 meant nothing to him. Still, working with Uber, as an economist I’ll bet he thought that would be something special. Again, Uber in 2015 had a different aura than Uber today.

5. “Laundering” and the role of academia and the news media in a world that’s mostly run by business and government

What exactly was it about the Hall and Krueger paper that bothered people so much? I don’t think it was simply that these guys were working for Uber or that, given that Uber was paying them, they’d write a report with a pro-Uber spin. I think what is really bugging the critics is the sense that academia—in this case, then NBER (National Bureau of Economic Research), an influential club or organization of academic economists—was being used to launder this money.

If Hall and Krueger were to publish a book, The Case for Uber, published by Uber Press, arguing that Uber is great, then it’s hard to see the problem. These guys chose to take the job and they did it. But when published as an NBER preprint, and one of the authors is a respected academic, it seems different—even with the disclosure statement.

Again, it’s a problem with the news media too, to the extent that they reported this study in the way they’d report straight-up academic research, without the “Uber-funded study claims . . .” preamble to every headline and sentence describing the published findings.

This all kinda makes me think of another well-known economist, John Kenneth Galbraith, who wrote about “countervailing power.” Galbraith was talking about economic and political power, but something similar arises in the world of ideas. Government and business are the 800-pound gorillas, and we often like to think of academia and advocacy organizations as representing countervailing power. When industry or government inserts propaganda into academic channels, this is bothersome in the same way that “astroturf” lobbying seems wrong. It’s bad for its own sake—fooling people and potentially affecting policy through disinformation—and also bad in that it degrades the credibility of un-conflicted scientific research or genuine grassroots organizing.

In saying this, I recognize that there’s no pure stance in science just as there’s no such thing as pure grassroots politics: Scientists have prior beliefs and political attitudes, and they also have to be paid from somewhere, and the same goes for grassroots organizers. Those mimeograph machines don’t run themselves. So there’s a continuous range here. But getting $100,000 for two months of work to coauthor a paper, that’s near the extreme end of the range.

What I’m getting at here is that, while there is indignation aimed at Krueger here, I think what’s really going on is indignation at perceived manipulation of the system. One way to see this is that nobody seems particularly angry at the Uber executives or even at Hall, the first author of that paper. If it’s bad science, you should be mad at the people who promoted it and the person who did the work, no? I think there’s this attitude that the full-time Uber employees were just doing their jobs, whereas Krueger, who was just a consultant, was supposed to have had a higher loyalty to academia.

6. Politics

There’s one other thing I wanted to get to, which was Wolfers’s attitude that Krueger needed to be defended. (Again, nobody seemed to feel the need to defend Hall for his role in the project.)

One part of the defense was the silly claim that he wouldn’t have done it for the money, but I think underlying there were two implicit defenses:

First, conflict of interest sounds like a bad thing, Krueger was a good person, and therefore he couldn’t’ve had a conflict of interest. I don’t think this argument makes sense—I see conflict of interest not as an aspect of character but as an aspect of the situation. When I write about Columbia University or any organization that is paying me or my family, I have a conflict of interest. I can still try to be objective, but even if I have pure objectivity, the conflict of interest is still there. It’s inherent in the interaction, not in me.

Second, Krueger is a political liberal so therefore he couldn’t have issued a report unduly favorable to Uber, because liberals are skeptical of corporations. I don’t think this argument works either, first because, as noted above, back in 2015 economists of a wide range of political stripes considered Uber to be awesome, and second because Krueger, while political, was not known as a hack. He works with Uber, they tell him good things about the company, he coauthors a positive report.

I always wondered if something similar was going on when the economist James Heckman hypes early childhood intervention programs. Heckman is a political conservative, and one would expect him to be skeptical of utopian social spending programs. So when he and his collaborators found (or, to be precise, thought they found) strong evidence of huge effects of these programs, it was natural for him to think that his new stance was correct—after all, he came to it despite his political convictions.

But it doesn’t work that way. Yes, you can be biased to come out with a result that confirms your preconceptions. But when you come out with a result that rocks your world, that could be a mistake too.

7. Who to credit or blame

I agree with my correspondent, who focused the blame (or, depending on your perspective, the credit) for this episode on Uber management. The online discussion seemed to be all about the economist who consulted for Uber and was the second author of the paper, but really it seems that we should think of Uber, the organization, as the leader of this endeavor.

Full disclosure: I’ve been paid real money by lots of organizations that have done bad things, including pharmaceutical companies, tech companies, and the U.S. Department of Defense.

P.S. Interesting comment here from economist Peter Dorman.

P.P.S. More here.

“Predicting consumers’ choices in the age of the internet, AI, and almost perfect tracking: Some things change, the key challenges do not”

David Gal writes:

I wanted to share the attached paper on choice prediction that I recently co-authored with Itamar Simonson in case it’s of interest.

I think it’s somewhat related to your work on the Piranha Problem, in that it seems, in most cases, most of the explainable variation in people’s choices is accounted for by a few strong, stable tendencies (and these are often captured by variables that are relatively easy to identify).

I also wrote a brief commentary based on the article in Fortune.

And here’s the abstract to the paper:

Recent technology advances (e.g., tracking and “AI”) have led to claims and concerns regarding the ability of marketers to anticipate and predict consumer preferences with great accuracy. Here, we consider the predictive capabilities of both traditional techniques (e.g., conjoint analysis) and more recent tools (e.g., advanced machine learning methods) for predicting consumer choices. Our main conclusion is that for most of the more interesting consumer decisions, those that are “new” and non-habitual, prediction remains hard. In fact, in many cases, prediction has become harder due to the increasing influence of just-in-time information (user reviews, online recommendations, new options, etc.) at the point of decision that can neither be measured nor anticipated ex ante. Sophisticated methods and “big data” can in certain contexts improve predictions, but usually only slightly, and prediction remains very imprecise—so much so that it is often a waste of effort. We suggest marketers focus less on trying to predict consumer choices with great accuracy and more on how the information environment affects the choice of their products. We also discuss implcations for consumers and policymakers.

Sophisticated statistics is often a waste of effort . . . Oh no, that’s not a message that I want spread around. So please, everyone, keep quiet about this paper. Thanks!

Nimby/Yimby thread

Mark Palko and Joseph Delaney share their Nimby/Yimby thread:

MONDAY, SEPTEMBER 13, 2021

Yes, YIMBYs can be worse than NIMBYs — the opening round of the West Coast Stat Views cage match

THURSDAY, SEPTEMBER 16, 2021

Yes, YIMBYs can be worse than NIMBYs Part II — Peeing in the River

FRIDAY, SEPTEMBER 17, 2021

The cage match goes wild [JAC]

MONDAY, SEPTEMBER 20, 2021

Krugman then told how the ring of mountains almost kept the Challenger Expedition from finding the lost city of Los Angeles

TUESDAY, SEPTEMBER 21, 2021

Cage match continues on development [JAC]

THURSDAY, SEPTEMBER 23, 2021

Yes, YIMBYs can be worse than NIMBYs Part III — When an overly appealing narrative hooks up with fatally misaligned market forces, the results are always ugly.

MONDAY, SEPTEMBER 27, 2021

Did the NIMBYs of San Francisco and Santa Monica improve the California housing crisis?

TUESDAY, SEPTEMBER 28, 2021

A primer for New Yorkers who want to explain California housing to Californians

FRIDAY, OCTOBER 1, 2021

A couple of curious things about Fresno

THURSDAY, OCTOBER 7, 2021

Does building where the prices are highest always reduce average commute times?

FRIDAY, OCTOBER 8, 2021

Housing costs [JAC]

WEDNESDAY, OCTOBER 13, 2021

Urbanism [JAC]

MONDAY, OCTOBER 18, 2021

Either this is interesting or I’m doing something wrong

And a study we’ll want to come back to:

A spatiotemporal analysis of transit accessibility to low-wage jobs in Miami-Dade County

Also:

Tuesday, December 21, 2021

The NYT weighs in again on California housing and it goes even worse than expected

I’m no expert on this topic. I have Yimby sympathies—a few years ago I recall seeing some leaflets opposing the building of a new tower in the neighborhood, and I think I wrote a letter to our city councilmember saying they shouldn’t be swayed by the obstructionists—but I’m open to some of the arguments listed above. Palko and Delaney are pushing against conventional narratives that are often unthinkingly presented in the news media.

Don’t go back to Rockville: Possible scientific fraud in the traffic model for a highway project?

Ben Ross of the Maryland Transit Opportunities Coalition writes:

You may be interested in the attached letter I sent to the U.S. Dept. of Transportation yesterday, presenting evidence that suggests scientific fraud in the traffic model being used by the Maryland Dept. of Transportation to justify a major highway project in Maryland. We request that USDOT make an independent examination of the model and that it release the input and output data files to expert outside reviewers. (A request for the data files was already made, and the requesters were told that the was being handled under the state’s FOIA-equivalent law and no response could be expected until after the project gets its approval.)

Ross also points to this news article by Bruce DePuyt that gives some background on the story. It seems that the state wants to add some lanes to the Beltway.

I’ve not read the documents in any sort of detail so I won’t comment on the claim of fraud except to make a meta-point. Without making any comment whatsoever about this particular report but just speaking in general, I think that projections, cost-benefit analyses, etc. are often beyond truth or fraud, in that an organization will want to make a decision and then they’ll come up with an analysis to support that goal, kind of in the same way that a turn-the-crank style scientist will say, “We did a study to prove . . .” So, sure, the analysis might be completely bogus with made-up numbers, but it won’t feel like “fraud” to the people who wrote the report, because the numbers aren’t derived from anything beyond the goal of producing the desired result. Just like all those projects that end up costing 5x what was stated in the original budget plan: those budgets were never serious, they were just lowball estimates created with the goal of getting the project moving.

In any case, it seems good that people such as Ross are looking at these reports and pointing out potential problems, and these can be assessed by third parties. After all, you don’t want to waste another year.

Forking paths in the estimate of the value premium?

Mathias Hasler writes:

I have a working paper about the value premium and about seemingly innocuous decisions that are made in the research process. I wanted to share this working paper with you because I think that you may find it interesting and because your statistics blog kept me encouraged to work on it.

In the paper, I study whether seemingly innocuous decisions in the construction of the original value premium estimate (Fama and French, 1993) affect our inference on the underlying value premium. The results suggest that a large part of the original value premium estimate is the result of chance in these seemingly innocuous research decisions.

Here’s the abstract of the paper:

The construction of the original HML portfolio (Fama and French, 1993) includes six seemingly innocuous decisions that could easily have been replaced with alternatives that are just as reasonable. I propose such alternatives and construct HML portfolios. In sample, the average estimate of the value premium is dramatically smaller than the original estimate of the value premium. The difference is 0.09% per month and statistically significant. Out of sample, however, this difference is statistically indistinguishable from zero. The results suggest that the original value premium estimate is upward biased because of a chance result in the original research decisions.

I’m sympathetic to this claim for the usual reasons, but I know nothing about this topic of the value premium, nor have I tried to evaluate this particular paper, so you can make of it what you will. I’d be happier if it had a scatterplot somewhere.

Gaurav Sood’s review of the book Noise by Kahneman et al.: In writing about noise in human judgment, the authors didn’t wrestle with the problem of noise in behavioral-science research. But behavioral-science research is the product of human judgment.

Here it is. This should interest some of you. Gaurav makes a convincing case that:

1. The main topic of the book—capriciousness in human judgment—is important, it’s worth a book, and the authors (Kahneman, Sibony, and Sunstein) have an interesting take on it.

2. Their recommendations are based on a selective and uncritical review of an often weak literature, for example this description of a study which seems about the closest thing possible to a Brian Wansink paper without actually being by Brian Wansink:

“When calories are on the left, consumers receive that information first and evidently think ‘a lot of calories!’ or ‘not so many calories!’ before they see the item. Their initial positive or negative reaction greatly affects their choices. By contrast, when people see the food item first, they apparently think ‘delicious!’ or ‘not so great!’ before they see the calorie label. Here again, their initial reaction greatly affects their choices. This hypothesis is supported by the authors’ finding that for Hebrew speakers, who read right to left, the calorie label has a significantly larger impact..”

Kinda stunning that they could write this with a straight face, given all we’ve heard about the Cornell Food and Brand Lab, etc.

In writing about noise in human judgment, Kahneman, Sibony, and Sunstein didn’t wrestle with the problem of noise in behavioral-science research. But behavioral-science research is the product of human judgment.

Here are my comments on the book and its promotional material from last year. I was pretty frustrated with the authors’ apparent unfamiliarity with the literature on variation and noise in statistics and economics associated with very famous figures such as W. E. Deming and Fischer Black. In his review, Gaurav persuaded me that the authors of Noise were on to something interesting, which makes me even sadder that they plowed ahead without more reflection and care. Maybe in the future someone can follow up with an article or book on the topic with the virtues of point 1 above and without the defects of point 2.

Actually, maybe Gaurav can do this! A book’s a lot, but an article fleshing out point 1 in a positive way, without getting snowed by noisy evidence or bragging about “discovering a new continent,” actually linking the theme of noise in human judgment to the challenges of interpreting research results . . . this could be really useful. So I’m glad he took the trouble to read the book and write his review.