Some thoughts on academic research and advocacy

An economist who would prefer to remain anonymous writes:

There is an important question of the degree of legitimacy we grant to academic research as advocacy.

It is widely accepted, and I think also true, that the adversary system we have in courts, where one side is seeking to find and present every possible fact and claim that tends to mitigate the guilt of the accused, and the other side strives to find and present every possible fact and claim that tends to confirm the guilt of the accused, and a learned and impartial judge accountable to the public decides, is a good system that does a creditable job of reaching truth. [Tell that to the family of Nicole Brown Simpson. — ed.] (Examining magistrates probably do a better job, but jurisdictions with examining magistrates also have defense attorneys and DAs; they just allow the judge also to participate in the research project.) It is also widely accepted, and I think also true, that it is important that high standards of integrity should be imposed on lawyers to prevent them from presenting false and misleading cases. (Especially DAs.) There is no question of “forking paths” here, each side is explicitly tasked with looking for evidence for one side.

I don’t think that this is a bad model for academic policy research. Martin Feldstein, like most prominent academic economists from both right and left, was an advocate and did not make a secret of his political views or of their sources. He was also a solid researcher and was good at using data and techniques to reach results that confirmed (and occasionally conditioned) his views. The same is true of someone like Piketty from the social-democratic left, or Sam Bowles from a more Marxist perspective, or the farther-right Sam Peltzman from Chicago.

All these individuals were transparent and responsible advocates for a particular policy regime. They all carried out careful and fascinating research and all were able to learn from each other. This is the model that I see as the de facto standard for my profession (policy economics) and I think it is adequate and functional and sustainable.

Romer’s whole “mathiness” screed is not mostly about “Chicago economists are only interested in models that adopt assumptions that conform to their prejudices”, it is IMHO mostly about “Chicago economists work hard to hide the fact that they are only interested in models that adopt assumptions than conform to their prejudice”. I think Romer exaggerates a bit (that is his advocacy) but I agree that he makes an important point.

I’m coming at this as an outsider and have noting to add except to point out the converse point, that honesty and transparency are not enough, and if you’re a researcher and come to a conclusion that wasn’t expected, that you weren’t aiming for, that you weren’t paid or ideologically predisposed to find, that doesn’t automatically mean that you’re right. I’ve seen this attitude a lot, of researchers thinking that their conclusions absolutely must be correct because they came as a surprise or because they’re counter to their political ideologies. But that’s not right either.

The Economic Pit and the Political Pendulum: Predicting Midterm Elections

This post is written jointly with Chris Wlezien.

We were given the following prompt in July 2022 and asked to write a response: “The Economy, Stupid: Accepted wisdom has it that it’s the economy that matters to voters. Will other issues matter in the upcoming midterm elections, or will it really be all about the economy?”

In the Gallup poll taken that month, 35% of Americans listed the economy as one of the most important problems facing the country today, a value which is neither high nor low from a historical perspective. What does this portend for the 2022 midterm elections? The quick answer is that the economy is typically decisive for presidential elections, but midterm elections have traditionally been better described as the swinging of the pendulum, with the voters moving to moderate the party in power. This may partly reflect “thermostatic” public response (see here and here) to government policy actions.

It has long been understood that good economic performance benefits the incumbent party, and there’s a long history of presidents trying to time the business cycle to align with election years. Edward Tufte was among the first to seriously engage this possibility in his 1978 book, Political Control of the Economy, and other social scientists have taken up the gauntlet over the years. Without commenting on the wisdom of these policies, we merely note that even as presidents may try hard to set up favorable economic conditions for reelection, they do not always succeed, and for a variety of reasons: government is only part of the economic engine, the governing party does not control it all, and getting the timing right is an imperfect science.

To the extent that presidents are successful in helping ensure their own reelection, this may have consequences in the midterms. For example, one reason the Democrats lost so many seats in the 2010 midterms may be that Barack Obama in his first two years in office was trying to avoid Carter’s trajectory; his team seemed to want a slow recovery, at the cost of still being in recession in 2010, rather than pumping up the economy too quickly and then crashing by 2012 and paying the political price then.

In the 1970s and 1980s, Douglas Hibbs, Steven Rosenstone, James Campbell, and others established the statistical pattern that recent economic performance predicts presidential elections, and this is consistent with our general understanding of voters and their motivations. Economics is only part of the referendum judgment in presidential elections; consider that incumbents tend to do well even when economic conditions are not ideal. The factors that lead incumbent party candidates to do well in presidential elections also influence Congressional elections in those years, via coattails. So the economy matters here. This is less true in midterm elections. There, voters tend to balance the president, voting for and electing candidates of candidates of the “out-party.” This now is conventional wisdom. There is variation in the tendency, and this partly reflects public approval of the president, which offers some sense of how much people like what the president is doing – we are more (less) likely to elect members of the other party, the less (more) we approve of the president. The economy matters for approval, so it matters in midterm elections, but to a lesser degree than in presidential elections, and other things matter, including policy itself.

Whatever the specific causes, it is rare for voters to not punish the president at the midterm, and historically this has taken very high approval ratings. The exceptions to midterm loss are 1998, after impeachment proceedings against popular Bill Clinton were initiated, and again in 2002, when the even more popular George W. Bush continued to benefit from the 9/11 rally effect, and gains in these years were slight. It has not happened since, and to find another case of midterm gain, one has to go all the way back to Franklin Roosevelt’s first midterm in 1934. This is consistent with voters seeing the president as directing the ship of state, with Congressional voting providing a way to make smaller adjustments to the course. Democrats currently control two of the three branches of government, so it may be natural for some voters to swing Republican in the midterms, particularly given Joe Biden’s approval numbers, which have been hovering in the low-40s.

Elections are not just referenda on the incumbent; the alternative also matters. This may seem obvious in presidential election years, but it is true in midterms as well. Consider that Republican attempts to impeach Bill Clinton may have framed the 1998 election as a choice rather than a “balancing” referendum, contributing to the (slight) Democratic gains in that year. We may see something similar in 2022, encouraged in part by the Supreme Court decision on abortion, among other rulings. Given that the court now is dominated by Republican appointees, some swing voters will want to maintain Democratic control of Congress as a way to check Republicans’ judicial power and future appointments to the courts. The choice in the midterms also may be accentuated by the reemergence of Donald Trump in the wake of the FBI raid of Mar-a-Lago.

Change in party control of Congress is the result of contests in particular districts and states. These may matter less as national forces have increased in importance in recent years, but the choices voters face in those contests still do matter when they go to the polls. In the current election cycle, there has been an increase in the retirements of Democratic incumbents as would be expected in a midterm with a Democratic president, but some of the candidates Republicans are putting up may not be best positioned to win. This is particularly true in the Senate, where candidates and campaigns matter more.

History, historians, and causality

Through an old-fashioned pattern of web surfing of blogrolls (from here to here to here), I came across this post by Bret Devereaux on non-historians’ perceptions of academic history. Devereaux is responding to some particular remarks from economics journalist Noah Smith, but he also points to some more general issues, so these points seem worth discussing.

Also, I’d not previously encountered Smith’s writing on the study of history, but he recently interviewed me on the subjects of statistics and social science and science reform and causal inference so that made me curious to see what was up.

Here’s how Devereaux puts it:

Rather than focusing on converting the historical research of another field into data, historians deal directly with primary sources . . . rather than engaging in very expansive (mile wide, inch deep) studies aimed at teasing out general laws of society, historians focus very narrowly in both chronological and topical scope. It is not rare to see entire careers dedicated to the study of a single social institution in a single country for a relatively short time because that is frequently the level of granularity demanded when you are working with the actual source evidence ‘in the raw.’

Nevertheless as a discipline historians have always11 held that understanding the past is useful for understanding the present. . . . The epistemic foundation of these kinds of arguments is actually fairly simple: it rests on the notion that because humans remain relatively constant situations in the past that are similar to situations today may thus produce similar outcomes. . . . At the same time it comes with a caveat: historians avoid claiming strict predictability because our small-scale, granular studies direct so much of our attention to how contingent historical events are. Humans remain constant, but conditions, technology, culture, and a thousand other things do not. . . .

He continues:

I think it would be fair to say that historians – and this is a serious contrast with many social scientists – generally consider strong predictions of that sort impossible when applied to human affairs. Which is why, to the frustration of some, we tend to refuse to engage counter-factuals or grand narrative predictions.

And he then quotes a journalist, Matthew Yglesias, who wrote, “it’s remarkable — and honestly confusing to visitors from other fields — the extent to which historians resist explicit reasoning about causation and counterfactual analysis even while constantly saying things that clearly implicate these ideas.” Devereaux responds:

We tend to refuse to engage in counterfactual analysis because we look at the evidence and conclude that it cannot support the level of confidence we’d need to have. . . . historians are taught when making present-tense arguments to adopt a very limited kind of argument: Phenomenon A1 occurred before and it resulted in Result B, therefore as Phenomenon A2 occurs now, result B may happen. . . . The result is not a prediction but rather an acknowledgement of possibility; the historian does not offer a precise estimate of probability (in the Bayesian way) because they don’t think accurately calculating even that is possible – the ‘unknown unknowns’ (that is to say, contingent factors) overwhelm any system of assessing probability statistically.

This all makes sense to me. I just want to do one thing, which is to separate two ideas that I think are being conflated here:

1. Statistical analysis: generalizing from observed data to a larger population, a step that can arise in various settings including sampling, causal inference, prediction, and modeling of measurements.

2. Causal inference: making counterfactual statements about what would have happened, or could have happened, had some past decision been made differently, or making predictions about potential outcomes under different choices in some future decision.

Statistical analysis and causal inference are related but are not the same thing.

For example, if historians gather data on public records from some earlier period and then make inference about the distributions of people working at that time in different professions, that’s a statistical analysis but that does not involve causal inference.

From the other direction, historians can think about causal inference and use causal reasoning without formal statistical analysis or probabilistic modeling of data. Back before he became a joke and a cautionary tale of the paradox of influence, historian Niall Ferguson edited a fascinating book, Virtual History: Alternatives and Counterfactuals, a book of essays by historians on possible alternative courses of history, about which I wrote:

There have been and continue to be other books of this sort . . . but what makes the Ferguson book different is that he (and most of the other authors in his book) are fairly rigorous in only considering possible actions that the relevant historical personalities were actually considering. In the words of Ferguson’s introduction: “We shall consider as plausible or probable only those alternatives which we can show on the basis of contemporary evidence that contemporaries actually considered.”

I like this idea because it is a potentially rigorous extension of the now-standard “Rubin model” of causal inference.

As Ferguson puts it,

Firstly, it is a logical necessity when asking questions about causality to pose ‘but for’ questions, and to try to imagine what would have happened if our supposed cause had been absent.

And the extension to historical reasoning is not trivial, because it requires examination of actual historical records in order to assess which alternatives are historically reasonable. . . . to the best of their abilities, Ferguson et al. are not just telling stories; they are going through the documents and considering the possible other courses of action that had been considered during the historical events being considered. In addition to being cool, this is a rediscovery and extension of statistical ideas of causal inference to a new field of inquiry.

See also here. The point is that it was possible for Ferguson et al. to do formal causal reasoning, or at least consider the possibility of doing it, without performing statistical analysis (thus avoiding the concern that Devereaux raises about weak evidence in comparative historical studies).

Now let’s get back to Devereaux, who writes:

This historian’s approach [to avoid probabilistic reasoning about causality] holds significant advantages. By treating individual examples in something closer to the full complexity (in as much as the format will allow) rather than flattening them into data, they can offer context both to the past event and the current one. What elements of the past event – including elements that are difficult or even impossible to quantify – are like the current one? Which are unlike? How did it make people then feel and so how might it make me feel now? These are valid and useful questions which the historian’s approach can speak to, if not answer, and serve as good examples of how the quantitative or ’empirical’ approaches that Smith insists on are not, in fact, the sum of knowledge or required to make a useful and intellectually rigorous contribution to public debate.

That’s a good point. I still think that statistical analysis can be valuable, even with very speculative sampling and data models, but I agree that purely qualitative analysis is also an important part of how we learn from data. Again, this is orthogonal to the question of when we choose to engage in causal reasoning. There’s no reason for bad data to stop us from thinking causally; rather, the limitations in our data merely restrict the strengths of any causal conclusions we might draw.

The small-N problem

One other thing. Devereaux refers to the challenges of statistical inference: “we look at the evidence and conclude that it cannot support the level of confidence we’d need to have. . . .” That’s not just a problem with the field of history! It also arises in political science and economics, where we don’t have a lot of national elections or civil wars or depressions, so generalizations necessarily rely on strong assumptions. Even if you can produce a large dataset with thousands of elections or hundreds of wars or dozens of business cycles, any modeling will implicitly rely on some assumption of stability of a process over time, and assumption that won’t necessarily make sense given changes in political and economic systems.

So it’s not really history versus social sciences. Rather, I think of history as one of the social sciences (as in my book with Jeronimo from a few years back), and they all have this problem.

The controversy

After writing all the above, I clicked through the link and read the post by Smith that Devereaux was arguing.

And here’s the funny thing. I found Devereaux’s post to be very reasonable. Then I read Smith’s post, and I found that to be very reasonable too.

The two guys are arguing against each other furiously, but I agree with both of them!

What gives?

As discussed above, I think Devereaux in his post provides an excellent discussion of the limits of historical inquiry. On the other side, I take the main message of Smith’s post to be that, to the extent that historians want to use their expertise to make claims about the possible effects of recent or new policies, they should think seriously about statistical inference issues. Smith doesn’t just criticizes historians here; he leads off by criticizing academic economists:

After having endured several years of education in that field, I [Smith] was exasperated with the way unrealistic theories became conventional wisdom and even won Nobel prizes while refusing to submit themselves to rigorous empirical testing. . . . Though I never studied history, when I saw the way that some professional historians applied their academic knowledge to public commentary, I started to recognize some of the same problems I had encountered in macroeconomics. . . . This is not a blanket criticism of the history profession . . . All I am saying is that we ought to think about historians’ theories with the same empirically grounded skepticism with which we ought to regard the mathematized models of macroeconomics.

By saying that I found both Devereaux and Smith to be reasonable, I’m not claiming they have no disagreements. I think their main differences come because they’re focusing on two different things. Smith’s post is ultimately about public communication and the things that academic say in the public discourse (things like newspaper op-eds and twitter posts) with relevance to current political disputes. And, for that, we need to consider the steps, implicit or explicit, that commentators take to go from their expertise to the policy claims they make. Devereaux is mostly writing about academic historians in their professional roles. With rare exceptions, academic history is about getting the details right, and even popular books of history typically focus on what happened, and our uncertainty about what happened, not on larger theories.

I guess I do disagree with this statement from Smith:

The theories [from academic history] are given even more credence than macroeconomics even though they’re even less empirically testable. I spent years getting mad at macroeconomics for spinning theories that were politically influential and basically un-testable, then I discovered that theories about history are even more politically influential and even less testable.

Regarding the “less testable” part, I guess it depends on the theories—but, sure, many theories about what have happened in the past can be essentially impossible to test, if conditions have changed enough. That’s unavoidable. As Devereaux replies, this is not a problem with the study of history; it’s just the way things are.

But I can’t see how Smith could claim with a straight face that theories from academic history are “given more credence” and are “more politically influential” than macroeconomics. The president has a council of economic advisers, there are economists at all levels of the government, or if you want to talk about the news media there are economists such as Krugman, Summers, Stiglitz, etc. . . . sure, they don’t always get what they want when it comes to policy, but they’re quoted endlessly and given lots of credence. This is also the case in narrower areas, for example James Heckman on education policy or Angus Deaton on deaths of despair: these economists get tons of credence in the news media. There are no academic historians with that sort of influence. This has come up before: I’d say that economics now is comparable to Freudian psychology in the 1950s in its influence on our culture:

My best analogy to economics exceptionalism is Freudianism in the 1950s: Back then, Freudian psychiatrists were on the top of the world. Not only were they well paid, well respected, and secure in their theoretical foundations, they were also at the center of many important conversations. Even those people who disagreed with them felt the need to explain why the Freudians were wrong. Freudian ideas were essential, leaders in that field were national authorities, and students of Freudian theory and methods could feel that they were initiates in a grand tradition, a priesthood if you will. Freudians felt that, unlike just about everybody else, they treated human beings scientifically and dispassionately. What’s more, Freudians prided themselves on their boldness, their willingness to go beyond taboos to get to the essential truths of human nature. Sound familiar?

When it comes to influence in policy or culture or media, academic history doesn’t even come close to Freudianism in the 1950s or economics in recent decades.

This is not to say we should let historians off the hook when they make causal claims or policy recommendations. We shouldn’t let anyone off the hook. In that spirit, I appreciate Smith’s reminder of the limits of historical theories, along with Devereaux’s clarification of what historians really do when they’re doing academic history (as opposed to when they’re slinging around on twitter).

Why write about this at all?

As a statistician and political scientist, I’m interested in issues of generalization from academic research to policy recommendations. Even in the absence of any connection with academic research, people will spin general theories—and one problem with academic research is that it can give researchers, journalists, and policymakers undue confidence in bad theories. Consider, for example, the examples of junk science promoted over the years by the Freakonomics franchise. So I think these sorts of discussions are important.

Krueger Uber $100,000 update

Last month we discussed the controversy regarding the recent revelation that, back in 2015, economist Alan Krueger had been paid $100,000 by Uber to coauthor a research article that was published at the NBER (National Bureau of Economic Research, a sort of club of influential academic economists) and seems possibly to have been influential in policy.

Krueger was a prominent political liberal—he worked in the Clinton and Obama administrations and was most famous for research that reported positive effects of the minimum wage—so there was some dissonance about him working for a company that has a reputation for employing drivers as contractors to avoid compensating them more. But I think the key point of the controversy was the idea that academia (or, more specifically, academic economics (or, more specifically, the NBER)) was being used to launder this conflict of interest. The concern was that Uber paid Krueger, the payment went into a small note in the paper, and then the paper was mainlined into the academic literature and the news media. From that perspective, Uber wasn’t just paying for Krueger’s work or even for his reputation; they were also paying for access to NBER and the economics literature.

In this new post I’d like to address a few issues:

1. Why now?

The controversial article by Hall and Krueger came out in 2015, and right there it had the statement, “Jonathan Hall was an employee and shareholder of Uber Technologies before, during, and after the writing of this paper. Krueger acknowledges working as a consultant for Uber in December 2014 and January 2015 when the initial draft of this paper was written,” is not “an adequate disclosure that Uber paid $100,000 for Krueger to write this paper.”

So why the controversy now? Why are we getting comments like this now:

rather than in 2015?

The new information is that Krueger was paid $100,000 by Uber. But we already knew he was being paid by Uber—it’s right on that paper. Sure, $100,000 is a lot, but what were people expecting? If Uber was paying Krueger at all, you can be pretty sure it was more than $10,000.

I continue to think that the reason the controversy happened now rather than in 2015 is that, back in 2015, there was a general consensus among economists that Uber was cool. They weren’t considered to be the bad guy, so it was fine for him to get paid.

2. The Necker cube of conflict of interest

Conflict of interest goes in two directions. On one hand, as discussed above, there’s a concern that being paid will warp your perspective. And, even beyond this sort of bias, being paid will definitely affect what you do. Hall and Krueger could be the most ethical researchers in the universe—but there’s no way they would’ve done that particular study had Uber not paid them to do it. This is the usual way that we think about conflict of interest.

But, from the other direction, there’s also the responsibility, if someone pays you do to a job, to try to do your best. If Hall and Krueger were to take hundreds of thousands of dollars from Uber and then turn around and stab Uber in the back, that wouldn’t be cool. I’m not talking here about whistleblowing. If Hall and Krueger started working for Uber and then saw new things they hadn’t seen before, indicating illegal or immoral conduct by the company, then, sure, we can all agree that it would be appropriate for them to blow the whistle. What I’m saying is that it, if Hall and Krueger come to work for Uber, and Uber is what they were expecting when they took the job, then it’s their job to do right by Uber.

My point here is that these two issues of conflict of interest go in opposite directions. As with that Necker cube, it’s hard to find a stable intermediate position. Is Krueger an impartial academic who just happened to take some free money? Or does he have a duty to the company that hired him? I think it kind of oscillates between the two. And I wonder if that’s one reason for the strong reaction that people had, that they’re seeing the Necker cube switch back and forth.

3. A spectrum of conflicts of interests

Another way to think about this is to consider a range of actions in response to taking someone’s money.

– At one extreme are the pure hacks, the public relations officers who will repeat any lie that the boss tells them, the Mob lawyers who help in the planning of crimes, the expert witnesses who will come to whatever conclusion they’re paid to reach. Sometimes the hack factor overlaps with ideology, so you might get a statistician taking cash from a cigarette company and convincing himself that cigarettes aren’t really addictive, or a lawyer helping implement an illegal business plan and convincing himself that the underlying law is unjust.

– Somewhat more moderate are the employees or consultants who do things that might otherwise make them queasy, because they benefit the company that’s paying them. It’s hard to draw a sharp line here. “Writing a paper for Uber” is, presumably, not something that Krueger would’ve done without getting paid by Uber, but that doesn’t mean that the contents of the paper cross any ethical line.

– Complete neutrality. It’s hard for me to be completely neutral on a consulting project—people are paying me and I want to give them their money’s worth—but I can be neutral when evaluating a report. Someone pays me to read a report and I can give my comments without trying to shade them to make the client happy. Similarly, when I’ve been an expert witness, I’ve just written it like I see it. I understand that this will help the client—if they don’t like my report, they can just bury it!—so I’m not saying that I’m being neutral in my actions. But the content of my work is neutral.

– Bending over backward. An example here might be the admirable work of Columbia math professor Michael Thaddeus in criticizing the university’s apparently bogus numbers. Rather than going easy on his employer, Thaddeus is holding Columbia to a high standard. I guess it’s a tough call, when to do this. Hall and Krueger could’ve taken Uber’s money and then turned around and said something like, This project is kinda ok, but given the $ involved, we should really be careful and point out the red flags in the interpretation of the results. That would’ve been fine; I’ll just say again that this conflicts with the general attitude that, if someone pays you, you should give them your money’s worth. The Michael Thaddeus situation is different because Columbia pays him to teach and do research; it was never his job to pump up its U.S. News ranking.

– The bust-out operation. Remember this from Goodfellas? The mob guys take over someone’s business and strip it of all its assets, then torch the place for the insurance. This is the opposite of the hack who will do anything for the boss. An example here would be something like taking $100,000 from Uber and then using the knowledge gained from the inside to help a competitor.

In a horseshoe-like situation, the two extremes of the above spectrum seem the least ethical.


This is just a minor thing, but I was corresponding with someone who pointed out that NBER working papers are not equivalent to peer-reviewed publications.

I agree, but NBER ain’t nothing. It’s an exclusive club, and it’s my impression that an NBER paper gets attention in a way that an Arxiv paper or a Columbia University working paper or an Uber working paper would not. NBER papers may not count for academic CV’s, but I think they are noticed by journalists, researchers, and even policymakers. An NBER paper is considered legit by default, and that does have to do with one of the authors being a member of the club.

5. Feedback

In an email discussion following up on my post, economics journalist Felix Salmon wrote:

My main point the whole time has just been that if Uber pays for a paper, Uber should publish the paper. I don’t like the thing where they add Krueger as a second author, pay him $100k for his troubles, and thusly get their paper into NBER. Really what I’m worried about here is not that Krueger is being paid to come to a certain conclusion (the conflict-of-interest problem), so much as Krueger is being paid to get the paper into venues that would otherwise be inaccessible to Uber. I do think that’s problematic, and I don’t think it’s a problem that lends itself to solution via disclosure. On the other hand, it’s a problem with a simple solution, which is just that NBER shouldn’t publish such papers, and that they should live the way that god intended, which is as white papers (or even just blog posts) published by the company in question.

As for the idea that conflict is binary and “the conflict is there, whether it’s $1000 or $100,000” — I think that’s a little naive. At some point ($1 million? $10 million? more?) the sheer quantity of money becomes germane. . . .

Salmon also asked:

Is there a way of fixing the disclosure system to encompass this issue? It seems pretty clear to me that when the lead author is literally an Uber employee, Uber has de facto control over whether the paper gets published. Again, I think the solution to this problem is to have Uber publish the paper. But if there’s a disclosure solution then I’m interested in what it would look like.

I replied:

You say that Uber should just publish the paper. That’s fine with me. I will put a paper on my website (that’s kinda like a university working paper series, just for me), I will also put it on Arxiv and NBER (if I happen to have a coauthor who’s a member of that club), and I’ll publish in APSR or JASA or some lower-ranking journal. No reason a paper can’t be published in more than one “working paper” series. I also think it’s ok for the authors to publish in NBER (with a disclosure, which that paper had) and in a scholarly journal (again, with a disclosure).

You might be right that the published disclosure wasn’t enough, but in that case I think your problem is with academic standards in general, not with Krueger or even with the economics profession. For example, I was the second author on this paper: where the first and last authors worked at Novartis. I was paid by Novartis while working on the project—I consulted for them for several years, with most of the money going to my research collaborators at Columbia and elsewhere, but I pocketed some $ too. I did not disclose that conflict of interest in that paper—I just didn’t think about it! Or maybe it seemed unnecessary given that two of the authors were Novartis employees. In retrospect, yeah, I think I should’ve disclosed. But the point is, even if I had disclosed, I wouldn’t have given the dollar value, not because it’s a big secret (actually, I don’t remember the total, as it was over several years, and most of it went to others) but just because I’ve never seen anyone disclose the dollar amount in an acknowledgements or conflict-of-interest statement for a paper. It’s just not how things are done. Also I don’t think we shared our data. And our paper won an award! On the plus side, it’s only been cited 17 times so I guess there’s some justice in the world and vice is not always rewarded.

I agree that conflict is not binary. But I think that even with $1000, there is conflict. One reason I say this is that I think people often have very vague senses of conflict of interest. Is it conflict of interest to referee a paper written by a personal friend? If the friend is not a family member and there’s no shared employment or $ changing hands, I’d say no. That’s just my take, also my approximation to what I think the rules are, based on what Columbia University tells me. Again, though, I think it would’ve been really unusual had that paper had a statement saying that Krueger had been paid $100,000. I’ve just never seen that sort of thing in an academic paper.

Did Uber have the right to review the paper? I have no idea; that’s an interesting question. But I think there’s a conflict of interest, whether or not Uber had the right to review: these are Uber consultants and Uber employees working with Uber data, so no surprise they’ll come to an Uber-favorable conclusion. I still think the existing conflict of interest statement was just fine, and that the solution is for all journalists to report this as, “Uber study claims . . . ” The first author of the paper was an Uber employee and the second author was an Uber consultant—it’s right there on the first page of the paper!

As a separate issue, if the data and meta-data (the data-collection protocol) are not made available, then that should cause an immediate decline of trust, even setting aside any conflict of interest. I guess that would be a problem with my above-linked Novartis paper too; the difference is that our punch line was a methods conclusion, not a substantive conclusion, and you can check the methods conclusion with fake data.

Salmon replied:

If you want the media to put “Uber study claims” before reporting such results, then the way you get there is by putting the Uber logo on the top of the study. You don’t get there by adding a disclosure footnote and expecting front-line workers in the content mines to put two and two together.

As discussed in my original post on the topic, I have lots of conflicts of interest myself. So I’m not claiming to be writing about all this from some sort of ethically superior position.

6. Should we trust that published paper?

The economist who first pointed me to this story (and who wished to remain anonymous) followed up by email:

I’m surprised that the focus of your thoughts and of the others who commented on the blog were on whether Krueger acted correctly, not on what to do with the results they (and others) found. I guess my one big takeaway from now on is to move my priors more strongly towards being sceptical of papers using non-shared, proprietary data…

That’s a good point! I don’t think anyone in all these discussions is suggesting we should actually trust that published paper. The data were supplied by Uber; the conclusions were restricted to what Uber wanted to get; the whole thing was done in an environment where economists loved Uber and wanted to hear good things about it; etc.

I guess the reason the conversation was all about Krueger and the role of academic economics was that everyone was already starting from the position that the paper could not be trusted. If it were considered to be a trustworthy piece of research, I think there’d be a lot less objection to it. It’s not like the defenders of Krueger were saying they believed the claims in the paper. Again, part of that is political. Krueger was a prominent liberal economist, and now in 2022, liberal economists are not such Uber fans, so it makes sense that they’d defend his actions on procedural rather than substantive grounds.

Some concerns about the recent Chetty et al. study on social networks and economic inequality, and what to do next?

I happened to receive two different emails regarding a recently published research paper.

Dale Lehman writes:

Chetty et al. (and it is a long et al. list) have several publications about social and economic capital (see here for one such paper, and here for the website from which the data can also be accessed). In the paper above, the data is described as:

We focus on Facebook users with the following attributes: aged between 25 and 44 years who reside in the United States; active on the Facebook platform at least once in the previous 30 days; have at least 100 US-based Facebook friends; and have a non-missing residential ZIP code. We focus on the 25–44-year age range because its Facebook usage rate is greater than 80% (ref. 37). On the basis of comparisons to nationally representative surveys and other supplementary analyses, our Facebook analysis sample is reasonably representative of the national population.

They proceed to measure social and economic connectedness across counties, zip codes, and for graduates of colleges and high schools. The data is massive as is the effort to make sense out of it. In many respects it is an ambitious undertaking and one worthy of many kudos.

But I [Lehman] do have a question. Given their inclusion criteria, I wonder about selection bias when comparing counties, zip codes, colleges, or high schools. I would expect that the fraction of Facebook users – even in the targeted age group – that are included will vary across these segments. For example, one college may have many more of its graduates who have that number of Facebook friends and have used Facebook in the prior 30 days compared with a second college. Suppose the economic connectedness from the first college is greater than from the second college. But since the first college has a larger proportion of relatively inactive Facebook users, is it fair to describe college 1 as having greater connectedness?

It seems to me that the selection criteria make the comparisons potentially misleading. It might be accurate to say that the regular users of Facebook from college 1 are more connected than those from college 2, but this may not mean that the graduates from college 1 are more connected than the graduates from college 2. I haven’t been able to find anything in their documentation to address the possible selection bias and I haven’t found anything that mentions how the proportion of Facebook accounts that meet their criteria varies across these segments. Shouldn’t that be addressed?

That’s an interesting point. Perhaps one way to address it would be to preprocess the data by estimating a propensity to use facebook and then using this propensity as a poststratification variable in the analysis. I’m not sure. Lehman makes a convincing case that this is a concern when comparing different groups; that said, it’s the kind of selection problem we have all the time, and typically ignore, with survey data.

Richard Alba writes in with a completely different concern:

You may be aware of the recent research, published in Nature by the economist Raj Chetty and colleagues, purporting to show that social capital in the form of early-life ties to high-status friends provides a powerful pathway to upward mobility for low-status individuals. It has received a lot of attention, from The New York Times, Brookings, and no doubt other places I am not aware of.

In my view, they failed to show anything new. We have known since the 1950s that social capital has a role in mobility, but the evidence they develop about its great power is not convincing, in part because they fail to take into account how their measure of social capital, the predictor, is contaminated by the correlates and consequences of mobility, the outcome.

This research has been greeted in some media as a recipe for the secret sauce of mobility, and one of their articles in Nature (there are two published simultaneously) is concerned with how to increase social capital. In other words, the research is likely to give rise to policy proposals. I think it is important then to inform Americans about its unacknowledged limitations.

I sent my critique to Nature, and it was rejected because, in their view, it did not sufficiently challenge the articles’ conclusions. I find that ridiculous.

I have no idea how Nature decides what critiques to publish, and I have not read the Chetty et al. articles so I can’t comment on theme either, but I can share Alba’s critique. Here it is:

While the pioneering big-data research of Raj Chetty and his colleagues is transforming the long-standing stream of research into social mobility, their findings should not be exempt from critique.

Consider in this light the recent pair of articles in Nature, in which they claim to have demonstrated a powerful causal connection between early-life social capital and upward income mobility for individuals growing up in low-income families. According to one paper’s abstract, “the share of high-SES friends among individuals with low-SES—which we term economic connectedness—is among the strongest predictors of upward income mobility identified to date.”

But there are good reasons to doubt that this causal connection is as powerful as the authors claim. At a minimum, the social capital-mobility statistical relationship is significantly overstated.

This is not to deny a role for social capital in determining adult socioeconomic position. That has been well established for decades. As early as the 1950s, the Wisconsin mobility studies focused in part on what the researchers called “interpersonal influence,” measured partly in terms of high-school friends, an operationalization close to the idea in the Chetty et al. article. More generally, social capital is indisputably connected to labor-market position for many individuals because of the role social networks play in disseminating job information.

But these insights are not the same as saying that economic connectedness, i.e., cross-class ties, is the secret sauce in lifting individuals out of low-income situations. To understand why the articles’ evidence fails to demonstrate this, it is important to pay close attention to how the data and analysis are constructed. Many casual readers, who glance at the statements like the one above or read the journalistic accounts of the research (such as the August 1 article in The New York Times), will take away the impression that the researchers have established an individual-level relationship—that they have proven that individuals from low-SES families who have early-life cross-class relationships are much more likely to experience upward mobility. But, in fact, they have not.

Because of limitations in their data, their analysis is based on the aggregated characteristics of areas—counties and zip codes in this case—not individuals. This is made necessary because they cannot directly link the individuals in their main two sources of data—contemporary Facebook friendships and previous estimates by the team of upward income mobility from census and income-tax data. Hence, the fundamental relationship they demonstrate is better stated as: the level of social mobility is much higher in places with many cross-class friendships. The correlation, the basis of their analysis, is quite strong, both at the county level (.65) and at the zip-code level (.69).

Inferring that this evidence demonstrates a powerful causal mechanism linking social capital to the upward mobility of individuals runs headlong into a major problem: the black box of causal mechanisms at the individual level that can lie behind such an ecological correlation, where moreover both variables are measured for roughly the same time point. The temptation may be to think that the correlation reflects mainly, or only, the individual-level relationship between social capital and mobility as stated above. However, the magnitude of an area-based correlation may be deceptive about the strength of the correlation at the individual level. Ever since a classic 1950 article by W. S. Robinson, it has been known that ecological correlations can exaggerate the strength of the individual-level relationship. Sometimes the difference between the two is very large, and in the case of the Chetty et al. analysis it appears impossible given the data they possess to estimate the bias involved with any precision, because Robinson’s mathematics indicates that the individual-level correlations within area units are necessary to the calculation. Chetty et al. cannot calculate them.

A second aspect of the inferential problem lies in the entanglement in the social-capital measure of variables that are consequences or correlates of social mobility itself, confounding cause and effect. This risk is heightened because the Facebook friendships are measured in the present, not prior to the mobility. Chetty et al. are aware of this as a potential issue. In considering threats to the validity of their conclusion, they refer to the possibility of “reverse causality.” What they have in mind derives from an important insight about mobility—mobile individuals are leaving one social context for another. Therefore, they are also leaving behind some individuals, such as some siblings, cousins, and childhood buddies. These less mobile peers, who remain in low-SES situations but have in their social networks others who are now in high-SES ones, become the basis for the paper’s Facebook estimate of economic connectedness (which is defined from the perspective of low-SES adults between the ages of 25 and 44). This sort of phenomenon will be frequent in high-mobility places, but it is a consequence of mobility, not a cause. Yet it almost certainly contributes to the key correlation—between economic connectedness and social mobility—in the way the paper measures it.

Chetty et al. try to answer this concern with correlations estimated from high-school friendships, arguing that the timing purges this measure of mobility’s impact on friendships. The Facebook-based version of this correlation is noticeably weaker than the correlations that the paper emphasizes. In any event, demonstrating a correlation between teen-age economic connectedness and high mobility does not remove the confounding influence of social mobility from the latter correlations, on which the paper’s argument depends. And in the case of high-school friendships, too, the black-box nature of the causality behind the correlation leaves open the possibility of mechanisms aside from social capital.

This can be seen if we consider the upward mobility of the children of immigrants, surely a prominent part today of the mobility picture in many high-mobility places. Recently, the economists Ran Abramitzky and Leah Boustan have reminded us in their book Streets of Gold that, today as in the past, the children of immigrants, the second generation, leap on average far above their parents in any income ranking. Many of these children are raised in ambitious families, where as Abramitzky and Boustan put it, immigrants typically are “under-placed” in income terms relative to their abilities. Many immigrant parents encourage their children to take advantage of opportunities for educational advancement, such as specialized high schools or advanced-placement high-school classes, likely to bring them into contact with peers from more advantaged families. This can create social capital that boosts the social mobility of the second generation, but a large part of any effect on mobility is surely attributable to family-instilled ambition and to educational attainment substantially higher than one would predict from parental status. The increased social capital is to a significant extent a correlate of on-going mobility.

In sum, there is without doubt a causal linkage between social capital and mobility. But the Chetty et al. analysis overstates its strength, possibly by a large margin. To twist the old saw about correlation and causation, correlation in this case isn’t only causation.

I [Alba] believe that a critique is especially important in this case because the findings in the Chetty et al. paper create an obvious temptation for the formulation of social policy. Indeed, in their second paper in Nature, the authors make suggestions in this direction. But before we commit ourselves to new anti-poverty policies based on these findings, we need a more certain gauge of the potential effectiveness of social capital than the current analysis can give us.

I get what Alba is saying about the critique not strongly challenging the article’s conclusions. He’s not saying that Chetty et al. are wrong; it’s more that he’s saying there are a lot of unanswered questions here—a position I’m sure Chetty et al. would themselves agree with!

A possible way forward?

To step back a moment—and recall that I have not tried to digest the Nature articles or the associated news coverage—I’d say that Alba is criticizing a common paradigm of social science research in which a big claim is made from a study and the study has some clear limitations, so the researchers attack the problem in some different ways in an attempt to triangulate toward a better understanding.

There are two immediate reactions I’d like to avoid. The first is to say that the data aren’t perfect, the study isn’t perfect, so we just have to give up and say we’ve learned nothing. On the other direction is the unpalatable response that all studies are flawed so we shouldn’t criticize this one in particular.

Fortunately, nobody is suggesting either of these reactions. From one direction, critics such as Lehman and Alba are pointing out concerns but they’re not saying the conclusions of the Chetty et al. study are all wrong of that the study is useless; from the other, news reports do present qualifiers and they’re not implying that these results are a sure thing.

What we’d like here is a middle way—not just a rhetorical middle way (“This research, like all social science, has weaknesses and threats to validity, hence the topic should continue to be studied by others”) but a procedural middle way, a way to address the concerns, in particular to get some estimates of the biases in the conclusions resulting from various problems with the data.

Our default response is to say the data should be analyzed better: do a propensity analysis to address Lehman’s concern about who’s on facebook, and do some sort of multilevel model integrating individual and zipcode-level data to address Alba’s concern about aggregation. And this would all be fine, but it takes a lot of work—and Chetty et al. already did a lot of work, triangulating toward their conclusion from different directions. There’s always more analysis that could be done.

Maybe the problem with the triangulation approach is not the triangulation itself but rather the way it can be set up with a central analysis making a conclusion, and then lots of little studies (“robustness checks,” etc.) designed to support the main conclusion. What if the other studies were set up to estimate biases, with the goal not of building confidence in the big number but rather of getting a better, more realistic, estimate.

With this in mind, I’m thinking that a logical next step would be to construct a simulation study to get a sense of the biases arising from the issues raised by Lehman and Alba. We can’t easily gather the data required to know what these biases are, but it does seem like it should be possible to simulate a world in which different sorts of people are more or less likely to be on facebook, and in which there are local patterns of connectedness that are not simply what you’d get by averaging within zipcodes.

I’m not saying this would be easy—the simulation would have to make all sorts of assumptions about how these factors vary, and the variation would need to depend on relevant socioeconomic variables—but right now it seems to me to be a natural next step in the research.

One more thing

Above I stressed the importance and challenge of finding a middle ground between (1) saying the study’s flaws make it completely useless and (2) saying the study represents standard practice so we should believe it.

Sometimes, though, response #1 is appropriate. For example, the study of beauty and sex ratio or the study of ovulation and voting or the study claiming that losing an election for governor lops 5 to 10 years off your life—I think those really are useless (except as cautionary tales, lessons of research practices to avoid). How can I say this? Because those studies are just soooo noisy compared to any realistic effect size. There’s just no there there. Researchers can fool themselves because the think that if they have hundreds or thousands of data points, that they’re cool, and that if they have statistical significance, they’ve discovered something. We’ve talked about this attitude before, and I’ll talk about again; I just wanted to emphasize here that it doesn’t always make sense to take the middle way. Or, to put it another way, sometimes the appropriate middle way is very close to one of the extreme positions.

Bets as forecasts, bets as probability assessment, difficulty of using bets in this way

John Williams writes:

Bets as forecasts come up on your blog from time to time, so I thought you might be interested in this post from RealClimate, which is the place to go for informed commentary on climate science.

The post, by Gavin Schmidt, is entitled, “Don’t climate bet against the house,” and tells the story of various public bets in the past few decades regarding climate outcomes.

The examples are interesting in their own right and also as a reminder that betting is complicated. In theory, betting has close links to uncertainty, and you should be able to go back and forth between them:

1. From one direction, if you think the consensus is wrong, you can bet against it and make money (in expectation). You should be able to transform your probability statements into bets.

2. From the other direction, if bets are out there, you can use these to assess people’s uncertainties, and from there you can make probabilistic predictions.

In real life, though, both the above steps can have problems, for several reasons. First is the vig (in a betting market) or the uncertainty that you’ll be paid off (in an unregulated setting). Second is that you need to find someone to make that bet with you. Third, and relatedly, that “someone” who will bet with you might have extra information you don’t have, indeed even their willingness to bet at given odds provides some information, in a Newtonian action-and-reaction sort of way. Fourth, we hear about some of the bets and we don’t hear about others. Fifth, people can be in it to make a point or for laffs or thrills or whatever, not just for the money, enough so that, when combined with the earlier items on this list, there won’t be enough “smart money” to take up the slack.

This is not to say that betting is a useless approach to information aggregation; I’m just saying that betting, like other social institutions, works under certain conditions and not in absolute generality.

And this reminds me of another story.

Economist Bryan Caplan reports that his track record on bets is 23 for 23. That’s amazing! How is it possible? Here’s Caplan’s list, which starts in 2007 and continues through 2021, with some of the bets still unresolved.

Caplan’s bets are an interesting mix. The first one is a bet where he offered 1-to-100 odds so it’s no big surprise that he won, but most of them are at even odds. A couple of them he got lucky on (for example, he bet in 2008 that no large country would leave the European Union before January 1, 2020, so he just survived by one month on that one), but, hey, it’s ok to be lucky, and in any case even if he only had won 21 out of 23 bets, that would still be impressive.

It seems to me that Caplan’s trick here is to show good judgment on what pitches to swing at. People come at him with some strong, unrealistic opinions, and he’s been good at crystallizing these into bets. In poker terms, he waits till he has the nuts, or nearly so. 23 out of 23 . . . that’s a great record.

Weak separation in mixture models and implications for principal stratification

Avi Feller, Evan Greif, Nhat Ho, Luke Miratrix, and Natesh Pillai write:

Principal stratification is a widely used framework for addressing post-randomization complications. After using principal stratification to define causal effects of interest, researchers are increasingly turning to finite mixture models to estimate these quantities. Unfortunately, standard estimators of mixture parameters, like the MLE, are known to exhibit pathological behavior. We study this behavior in a simple but fundamental example, a two-component Gaussian mixture model in which only the component means and variances are unknown, and focus on the setting in which the components are weakly separated. . . . We provide diagnostics for all of these pathologies and apply these ideas to re-analyzing two randomized evaluations of job training programs, JOBS II and Job Corps.

The paper’s all about maximum likelihood estimates and I don’t care about that at all, but the general principles are relevant to understanding causal inference with intermediate outcomes and fitting such models in Stan or whatever.

Still more on the Heckman Curve!

Carlos Parada writes:

Saw your blog post on the Heckman Curve. I went through Heckman’s response that you linked, and it seems to be logically sound but terribly explained, so I feel like I need to explain why Rea+Burton is great empirical work, but it doesn’t actually measure the Heckman curve.

The Heckman curve just says that, for any particular person, there exists a point where getting more education isn’t worth it anymore because the costs grow as you get older, or equivalently, the benefits get smaller. This is just trivially true. The most obvious example is that nobody should spend 100% of their life studying, since then they wouldn’t get any work done at all. Or, more tellingly, getting a PhD isn’t worth it for most people, because most people either don’t want to work in academia or aren’t smart enough to complete a PhD. (Judging by some of the submissions to PPNAS, I’m starting to suspect most of academia isn’t smart enough to work in academia.)

The work you linked finds that participant age doesn’t predict the success of educational programs. I have no reason to suspect these results are wrong, but the effect of age on benefit:cost ratios for government programs does not measure the Heckman curve.

To give a toy model, imagine everyone goes to school as long as the benefits of schooling are greater than the costs for them, then drops out as soon as they’re equal. So now, for high school dropouts, what is the benefit:cost ratio of an extra year of school? 1 — the costs roughly equal the benefits. For college dropouts, what’s the benefit:cost ratio? 1 — the costs roughly equal the benefits. And so on. By measuring the effects of government interventions on people who completed x years of school before dropping out, the paper is conditioning on a collider. This methodology would only work if when people dropped out of school was independent of the benefits/costs of an extra year of school.

(You don’t have to assume perfect rationality for this to work: If everyone goes to school until the benefit:cost ratio equals 1.1 or 0.9, you still won’t find a Heckman curve. Models that assume rational behavior tend to be robust to biases of this sort, although they can be very vulnerable in some other cases.)

Heckman seems to have made this mistake at some points too, though, so the authors are in good company. The quotes in the paper suggest he thought an individual Heckman curve would translate to a downwards-sloping curve for government programs’ benefits, when there’s no reason to believe they would. I’ve made very similar mistakes myself.


An econ undergrad who really should be getting back to his Real Analysis homework

Interesting. This relates to the marginal-or-aggregate question that comes up a lot in economics. It’s a common problem that we care about marginal effects but the data more easily allow us to estimate average effects. (For the statisticians in the room, let me remind you that “margin” has opposite meanings in statistics and economics.)

But one problem that Parada doesn’t address with the Heckman curve is that the estimates of efficacy used by Heckman are biased, sometimes by a huge amount, because of selection on statistical significance; see section 2.1 of this article. All the economic theory in the world won’t fix that problem.

P.S. In an amusing example of blog overlap, Parada informs us that he also worked on the Minecraft speedrunning analysis. It’s good to see students keeping busy!

Solution to that little problem to test your probability intuitions, and why I think it’s poorly stated

The other day I got this email from Ariel Rubinstein and Michele Piccione asking me to respond to this question which they sent to a bunch of survey respondents:

A very small proportion of the newborns in a certain country have a specific genetic trait.
Two screening tests, A and B, have been introduced for all newborns to identify this trait.
However, the tests are not precise.
A study has found that:
70% of the newborns who are found to be positive according to test A have the genetic trait (and conversely 30% do not).
20% of the newborns who are found to be positive according to test B have the genetic trait (and conversely 80% do not).
The study has also found that when a newborn has the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.
Likewise, when a newborn does not have the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.
Suppose that a newborn is found to be positive according to both tests.
What is your estimate of the likelihood (in %) that this newborn has the genetic trait?

Here was my response:

OK, let p = Pr(trait) in population, let a1 = Pr(positive test on A | trait), a2 = Pr(positive test on A | no trait), b1 = Pr(positive test on B | trait), b2 = Pr(positive test on B | no trait).
Your first statement is Pr(trait | positive on test A) = 0.7. That is, p*a1/(p*a1 + (1-p)*a2) = 0.7
Your second statement is Pr(trait | positive on test B) = 0.2. That is, p*b1/(p*b1 + (1-p)*b2) = 0.2

What you want is Pr(trait | positive on both tests) = p*a1*b1 / (p*a1*b1 + (1-p)*a2*b2)

It looks at first like there’s no unique solution to this one, as it’s a problem with 5 unknowns and just 2 data points!

But we can do that “likelihood ratio” trick . . .
Your first statement is equivalent to 1 / (1 + ((1-p)/p) * (a2/a1)) = 0.7; therefore (p/(1-p)) * (a1/a2) = 0.7 / 0.3
And your second statement is equivalent to (p/(1-p)) * (b1/b2) = 0.2 / 0.8
Finally, what you want is 1 / (1 + ((1-p)/p) * (a2/a1) * (b2/b1)). OK, this can be written as X / (1 + X), where X is (p/(1-p)) * (a1/a2) * (b1/b2).
Given the information above, X = (0.7 / 0.3) * (0.2 / 0.8) * (1-p)/p

Still not enough information, I think! We don’t know p.

OK, you give one more piece of information, that p is “very small.” I’ll suppose p = 0.001.

Then X = (0.7 / 0.3) * (0.2 / 0.8) * 999, which comes to 580, so the probability of having the trait given positive on both tests is 580 / 581 = 0.998.

OK, now let me check my math. According to the above calculations,
(1/999) * (a1/a2) = 0.7/0.3, thus a1/a2 = 2300, and
(1/999) * (b1/b2) = 0.2/0.8, thus b1/b2 = 250.
And then (p/(1-p))*(a1/a2)*(b1/b2) = (1/999)*2300*250 = 580.

So, yeah, I guess that checks out, unless I did something really stupid. The point is that if the trait is very rare, then the tests have to be very precise to give such good predictive power.

But . . . you also said “the tests are not precise.” This seems to contradict your earlier statement that only “a very small proportion” have the trait. So I feel like your puzzle has an embedded contradiction!

I’m just giving you my solution straight, no editing, so you can see how I thought it through.

Rubinstein and Piccione confirmed that my solution, that the probability is very close to 1, is correct, and they pointed me to this research article where they share the answers that were given to this question when they posed it to a bunch of survey respondents.

I found the Rubinstein and Piccione article a bit frustrating because . . . they never just give the damn responses! The paper is very much in the “economics” style rather than the “statistics” style in that they’re very focused on the theory, whereas statisticians would start with the data. I’m not saying the economics perspective is wrong here—the experiment was motivated by theory, so it makes sense to compare results to theoretical predictions—I just found it difficult to read because there was never a simple plot of all the data.

My problem with their problem

But my main beef with their example is that I think it’s a trick question. On one hand, it says only “very small proportion” in the population have the trait; indeed, I needed that information to solve the problem. On the other hand, it says “the tests are not precise”—but I don’t think that’s right, at least not in the usual way we think about the precision of a test. With this problem description, they’re kinda giving people an Escher box and then asking what side is up!

To put it another way, if you start with “a very small proportion,” and then you take one test and it gets your probability all the way up to 70%, then, yeah, that’s a precise test! It takes a precise test to give you that much information, to take you from 0.001 to 0.7.

So here’s how I think the problem is misleading: The test is described as “not precise,” and then you see the numbers 0.7 and 0.2, so it’s natural to think that these tests do not provide much information. Actually, though, if you accept the other part of the problem (that only “a very small proportion” have the trait), the tests provide a lot of information. It seems strange to me to call a test which offers a likelihood ratio of 2300 as being “not precise.”

To put it another way: I think of the precision of a test as a function of the test’s properties alone, not of the base rate. If you have a precise test and then apply it to a population with a very low base rate, you can end up with a posterior probability of close to 50/50. That posterior probability depends on the test’s precision and also on the base rate.

I guess they could try out this problem on a new set of respondents, where instead of describing the tests as “not precise,” they describe them as “very precise,” and see what happens.

One more thing

On page 11 of their article, Rubinstein and Piccione given an example where different referees have independent data in their private signals, when trying to determine if a defendant is guilty of a crime. This does not seem plausible in the context of deciding whether a defendant is guilty. I think it would make more sense to say that they have overlapping information. This does not change the math of the problem—you can think of their overlapping information, along with the base rate, as being a shared “prior” and the non-overlapping information corresponds to the two data points in your earlier formulation—but that would make it more realistic.

I understand that this model is just based on the literature. I just have political problems with oversimplified models of politics, juries, etc. I’d recommend that the authors either use a different “cover story” or else emphasize that this is just a mathematical story not applicable to real juries. In their paper, they talk about “the assumption that people are Bayesian,” but I’m bothered by the assumption that different referees have independent data in their private signals. That’s a really strong assumption! It’s funny which assumptions people will question and which assumptions they will just accept as representing neutral statements of a problem.

A connection to statistical inference and computing

This problem connects to some of our recent work on the computational challenges of combining posterior distributions. The quick idea is that if theta is your unknown parameter (in this case, the presence or absence of the trait) and you want to combine posteriors p_k(theta|y_k) from independent data sources y_k, k=1,…,K, then you can multiply these posteriors but then you need to divide by the factor p(theta)^(k-1). Dividing by the prior to a power in this way will in general induce computational instability. Here is a short paper on the problem and here is a long paper. We’re still working on this.

This journal is commissioning a sequel to one of my smash hits. How much will they pay me for it? You can share negotiation strategies in the comments section.

I know it was a mistake to respond to this spam but I couldn’t resist . . . For the rest of my days, I will pay the price of being on the sucker list.

The following came in the junk mail the other day:

Dear Dr. Andrew Gelman,

My name is **, the editorial assistant of **. ** is a peer-reviewed, open access journal published by **.

I have had an opportunity to read your paper, “Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs”, and can find that your expertise fits within the scope of our journal quite well.
Therefore, you are cordially invited to submit new, unpublished manuscripts to **. If you do not have any at the moment, it is appreciated if you could keep our journal in mind for your future research outputs.

You may see the journal’s profile at ** and submit online. You may also e-mail submissions to **.

We are recruiting reviewers for the journal. If you are interested in becoming a reviewer, we welcome you to join us. Please find the application form and details at ** and e-mail the completed application form to **.

** is included in:
· CrossRef; EBSCOhost; EconPapers
· Gale’s Academic Databases
· GetInfo; Google Scholar; IDEAS
· J-Gate; Journal Directory
· JournalTOCs; LOCKSS
· MediaFinder®-Standard Periodical Directory
· RePEc; Sherpa/Romeo
· Standard Periodical Directory
· Ulrich’s; WorldCat
Areas include but are not limited to:
· Accounting;
· Economics
· Finance & Investment;
· General Management;
· Management Information Systems;
· Business Law;
· Global Business;
· Marketing Theory and Applications;
· General Business Research;
· Business & Economics Education;
· Production/Operations Management;
· Organizational Behavior & Theory;
· Strategic Management Policy;
· Labor Relations & Human Resource Management;
· Technology & Innovation;
· Public Responsibility and Ethics;
· Public Administration and Small Business Entrepreneurship.

Please feel free to share this information with your colleagues and associates.

Thank you.

Best Regards,

Editorial Assistant
Tel: ** ext.**
Fax: **
E-mail 1: **
E-mail 2: **
URL: **

Usually I just delete these things, but just the other day we had this discussion of some dude who was paid $100,000 to be the second author on a paper. Which made me wonder how much I could make as a sole author!

And this reminded me of this other guy who claimed that scientific citations are worth $100,000 each. A hundred grand seems like the basic unit of currency here.

So I sent a quick response:

Hi–how much will you pay me to write an article for your journal?

I’m not expecting $100,000 as their first offer—they’ll probably lowball me at first—but, hey, I can negotiate. They say the most important asset in negotiation is the willingness to say No, and I’m definitely willing to say No to these people!

Just a few hours later I received a reply! Here it is:

Dear Dr. Andrew Gelman,

Thanks for your email. We charge the Article Processing Charge (Formatting and Hosting) of 100USD for per article.

Welcome to submit your manuscript to our journal. If you have any questions, please feel free to contact me.

Best Regards,

Editorial Assistant
Tel: ** ext.**
Fax: **
E-mail 1: **
E-mail 2: **
URL: **

I don’t get it. They’re offering me negative $100? That makes no sense? What next, they’ll offer to take my (fully functional) fridge off my hands for a mere hundred bucks?? In what world am I supposed to pay them for the fruits of my labor?

So I responded:

No, I would only provide an article for you if you pay me. It would no make sense for me to pay you for my work.

No answer yet. If they do respond at some point, I’ll let you know. We’ll see what happens. If they offer me $100, I can come back with a counter-offer of $100,000, justifying it by the two links above. Then maybe they’ll say they can’t afford it, they’ll offer, say, $1000 . . . maybe we can converge around $10K. I’m not going to share the lowest value I’d accept—that’s something the negotiation books tell you never ever to do—but I’ll tell you right now, it’s a hell of a lot more than a hundred bucks.

P.S. That paper on higher-order polynomials that they scraped carefully vetted for suitability for their journal . . . according to Google Scholar it has 1501 citations, which implies a value of $150,100,000, according to the calculations referred to above. Now, sure, most of that value is probably due to Guido, my collaborator on that paper, but still . . . 150 million bucks! How hard could it be to squeeze out a few hundred thousand dollars for a sequel? It says online that Knives Out grossed $311.4 million, and Netflix paid $469 million for the rights for Knives Out 2 and 3. If this academic publisher doesn’t offer me a two-paper deal that’s at least in the mid four figures, my agent and I will be taking our talents to Netflix.

What can $100,000 get you nowadays? (Discussion of the market for papers in economics.)

Someone who would prefer to remain anonymous writes:

Before anything else and like many of the people who write to you, as an early career academic in the area of economics, I feel constrained in the criticisms that I can make publicly. So I have to kindly request that, if you do publish any of what follows, that identifying information about myself be not made public.

My correspondent continues:

The recent disclosed leaks at Uber revealed some, well, distressing behaviour by some of my peers. As the Guardian recently wrote (, several noted economists collaborated with Uber when writing some academic papers. In short, Uber shared data with a selected group of economists, paid them and had Uber economists collaborating with the authors.

The act of collaboration is not, in itself, necessarily a bad thing, as the potential access to proprietary data allows research to be done that otherwise would not be possible. But the way things were done raises several issues, many of which you have commented on before. I’d like to focus on one in particular: how do we deal with studies done with closed data shared by interested parties?

In the leaked emails, Uber staffers wrote that, concerning one economist who wanted to do a separate unpaid study using Uber’s shared data: “We see low risk here because we can work with Landier on framing the study and we also decide what data we share with him.” I.e., the issue here isn’t just of replication, already a serious concern, but the risk that a company may omit data so as to influence academics doing a study, so as to frame things in the best light possible. It is distressing to read that executives wanted to work on report’s message to “to ensure it’s not presented in a potentially negative light”.

Perhaps I’m being a bit too naive about all this, in that the obvious question when seeing a study like this is to ask why would Uber be ready to collaborate unless they are going to get what they wanted? Indeed, I recall being a little bit sceptical about Hall and Krueger’s initial NBER paper when I read it. But the excerpts produced by The Guardian are so much worse than I’d have feared, even if we don’t know the exact extent that Uber acted as these excerpts describe, it’s hard not to fear the worst. Where does that leave us with these studies? Should we dismiss them altogether or can we salvage some their analyses?

There is one small bright spot in all of this. Because I am not a Labour economist, I only ever read Hall and Krueger’s initial NBER paper, so missed Berg and Johnston’s later critique (, where the issue of inadequate data was already being raised, among others even stronger criticisms. So even before the emails, there were some people tackling these issues head on. And, to the credit of ILR Review, the journal that published Hall and Krueger’s paper, the critique was published by themselves, unlike what happens so often.

Anyway, that’s all. I guess I’m shocked at how much worse things seem to be, at how willing Uber was to manipulate and try to use well regarded academic’s reputations to, it seems, launder their own reputation…

Some googling revealed this exchange between financial journalist Felix Salmon and economists Michael Strain and Justin Wolfers:

This followup wins the thread:

Lots to chew on here, so let me go through some issues one at a time:

1. Conflicts of interest

It’s easy to get all “moralistic” about this. It’s also easy to get all “It’s easy to get all ‘moralistic’ about this” about this.

So let’s be clear: the conflict of interest here is real; indeed, it’s hard to get much more “conflict of interest” than “Company pays you $100,000 to write a paper, then you write a paper favorable to that company.” At the same time, there’s nothing necessarily morally wrong about having a conflict of interest. It is what it is. Every year I fill out my conflict of interest form with Columbia. “Conflict of interest” is a description, not a pejorative.

With regard to the Hall and Krueger paper, the dispute was no whether there was a conflict of interest but rather (a) whether the conflict was sufficiently disclosed, and (b) how this conflict should affect the trust that policymakers would hold in its conclusions.

I don’t have a strong feeling about the disclosure issue—Salmon holds that the statement, “Jonathan Hall was an employee and shareholder of Uber Technologies before, during, and after the writing of this paper. Krueger acknowledges working as a consultant for Uber in December 2014 and January 2015 when the initial draft of this paper was written,” is not “an adequate disclosure that Uber paid $100,000 for Krueger to write this paper.” I dunno. I’ve written lots of acknowledgments and I don’t recall ever mentioning the dollar value. It seems to me pretty clear that if you have one author who worked at the company and another author who was paid by the company, the conflict is there, whether it’s $1000 or $100,000.

Regarding trust: yeah, with this level of conflict, you’d want to see some data analyses by an outside team, like with that Google chip-layout paper.

News reports should be more clear on this. A headline, “Ride-hailing apps have created jobs for Paris’s poorer youth, but a regulatory clampdown looms,” should be replaced by something like “Uber-paid study reports that ride-hailing apps have created jobs for Paris’s poorer youth, but a regulatory clampdown looms.” Just add “Uber-paid study” to the beginning of every headline!

2. Morality

I don’t get this reaction:

I mean, sure, I don’t like Uber either. Some people think a company like Uber is cool, some people think it’s creepy. Views differ. But “sell their souls . . . destroy their lives . . . especially distressing”? That seems a bit strong. Consider: back in 2015, economists absolutely loved Uber, which is no surprise given that economists loved to talk about the problems with the market for taxicabs, the famous medallion system, etc. Economists hated taxi regulation about as much as they hated rent control and other blatant restrictions on free enterprise. Economists on the center-left, economists on the center-right, they all hated those regulations, so it’s no surprise that they loved Uber. The company was an economist’s dream, along with being super convenient for users.

3. Interesting data

I get Wolfers’s point that Krueger would find the Uber data interesting. I would have too! Indeed, had Uber offered me $100,000, or even $50,000, I probably would’ve worked for them too. I can’t be sure—they never approached me, and it’s possible that I would’ve said no—, but, if I had said no, it wouldn’t have been because their data were not sufficiently interesting. The point Wolfers seems to be missing here is that God is in every leaf of every tree. Yes, Uber data are interesting, but lots of other data are interesting too. Ford’s data are interesting, GM’s data are interesting, Bureau of Labor Statistics data are interesting. Lots of interesting data out there, and often people will choose what to look at based on who is paying them. I think one missing piece in the public discussions of this case is how much economists looooved Uber back then: it was a symbol of all that was good about the free market! So they found these data to be particularly interesting.

4. What can you get for $100,000?

A funny thing about the discussion is how little an amount of money $100,000 seems to be to commenters, and that includes people on both sides of the issue! Wolfers thinks that $100,000 is so small that it is “extremely unlikely” that Krueger would write a paper for that paltry sum. From the other direction, Dubal thinks it’s “pathetic” that he would “violate basic rules of research for just 100,000.”

I only met Krueger once, so I can’t speak to his motivations, but I will say that being the second author on a paper can sometimes be pretty easy, and $100,000 is real money! For example, suppose Krueger’s consulting rate was $2000/hour. He should be able to do the work required to be second author on a paper in less than 50 hours. The disclosure in that article says he was working as a consultant during the 2-month period when the initial draft of the paper was written. Spending 50 hours on a project during a 2-month period, that’s plausible. So I can’t really see why Wolfers thinks this is “extremely unlikely.”

There is an alternative story, though, consistent with what Wolfers hypothesizes, which is that Krueger would’ve coauthored the paper anyway but took the $100,000 because it was offered to him, and who wouldn’t take free money? I’m willing to believe that story, actually. This also works as a motivation of Uber: they offered free money to Krueger for something he would’ve done anyway, just to give him an excuse to clear his schedule to do the work. So, he didn’t coauthor a paper for $100,000; he coauthored a paper for free and then accepted the $ to motivate himself to do it. Meanwhile, from Uber’s perspective, the money is well spent, in the same way that the National Science Foundation is motivated to pay me to free up my time to do research that’s they think will be valuable to society.

Regarding Dubal’s comment: I don’t see what “basic rules of research” were being violated? Not sharing your data? If working with private data is violating a basic rule of research, fine, but then scientists are doing that for free every day. If you set your time machine back to 2015, and you consider Krueger as an economist who thinks that Uber is cool, then getting paid by them and working with their data, that’s even cooler, right? I imagine that for an economist in 2015, working with Uber is as cool as working with a pro sports team would be for a statistician. Getting paid makes it even cooler: then you’re an insider! Sure, Krueger was a big-name academic, he’d served at the highest levels of government, and according to Wolfers he was doing well enough that $100,000 meant nothing to him. Still, working with Uber, as an economist I’ll bet he thought that would be something special. Again, Uber in 2015 had a different aura than Uber today.

5. “Laundering” and the role of academia and the news media in a world that’s mostly run by business and government

What exactly was it about the Hall and Krueger paper that bothered people so much? I don’t think it was simply that these guys were working for Uber or that, given that Uber was paying them, they’d write a report with a pro-Uber spin. I think what is really bugging the critics is the sense that academia—in this case, then NBER (National Bureau of Economic Research), an influential club or organization of academic economists—was being used to launder this money.

If Hall and Krueger were to publish a book, The Case for Uber, published by Uber Press, arguing that Uber is great, then it’s hard to see the problem. These guys chose to take the job and they did it. But when published as an NBER preprint, and one of the authors is a respected academic, it seems different—even with the disclosure statement.

Again, it’s a problem with the news media too, to the extent that they reported this study in the way they’d report straight-up academic research, without the “Uber-funded study claims . . .” preamble to every headline and sentence describing the published findings.

This all kinda makes me think of another well-known economist, John Kenneth Galbraith, who wrote about “countervailing power.” Galbraith was talking about economic and political power, but something similar arises in the world of ideas. Government and business are the 800-pound gorillas, and we often like to think of academia and advocacy organizations as representing countervailing power. When industry or government inserts propaganda into academic channels, this is bothersome in the same way that “astroturf” lobbying seems wrong. It’s bad for its own sake—fooling people and potentially affecting policy through disinformation—and also bad in that it degrades the credibility of un-conflicted scientific research or genuine grassroots organizing.

In saying this, I recognize that there’s no pure stance in science just as there’s no such thing as pure grassroots politics: Scientists have prior beliefs and political attitudes, and they also have to be paid from somewhere, and the same goes for grassroots organizers. Those mimeograph machines don’t run themselves. So there’s a continuous range here. But getting $100,000 for two months of work to coauthor a paper, that’s near the extreme end of the range.

What I’m getting at here is that, while there is indignation aimed at Krueger here, I think what’s really going on is indignation at perceived manipulation of the system. One way to see this is that nobody seems particularly angry at the Uber executives or even at Hall, the first author of that paper. If it’s bad science, you should be mad at the people who promoted it and the person who did the work, no? I think there’s this attitude that the full-time Uber employees were just doing their jobs, whereas Krueger, who was just a consultant, was supposed to have had a higher loyalty to academia.

6. Politics

There’s one other thing I wanted to get to, which was Wolfers’s attitude that Krueger needed to be defended. (Again, nobody seemed to feel the need to defend Hall for his role in the project.)

One part of the defense was the silly claim that he wouldn’t have done it for the money, but I think underlying there were two implicit defenses:

First, conflict of interest sounds like a bad thing, Krueger was a good person, and therefore he couldn’t’ve had a conflict of interest. I don’t think this argument makes sense—I see conflict of interest not as an aspect of character but as an aspect of the situation. When I write about Columbia University or any organization that is paying me or my family, I have a conflict of interest. I can still try to be objective, but even if I have pure objectivity, the conflict of interest is still there. It’s inherent in the interaction, not in me.

Second, Krueger is a political liberal so therefore he couldn’t have issued a report unduly favorable to Uber, because liberals are skeptical of corporations. I don’t think this argument works either, first because, as noted above, back in 2015 economists of a wide range of political stripes considered Uber to be awesome, and second because Krueger, while political, was not known as a hack. He works with Uber, they tell him good things about the company, he coauthors a positive report.

I always wondered if something similar was going on when the economist James Heckman hypes early childhood intervention programs. Heckman is a political conservative, and one would expect him to be skeptical of utopian social spending programs. So when he and his collaborators found (or, to be precise, thought they found) strong evidence of huge effects of these programs, it was natural for him to think that his new stance was correct—after all, he came to it despite his political convictions.

But it doesn’t work that way. Yes, you can be biased to come out with a result that confirms your preconceptions. But when you come out with a result that rocks your world, that could be a mistake too.

7. Who to credit or blame

I agree with my correspondent, who focused the blame (or, depending on your perspective, the credit) for this episode on Uber management. The online discussion seemed to be all about the economist who consulted for Uber and was the second author of the paper, but really it seems that we should think of Uber, the organization, as the leader of this endeavor.

Full disclosure: I’ve been paid real money by lots of organizations that have done bad things, including pharmaceutical companies, tech companies, and the U.S. Department of Defense.

P.S. Interesting comment here from economist Peter Dorman.

P.P.S. More here.

“Predicting consumers’ choices in the age of the internet, AI, and almost perfect tracking: Some things change, the key challenges do not”

David Gal writes:

I wanted to share the attached paper on choice prediction that I recently co-authored with Itamar Simonson in case it’s of interest.

I think it’s somewhat related to your work on the Piranha Problem, in that it seems, in most cases, most of the explainable variation in people’s choices is accounted for by a few strong, stable tendencies (and these are often captured by variables that are relatively easy to identify).

I also wrote a brief commentary based on the article in Fortune.

And here’s the abstract to the paper:

Recent technology advances (e.g., tracking and “AI”) have led to claims and concerns regarding the ability of marketers to anticipate and predict consumer preferences with great accuracy. Here, we consider the predictive capabilities of both traditional techniques (e.g., conjoint analysis) and more recent tools (e.g., advanced machine learning methods) for predicting consumer choices. Our main conclusion is that for most of the more interesting consumer decisions, those that are “new” and non-habitual, prediction remains hard. In fact, in many cases, prediction has become harder due to the increasing influence of just-in-time information (user reviews, online recommendations, new options, etc.) at the point of decision that can neither be measured nor anticipated ex ante. Sophisticated methods and “big data” can in certain contexts improve predictions, but usually only slightly, and prediction remains very imprecise—so much so that it is often a waste of effort. We suggest marketers focus less on trying to predict consumer choices with great accuracy and more on how the information environment affects the choice of their products. We also discuss implcations for consumers and policymakers.

Sophisticated statistics is often a waste of effort . . . Oh no, that’s not a message that I want spread around. So please, everyone, keep quiet about this paper. Thanks!

Nimby/Yimby thread

Mark Palko and Joseph Delaney share their Nimby/Yimby thread:


Yes, YIMBYs can be worse than NIMBYs — the opening round of the West Coast Stat Views cage match


Yes, YIMBYs can be worse than NIMBYs Part II — Peeing in the River


The cage match goes wild [JAC]


Krugman then told how the ring of mountains almost kept the Challenger Expedition from finding the lost city of Los Angeles


Cage match continues on development [JAC]


Yes, YIMBYs can be worse than NIMBYs Part III — When an overly appealing narrative hooks up with fatally misaligned market forces, the results are always ugly.


Did the NIMBYs of San Francisco and Santa Monica improve the California housing crisis?


A primer for New Yorkers who want to explain California housing to Californians


A couple of curious things about Fresno


Does building where the prices are highest always reduce average commute times?


Housing costs [JAC]


Urbanism [JAC]


Either this is interesting or I’m doing something wrong

And a study we’ll want to come back to:

A spatiotemporal analysis of transit accessibility to low-wage jobs in Miami-Dade County


Tuesday, December 21, 2021

The NYT weighs in again on California housing and it goes even worse than expected

I’m no expert on this topic. I have Yimby sympathies—a few years ago I recall seeing some leaflets opposing the building of a new tower in the neighborhood, and I think I wrote a letter to our city councilmember saying they shouldn’t be swayed by the obstructionists—but I’m open to some of the arguments listed above. Palko and Delaney are pushing against conventional narratives that are often unthinkingly presented in the news media.

Don’t go back to Rockville: Possible scientific fraud in the traffic model for a highway project?

Ben Ross of the Maryland Transit Opportunities Coalition writes:

You may be interested in the attached letter I sent to the U.S. Dept. of Transportation yesterday, presenting evidence that suggests scientific fraud in the traffic model being used by the Maryland Dept. of Transportation to justify a major highway project in Maryland. We request that USDOT make an independent examination of the model and that it release the input and output data files to expert outside reviewers. (A request for the data files was already made, and the requesters were told that the was being handled under the state’s FOIA-equivalent law and no response could be expected until after the project gets its approval.)

Ross also points to this news article by Bruce DePuyt that gives some background on the story. It seems that the state wants to add some lanes to the Beltway.

I’ve not read the documents in any sort of detail so I won’t comment on the claim of fraud except to make a meta-point. Without making any comment whatsoever about this particular report but just speaking in general, I think that projections, cost-benefit analyses, etc. are often beyond truth or fraud, in that an organization will want to make a decision and then they’ll come up with an analysis to support that goal, kind of in the same way that a turn-the-crank style scientist will say, “We did a study to prove . . .” So, sure, the analysis might be completely bogus with made-up numbers, but it won’t feel like “fraud” to the people who wrote the report, because the numbers aren’t derived from anything beyond the goal of producing the desired result. Just like all those projects that end up costing 5x what was stated in the original budget plan: those budgets were never serious, they were just lowball estimates created with the goal of getting the project moving.

In any case, it seems good that people such as Ross are looking at these reports and pointing out potential problems, and these can be assessed by third parties. After all, you don’t want to waste another year.

Forking paths in the estimate of the value premium?

Mathias Hasler writes:

I have a working paper about the value premium and about seemingly innocuous decisions that are made in the research process. I wanted to share this working paper with you because I think that you may find it interesting and because your statistics blog kept me encouraged to work on it.

In the paper, I study whether seemingly innocuous decisions in the construction of the original value premium estimate (Fama and French, 1993) affect our inference on the underlying value premium. The results suggest that a large part of the original value premium estimate is the result of chance in these seemingly innocuous research decisions.

Here’s the abstract of the paper:

The construction of the original HML portfolio (Fama and French, 1993) includes six seemingly innocuous decisions that could easily have been replaced with alternatives that are just as reasonable. I propose such alternatives and construct HML portfolios. In sample, the average estimate of the value premium is dramatically smaller than the original estimate of the value premium. The difference is 0.09% per month and statistically significant. Out of sample, however, this difference is statistically indistinguishable from zero. The results suggest that the original value premium estimate is upward biased because of a chance result in the original research decisions.

I’m sympathetic to this claim for the usual reasons, but I know nothing about this topic of the value premium, nor have I tried to evaluate this particular paper, so you can make of it what you will. I’d be happier if it had a scatterplot somewhere.

Gaurav Sood’s review of the book Noise by Kahneman et al.: In writing about noise in human judgment, the authors didn’t wrestle with the problem of noise in behavioral-science research. But behavioral-science research is the product of human judgment.

Here it is. This should interest some of you. Gaurav makes a convincing case that:

1. The main topic of the book—capriciousness in human judgment—is important, it’s worth a book, and the authors (Kahneman, Sibony, and Sunstein) have an interesting take on it.

2. Their recommendations are based on a selective and uncritical review of an often weak literature, for example this description of a study which seems about the closest thing possible to a Brian Wansink paper without actually being by Brian Wansink:

“When calories are on the left, consumers receive that information first and evidently think ‘a lot of calories!’ or ‘not so many calories!’ before they see the item. Their initial positive or negative reaction greatly affects their choices. By contrast, when people see the food item first, they apparently think ‘delicious!’ or ‘not so great!’ before they see the calorie label. Here again, their initial reaction greatly affects their choices. This hypothesis is supported by the authors’ finding that for Hebrew speakers, who read right to left, the calorie label has a significantly larger impact..”

Kinda stunning that they could write this with a straight face, given all we’ve heard about the Cornell Food and Brand Lab, etc.

In writing about noise in human judgment, Kahneman, Sibony, and Sunstein didn’t wrestle with the problem of noise in behavioral-science research. But behavioral-science research is the product of human judgment.

Here are my comments on the book and its promotional material from last year. I was pretty frustrated with the authors’ apparent unfamiliarity with the literature on variation and noise in statistics and economics associated with very famous figures such as W. E. Deming and Fischer Black. In his review, Gaurav persuaded me that the authors of Noise were on to something interesting, which makes me even sadder that they plowed ahead without more reflection and care. Maybe in the future someone can follow up with an article or book on the topic with the virtues of point 1 above and without the defects of point 2.

Actually, maybe Gaurav can do this! A book’s a lot, but an article fleshing out point 1 in a positive way, without getting snowed by noisy evidence or bragging about “discovering a new continent,” actually linking the theme of noise in human judgment to the challenges of interpreting research results . . . this could be really useful. So I’m glad he took the trouble to read the book and write his review.

Krugman in the uncanny valley: A theory about east coast pundits and California

One of Palko’s pet peeves is East Coast media figures who don’t understand California. To be more specific, the problem is that they think they know California but they don’t, which puts them in a sort of uncanny San Fernando or Silicon Valley of the mind.

He quotes New York Times columnist Paul Krugman, who first writes about Silicon Valley and the Los Angeles entertainment complex and then continues:

California as a whole is suffering from gentrification. That is, it’s like a newly fashionable neighborhood where affluent newcomers are moving in and driving working-class families out. In a way, California is Brooklyn Heights writ large.

Yet it didn’t have to be this way. I sometimes run into Californians asserting that there’s no room for more housing — they point out that San Francisco is on a peninsula, Los Angeles ringed by mountains. But there’s plenty of scope for building up.

As Palko points out (but unfortunately nobody will see, because he has something like 100 readers, as compared to Krugman’s million), “SF is not part of Silicon Valley; it’s around fifty miles away” and “Neither the city nor the county of LA is ringed with mountains.” This is not to say that Krugman is wrong about making it possible for developers to build up—but this seems more of a generic issue of building apartments where people want to live, rather than restricting construction to available land that’s far away. As Palko’s co-blogger points out, rising housing prices are a problem even in a place like London, Ontario, “a mid-sized city with a mid-ranked university and a 9-10% unemployment rate” not ringed by mountains or anything like that. I’ve heard that rents in Paris are pretty high too, and that’s not ringed by mountains either.

Here’s my theory. When East Coast media figures write about Texas, say, or Minnesota or Oregon or even Pennsylvania, they know they’re starting from a position of relative ignorance and they’re careful to check with the local experts (which leads to the much-mocked trope of the interview with locals in an Ohio diner). And when they write about NYC or Washington D.C. or whatever suburb they grew up in . . . well, then they might have a biased view of the place but at least they know where everything is. But California is the worst of both worlds: they’re familiar enough with the place to write about it but without realizing the limitations of their understanding.

The point here is not that Krugman is the worst or anything like that, even just restricting to the New York Times. I’ve complained before about pundits not correcting major errors in their columns. Krugman’s a more relevant example for the present post because his columns are often informed by data, so it’s interesting when he gets the data wrong.

As we’ve discussed before, to get data wrong, two things need to happen:
1. You need to be misinformed.
2. You need to not realize you’re misinformed.
That’s the uncanny valley—where you think you know something but you don’t—and it’s a topic that interests me a lot because it seems to be where so many problems in science arise.

P.S. It was funny that Krugman picked Brooklyn Heights, of all places. He’s a baby boomer . . . maybe he had The Patty Duke Show in mind. The family on that show was white collar, in a Father Knows Best kind of way, but my vague impression is that white collar was the default on TV back then. Not that all or even most shows had white collar protagonists—I guess that from our current perspective the most famous shows from back then are Westerns, The Honeymooners, and Lucy, none of which fit the “white collar” label, but I still think of white-collar families as representing the norm. In any case, I guess the fictional Brooklyn Heights of The Patty Duke Show was filled with salt-of-the-earth working-class types who would’ve been played for laughs by character actors. Now these roles have been gentrified and everyone on TV is pretty. Actually I have no idea what everyone on TV looks like; I know about as much about that as NYT columnists know about California geography.

P.P.S. Unrelatedly—it just happened to appear today—Palko’s co-blogger Delaney does a good job at dismantling a bad argument from Elon Musk. In this case, Musk’s point is made in the form of a joke, but it’s still work exploring what’s wrong with what he said. Arguing against a joke is tricky so I think Delaney gets credit for doing this well.

Consequences are often intended. And, yes, religion questions in surveys are subject to social desirability bias.

Someone recommended the book, “Hit Makers: The Science of Popularity in an Age of Distraction,” by Derek Thompson. It had a lot of good things, many of which might be familiar with the readers of this blog but with some unexpected stuff too.

Here though, I want to mention a couple of things in the book that I disagreed with.

On page 265, Thompson writes:

Seems like a fair thumbnail description. But why call it an “unintentional” manslaughter? Many of the online advertising ventures were directly competing with newspapers, no? They knew what they were doing. This is not to say they were evil—business is business—but “unintentional” doesn’t seem quite right. This struck me because it reminded me of the “unintended consequences” formulation, which I think is overused, often applied to consequences that were actually intended. The idea of unintended consequences is just so appealing that it can be applied indiscriminately.

The other is on page 261, where Thomson writes that religion is an area where “researchers found no evidence of social desirability bias.” This in the context of a discussion of errors in opinion surveys.

But I’m pretty sure that Thompson’s completely wrong on this one. Religion is a famous example of social desirability bias in surveys: people say they attend church—this is something that’s socially desirable—at much higher rates than they actually do. And researchers have studied this! See here, for example. I could see how Thompson wouldn’t have necessarily heard of this research; what surprised me is that he made such a strong statement that there was no bias. I wonder what he was thinking?

Fun July 4th statistics story! Too many melons, not enough dogs.

Just in time before the holiday ends . . . A correspondent who wishes to remain anonymous points us to this:

Apparently this was written by our former ambassador to the United Nations. I googled and her bachelor’s degree was in accounting! They teach you how to take averages in accounting school, don’t they?? So I’m guessing this particular post was written by someone less well-educated, maybe a staffer with a political science degree or something like that.

But what really gets me is, who eats only 1 hot dog on July 4th? No burger, no chicken, just one hot dog?? Is this staffer on a diet, or what?? Also, get it going, dude! Throw in a burger and some chicken breasts and you can get that inflation rate over 100%, no?

Meanwhile this person’s eating an entire watermelon? The wiener/watermelon ratio at this BBQ is totally wack. I just hope these staffers are more careful with their fireworks tonight than they were with their shopping earlier today. What’re they gonna do with all those extra watermelons?