Movements in the prediction markets, and going beyond a black-box view of markets and prediction models

Posted on September 13, 2024 9:39 AM by Andrew

My Columbia econ colleague Rajiv Sethi writes:

The first (and possibly last) debate between the two major party nominees for president of the United States is in the books. . . . movements in prediction markets give us a glimpse of what might be on the horizon.

The figure below shows prices for the Harris contract on PredictIt and Polymarket over a twenty-four hour period that encompasses the debate, adjusted to allow for their interpretation as probabilities and to facilitate comparison with statistical models.

The two markets responded in very similar fashion to the debate—they moved in the same direction to roughly the same degree. One hour into the debate, the likelihood of a Harris victory had risen from 50 to 54 on PredictIt and from 47 to 50 on Polymarket. Prices fluctuated around these higher levels thereafter.

Statistical models such as those published by FiveThirtyEight, Silver Bulletin, and the Economist cannot respond to such events instantaneously—it will take several days for the effect of the debate (if any) to make itself felt in horse-race polls, and the models will respond when the polls do.

This relates to something we’ve discussed before, which is how one might consider improving a forecast such as ours at the Economist magazine so as to make use of available information that’s not in the fundamentals-based model and also hasn’t yet made its way into the polls. Such information includes debate performance, political endorsements, and other recent news items as well as potential ticking time bombs such as unpopular positions that are held by a candidate but of which the public is not yet fully aware.

Pointing to the above graph that shows the different prices in the different markets, Sethi continues:

While the markets responded to the debate in similar fashion, the disagreement between them regarding the election outcome has not narrowed. This rasies the question of how such disagreement can be sustained in the face of financial incentives. Couldn’t traders bet against Trump on Polymarket and against Harris on PredictIt, locking in a certain gain of about four percent over two months, or more than twenty-six percent at an annualized rate? And wouldn’t the pursuit of such arbitrage opportunities bring prices across markets into alignment?

There are several obstacles to executing such a strategy. PredictIt is restricted to verified residents of the US who fund accounts with cash, while trading on Polymarket is crypto-based and the exchange does not accept cash deposits from US residents. This leads to market segmentation and limits cross-market arbitrage. In addition, PredictIt has a limit of $850 on position size in any given contract, as well as a punishing fee structure.

This is all super interesting. So much of the discussion I’ve seen of prediction markets is flavored by pro- or anti-market ideology, and it’s refreshing to see these thoughts from Sethi, an economist who studies prediction markets and sees both good and bad things about them without blindly promoting or opposing them in an ideological way.

Sethi also discusses public forecasts that use the fundamentals and the polls:

While arbitrage places limits on the extent to which markets can disagree, there is no such constraint on statistical models. Here the disagreement is substantially greater—the probability of a Trump victory ranges from 45 percent on FiveThirtyEight to 49 percent on the Economist and 62 percent on Silver Bulletin.

Why the striking difference across models that use basically the same ingredients? One reason is a questionable “convention bounce adjustment” in the Silver Bulletin model, without which its disagreement with FiveThirtyEight would be negligible.

But there also seem to be some deep differences in the underlying correlation structure in these models that I find extremely puzzling. For example, according to the Silver Bulletin model, Trump is more likely to win New Hampshire (30 percent) than Harris is to win Arizona (23 percent). The other two models rank these two states very differently, with a Harris victory in Arizona being significantly more likely than a a Trump victory in New Hampshire. Convention bounce adjustments aside, the correlation structure across states in the Silver Bulletin model just doesn’t seem plausible to me.

I have a few thoughts here:

1. A rule of thumb that I calculated a few years ago in my post, Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?, is that a 10 percentage point share in win probability corresponds roughly to a four-tenths of a percentage point swing in expected vote share. So the 5 percentage point swings in those markets correspond to something like a two-tenths of a percentage point swing in opinion, which can crudely be thought of as being roughly equivalent to an implicit model where the ultimate effect of the debate is somewhere between zero and half a percentage point.

2. The rule of thumb gives us a way to roughly calibrate the difference in predictions of different forecasts. A difference between a Trump win probability of 50% in one forecast and 62% in another corresponds to a difference of half a percentage point in predicted national vote share. It doesn’t seem unreasonable for different forecasts to differ by half a percentage point in the vote, given all the judgment calls involved in what polls to include, how to adjust for different polling organizations, how to combine state and national polls, and how you set up the prior or fundamentals-based model.

3. Regarding correlations: I think that Nate Silver’s approach has both the strengths and weaknesses of a highly empirical, non-model-based approach. I’ve never seen a document that describes what he’s done (fair enough, we don’t have such a document for the Economist model either!); my impression based on what I’ve read is that he started with poll aggregation, then applies some sort of weighting, then has an uncertainty model based on uncertainty in state forecasts and uncertain demographic swings. I think that some of the counterintuitive behavior in the tails is coming from the demographically-driven uncertainties and also because, at least when he was working under the Fivethirtyeight banner, he wanted to have wide uncertainties in the national electoral college forecasts, and with the method he was using, the most direct way to do this was to give huge uncertainties for the individual states. The result was weird stuff like the prediction that, if Trump were to win New Jersey, that his probability of winning Alaska would go down. This makes no sense to anyone other than Nate because, if Trump were to have won in New Jersey, that would’ve represented a total collapse of the Democratic ticket, and it’s hard to see how that would’ve played out as a better chance for Biden in Alaska. The point here is not that Nate made a judgment call about New Jersey and Alaska; rather, a 50-state prediction model is a complicated thing. You build your model and fit it to available data, then you have to check its predictions every which way, and when you come across results that don’t make sense, you need to do some mix of calibrating your intuitions (maybe it is reasonable to suppose that Trump winning New Jersey would be paired with Biden winning Alaska?) and figuring out what went wrong with the model (I suspect some high-variance additive error terms that were not causing problems with the headline national forecast but had undesirable properties in the tail). You can figure some of this out by following up and looking at other aspects of the forecast, as I did in the linked post.

So, yeah, I wouldn’t take the correlations of Nate’s forecast that seriously. That said, I wouldn’t take the correlations of our Economist forecast too seriously either! We tried our best, but, again, many moving parts and lots of ways to go wrong. One thing I like about Rajiv’s post is that he’s willing to do the same critical work on the market-based forecasts, not just treating them as a black box.

Sports gambling addiction epidemic fueled by some combination of psychology, economics, and politics

Posted on September 8, 2024 9:56 AM by Andrew

We’ve written about this before, for example:

2012: There are four ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, and (4) keeping a gambling addict away from the casino

2022: Again on the problems with technology that makes it more convenient to gamble away your money

2023: There are five ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, (5) refusing to promote gambling to college students

Corbin Smith shares some new stories on the unfortunately topical subject of gambling addiction and how it relates to the financing and the sports media. In his article, Smith implicitly makes a strong case that to understand the problem you need to think about interactions between psychology, economics, and politics. The sports, news, and entertainment media are pushing gambling so hard. I guess that in the future we will look back on the present era and laugh/cringe in the same way that we laugh/cringe at the “Mad Men”-style drinking and smoking culture from the 1950s.

“Very interesting failed attempt at manipulation on Polymarket today”

Posted on September 6, 2024 10:31 PM by Andrew

Rajiv Sethi points to this thread and writes:

Very interesting failed attempt at manipulation on Polymarket today (would have been very profitable if successful).

I have to say, this sort of thing creeps me out. Recreational or business-hedging betting on elections doesn’t bother me, but this idea of manipulating sources of information . . . it seems wrong somehow. Not just wrong in the same way that it would wrong, and possibly illegal, to manipulate sports-betting odds or stock prices or whatever, but more wrong in the sense of interfering with democracy.

I expect many of you will disagree with me and say it’s just a funny story—and it is a funny story—and I can’t offer strong arguments in favor of my reaction to this one, but here it is.

P.S. More here from Rajiv, with lots of details and the conclusion:

Derivative contracts of this kind continue to be listed on Polymarket. It would be a good thing if they were discontinued.

If you want to play women’s tennis at the top level, there’s a huge benefit to being __. Not just , but exceptionally _, outlier-outlier ___. (And what we can learn about social science from this stylized fact.)

Posted on September 6, 2024 9:38 AM by Andrew

If you want to play basketball at the top level, there’s a huge benefit to being tall. Not just tall, but exceptionally tall, outlier-outlier tall. If you’re an American and at least 7 feet tall and the right age, it’s said that there’s a 1-in-7 chance you’ll play in the NBA (but maybe that’s an overestimate; we’re still looking into that one).

Here’s another one for you. If you want to play women’s tennis at the top level, there’s a huge benefit to being ____. Not just ____, but exceptionally ___, outlier-outlier ___.

Take a guess and continue:
Continue reading →

Heroes and Villains: The Effects of Identification Strategies on Strong Causal Claims in France

Posted on August 24, 2024 9:33 AM by Andrew

This is an interesting one. It’s a polite but spirited debate between some historians and economists regarding a claim about early twentieth-century French history. This would seem to be an area only of interest to specialists, but the topic is support for fascist and fascist-adjacent parties, which unfortunately is a major concern of our times, and not just in France. For a quick background on the history you could take a look at this from journalist Geoffrey Wheatcroft.

Ultimately, this scholarly debate does not tell us much about support for fascism—as is typical in social science, the research only captures some small part of the story—so my treatment of the controversy will focus more on statistical issues than political interpretations.

But here’s the basic summary. An article was published last year in an economics journal, concluding that “home municipalities of French line regiments arbitrarily rotated under Philippe Pétain’s generalship through the heroic World War I battlefield of Verdun diverge politically thereafter . . . under Pétain’s collaborationist Vichy regime (1940–1944), they raise 7 percent more active Nazi collaborators per capita.” Some historians pointed out problems with the data and methods in that article. The economists then replied, arguing that their data and methods were just fine, and not yielding an inch except on one small error that was introduced be an editor in a shorter, popular version of their article that they’d published in a magazine.

I’ll describe the debate and then discuss what seem to me to be the key issues.

The debate

Thomas Blanchet writes:

I wanted to bring to your attention a controversy regarding a paper recently published in the American Economic Review: “Heroes and Villains: The Effects of Heroism on Autocratic Values and Nazi Collaboration in France.” While the paper is obviously in English, the controversy has been happening in French, so academics outside of France may be unaware of it.

The article pretends to find a causal link between serving under the General Pétain in WWI during one of its biggest battle (Verdun), and subsequent collaboration with the Nazi regime in WWII, under France’s collaborationist regime, which was headed by Pétain as well.

When the working paper came out, historians started to take it apart.
– They wrote a first takedown.
– The authors replied
– And the critics replied again.
I’m linking to the Google translations, which are actually quite OK.

As is often the case in these situations, you have to filter out a certain amount of skepticism for quantitative methods coming from the historians. But they make pretty damning points.

The treatment variable is the random assignment of soldiers from given towns to the various regiments in 1914. But it turns out that by the time the Verdun battle happened (in 1916), soldiers had been severely shuffled around:

Since the authors claim to attach a municipality to a regiment, despite the subsequent mixing, we must conclude that they assume that a significant part of the members of the regiment continued to come from the municipalities which originally depended on the recruiting office. Let’s take at random the example of Châteauroux, headquarters of the 90th infantry regiment, and let’s look at the deaths for France between August 1914 and December 1916, among the infantrymen of the 1914 class who passed through the commune’s recruitment office. Only 10% of them were at the time of their death within the 90th regiment : a proportion comparable to those who were part of the 13th (based in Nevers), the 79th (Nancy, Neufchâteau), the 85th (Cosne), and the 95th (Bourges). Conversely, let’s look at the deaths for France between August 1914 and December 1916, among the infantrymen of the 1914 class within the 90th regiment : only 10% were recruited in Châteauroux, that is to say less than in Limoges or in Guéret.

Then, for their outcome variable (share of people who collaborated with the Nazi regime), they use data from a list of collaborators that shows all the signs of being pretty janky:

This is a file in no way “declassified”, and not placed in an archive. Little is known about this list, other than that it was in the possession of Colonel Paillole, a former Giraudist soldier who was a member of the secret services of Free France until November 1944. The authors speak of a “collected” list in 1944-1945 under the supervision of Paul Paillole » : a double error, therefore, since he left his functions at the end of 1944, and there is no indication that he was at the origin of this document. They then write that the file would include “the names of all the members of the French Popular Party (PPF), which are now part of our data”: a new error, since 9,403 names are attributed to the PPF, while it includes, according to estimates, between 40,000 and 50,000 members. It is equally false to write that the list “captures the entire spectrum of collaboration, from economic collaboration to membership in collaborationist parties or paramilitary groups”: according to the author’s admission, although not very rigorous, having makes this list known, economic collaboration has a negligible place in it. He spoke, for this document, of a “list made up of odds and ends, with a dubious restitution in its form, as if it had been repeatedly retouched, possibly redacted, or lengthened” : it is difficult to see there a solid basis for quantification.

So both the treatment and the outcome seem pretty questionable.

The authors had a reply. I didn’t find it very convincing, and it contains this perfect encapsulation of the “what doesn’t kill my statistical significance makes it stronger” mentality:

The fact that in 1915, the infantry regiments broke away from their local roots at the start of the war to incorporate troops from several departments, shows the strength and robustness of the statistical relationship that we put forward in our analysis. In statistical terms, the fact that these regiments, which were originally anchored locally, were subsequently mixed, leads us to underestimate the real effect of the rotation in Verdun on collaboration.

It is a bit unsettling that the paper got published in the AER in spite of all that. The criticism was out before the paper was accepted. (Obviously the criticism having been done in French by historians didn’t help.)

I found that story to be a nice case of very questionable data analysis making its way to a top journal, and I felt it would be interesting to share it outside of the French historians’ bubble.

The issues

It’s hard for me to adjudicate this one, as it involves a lot of specialized knowledge, and the people on both sides of this debate know a lot more about this bit of history than I do. So I’ll just try to lay out the key issues in contention:

1. Where were the soldiers from?

From the published article:

On August 2, 1914, France ordered the general mobilization of every man between 20 and 48 years of age: 92.76 percent of 1914 France’s municipalities sent troops that served in one of the 153 line regiments that were rotated through the Battle of Verdun, and 56.86 percent of all French municipalities did so in one of the 92 regiments rotated through under Pétain’s direct command. . . . We consider a regiment to form part of the exogenous heroic network linked to Pétain if it happened to rotate through Verdun under his direct command (between February 26 and May 1), as opposed to those that were rotated between May and December, under other generals. . . .

Here i is municipality (there are 35000 of these), b is the military recruitment bureau (there are 158 of these), and e is the electoral district (I couldn’t figure out how many of these were). Y is “the intensity of collaboration, measured as the logarithm of the share of collaborators listed in 1944/1945 as being from municipality i, normalized by the population,” and beta is the coefficient of interest.

These are the key predictors:

And here’s their key result:

Verdun-under-Pétain municipalities would later raise 7–9 percent more collaborators per capita compared to otherwise similar municipalities within the same department.

They also fit their model to electoral outcomes, finding:

We show that compared to other municipalities that served at Verdun in the same department, vote shares in Verdun-under-Pétain municipalities—though very similar before World War I—diverge thereafter, and do so in manner that reflects Pétain’s own evolving views. This includes displaying 11.1 percent lower vote shares for the left as early as 1919, voting more for the right and, later, the extreme right as well.

Further these patterns culminate in the last legislative elections of the Third Republic in 1936. Between the two rounds of the legislative election, Pétain gave a highly publicized front-page interview two days before the second round in an attempt to prevent the electoral victory of the left-wing Popular Front. In the first round, we show Verdun-under-Pétain municipalities display a 7.7 percent higher vote for the right, including 2.6 percent for the extreme-right blueshirts of the Francisme Party. Further, despite the fact that the two rounds of the elections were just one week apart, we show there is a dramatic 7 percentage point left-to-right swing between parties participating in the second round just after Pétain’s speech.

Relevant to their causal identification is this map from the published paper:

Also this:

Consistent with the arbitrary nature of the regimental rotation system, we show that municipalities that raised regiments that served at Verdun under Pétain’s direct command (henceforth “Verdun-under-Pétain municipalities”) are very similar along a broad range of pre-World War I characteristics to others. Most importantly, we hand collected novel voting data at the highly granular level of France’s (then) 34,947 municipalities to show that this includes similar vote shares for each political party in the last prewar election in 1914.

I’m not sure what to think. On one hand, there seem to have been differences between these two groups of municipalities in their political trajectories between 1914 and 1944. On the other hand, the chunks in that above map are pretty big; they don’t look like anything close to 35000 or even 158 independent data points. It would be good to see some before/after scatterplots with one dot per chunk, to get a sense of what is going on here. Maybe also scatterplots with one dot per military recruitment bureau or something else that’s more aggregated than municipality. Actually, scatterplots of municipalities could be helpful too, or maybe not, as the data might be too noisy for us to see anything.

It’s the usual story with correlation in observational comparisons: you want to look at the data in many different ways to see what you’re comparing.

2. The list of collaborators

Regression coefficients and averages can be hard to interpret. For example, the published article says that the fitted model “implies that Verdun-under-Pétain municipalities have 0.598 additional collaborators, on average, compared to Verdun-not-Pétain municipalities”; this is “with respect to a mean number of 2.42 collaborators in a municipality.” Setting aside the hyper-precision—given the uncertainty in this process, it would be more appropriate to replace that “0.598” by “0.6”—there would at first seem to be a concern about the scale of the result. An increase of 0.6 collaborators in a town isn’t very much! The point is that the percentage increase is large: presumably there were many more than 2.42 collaborators in these towns, and the available data represent only a very small proportion of the actual value. That’s fine—but then the comparison in the paper is not necessarily revealing any difference in the number of collaborators in these different towns; it could just as well represent a difference in probability of inclusion in the dataset. From that perspective, the details regarding this list could be very relevant to our interpretation of this result.

3. Misclassification

Does it matter that, by 1916, some large proportion of the solders on the front didn’t come from the regions assigned to them in the dataset? The critics say yes, this is a big problem of measurement; the original authors say no, they are getting a sort of reduced-form or intent-to-treat estimate which should still be directionally correct.

The authors write, “In statistical terms, the fact that these regiments, which were originally anchored locally, were subsequently mixed, leads us to underestimate the real effect of the rotation in Verdun on collaboration”—which should be true in expectation if the mixing error is independent of the outcome—but is also missing a couple parts of the story. The first problem here is that misclassification of the predictor doesn’t just decrease the underlying effect size; it also decreases the effect size relative to uncertainty. That is, the signal becomes weaker relative to noise, which leads to higher type M (magnitude) and S (sign) errors. Second, when the main signal becomes smaller, it is more likely to get overwhelmed by other effects. This returns us to the point that the study is observational, a comparison of later political behavior in different regions of France. There could be all sorts of differences between the regions labeled as treatment and control. From that perspective, the analysis is leaning heavily on the finding of no difference in voting patterns before 1916.

Summary

That’s what I’ve got. It’s an observational study. I wouldn’t be inclined to take the collaborators data so seriously, as the numbers just seem too small and thus the results would be sensitive to possible systematic errors. For the electoral analysis, the question is how do the two groups of regions differ, and what else was going on between 1916 and 1940 in these different regions.

P.S. More here from David Weakliem. It seems that the statistical evidence isn’t so clear; as Weakliem puts it, “here is only weak evidence, at best, that service under Pétain increased the number of collaborators.” Given that the data on collaborators was so sparse—that is, the number of collaborators in the data was such a small fraction of the number of collaborators in these towns at the time—I’m also concerned that nonuniformity in data collection would overwhelm any patterns.

The River, the Village, and the Fort: Nate Silver’s new book, “On the Edge”

Posted on August 13, 2024 2:17 PM by Andrew

Uncertainty is unpleasant and we work hard to talk it away

As a statistician, I make use of the mathematics of probability and variation every day. Outside of my work, though, I am very uncomfortable with uncertainty, whether in the personal or political realms. And I think most people feel the same way. Even in areas such as scientific research or economic and political forecasting where uncertainty is inherent (if you knew what would happen in science, you wouldn’t have to do research; if there was no future uncertainty, there’s be no need to make a forecast), a lot of effort gets put effort into avoiding or denying uncertainty, a “premature collapsing of the wave function,” to use an analogy from quantum physics. When a “statistically significant” result in an experiment is reported as a discovery, or when a non-statistically-significant difference is reported as a null result, this is a denial of uncertainty.

Collapsing of uncertainty reduces mental tension: it’s work to hold two conflicting ideas in your head at once, and a relief to be able to choose just one—especially if you are persuaded that this choice is justified by science. Hence the appeal of making strong statements, which also can get you some attention and respect if stated with enough of an air of authority.

But then if your predictions get tested against reality, things can go wrong. Embarrassments arose when prominent economists predicted a recession in 2023 which then never happened, and in the 2016 election campaign when some political journalists gave Hillary Clinton a near-certain chance of winning in the Electoral College, based on her narrow but consistent polling lead, not accounting for the possibility of systematic correlated survey errors. Discomfort with uncertainty is all too human, and it also comes with social costs.

The replication crisis in science is a product of systematic discomfort with uncertainty, with speculative results presented as settled fact, leading to distress when these findings do not hold up in later experiments: a scientific and emotional boom-and-bust cycle. Indeed, classical statistical methods seem almost designed to create this boom-and-bust behavior, when non-statistically-significant results are treated as if they were zero and statistically-significant results are taken at face value. I prefer a Bayesian approach in which estimates are partially pooled toward zero. There are also non-Bayesian statistical methods that do this sort of regularization. Whatever your statistical methods and philosophy, I recommend studying the process, not just the particular dataset. This is called the reference set in frequentist statistics or averaging over the prior in Bayesian statistics. In poker terms, the principle is to evaluate the strategy, not the play.

Those unusual people who thrive on uncertainty

In his new book, “On the Edge: The Art of Risking Everything,” Nate Silver explores the world of some people—notably, poker players and tech investors—who do not just accept uncertainty but thrive on it. Silver and these others thrive on uncertainty, directly because they can separate their wishes and fears from their assessments of risks and returns, and indirectly because everyone else’s irrationality leaves a lot of potential gains on the table. The messier and more complicated the game, the bigger advantage it is to have a cool head. I’m reminded of a passage from Frank Wallace’s classic “Poker: A guaranteed income for life by using the advanced concepts of poker” (which is pretty much devoid of any intentional literary merit but on its own terms is a kind of outsider-art masterpiece), where he advices the would-be sharpie to encourage games with lots of wild cards and unusual twists, because the more uncertainty, the more the suckers will get confused. Unlike Wallace, Silver is writing his book with altruistic goals: rather than offering readers tips on how to fleece the rubes, he wants to help them understand the world.

Silver characterizes as living in a conceptual country he calls “the river,” by analogy to riverboat gamblers, and, along with many interesting stories, he shares “Thirteen habits of highly successful risk takers.” The concept of “risk taker” is subtle, though: as recommended by the Kelly criterion of gambling, the optimal level to risk in a bet is proportional to your total financial resources—a point which is not invalidated by legendary gamblers who’ve gone to zero and picked themselves back up, as what is relevant here is lifetime resources, not just the stack you’re sitting behind at the poker table. The point is that being comfortable with risk is not the same as betting wildly. For that reason, I think the book’s title is misleading. A successful gambling strategy is not to live “on the edge” or to “risk everything” but rather to be aware of where the edges and risks are and act accordingly.

Vibes

One fun aspect of the book is the stories about some interesting people I never otherwise would’ve heard about. I was bored stiff with the retelling of the overexposed Sam Bankman-Fried—I don’t really need to read an analysis of four different theories of that pretentious plutocrat—but that stuff is easy for a reader to skip. Only in Nate’s book would we get a fun mini-anecdote about “a hand played by the legendary poker player Tom Dwan on the Hustler livestream, where he’d correctly called a bluff against an opponent named Wesley in a record-breaking $3.1 million pot. Wesley’s behavior had been similar to Friedrich’s: a quick all-in followed by a turtle shell in an extremely high-pressure moment where it might be hard to conceal your emotional state.” And Nate buried that one in a footnote! I love this sort of thing, which gives me the feeling that he was pouring everything he had into the book, not holding back.

Here’s another throwaway: “There is almost never such a thing as a sure thing. Even when you’re literally cheating, it may not be a sure thing. In the infamous Boston College basketball point-shaving scandal of 1978-79, when two teammates were paid by the mob to throw games, only four of the nine games the mob bet on won money, with three losses and two pushes.” Boston College . . . the mob . . . hapless crooks . . . this is prime George V. Higgins territory!

The flip side of Nate not holding back is that sometimes he writes things that seem like clichés. Here he is talking about an artist who was “in the right place at the right time” and made a ton of money from non-fungible tokens: “When I spoke with Winkelmann—a.k.a. Beeple—I was expecting someone with the self-important air of being a serious artist or at least someone whose success had gone to his head. Instead Beeple was extremely down-to-earth, dropping f-bombs about once every fifteen seconds in a thick Wisconsin accent.”

Hey, wait a minute! The macho regular-guy artist . . . that’s standard operating practice in the art world. You’ve heard of Jackson Pollock, right? Who was from Cody, Wyoming—that’s even more earthy than being from Wisconsin. Silver’s quote reminds me uncomfortably of the story from Freakonomics of an unnamed “academic” who says something stupid, only to be shot down by regular-guy “Chuck Esposito, a genial, quick-witted and thoroughly sports-fixated man who runs the race and sports book at Caesars Palace in Las Vegas,” which in turn reminded of the punchline of that joke from grade school: “Hey, man, the smartest guy in the world just jumped out of the plane wearing my backpack.”

I’m not saying that Beeple isn’t a smart, down-to-earth guy; I’m just resisting Nate’s dressing him in rogue’s clothing. In some ways, this sort of thing is almost necessary: Nate’s giving us a tour of a world that he loves, he’s writing about to his friends, and so of course he wants to present them in a positive light. If Nate were to describe Beeple as, for example, “the typical self-important ‘serious artist’ who signals his regular-guy status by cultivating a thick Wisconsin accent and carefully dropping f-bombs into his conversation,” well, what would be the point of that? It’s kind of like sportswriting: with rare exceptions, we like the athletes to be presented in a positive light. Similarly, we are introduced to “Will MacAskill, a boyish-looking Oxford professor of philosophy with an endearing Scottish lilt.” I have a horrible feeling that if I’m not careful I’d be described as a “shifty-eyed academic who speaks in a nasal east-coast suburban accent,” rather than, say, a “charming gray-haired statistician with many of the attributes of the absent-minded professor.”

Attitudes toward risk

To be fair, though, the above sort of descriptions work with one of the themes of the book: Nate’s take on Silicon Valley is that “to make all of this work, you need a symbiosis between two often-clashing personality types . . . risk-tolerant VCs [investors] . . . and “risk-ignorant founders.” This is an interesting idea that goes well beyond business-book platitudes of risk-taking being good in itself—one of my pet peeves is when funders say they want to support “high risk, high return projects,” but then in the proposal they never want you to talk about the risky part, and I don’t think they really want high-risk, they just think it sounds good—instead distinguishing between different attitudes toward risk: there are normies who don’t want risk at all and will pay a high price to avoid it; there are savvy forecasters who try to accurately assess risks and then make optimal decisions; and there are the obsessives who try things nobody else would do. The savvies and the obsessives are similar in their willingness to jump into the unknown—and, according to Nate, they get along pretty well, considering each other as fellow citizens of the River—but they thrive on risk in different ways.

I think all of us have all three of these attitudes in different aspects in our lives. At a personal level, I’m a risk-fearing normie: I treasure my secure job and secure family life, and I don’t borrow money. When doing statistics, I’m a savvy forecaster; assessing uncertainty for inferences and decision making is central to my theoretical and applied work. In my career, I’m an obsessive: I wrote Bayesian Data Analysis back when people were telling me that it wasn’t a savvy move, and more generally I throw myself into what I want to work on, and I try to persuade others to join me. There are other obsessives whose careers have not gone so well, which fits Nate’s typology that risk-ignorant founders can create social good even if they’re not making personally optimal decisions. The other point here is that these three often-clashing personality types don’t just power Silicon Valley; in some mix or another they power all of us. As memorably dramatized in the Inside Out movies, each of us is a sort of committee, and Nate’s portrayal of different people’s attitudes toward risk can perhaps give us a better understanding of our own incoherent selves and how this incoherence can be a strength.

Bringing this closer to home, this all reminds me of statistical workflow, how we bounce back and forth between scientific hypothesizing, design of measurement, data gathering, statistical modeling, and comparison and checking of models. When forming hypotheses and constructing models, it helps to have a damn-the-torpedoes “founder” attitude: don’t think about the risks, just do it. When designing an experiment and assessing model fit, it’s good to be risk-aware—not risk-averse, but clear-eyed about uncertainty. Tons and tons of unreplicable science has occurred from thoughtlessly-optimistic designs and analyses. By analogy to Nate’s description of Silicon Valley, empirical research needs the obsessive attitude for modeling and the savvy attitude for design and analysis. I’m not sure where the normie, risk-averse attitude comes in, but given its omnipresence in human behavior, I expect it serves some useful psychological purpose in keeping us sane, if for no other reason than that denial of uncertainty frees up our brain cycles to think about other things.

Ethics

A theme running through the book is the ethical question. If you’re a River dweller and can thrive under uncertainty, what do you do with this ability? I would say that Nate employs it for social good: for over a decade he’s supplied trustworthy forecasts for elections and sports, he’s participated in public debates, and he’s published two books which allow ordinary readers inside some important communities. Nate’s also made some money along the way, which is fine: we live in a capitalist society, and someone has to pay the bills for all these public goods. Some of the people promoted in the book though, I’m not so sure about. For example, Nate interviews Gary Loveman, an executive at Caesars who is famous for implementing schemes to extract fortunes from gambling addicts. Now, you could argue that Loveman just plays a role in society—if he weren’t destroying these people’s lives, it would be someone else—but that would contradict the other part of Loveman’s story, which is how innovative he has been. If you want to give credit to Loveman for being such a trailblazer, then he deserves the blame for the consequences of his actions, no? I don’t think Nate disagrees with me in general terms—he writes that when playing slot machines in Vegas, “one thing is certain: in the long run you’re going to lose,” and he writes that “there’s something about slot machines where the transaction between casinos and their customers feels fundamentally unfair.” This just didn’t come up when he’s interviewing Loveman.

Again, I’m not pure myself—just look at the list of sponsors of our research and I expect that, whatever your political persuasion, you’ll find two or three that you’ll object to—so think of the above paragraph not as a moral objection but as an intellectual objection. The River includes scam-infested waters, from Vegas casinos to electricity-guzzling cryptocurrencies. Arguably, scams are as an important aspect of the River as corruption is to the Village, and to not fully explore this is a missed opportunity, in the same way that it would be a missed opportunity to write a book about applied social science without checking in on the various prominent examples of fraud and unreplicable research that have been enabled by the culture of the scientific establishment.

Politics

Nate’s book is mostly about people who thrive on risk, and, as discussed above, I found lots to think about. He’s got good stories, interesting ideas, a personal connection, and a sense of history and the interconnectedness of society. The ranking of poker hands on page 42 was familiar to me, but even there I learned something—apparently, poker-heads write a 10 as T, for example, illustrating a full house with T♣T♠T♦8♦8♠. I’m guessing that this is standard notation now, and I’m wondering if it has something to do with modern poker thinking in which a 10 is just about as good as a face card? Or it may just have something to do with computerization: with so much of poker being online, it’s convenient to always use a single character to indicate the denomination.

Nate situates his ideas about risk into a theory of politics: as he puts it, successful governmental policies have “encouraged calculated risk-taking.” This makes sense, and it’s something that policy-makers have thought a lot about, not always with much agreement, as can be seen from the continuing debates about international development during the past several decades. There are no easy answers here, and I respect Nate for bringing up political issues rather than just treating risk-taking as a matter of individual behavior.

There’s one part of Nate’s book that puzzles me, though. He focuses on the River, which he characterizes as “an ecosystem of people and ideas” who “speak one language with terms such as expected value, Nash equilibrium, and Bayesian priors.” I’m pretty sure that I live on the River! I enjoy poker (OK, I haven’t played it recently, but still), I “have close ties to the tech sector,” and I’m one of those people with an “appetite for involving themselves in all sorts of problems and controversies.” Indeed, I think my River-dwellingness makes me a good reviewer for this book.

Nate contrasts the River with the Village: “people who work in government, in much of the media, and in parts of academia (although perhaps excluding some of the more quantitative academic fields such as economics). It has distinctly left-of-center politics associated with the Democratic Party.” Despite working in the news media himself, Nate writes, “I’ve never quite taken to the Village,” and I kind of know what he means, as I feel that way about academia, which can feel cringey in a way that isn’t quite what you see on the River.

OK, so here’s my problem. If the River is risk-takers who are skeptical about politics, with views ranging from the far left to the right and everywhere in between, and if the Village is normies who are center-left and who have some connection to political power or at least would like to wield it, then . . . where’s everybody else?

There are two kinds of “everybody else” I’m thinking about here.

First, there are all the people in the country and around the world who aren’t well connected. They’re not professional gamblers or hedge fund investors or tech developers or election-forecasting statistical analysts (hi!), nor are they politicians, journalists, Ivy League professors of cultural studies, or even bloggers. They’re just regular working stiffs. I’m not saying Nate should write a book about these people; it just seems important to recognize that the River people and the Village people he describes are two small tribes within a large society. To put it in crude terms, the Riverians want to make money from the masses and the Villagers want to boss the masses around. That’s a big area of disagreement (notwithstanding that many River dwellers want to boss the masses around and many Village residents would like to make some money); the key is that the battle between the River and the Village is taking place among the larger population and the larger economy.

The other group of missing people are . . . approximately half the people who run the country. Where’s Ted Cruz, for example? He doesn’t seem like a gambler, and he spends much of his time trying to tell people what to do. He works in that small village of Washington, D.C. But he doesn’t have “distinctly left-of-center politics”! And then there are the six Republican judges on the supreme court: not river-dwelling gamblers in any sense of the phrase, these judges have the most risk-averse career paths imaginable—but they’re not in Nate’s Village, as he has defined it. Now, you might say that individual Villager can lean right even as the Village as a whole has distinctly left-of-center politics, but that doesn’t really work given that the Republicans control the judiciary, half of congress, and a few years ago the executive branch of government as well. I think the only possible solutions here are: (a) redefine the Village as being divided between the two parties, or (b) define the Village as having left-of-center politics and then introduce a new entity—the Fort?—that has all the normie, risk-averse characteristics of the Village but on the political right. Ted Cruz and Samuel Alito are comfortably situated within the Fort.

I kind of like this River/Village/Fort thing. The Village is fighting against the Fort, and the River has to figure out what to do about it. Riverians are split: those on the left are in favor of growth in an environment of political equality, and they support higher taxes on the rich, government action on climate change, abortion rights, and other aspects of the Democratic platform; while those on the right are in favor of economic freedom and social stability, and they support lower taxes and spending, less business regulation, restrictions on abortion, and other Republican policies. People of all political persuasions can hate the River because they can find a group of people on the River who annoy them. If you’re on the right, you can hate smug successful Riverians who want to kill the capitalist golden goose that created the conditions for their success; if you’re on the left, you can hate greedy Riverians for whom all the world’s riches aren’t enough.

Who would be the Riverians’ ideal politician? There’s the Libertarian Party but they never get many votes, even in the seemingly ideal year for them of 2016. Considering major-party figures, I’d have to say the closest to a Riverian would be . . . Mitt Romney. He’s a risk-taker, he invested in businesses, he was a cross-party success as governor of Massachusetts, and his notorious “47%” statement (“There are 47 percent of the people who will vote for the president no matter what. . . . are dependent upon government . . . believe that they are victims . . . believe the government has a responsibility to care for them . . . these are people who pay no income tax”) is a hard-headed statement that sounds very River: it might not be how we want things to be, but it’s how things are.

I don’t have the impression that Romney had anything close to universal River support, however, and I suspect the reason is that in 2008 the River went all-in (as Nate would say) on Barack Obama, who was Romney’s opponent in 2012. Obama, as a liberal law professor and Democratic politician, is Village through and through, but he has various Riverian vibes: he’s a basketball fan, he’s a technocrat, and you get the impression that, if he doesn’t himself talk about “expected value, Nash equilibrium, and Bayesian priors,” that he’s friends with some people who do. Obama shares some characteristics with tech investors (who can also talk the talk of expected value and Bayes, even if they’re not actually crunching numbers themselves)—he’s even friends with Richard Branson!, and you can see how Riverian economics professor Steven Levitt and occasional promoter of climate-change denial wrote that in 2008 he thought Obama “would be the greatest president in history.” By 2012, much of that bloom was off the rose, but perhaps the Riverians of the center and right didn’t realize what they had in Romney. I think they should be happy with Romney’s religiosity as well. Nate writes that the River is a non-religious place, but my impression from reading various River-adjacent takes on evolutionary psychology is that, whatever they their own religious beliefs, many Riverites think of religion as a good thing for other people, in the same way that various elites would like abortion to be illegal but presumably think that anyone they personally knew who needed an abortion, would be able to get one.

Statistics and poker

One thing the book made me think about are the differences between what it takes to do statistics (or data science or “analytics”) and what it takes to play poker. Here’s my take: to be a good analyst, you need:
(a) some understanding of math and probability,
(b) the willingness to learn from data—to use data and analysis to come to new conclusions rather than just to confirm your existing beliefs—and
(c) the ability and willingness to apply subject-matter knowledge.
To be a good poker player, you need:
(a) and (b) as above,
(d) the ability to practice deception, and
(e) the ability to “read” people when they’re trying to deceive.

I’m a good statistician (or so think the people who pay me!) and I think I’m very strong on all of (a), (b), and (c) above. I’m just an OK poker player—I’m sure they’d clean me out at Vegas in short order—OK because I’ve got (a) and (b), but I’m weak on (d) and (e).

I don’t really know, but I think that Bill James is like me: good with (a), (b), (c), but not so much with (d) and (e). I’d want him as an analyst on my baseball team; I wouldn’t put him in charge of negotiations.

Nate is good at all of (a), (b), (c), (d), and (e), which has allowed him to become a world-class analyst and a world-class poker player. I do wonder if his strengths in all of these traits causes him to somewhat blur the boundaries between them.

In particular, I might argue that being bad at (d) and (e) might make me and Bill James better scientists than we would be, if we happened to be good at deception. I don’t say this because being good at deception means we would cheat—I think Nate’s good at deception in poker but I don’t think he cheats in any way in his analytical work—, but rather because, if you’re bad at deception, this kind of forces you to a more direct and transparent approach to science. There’s something brute-force about how I do science, and I think that brute force here is a good thing; I’m not in the habit of hiding things and so everything ends up naturally out in the open.

Or maybe this is just a rationalization. The relevant to Nate’s book is that there are different aspects of “Riverian” comfort with uncertainty. One aspect is the Riverian analysts’ traits of comfort with probability, willingness to learn from data, and integration of subject-matter understanding into analytics. The other aspect is the Riverian poker players’ traits of deception and ability to read through others’ deception. In mathematical terms, the (a) + (b) + (c) is decision theory and (a) + (b) + (d) + (e) is game theory. I don’t know how different these two sorts of River dwellers are; it’s just interesting to think that there are different ways in the River to be comfortable with uncertainty.

Summary

I enjoyed Nate’s new book. It had lots of great stories and an interesting take on risk attitudes, not just in themselves but how they interact. The book feels very honest; it seems like it’s written from the heart. I don’t think the political angle was fully thought through, but with a small extension (from River/Village to River/Village/Fort), it kinda works. The book is a kind of cultural tour of some people who thrive on uncertainty—Nate focuses not on people who take risks out of desperation but rather people like him (or me!) who would do just fine in the Village or Fort but choose to enter the River because it’s more exciting or just because it feels more real. How do these unusual people fit into the rest of society? It’s an interesting question that Nate takes some interesting steps to answer.

Data issues in that paper that claims that TikTok and Instagram have consumption spillovers that lead to negative utility

Posted on August 11, 2024 9:00 AM by Andrew

A professor of economics sends an email with subject line “Influential zombie paper” and the following content:

This might be of interest to your blog, within the topics (i) that measurement and data quality are central and (ii) media hype of sexy results.

There’s a paper that claims to present evidence that the users of social media platforms such as Instagram and TikTok are harmed by using them, but because everybody else is using them, they would be harmed even more if they stopped using them unilaterally, so they continue using it. It’s a sexy idea (social dilemmas have a vexing quality…) that sounds plausible, so it has been covered in Freakonomics radio and other popular media outlets. Jonathan Haidt uses it to argue for regulation.

It’s a paper by powerful people, including a co-editor of one of the infamous top-5 journals, so I can only utter criticism anonymously. But given that this paper is starting to have quite some influence in the real world despite the fact that it contains essentially no evidence for the claims it’s making, I think it’s important that people become aware of its shortcomings.

The problem is that the paper’s data are of appallingly low quality. The paper’s main issue is that people participating in the study might be asked to refrain from social media use for a while in a way that will be checked by the experimenters (though in a way that’s easy to circumvent). Upon learning this information, 54% of participants refused to continue with the study. Of course, this refusal depends on how much people dislike the idea of shutting off their social media, so there’s a huge problem of selecting into the survey based on the main dependent variable. The paper then measures people’s monetary value from shutting down their social media, both when they do it unilaterally, and when everybody else is required to shut it down, too. But because more than half the people selected out of the survey, presumably because their value from social media use is very high, the averages measured from the remaining subjects are pretty much meaningless. Moreover, the sample they end up with in the main experiment has only 22% men, a fact that is buried in a footnote. The paper has countless other problems. For instance, they promised subjects that they would prevent social media use across entire college campuses. But the way they implemented things, they never had the possibility to do this credibly, and that might well have been apparent to subjects.

The paper is symptomatic of a recent trend in economics that uses quick and dirty survey studies published in top journals to produce headline-grabbing results. And because it includes well-connected authors, they can publish it without any review in the NBER working papers series, which receives a lot of media attention.

If you decide to blog about this, please keep me anonymous.

I have not looked at this literature at all, so I’m just passing this along for the rest of you to evaluate. In general, I think it’s fine to publish papers based on imperfect survey data—of course I’ll say this, given that many of my own papers are based on imperfect survey data (for example, I absolutely love our penumbra paper)—so the problem, if there is one, is not so much with the data quality as with the possible overclaiming. And, as we’ve discussed in the past, if you want to publish in a top journal, there’s a lot of pressure to make strong claims.

Also, I respect my correspondent’s anonymity. Let me also be clear that I have no reason to think that the authors of the above-discussed article would retaliate against someone offering these criticisms. I don’t know any of the people involved, and I have no particular reason to think that any of them would behave unprofessionally.

Also, let me emphasize that the argument being made by my correspondent is not that TikTok and Instagram have positive utility; the argument being made is that the paper in question does not provide strong evidence in favor of its claims.

This is the usual story:
1. Splashy paper makes claim A.
2. Critic argues that the paper does not make the case for A.
The critic is not asserting “not-A.”

It’s an interesting logic issue: The opposite of the claim “A is true” is not “A is false” or “not-A is true,” but, rather, “The claimed evidence for A is not there.”

Another way to put it is to reframe the debate as follows:
1. Splashy paper makes claim B, where B = “The data in the paper demonstrate A.”
2. Critic argues not-B, i.e., The data in the paper do not demonstrate A.

This is related to the all-important distinction between evidence and truth.

A welcome rant on betting, knowledge, belief, and the foundations of probability

Posted on August 7, 2024 9:40 AM by Andrew

Ron Bloom writes:

I found the above scribbling on a page in my copy of the book by Lindley, which I found on a shelf today.

The claims that discover the origin of one’s degree-of-belief (in some claim or hypothesis) in some fundamental protean unanalyzable interaction called “betting behavior” are … preposterous.

Well put.

Bloom continues:

[0] True, some Mathematical statisticians may be crack gamblers know a thing or two rational betting behavior; but mostly they are surely not psychologists or even philosophically inclined observers of human behavior. What makes their crack notions of the ground-of-belief superior what psychologists, philosophers, biographers, moralists and country-doctors (Hume, Freud, Plutarch, say) have had to say about it through the ages … and so on?

Yes, this reminds me of the story from Freakonomics of an unnamed “academic” who says something stupid, only to be shot down by regular-guy “Chuck Esposito, a genial, quick-witted and thoroughly sports-fixated man who runs the race and sports book at Caesars Palace in Las Vegas,” which in turn reminded of the punchline of that joke from grade school: “Hey, man, the smartest guy in the world just jumped out of the plane wearing my backpack.” Ahhh, those pointy-headed academics, amirite?

Bloom continues:

[1] Taking “gambling behavior” as the un-decomposable uber-void out of which emerges the phenomenon of “rational grounds for belief of some more or less intense degree in such-and-such” is a kind of mystical pythagorian fundamentalism; in which the number-cultists give themselves great credit for knowing a great deal about the world; the mystical insights about the great world which can only be found in numbers proper; the world regarded as if a mere epi-phenomenon of the fundament of numbers. Except that in this case it is not so much number-mysticism; as … [one can scarcely believe but it appears over and over again] … “betting” mysticism; the cult of the die and the knights of the green table. Remarkable.

[2] When in fact we have *rational* degree of belief (in X) it must be grounded in evidence (E) of some kind.

[3] If there is evidence (E) of some kind that supports this aforesaid (rational) degree of belief (in X) then if we intend the word “rational” to really wear the pants (and not be some mere humpty-dumpty word) it comes down to this: This evidence E is to some greater or lesser extent hitherto observed in connection with X; whether in our own experience or in the experience of others (about which we learn by their reports). That is to say the correlation or association of E and X is part of the world of experience; and the adjectives “greater” or “lesser” (if they too are not to be mere humpty-dumpty words) describe the extent to which
that association is commonplace or rare. That is to say; there is an implied reference class. And an implied proportion. It may be a clean one; a messy one; a vague one or a precise one; but if our beliefs in claims can be assigned degrees and that assignment can be named “rational” then those degrees finally have to be grounded in experience; and experience of events forming a class; a reference class.

Degrees of belief — if we call them rational — are grounded in what the believer claims he has heard or experienced; and the degree is a measure rough or precise (depending upon the domain of inquiry; which dictates — per Aristotle — the degree of precision that can be properly sought) of the sparsity of regularity of the appearance of some circumstance in the larger reference class that he (the believer) situates the special case.

I laconically and linkfully replied: Yes, this comes up from time to time. We discuss it in chapter 1 of Bayesian Data Analysis and also in some blog posts:
https://statmodeling.stat.columbia.edu/2022/08/11/bets-as-forecasts-bets-as-probability-assessment-difficulty-of-using-bets-in-this-way/ and
https://statmodeling.stat.columbia.edu/2022/12/23/a-probability-isnt-just-a-number-its-part-of-a-network-of-conditional-statements/ and
https://statmodeling.stat.columbia.edu/2018/12/26/what-is-probability/ I’m pretty sure I have a couple other posts on the topic but I couldn’t find them by searching.

Bloom responded:

I like Keynes’ opening chapters … until suddenly he is (I think) precipitous and unfair in his discounting the frequency notion. I remember always finding the argument that “infinite limits do not exist in nature” somewhat akin to special pleading. After all — all mathematical devices are idealizations; and the idealization (say) of the point of no extent; or the line of no width; from geometry; are used to great profit in application; with no complaint; in their appropriate domain (i.e. not at the quantum scale; nor at the cosmological scale).

Aristotle prefaced many of his lectures of all diverse topics with the caution that: “..but we remember to seek only so much precision as the subject matter will admit”.

I think some of the hair-splitting about “numerical degrees” and how they should or should not be grounded; fails to take into account that fine bit of guidance.

In that respect, I appreciate Keynes’ notion that probability *relations* were fundamental: they might be greater or smaller; but not necessarily be assigned values; and moreover, that the relation did not necessarily cover all pairs of statements; the probability that the butter is sour may be comparable (greater or less than) the probability that the milk is sour; but the probability that butter is sour may not be comparable in such schema to the probability that Martians enjoy butter if they can only get their hands on it!

The bit about “probability relations” reminds me of the idea that “a probability isn’t just a number; it’s part of a network of conditional statements,” a principle that I could relate to after thinking hard about the problem of the boxer, the wrestler, and the coin flip: how to mathematically distinguish between certain and uncertain probabilities with the same numerical value. There is no “internal” solution to the problem; ultimately, the only way to distinguish between the two probability statements is to embed each of them into a larger structure of events.

Two job openings, one in New York on data visualization, one near Paris on Bayesian modeling

Posted on July 25, 2024 9:39 AM by Andrew

There are so many interesting and important things to do in statistical modeling, causal inference, and social science, and so many places for recent graduates to jump in. Here are two opportunities that happen to have come in the mail on the same day.

Angela Aidala sends along this ad for a research analyst focused on data visualization and design at the Brennan Center:

The Brennan Center for Justice at NYU School of Law is a nonpartisan law and policy institute that seeks to improve our systems of democracy and justice. We work to hold our political institutions and laws accountable to the twin American ideals of democracy and equal justice for all. Among our core priorities, we fight to protect voting rights, end mass incarceration, strengthen checks and balances, and preserve Constitutional protection in the fight against terrorism. Part think-tank, part advocacy group, part cutting-edge communications hub, we start with rigorous research. We craft innovative policies. And we fight for them — in Congress and the states, the courts, and in the court of public opinion.

The Brennan Center is seeking a talented Research Analyst (Data Visualization Design) to conduct research with Brennan Center program project teams and as part of the Research Department.

Position Overview:

The Research Analyst will work with colleagues across Brennan Center, and with external partners, on a wide range of projects. The Research Analyst will have a particular responsibility for partnering with researchers and communications experts in creating data visualizations to effectively communicate research on democracy and justice.

Responsibilities:

Design and produce effective data visualizations.

Collaborate with researchers and staff across all Brennan Center programs and departments to better leverage data visualization in the pursuit of our mission.

Collaborate with communications professionals and external partners on data visualizations related to democracy and justice.

Apply knowledge of research methods and data visualization practices to inform the design of research, collection and cleaning of data, data management, and statistical analysis.

Conduct empirical research and data analysis for Brennan Center policy and legal advocacy efforts and public communication efforts.

Draft and edit writeups of research output, including reports, academic journal articles, and other public communications as needed.

Critically evaluate research and communicating the strengths and weaknesses of research to diverse audiences.

Develop and maintain a high level of practical expertise on best practices for data visualization and research methodology.

Develop and maintain a high level of knowledge of substantive research on democracy and justice.

Provide mentorship and training for staff in areas of expertise as appropriate.

Qualifications:

The ideal candidate will offer some combination of the following experiences and qualifications. We recognize that many excellent candidates may not have all these experiences and qualifications and we will provide support for the successful candidate to acquire additional skills. We hope that you will apply as long as you believe you could contribute to our organization and our work.

A master’s degree and at least two years of research experience in data visualization.

Demonstrated expertise in and portfolio of data visualization projects.

Experience with the development of data visualization at multiple stages of the research process and as part of a broader communication strategy reaching multiple audiences, including the public, policymakers, advocates, and for academic or technical audiences.

Experience developing data visualizations appropriately tailored for public communication strategies on varied media, including but not limited to print reports, op eds, websites, social media, and interactives.

Experience working on data visualization of geographic data, data visualizations that effectively visualize statistical uncertainty, and the use of appropriate data visualization for advanced statistical research.

Familiarity with commonly used data visualization tools and software such as Datawrapper, D3, Flourish, and Tableau, or others.

Strong fluency with data science techniques, and advanced experience using R, Stata, or another comparable data analysis programs. Substantial experience in dataset management.

Exceptional attention to detail and accuracy in all work, with proven ability to plan for rigorous arms-length research review and other techniques for bulletproofing findings.

Understanding of how to engage in research through a racial equity approach and experience working with data visualizations that effectively portray the diverse experiences of Americans in our systems of democracy and justice.

Demonstrated ability to collaborate with diverse stakeholders in designing and implementing research and to communicate technical research findings to broad audiences.

Excellent interpersonal skills, including proven ability to work effectively within teams, and to work collaboratively with editors, other staff, and outside partners.

Commitment to the mission and values of the Brennan Center.

They’re asking for a lot—it’s hard to find someone who can write and also do visualization and research. Seems like it could be a great job for the right person.

Unrelatedly, Nicolas Bousquet informs us:

I am writing to inform us of “a post-doctoral position in Bayesian modeling at Ecole Polytechnique in Palaiseau (France), in collaboration with Électricité de France. We hope that this subject will be of interest to some young PhDs or future PhDs who are familiar with the mathematical aspects of Bayesian modeling, and who are interested in exploring approximations of prior distributions.

From the ad:

The Sustainable Energies Chair of the Ecole Polytechnique is offering a 1-year post-doc position, renewable once and starting before December 2024, on a subject related to improving the objectivity of Bayesian modeling rules in view of supporting interpretable decision. . . . Bayesian modeling choices play an important role in the context of forecasting and decision support through statistical learning applied to costly or rare data. These are characteristic of the risk situations affecting the industrial world. Their promise is to be able to model the uncertain knowledge required to inform models in addition to available data, or even in their absence in the event of a regime change. Major applications related to decarbonated energies are the following (among others):

rapid location of nuclear material in waste packages, in order to apply focused spectrometry techniques to identify fission products;

the reliability of important or even critical industrial components, such as steam generators, batteries, valves, etc. ;

quantification of the intensity of extreme natural phenomena (torrential rain, marine submersion, floods, etc.);

calibration of technico-economic models used to optimize the design and operation of energy parks, particularly when the depth of history is shallow (e.g. offshore park deployments).

The work will aim to bring together a set of known methodological constraints, still separated, into a single approach that will extend methodologies already established for sub-families of models (in particular exponential families and conjugate models). Such constraints are, for instance, related to prior-data conflict or q-vague convergence. . . . Such approaches are interpretable because they consider that the available information can be assimilated to that provided by data not directly known, but that it can be manipulated by explicit approximation techniques outside conjugate families. . . .

The work will be supervised by Professor Josselin Garnier (École Polytechnique / CMAP-Centre de Mathématiques Appliquées), in collaboration with Dr. Nicolas Bousquet, senior researcher at EDF R&D. . . .

The candidate should have a PhD thesis in statistics or applied mathematics, with a good knowledge of Bayesian statistics. A good knowledge of mathematical tools related to the approximation of probability distributions and non-convex optimization would be a plus. . . . CMAP conducts theoret- ical and numerical research on mathematics in interaction with other sciences (biology, economics, computer science, mechanics, physics, etc) or in connection with industrial or societal applications. Its specialties are numerical analysis, scientific computing, control, artificial intelligence, modeling, optimization, probability, signals, statistics, etc.

They didn’t specifically mention Stan, but I’m sure it would be useful!

Bill James hangs up his hat. Also some general thoughts about book writing vs. blogging. Also I push back against James’s claim about sabermetrics and statistics.

Posted on July 17, 2024 9:17 AM by Andrew

Bill James folded up his blog, Bill James Online. From his announcement:

I [Bill James] want to focus on writing books. This website consumes so much of time and energy that I have found it impossible to focus on other projects. . . .

Or to put the reasons for the shutdown into one different sentence, we have pushed the economic insanity of this as far as we can push it. I don’t know how much I have earned for the hours I have put into this over the years, but I can tell you great confidence that is nowhere near the minimum wage. Actually, LOTS of people who blog about baseball and other stuff aren’t making anywhere near the minimum wage either, and God bless them, but when you have better options, then the decision to put your time into THIS, rather than THAT, amounts to working for negative money. A sensible man can only do that for so long, not that I am claiming to be a sensible man. A sensible man would have done this ten years ago.

Ohhhh, this is a topic close to my heart! I write something like 400 blog posts a year. If I weren’t blogging, I could be putting out a book a year. That is, assuming I could translate the blogging effort into book-writing effort, in the same way that a high jumper converts forward momentum into upward momentum.

Also, like James, I’m making less than the minimum wage from blogging. Actually, my income from blogging is negative. No kidding. I don’t just mean negative-in-opportunity-cost, I mean negative in dollar terms. I get paid zero ($0) for this blogging, and I pay a few thousand dollars every year to host the blog. My co-bloggers, on the other hand, make exactly zero, because they are paid $0 to blog and they pay $0 to maintain the blog.

As an academic, I’m used to making negative money from writing. Some journals charge publication fees, and very few venues pay. I know what you’re thinking—I’m paid to do research, and publication is a demonstration of that, there’s also “publish or perish,” etc.—and, sure, if I published nothing every year, then I think they’d stop giving me raises. But the marginal benefit of a new publication can’t be much. I think that, say, five published papers a year would be more than enough.

In summary: I blog for the same reason I publish articles and books, which is not for the money; it’s out of some desire to express myself, to share my thoughts with the world, out of a Bill-James-like annoyance at other people’s misconceptions that I’d like to fix, as well as a Bill-James-like understanding that public writing can be a great way to work out ideas, to let them develop far beyond what they would be within the cramped confines of my head. The usual story.

Now back to what Bill James said:

1. I’m glad that he’s switching back from blogs to books. I absolutely love many of his books, and I think he does better when he has the space to fully explore his arguments. Come to think of it, his classic Baseball Abstract books from the 80s had something of the flavor of blog posts—he used each team as a starting point for some musings and data analysis—; even there, though, the structure of going through all the teams in the league worked well.

As an author of several books, I recognize the importance of structure: much of what makes a book is not just its content but also how it is arranged, and I think this structure matters even for readers who jump around the book rather than going methodically from page 1 onward.

So, yes, much as it pains me as a blogger to say this, I think it’s good news that James is shifting his focus, and I look forward to reading his next baseball book.

2. James wrote, “A sensible man can only [write for much less than the minimum wage, when higher-earning options are available] for so long.” I don’t buy that! I know nothing about Bill James’s finances, but my guess is that he’s doing just fine and doesn’t need the money: he wrote a bunch of successful books, he worked for the Red Sox for many years, I recall reading that he made some money working on arbitration cases, and he seems like a sensible guy, has probably paid off his mortgage, drives an affordable car, no major gambling or drug habits, etc. Assuming that’s the case, then, no, there’s no “sensible” requirement to stop blogging.

As I just wrote above, I do see a very good reason for James to shut down the blog: it’s reason 1 above. Also, I see the logic to just stopping rather than, say, reducing the rate of posts. Readers expect a certain flow of material, and if the blog is there, then you’ll have the temptation to post things. Shut the blog down, go cold turkey, and turn that energy into book writing.

James describes his website as “a theoretically profit-making enterprise”—but why does it have to be? There’s no shame in working for free, sharing your thoughts and analyses with the world, and providing a forum for discussion. If you can make money doing all that, sure, go for it—but if you’re not set up to make it into a profit-making enterprise, so be it.

Should I stop blogging?

Should I stop blogging—or, at least, curtail it—so I can be more productive in other aspects of my job? Maybe so. My colleagues and I have several book projects lined up. Indeed, if I wasn’t blogging right now, I’d be working on the Bayesian Workflow book. Another way to roll would be to continue writing things like this, but instead of typing it into the WordPress window and scheduling it for July or whenever, inserting it into some big Word document and shaping it, along with the material from various other posts, into a book. That might be a good idea!

In theory I could get the best of both worlds—first post on blog, then reformat into a book—, but in practice this would not be so easy, because writing is structuring. Even writing this blog post involves thought about structure. So if I were to decide to put everything into a book, that structuring effort would go into that direction. I’m not sure.

I’ve been blogging here for nearly 20 years. Right now, my guess is that I’ll keep doing it for another 20, but I do see the logic to stopping. Or to changing the rules, in some way, for example first putting my ideas into books and then spinning them off into blogs, rather than the other way around.

One good thing about book writing and blogging is that these are two formats where you can write whatever you want. Writing articles for scholarly journals or public-facing newspapers or magazines is much more of a pain in the ass because you have to go through reviewers and editors. Don’t get me wrong—reviews and edits typically improve my writing—for example, this journal version of our Two Truths and a Lie activity is better in all respects than the original version I’d written up for the first draft of our Active Statistics book—but this sort of thing takes a lot of work. There’s enough new material in Active Statistics for 100 or 200 journal articles, but writing a hundred journal articles and shepherding them through the review process, that would’ve killed me.

So maybe the best option would be for me to stop blogging my immediate ideas and instead put this material into vessels for new books, then I could extract material from these books-in-preparation and use them as blog posts.

I’m not quite ready to take that step, but it could be the best of both worlds, allowing me to write more books while still providing a flow of material to all of you, and still providing this space for fun and thoughtful discussion.

Yes, sabermetrics is about statistics

After discussing his approach to blogging (which is similar to my approach, except that I think I engage more with the comments here), James says a few things that, as a statistician, I find annoying. Here he is:

It frustrates me that people still imagine that sabermetrics is about statistics. Sabermetrics doesn’t have anything at all to do with statistics, and we don’t use numbers any more than a doctor does, or an architect, or an economist.

C’mon, Bill. Sabermetrics is all about statistics. Not just any statistics, it’s about baseball statistics, but, still, sabermetrics without statistics is almost nothing. In contrast, subtract numbers from architecture and you still have much of architecture; subtract numbers from medicine and you still have much of medicine. You don’t need numbers to heal a broken bone; you don’t need numbers to culture penicillin. Numbers help in all sorts of areas of architecture and medicine, but they’re not central to these fields in the way that they are to sabermetrics. Economics, though, is pretty much reliant on numbers. Without numbers, you still have lots of economic theory and storytelling and policy, from Adam Smith to Karl Marx, but the numbers are where the rubber meets the road. Similar, you can do some sabermetrics without numbers, for example making a solid argument about batting order, but the numbers are what make it all works.

So I think Bill James is protesting too much. I get it: he’s annoyed when outsiders think of sabermetricians as pocket-protector-wearing geeks who don’t care about the sport. But to deny the centrality of statistics to sabermetrics, that just seems silly.

James follows up on his analogy:

What do you think doctors do? They create ways to measure things, things about your body, things about your health. They have reference to thousands and thousands and thousands of statistics that the medical community has created about every little part of your body, the thickness of the walls of your heart and the amount of iron in your blood and the amount of nitrogen in your blood and the amount of every other known substance that can be found in your blood or your hair or your sweat or saliva. If you want to know how many days of your life you could be expected to lose if you put on 1.7 pounds, your doctor could tell you if he wanted to, because of the hundreds of studies done by other doctors.

Ummm, no. Doctors don’t know many days of your life you could be expected to lose if you put on 1.7 pounds. This is a tough problem of causal inference, and there’s a lot of debate about, beyond the simple fact that this expected outcome, even if were known, would depend on your own situation in ways that are not well addressed by those “hundreds of studies.”

Let me put it another way. The analogy would go like this. Medicine is like baseball: it’s a field of human endeavor that is not inherently statistical but which produces a stream of data which, when collected and analyzed well, can yield useful insights. Biostatistics is like sabermetrics: it’s a field of human endeavor involving the collection and analysis and interpretation of data in the relevant field. Biostatistics is “about” medicine—it’s a field that only exists because of our interest in medicine—in the same way that sabermetrics is “about” baseball.

Sabermetrics is about baseball; it’s also about statistics. That should be no more controversial than saying that biostatistics is about medicine and also about statistics.

James continues:

Yes, we use statistics a lot. A painter uses a paint brush, we might say, every time that he paints . . . but when you interview a painter, you ask him about his paintings, not about his paint brushes.

But we do care about painters’ technique! And even about their brushes. Just for example, here’s something I found from a quick google search:

Monet’s gestural approach to painting encouraged Renoir to handle paint more freely, although Renoir’s style would remain distinct from his friend’s: while Monet created harsh and clearly circumscribed brushstrokes with fat pigments and flat-ended brushes, Renoir employed thinned paint to create liquid strokes of color that shimmered across his canvases. Such fluency can be seen in Road at Wargemont, where everything quivers with movement: wind gusts through the trees, sends clouds scudding across the sky, and prevents the rain from striking the earth.

There’s a discussion of technique—and of brushes—in the context of the paintings. By analogy, we might talk about some of the techniques used by sabermetricians in the context of their statements about baseball, for example when Bill James criticized Pete Palmer’s use of the linear weights method by looking at some ratings from some specific players. Here’s another example from James:

Total Baseball has Glenn Hubbard rated as a better player than Pete Rose, Brooks Robinson, Dale Murphy, Ken Boyer, or Sandy Koufax, a conclusion which is every bit as preposterous as it seems to be at first blush.

To a large extent, this rating is caused by the failure to adjust Hubbard’s fielding statistics for the ground-ball tendency of his pitching staff. Hubbard played second base for teams which had very high numbers of ground balls, as is reflected in their team assists totals. The Braves led the National League in team assists in 1985, 1986, and 1987, and were near the league lead in the other years that Hubbard was a regular. Total Baseball makes no adjustment for this, and thus concludes that Hubbard is reaching scores of baseballs every year that an average second basement would not reach, hence that he has enormous value.

It’s all about baseball—but the discussion gets into technique (“paint brushes”) as well.

I don’t think I’m disagreeing with James on his main point, which is that his subject matter is baseball, not statistics. The main difference is that I’m a statistician, so I don’t find it insulting for someone to say that sabermetrics is statistical!

P.S. The above post in itself demonstrates the strength and weakness of blogging.

The strength is that some little thing got me to start writing, and then the essay took on a life of its own, moving from a discussion of Bill James’s blogs and books, to my own blogs and books, to sabermetrics, to some general points about what is statistics, to art technique, and then looping back to a point made by James in one of his Baseball Abstracts.

The weakness is that it’s all mixed together, and as a result it might be difficult to reach all the desired audiences. I have no idea if this will ever be seen by Bill James, let alone all the other people who might enjoy it if they knew where to look.

Google is violating the First Law of Robotics.

Posted on July 16, 2024 9:53 AM by Andrew

There are a few sad things about this story:

1. Google failing at its job

2. Bad behavior being rewarded

3. The conversion of the news media into empty brands, followed by the brands being sold for scrap.

Back when news media companies actually made money, their reputations were valuable for their own sake. Now these brands are being mined for whatever remaining value that they might have.

Remember that book from 1990, “America: What Went Wrong?” OK, probably not. Many of you weren’t even alive in 1990. Anyway, it was a book about how manufacturing companies (I remember the example of Simplicity Patterns) were being bought, stripped, and destroyed by corporate raiders. Creative destruction, sure, that’s the way the world works, but in this case it was more like stealing.

The same thing has gone on for awhile with many media companies.

As they say in Goodfellas, it’s a bust-out operation.
Continue reading →

Ancestor-worship in academia: Where does it happen?

Posted on July 11, 2024 9:41 AM by Andrew

I was reading this post by economist Rajiv Sethi, which began, “The celebrated economist Robert Solow has died at the age of 99,” and continued:

Solow is best known for his model of growth which is simple in the best sense . . . With over 42,000 citations, Solow’s paper on growth is among the most influential in economics. Much of the subsequent literature has built on his foundations . . . While Solow’s influence on the profession through his writings was profound, his influence through the students he advised was even greater. Take a look at his academic family tree, and you will see an extraordinary collection of economists . . . Some of his former students were themselves incredibly fertile (in generating economists), so Solow has an extraordinary collection of academic grandchildren. . . . Scholars have influence through their work, which can be tracked and traced through a thicket of citations. But they also have influence through their interpersonal guidance, which remains largely invisible and unheralded. Solow’s influence through both channels was staggering and profound. . . .

This is fine. I like to write obituaries too, and these are the places to write nice things (for example here, here, here, here, here, here, and here).

With regard to Solow, though, I was moved to reply:

I met Solow once and wasn’t impressed; see here.

But maybe he was having a bad day.

The larger problem I see is the habit of economists describing other economists in heroic terms. I guess econ takes this from math and physics. I remember as a math and physics student how we were supposed to worship Archimedes, Newton, Euler, etc. I’m sure they all deserved this, but it seems to have transferred into academics in certain fields making idols of their predecessors and even of their colleagues.

In poli sci I don’t see this ancestor-worship being so strong. Locke and Hobbes are intellectual heroes, sure (I’m not talking about their personal characteristics here), but modern political scientists don’t seem to be idolizing the political scientists of the mid and late twentieth century.

As to statistics: as students, we were taught that Fisher or Neyman were intellectual heroes, and I do think that has caused some problems. We should be able to celebrate great work without idealizing the individuals involved. They’re just people!

Rajiv responded:

Economists can be quite brutal and dismissive towards each other also, see for example Romer in the post you linked to! Solow himself was very critical of Lucas. For me there are relatively few real heroes, and Solow would not be among them. But his list of academic descendants is unmatched in terms of status and influence.

To which I replied:

I agree that economists can be negative about each other (for example, Krugman on Hayek and Galbraith). In that, they’re different from mathematicians and physicists, who wrote about their predecessors with worship or else didn’t bother writing about them at all. I can’t recall any mathematicians saying, “Cauchy wasn’t all that.”

Some more examples from economics are the extravagant and, in my impression, unreasonable adulation given to Gary Becker and Lawrence Summers. I’m sure these guys were impressive, but I don’t think the whole thing of treating them as geniuses worked out so well.

In a dark room with many of the lights covered up by opaque dollar bills, even a weak and intermittent beam can appear brilliant, if you look right at it.

I wonder if part of the problem is that economics has traditionally been taught as a history of great men as much as a history of ideas. When I took economics in high school we read a book called The Worldly Philosophers, which was all about how Adam Smith was so great, and David Ricardo was so great, and Keynes was so great . . . it’s no surprise that later economists wanted to add their personal heroes to the pantheon.

In contrast, in statistics, yes, we hear about Gauss and Laplace and Galton and Pearson and Neyman and Fisher, but it seems much more closely tied to the content. Gauss has the normal distribution, Laplace has probability theory, Galton has correlation and regression, etc. This continues to modern figures such as Rubin with causal inference, Wahba with splines, and Efron with bootstrap. The great contributors are known by name, and sometimes by epoynyn (“Gaussian distribution,” “Bayesian inference,” etc.), but still it seems ultimately more about the ideas than about the genius of their promulgators.

For some reason, things have gone differently in psychology. Great names such as Freud, Piaget, and Skinner get reassessed, and modern leaders in the field are respected for what they have not done, not so much for who they are. On the rare occasions that a psychologist attempts to puff himself up into a Great Man, he becomes more of a figure of derision than anything else.

It’s $ time! How much should we charge for a link?

Posted on July 7, 2024 9:33 AM by Andrew

The following came in the email, subject line “Paid Link Insertion Request”:

Hello Andrew,

I hope this message finds you well. In the course of our ongoing collaborations with clients, we’ve received feedback indicating their interest in incorporating a link on your site.

Could you please confirm if your site, stat.columbia.edu, is open to link insertions, and if so, could you provide details regarding the associated charges?

If link insertions are not within your preferences, we are eager to submit a Guest Post instead. Could you kindly share any specific content requirements or guidelines for the submission?

Thank you for your time and consideration.

Best regards,

Joe Stone

I know I shouldn’t reply, but I couldn’t resist, so I sent a message back:

How much do you pay and what is in the link?

If Joe Stone responds to this, I’ll keep you informed.

P.S. It’s been a few months, and . . . no response. That’s fine—I can’t imagine that whatever they’d have paid would be enough to be worth the effort—but I’m kinda surprised I didn’t hear back from them. Kinda weird.

P.P.S. A few more months and still no reply from “Joe Stone.” As is often the case, I’m pretty sure this is a scam but I have no idea how it’s supposed to work.

Obnoxious receipt from Spirit Airlines

Posted on July 2, 2024 9:46 AM by Andrew

Here it is:

“Government’s Cut,” huh? So tacky. Who do you think pays for all those airport runways, air traffic controllers, environmental cleanup, etc.? Assholes.

Next time I’d prefer my plain ticket to come without any stupid propaganda attached to it.

Arnold Foundation and Vera Institute argue about a study of the effectiveness of college education programs in prison.

Posted on June 13, 2024 9:08 AM by Andrew

OK, this one’s in our wheelhouse. So I’ll write about it. I just want to say that writing this sort of post takes a lot of effort. When it comes to social engagement, my benefit/cost ratio is much higher if I just spend 10 minutes writing a post about the virtues of p-values or whatever. Maximizing the number of hits and blog comments isn’t the only goal, though, and I do find that writing this sort of long post helps me clarify my thinking, so here we go. . . .

Jonathan Ben-Menachem writes:

Two criminal justice reform heavyweights are trading blows over a seemingly arcane subject: research methods. . . . Jennifer Doleac, Executive Vice President of Criminal Justice at Arnold Ventures, accused the Vera Institute of Justice of “research malpractice” for their evaluation of New York college-in-prison programs. In a response posted on Vera’s website, President Nick Turner accused Doleac of “giving comfort to the opponents of reform.”

At first glance, the study at the core of this debate doesn’t seem controversial: Vera evaluated Manhattan DA-funded college education programs for New York prisoners and found that participants were less likely to commit a new crime after exiting prison. . . . Vera used a method called propensity score matching, and constructed a “control” group on the basis of prisoners’ similarity to the “treatment” group. . . . Despite their acknowledgment that “differences may remain across the groups,” Vera researchers contended that “any remaining differences on unobserved variables will be small.”

Doleac didn’t buy it. . . . She argued that propensity score matching could not account for potentially different “motivation and focus.” In other words, the kind of people who apply for classes are different from people who don’t apply, so the difference in outcomes can’t be attributed to prison education. . . .

Here’s Doleac’s full comment:

Vera Institute just released this study of a college-in-prison education program in NY, funded by the Manhattan DA’s Criminal Justice Investment Initiative. Researchers compared people who chose to enroll in the program with similar-looking people who chose not to. This does not isolate the treatment effect of the education program. It is very likely that those who enrolled were more motivated to change, and/or more able to focus on their goals. This pre-existing difference in motivation & focus likely caused both the difference in enrollment in the program and the subsequent difference in recidivism across groups.

This report provides no useful information about whether this NY program is having beneficial effects.

Now we return to Ben-Menachem for some background:

This fight between big philanthropy and a nonprofit executive is extremely rare, and points to a broader struggle over research and politics. The Vera Institute boasts a $264 million operating budget, and . . . has been working on bail reform since the 1960s. Arnold Ventures was founded in 2010, and the organization has allocated around $400 million to criminal justice reform—some of which went to Vera.

How does the debate over methods relate to larger policy questions? Ben-Menachem writes:

Although propensity score matching does have useful applications, I might have made a critique similar to Doleac if I was a peer reviewer for an academic journal. But I’m not sure about Doleac’s claim that Vera’s study provides “no useful information,” or her broader insistence on (quasi) experimental research designs. Because “all studies on this topic use the same flawed design,” Doleac argued, “we have *no idea* whether in-prison college programming is a good investment.” This is a striking declaration that nothing outside of causal inference counts.

He connects this to an earlier controversy:

In 2018, Doleac and Anita Mukherjee published a working paper called “The Moral Hazard of Lifesaving Innovations: Naloxone Access, Opioid Abuse, and Crime” which claimed that naloxone distribution fails to reduce overdose deaths while also “making riskier opioid use more appealing.” In addition to measurement problems, the moral hazard frame partly relied on an urban myth—“naloxone parties,” where opioid users stockpile naloxone, an FDA approved medication designed to rapidly reverse overdose, and intentionally overdose with the knowledge that they can be revived. The final version of the study includes no references to “naloxone parties,” removes the moral hazard framing from the title, and describes the findings as “suggestive” rather than causal.

Later that year, Doleac and coauthors published a research review in Brookings citing her controversial naloxone study claiming that both naloxone and syringe exchange programs were unsupported by rigorous research. Opioid health researchers immediately demanded a retraction, pointing to heaps of prior research suggesting that these policies reduce overdose deaths (among other benefits). . . .

Ben-Menachem connects this to debates between economists and others regarding the role of causal inference. He writes:

While causal inference can be useful, it is insufficient on its own and arguably not always necessary in the policy context. By contrast, Vera produces research using a very wide variety of methods. This work teaches us about the who, where, when, what, why, and how of criminalization. Causal inference primarily tells us “whether.”

I disagree with him on this one. Propensity score matching (which should be followed up with regression adjustment; see for example our discussion here) is a method that is used for causal inference. I will also channel my causal-inference colleagues and say that, if your goal is to estimate and understand the effects of a policy, causal inference is absolutely necessary. Ben-Menachem’s mistake is to identify “causal inference” with some particular forms of natural-experiment or instrumental-variables analyses. Also, no matter how you define it, causal inference primarily tells us, or attempts to tell us, “how much” and “where and when,” not “whether.” I agree with his larger point, though, which is that understanding (what we sometimes call “theory”) is important.

I think Ben-Menachem’s framing of this as economists-doing-causal-inference vs. other-researchers-doing-pluralism misses the mark. Everybody’s doing causal inference here, one way or another, and indeed matching can be just fine if it is used as part of a general strategy for adjustment, even if, as with other causal inference methods, it can do badly when applied blindly.

But let’s move on. Ben-Menachem continues:

In a recent interview about Arnold Ventures’ funding priorities, Doleac explained that her goal is to “help build the evidence base on what works, and then push for policy change based on that evidence.” But insisting on “rigorous” evidence before implementing policy change risks slowing the steady progress of decarceration to a grinding halt. . . .

In an email, Vera’s Turner echoed this point. “The cost of Doleac’s apparently rigid standard is that it not only devalues legitimate methods,” he wrote, “but it sets an unreasonably and unnecessarily high burden of proof to undo a system that itself has very little evidence supporting its current state.”

Indeed, mass incarceration was not built on “rigorous research.” . . . Yet today some philanthropists demand randomized controlled trials (or “natural experiments”) for every brick we want to remove from the wall of mass incarceration. . . .

Decarceration is a fight that takes place on the streets and in city halls across America, not in the halls of philanthropic organizations. . . . the narrow emphasis on the evaluation standards of academic economists will hamstring otherwise promising efforts to undo the harms of criminalization.

Several questions arise here:

1. What can be learned from this now-controversial research project? What does it tell us about the effects of New York college-in-prison programs, or about programs to reduce prison time?

2. Given the inevitable weaknesses of any study of this sort (including studies that Doleac or I or other methods critics might like), how should its findings inform policy?

3. What should advocates’ or legislators’ views of the policy options be, given that the evidence in favor of the status quo is far from rigorous by any standard?

4. Given questions 1, 2, 3 above, what is the relevance of methodological critiques of any study in a real-world policy context?

Let me go through these four questions in turn.

1. What can be learned from this now-controversial research project?

First we have to look at the study! Here it is: “The Impacts of College-in-Prison Participation on Safety and Employment in New York State: An Analysis of College Students Funded by the Criminal Justice Investment Initiative,” published in November 2023.

I have no connection to this particular project, but I have some tenuous connection to both of the organizations involved in this debate, as many years ago I attended a brief meeting at the Arnold Foundation regarding a study being done by the Vera Institute regarding a program they were doing in the correctional system. And many years ago my aunt Lucy taught math at Sing Sing prison for awhile.

Let’s go to the Vera report, which concludes:

The study found a strong, significant, and consistent effect of college participation on reducing new convictions following release. Participation in this form of postsecondary education reduced reconviction by at least 66 percent. . . .

Vera also conducted a cost analysis of these seven college-in-prison programs . . . Researchers calculated the costs reimbursed by CJII, as well as two measures of the overall cost: the average cost per student and the costs of adding an additional group of 10 or 20 students to an existing college program . . . Adding an additional group of 10 or 20 students to those colleges that provided both education and reentry services would cost colleges approximately $10,500 per additional student, while adding an additional group of students to colleges that focused on education would cost approximately $3,800 per additional student. . . . The final evaluation report will expand this cost analysis to a benefit-cost analysis, which will evaluate the return on investment of these monetary and resource outlays in terms of avoided incarceration, averted criminal victimization, and increased labor force participation and improved income.

And they connect this to policy:

This research indicates that academic college programs are highly effective at reducing future convictions among participating students. Yet, interest in college in prison among prospective students far outstrips the ability of institutions of higher education to provide that programming, due in no small part to resource constraints. In such a context, funding through initiatives such as CJII and through state and federal programs not only supports the aspirations of people who are incarcerated but also promotes public safety.

Now let’s jump to the methods. From page 13 of the report onward:

To understand the impact of access to a college education on the people in the program, Vera researchers needed to know what would have happened to these people if they had not participated in the program. . . . Ideally, researchers need these comparisons to be between groups that are otherwise as similar as possible to guard against attributing outcomes to the effects of education that may be due to the characteristics of people who are eligible for or interested in participating in education. In a fair comparison of students and nonstudents, the only difference between the two is that students participated in college education in prison while nonstudents did not. . . . One study of the impacts of college in prison on criminal legal system outcomes found that people who chose or were able to access education differed in their demographics, employment and conviction histories, and sentence lengths from people who did not choose or have the ability to access education. This indicates a need for research and statistical methods that can account for such “selection” into college education . . .

The best way to create the fair comparisons needed to estimate causal effects is to perform a randomized experiment. However, this was not done in this study due to the ethical impact of withholding from a comparison group an intervention that has established positive benefits . . . Vera researchers instead aimed to create a fairer comparison across groups using a statistical technique called propensity score matching . . . Vera researchers matched students and nonstudents on the following variables:
– demographics . . .
– conviction history . . .
– correctional characteristics . . .
– education characteristics . . .
Researchers considered nonstudents to be eligible for comparison not only if they met the same academic and behavioral history requirements as students but also if they had a similar time to release during the CIP period, a similar age at incarceration, and a similar time from prison admission to eligibility. . . . when evaluating whether an intervention influences an outcome of interest, it is a necessary but not sufficient condition that the intervention happens before the outcome. Vera researchers therefore defined a “start date” for students and a “virtual start date” for nonstudents in order to determine when to begin measuring in-facility outcomes, which included Tier II, Tier III, high-severity, and all misconducts. . . . To examine the effect of college education in prison on misconducts and on reported wages, Vera researchers used linear regression on the matched sample. For formal employment status and for an incident within six months and 12 months of release that led to a new conviction, Vera used logistic regression on the matched sample. For recidivism at any point following release, Vera used survival analysis on the matched sample to estimate the impact of the program on the time until an incident that leads to a new conviction occurs.

What about the concern expressed by Doleac regarding differences that are not accounted for by the matching and adjustment variables? Here’s what the report says:

Vera researchers have attempted to control [I’d prefer the term “adjust” — ed.] for pre-incarceration factors, such as conviction history, age, and gender, that may contribute to misconducts in prison. However, Vera was not able to control for other pre-incarceration factors that have been found in the literature to contribute to misconducts, such as marital status and family structure, mental health needs, a history of physical abuse, antisocial attitudes and beliefs, religiosity, socioeconomic disadvantage and exposure to geographically concentrated poverty, and other factors that, if present, would still allow a person to remain eligible for college education but might influence misconducts. Vera researchers also have not been able to control for factors that may be related to misconducts, including characteristics of the prison management environment, such as prison size, and the proportion of people incarcerated under age 25, as Vera did not have access to information about the facilities where nonstudents were incarcerated. Vera also did not have access to other programs that students and nonstudents may be participating in, such as work assignments, other programming, or health and mental health service engagement, which may influence in-facility behavior and are commonly used as controls in the literature. If other literature on the subject is correct and education does help to lower misconducts, Vera may have, by chance, mismatched students with controls who, unobserved to researchers and unmeasured in the data, were less likely to have characteristics or be exposed to environments that influence misconducts. While prior misconducts, assigned security class, and time since admission may, as proxies, capture some of this information, they may do so imperfectly.

They have plans to mitigate these limitations going forward:

First, Vera will receive information on new students and newly eligible nonstudents who have enrolled or become eligible following receipt of the first tranche of data. Researchers will also have the opportunity to follow the people in the analytical sample for the present study over a longer period of time. . . . Second, researchers will receive new variables in new time periods from both DOCCS and DOL. Vera plans to obtain more detailed information on both misconducts and counts of misconducts that take place in different time periods for the final report. . . . Next, Vera will obtain data on pre-incarceration wages and formal employment status, which could help researchers to achieve better balance between students and nonstudents on their work histories . . .

In summary: Yeah, observational studies are hard. You adjust for what you can adjust for, then you can do supplementary analyses to assess the sizes and directions of possible biases. I’m kinda with Ben-Menachem on this one: Doleac’s right that the study “does not isolate the treatment effect of the education program,” but there’s really no way to isolate this effect—indeed, there is no single “effect,” as any effect will vary by person and depend on context. But to say that the report “provides no useful information” about the effect . . . I think that’s way too harsh.

Another way of saying this is that, speaking in general terms, I don’t find adjusting for existing pre-treatment variables to be a worse identification strategy than instrumental variables, or difference-in-differences, or various other methods that are used for causal inference from observational studies. All these methods rely on strong, false assumptions. I’m not saying that these methods are equivalent, either in general or in any particular case, just that all have flaws. And indeed, in her work with the Arnold Foundation, Doleac promotes various criminal-justice reforms. So I’m not quite sure why she’s so bothered by this particular Vera study. I’m not saying she’s wrong to be bothered by it; there just must be more to the story, other reasons she has for concern that were not mentioned in her above-linked social media post.

Also, I don’t believe that estimate from the Vera study that the treatment reduces recidivism by 66%. No way. See the section “About that ’66 percent'” below for details. So there are reasons to be bothered by that report; I just don’t quite get where Doleac is coming from in her particular criticism.

2. Given the inevitable weaknesses of any study of this sort, how should its findings inform policy?

I guess it’s the usual story: each study only adds a bit to the big picture. The Vera study is encouraging to the extent that it’s part of a larger story that makes sense and is consistent with observation. The results so far seem too noisy to be able to say much about the size of the effect, but maybe more will be learned from the followups.

3. What should advocates’ or legislators’ views of the policy options be, given that the evidence in favor of the status quo is far from rigorous by any standard?

This I’m not sure. It depends on your understanding of justice policy. Ben-Menachem and others want to reduce mass incarceration, and this makes sense to me, but others have different views and take the position that mass incarceration has positive net effects.

I agree with Ben-Menachem that policymakers should not stick with the status quo, just on the basis that there is no strong evidence in favor of a particular alternative. For one thing, the status quo is itself relatively recent, so it’s not like it can be supported based on any general “if it ain’t broke, don’t fix it” principle. But . . . I don’t think Doleac is taking a stick-with-the-status-quo position either! Yes, she’s saying that the Vera study “provides no useful information”—a statement I don’t really agree with—but I don’t see her saying that New York’s college-in-prison education program is a bad idea, or that it shouldn’t be funded. I take Doleac as saying that, if policymakers want to fund this program, they should be clear that they’re making this decision based on their theoretical understanding, or maybe based on political concerns, not based on a solid empirical estimate of its effects.

4. Given questions 1, 2, 3 above, what is the relevance of methodological critiques of any study in a real-world policy context?

Methodological critique can help us avoid overconfidence in the interpretation of results.

Concerns such as Doleac’s regarding identification help us understand how different studies can differ so much in their results: in addition to sampling variation and varying treatment effect, the biases of measurement and estimation depend on context. Concerns such as mine regarding effect sizes should help when taking exaggerated estimates and mapping them to cost-benefit analyses.

Even with all our concerns, I do think projects such as this Vera study are useful in that they connect the qualitative aspects of administrating the program with quantitative evaluation. It’s also important that the project itself has social value and that the proposed mechanism of action makes sense. I’m reminded of our retrospective control study of the Millennium Villages project (here’s the published paper, here and here are two unpublished papers on the design of the study, and here’s a later discussion of our study and another evaluation of the project): the study could never have been perfect, but we learned a lot from doing a careful comparison.

To return to Ben-Menachem’s post, I think the framing of this as a “fight over rigor” is a mistake. The researchers at the Vera Institute and the economist at the Arnold Foundation seem to be operating at the same, reasonable, level of rigor. They’re concerned about causal identification and generalizability, they’re trying to learn what they can from observational data, etc. Regression adjustment with propensity scores is no more or less rigorous than instrumental variables or change-point analysis or multilevel modeling or any other method that might be applied in this sort of problem. It’s really all about the details.

It might help to compare this to an example we’ve discussed in this space many times before: flawed estimates of the effect of air pollution on lifespan. There’s lot of theory and evidence that air pollution is bad for your life expectancy. The theory and evidence are not 100% conclusive—there’s this idea that a little bit of pollution can make you stronger by stimulating your immune system or whatever—but we’re pretty much expecting heavy indoor air pollution to be bad for you.

The question then comes up, what is learned that is policy relevant from a really bad study of the effects of air pollution. I’d say, pretty much nothing. I have a more positive take on the Vera study, partly because it is very directly studying the effect of a treatment of interest. The analysis has some omitted variables concerns, also the published estimates are, I believe, way too high, but it still seems to me to be moving the ball forward. I guess that one way they could do better would be to focus on more immediate outcomes. I get that reduction in recidivism is the big goal, but that’s kind of indirect, meaning that we would expect smaller effects and noisier estimates. Direct outcomes of participation in the program could be a better thing to focus on. But I’m speaking in general terms here, as I have no knowledge of the prison system etc.

About that “66 percent”

As noted above, the Vera study concluded:

Participation in this form of postsecondary education reduced reconviction by at least 66 percent.

“At least 66 percent” . . . where did this come from? I searched the paper for “66” and found this passage:

Vera’s study found that participation in college in prison reduced the risk of reconviction by 66 to 67 percent (a relative risk of 0.33 and 0.34). (See Table 7.) The impact of participation in college education was found to reduce reconviction in all three of the analyses (six months, 12 months, and at any point following release). The consistency of estimated treatment effects gives Vera confidence in the validity of this finding.

And here is the relevant table:

Ummmm . . . no. Remember Type M errors? The raw estimate is HUGE (a reduction in risk of 66%) and the standard error is huge too (I guess it’s about 33%, given that a p-value of 0.05 corresponds to an estimate that’s approximately two standard errors away from zero) . . . that’s the classic recipe for bias.

Give it a straight-up Edlin factor of 1/2 and your estimated effect is to reduce the risk of reconviction by 33%, which still sounds kinda high to me, but I’ll leave this one to the experts. The Vera report states that they “detected a much stronger effect than prior studies,” and those prior studies could very well be positively biased themselves, so, yeah, my best guess is that any true average effect is less than 33%.

So when they say, “at least 66 percent”: I think that’s just wrong, an example of the very common statistical error of reporting an estimate without correcting for bias.

Also, I don’t buy that the result appearing in all three of the analyses represents a “consistency of estimated treatment effects” that should give “confidence in the validity of this finding.” The three analyses have a lot of overlap, no? I don’t have the raw data to check what proportion of the reconvictions within 12 months or at any point following release already occurred within 6 months, and I’m not saying the three summaries are entirely redundant. But they’re not independent pieces of information either. I have no idea why the estimates are soooo close to each other; I guess that is probably just one of those chance things which in this case give a misleading illusion of consistency.

Finally, to say a risk reduction of “66 to 67 percent” is a ridiculous level of precision, given that even if you were to just take the straight-up classical 95% intervals you’d get a range of risk reductions of something like 90 percent to zero percent (a relative risk between 0.1 and 1.0).

So we’re seeing overestimation of effect size and overconfidence in what can be learned by the study, which is an all-too-common problem in policy analysis (for example here).

None of this has anything to do with Doleac’s point. Even with no issues of identification at all, I don’t think this treatment effect estimate of 66% (or “at least 66%” or “66 to 67 percent”) decline in recidivism should be taken seriously.

To put it another way, if the same treatment were done on the same population, just with a different sample of people, what would I expect to see? I don’t know—but my best estimate would be that the observed difference would be a lot less than 66%. Call it the Edlin factor, call it Type M error, call it an empirical correction, call it Bayes; whatever you want to call it, I wouldn’t feel comfortable taking that 66% as an estimated effect.

As I always say for this sort of problem, this does not mean that I think the intervention has no effect, or that I have any certainty that the effect is less than the claimed estimate. The data are, indeed, consistent with that claimed 66% decline. The data are also consistent with many other things, including (in my view more plausibly) smaller average effects. What I’m disagreeing with is the claim that the study demonstrates provides strong evidence for that claimed effect, and I say this based on basic statistics, without even getting into causal identification.

P.S. Ben-Menachem is a Ph.D. student in sociology at Columbia and he’s published a paper on police stops in the APSR. I don’t recall meeting him, but maybe he came by the Playroom at some point? Columbia’s a big place.

1. Why so many non-econ papers by economists? 2. What’s on the math GRE and what does this have to do with stat Ph.D. programs? 3. How does modern research on combinatorics relate to statistics?

Posted on June 12, 2024 9:30 AM by Andrew

Someone who would prefer to remain anonymous writes:

A lot of the papers I’ve been reading that sound really interesting don’t seem to involve economics per se (e.g., https://web.archive.org/web/20070104045027/https://home.uchicago.edu/~eoster/hivbehavior.pdf), but they usually seem to come out of econ (as opposed to statistics) departments. Why is that? Is it a matter of culture? Or just because there are more economists? Or something else?

And here’s the longer version of my question.

I’ve been reading your blog for a couple of years and this post of yours, “Is an Oxford degree worth the parchment it’s printed on?”, from a month ago got me thinking about studying statistics. My background is mainly in engineering (BS CompE/Math, MS EE). Is it possible to get accepted to a good stats program with my background? I know people who have gone into econ with an engineering, but not statistics. I’ve also been reading some epidemiology papers that are really cool, so statistics seems ideal, since it’s heavily used in both econ and epidemiology, but I wonder if there’s some domain specific knowledge I’d be missing.

I’ve noticed that a lot of programs “strongly recommend” taking the GRE math subject test; is that pretty much required for someone with an unorthodox background? I’d probably have to read a topology and number theory text, and maybe a couple others to get an acceptable GRE math score, but those don’t seem too relevant to statistics (?). I’ve done that sort of thing before – I read and did all the exercises in a couple of engineering texts when I switched fields within engineering, and I could do it again, but, if given the choice, there are a other things I’d rather spend my time on.

Also, I recently ran into my old combinatorics professor, and he mentioned that he knew some people in various math departments who used combinatorics in statistics for things like experimental design. Is that sort of work purely the realm of the math departments, or does that happen in stats departments too? I loved doing combinatorics, and it would be great if I could do something in that area too.

My reply:

1. Here are a few reasons why academic economists do so much work that does not directly involve economics:

a. Economics is a large and growing field in academia, especially if you include business schools. So there are just a lot of economists out there doing work and publishing papers. They will branch out into non-economics topics sometimes.

b. Economics is also pretty open to research on non-academic topics. You don’t always see that in other fields. For example, I’ve been told that in political science, students and young faculty are often advised not to work in policy analysis.

c. Economists learn methodological tools, in particular, time series analysis and observational studies, which are useful in other empirical settings.

d. Economists are plugged in to the news media, so you might be more likely to hear about their work.

2. Here’s the syllabus for the GRE math subject test. I don’t remember any topology or number theory on the exam, but it’s possible they changed the syllabus some time during the past 40 years, also it’s not like my memory is perfect. Topology is cool—everybody should know a little bit of of topology, and even though it only very rarely arises directly in statistics, I think the abstractions of topology can help you understand all sorts of things. Number theory, yeah, I think that’s completely useless, although I could see how they’d have it on the test, because being able to answer a GRE math number theory question is probably highly correlated with understanding math more generally.

3. I am not up on the literature for combinatorics for experimental design. I doubt that there’s a lot being done in math departments in this area that has much relevance for applied statistics, but I guess there must be some complicated problems where this comes up. I too think combinatorics is fun. There probably are some interesting connections between combinatorics and statistics which I just haven’t thought about. My quick guess would be that there are connections to probability theory but not much to applied statistics.

P.S. This blog is on a lag, also sometimes we respond to questions from old emails.

When the story becomes the story

Posted on June 1, 2024 9:10 AM by Andrew

I was thinking recently about the popularity of Nudge, despite all its serious flaws, not just in presentation but in substance, not just extolling fraudulent science and the later not coming to terms with it, but also being part of a whole academic movement that relies on junk science even when you eliminate the clearly-identified fraud.

I can kinda see why this stuff would be popular with the Ted/NPR crowd, the kind of people who want to take your organization’s spare cash and spend it on management consultants, motivational speakers, and people who will organize lifeboat activities, which I guess is the modern equivalent of making people go to church every Sunday and mouth the words even if they don’t believe.

But how did it become so influential within academia? How is it that psychologists and economists (not to mention business and law professors) at top universities fell for it all?

Part of it is the whole academic-gold-rush thing: Tversky, Kahenman, and their predecessors and successors in the field of judgment and decision making really did indeed have lots of good ideas (see here, for example), and it made sense for other researchers to follow up and for others to popularize and promote the ideas.

So far, so good. But, then, when it moved from lab experiments and studies of defaults to goofy stuff like power pose and bottomless soup bowls and signing at the bottom and himmicanes and all the rest, why has it taken so long for academic researchers to jump off the train (and, indeed, some are still on it, serenely taking drinks in the club car as it goes off the cliff)?

Again, I’ll start with the charitable explanation, which I do think has a lot of truth to it: judgment and decision making is a real area of research, don’t want to throw out the baby, let’s accentuate the positive, etc etc. This is a strategic argument to keep quiet, keep getting some use of the bathwater as it slowly drains out [ok, sorry for switching metaphors but this just a blog post, ok? — ed.], basically the idea is to extract what value there is here and kinda keep quiet about the problems.

But . . . I think something else has been going on, not so much now as ten or fifteen years ago when these ideas were at their height, and that’s that the story became the story, which is indeed the subject of this post.

What do I mean by “the story became the story”? I mean that a big part of the appeal of the Nudge phenomenon is not just the lab studies of cognitive biases, not just the real-world studies of big effects (ok, some of these were p-hacked and others were flat-out faked, but people didn’t know that at the time), not just “nudge” as a cool unifying slogan that connected academic research to policy, not just potential dollars that flow toward a business- and government-friendly idea, but also the idea of Nudge as an academic success. The idea is that we should be rooting for Thaler and Sunstein because they’re local boys made good. The success is part of the story, in the same way that in the 1990s, Michael Jordan’s success was part of his story: people were rooting for Michael to break more records because it was all so exciting, the same way people liked to talk about how world-historically rich Bill Gates was, or about the incredible Tiger Woods phenomenon.

Sometimes when something gets big enough, its success becomes part of the story, and I think that’s what happened with Nudge and related intellectual products among much of social-science academia. One of their own had made it big.

Another example comes up in political campaigns and social movements. Brexit, Black Lives Matter, Barack Obama, Donald Trump: sometimes the story becomes the story. Part of the appeal of these movements is the story, that something big is happening.

It doesn’t have to happen that way. Sometimes we see the opposite, which is that someone or something becomes overexposed and then there’s a backlash. I guess that happened to some extent with Gladwell (follow-up here but also see here). So it’s not like I’m postulating any general laws here or anything. I just think it’s interesting how, in some cases, the story becomes the story.

Whassup with those economists who predicted a recession that then didn’t happen?

Posted on May 31, 2024 9:36 AM by Andrew

In a recent column entitled “Recession was inevitable, economists said. Here’s why they were wrong,” Gary Smith writes:

In an August 2022 CNBC interview, Steve H. Hanke, a Johns Hopkins University economics professor, predicted: ‘We’re going to have one whopper of a recession in 2023.’ In April 2023, he repeated the warning: ‘We know the recession is baked in the cake,’ he said. Many other economists also anticipated a recession in 2023. They were wrong.”

I am not an expert on monetary policy or economics. Rather, this story interests me as a political scientist, in that policy recommendations sometimes rely on academic arguments, and also as a student of statistical workflow I am interested in how people revise their models when they learn that they have made a mistake.

Along those lines, I sent an email to Hanke asking if he had written anything addressing his error regarding the recession prediction, and how he had revised his understanding of macroeconomics after the predicted outcome did not come to pass.

Hanke replied:

Allow me to first respond to your query of January 23rd. No, I have not written up why my longtime colleague John Greenwood and I changed our forecast concerning the timing of a likely recession. But, given your question, I now plan to do that. More on that below.

In brief, Greenwood and I employ the quantity theory of money to diagnose and predict the course of the economy (both inflation and real GDP growth). That’s the model, if you will, and we did not change our model prior to changing our forecast. So, why was our timing on the likely onset of a recession off? After the onset of the COVID pandemic, the money supply, broadly measured by M2, exploded at an unprecedented rate, resulting in a large quantity of excess money balances (see Table 2, p. 49 of the attached Greenwood-Hanke paper in the Journal of Applied Corporate Finance). We assumed, given historical patterns, etc., that this excess money would be exhausted and that a recession would commence in late 2023. Note that economic activity is typically affected with a lag of between 6 and 18 months after a significant change in the money supply. The lags are long and variable, sometimes even shorter than 6 months and longer than 18 months.

We monitored the data and realized that the excess money exhaustion was taking longer than we had originally assumed. So, we changed our forecast, but not our model. The attached Hanke-Greenwood article contains our new forecast and the reason why we believe a recession is “baked in the cake” in late 2024.

All this is very much in line with John Maynard Keynes’ quip, which has become somewhat of an adage: “When the facts change, I change my mind. What do you do, sir?”

Now, for a little context. After thinking about your question, I will include a more elaborate answer in a chapter in a book on money and banking that I am under contract to deliver by July. That chapter will include an extensive discussion of why the quantity theory of money allowed for an accurate diagnosis of the course of the economy and inflation during the Great Financial Crisis of 2008. In addition, I will include a discussion of how Greenwood and I ended up being almost the only ones that were able to anticipate the course of inflation in the post-pandemic period. Indeed, in 2021, we predicted that U.S. headline CPI would peak at 9% per year. This turned out to be very close to the 9.1% per year CPI peak in June 2022. Then, the Fed flipped the switch on its monetary printing presses. Since March 2022, the U.S. money supply has been falling like a stone. With that, Greenwood and I forecasted that CPI would end 2023 between 2% and 5% per year. With December’s CPI reading coming in at 3.4% per year, we hit the bullseye again. And, in this chapter, I will also elaborate on the details of why our initial prediction of the onset of a recession was too early, and why the data have essentially dictated that we move the onset forward by roughly a full year. In short, we have moved from the typical short end of the lag for the onset of a recession to the long end.

Again, macroeconomics is not my area of expertise. My last economics class was in 11th grade, and I remember our teacher telling us about challenges such as whether checking accounts count as “money.” I’m sure that everything is a zillion times more complicated now. So I’ll just leave the discussion above as is. Make of it what you will.

P.S. Since writing the above I came across a relevant news article by Jeanna Smialek and Ben Casselman entitled, “Economists Predicted a Recession. So Far They’ve Been Wrong: A widely predicted recession never showed up. Now, economists are assessing what the unexpected resilience tells us about the future.”

Dan Luu asks, “Why do people post on [bad platform] instead of [good platform]?”

Posted on May 28, 2024 9:36 AM by Andrew

Good analysis here. Here are Luu’s reasons why people post on twitter or do videos instead of blogging:

Engagement

Just looking at where people spend their time, short-form platforms like Twitter, Instagram, etc., completely dominate longer form platforms like Medium, Blogspot, etc.; you can see this in the valuations of these companies, in survey data, etc. Substack is the hottest platform for long-form content and its last valuation was ~$600M, basically a rounding error compared to the value of short-form platforms . . . The money is following the people and people have mostly moved on from long-form content. And if you talk to folks using substack about where their readers and growth comes from, that comes from platforms like Twitter, so people doing long-form content who optimize for engagement or revenue will still produce a lot of short-form content.

Friends

A lot of people are going to use whatever people around them are using. . . . Today, doing video is natural for folks who are starting to put their thoughts online.

Friction

When people talk about [bad platform] being lower friction, it’s usually about the emotional barriers to writing and publishing something, not the literal number of clicks it takes to publish something. We can argue about whether or not this is rational, whether this “objectively” makes sense, etc., but at the end of the day, it is simply true that many people find it mentally easier to write on a platform where you write short chunks of text instead of a single large chunk of text.

Revenue

And whatever the reason someone has for finding [bad platform] lower friction than [good platform], allowing people to use a platform that works for them means we get more content. When it comes to video, the same thing also applies because video monetizes so much better than text and there’s a lot of content that monetizes well on video that probably wouldn’t monetize well in text.

Luu demonstrates with many examples.

I’m convinced by Luu’s points. They do not contradict my position that Blogs > Twitter (see also here). Luu demonstrates solid reasons for using twitter or video, even if blogging results in higher-quality argumentation and discussion.

Blogging feels like the right way to go for me, but I also like writing articles and books. If I’d been born 50 years earlier, I think I just would’ve ended up writing lots more books, maybe a book a year instead of every two or three years.

As for Luu, he seems to do a lot more twitter posting than blog posting. I went on twitter to take a look, and his twitter posts are pretty good! That won’t get me to be a regular twitter reader, though, as I have my own tastes and time budget. I’ll continue to read his blog, so I hope he keeps posting there.

P.S. I was thinking of scheduling this for 1 Apr and announcing that I’d decided to abandon the blog for twitter, but I was afraid the argument might be so convincing that I’d actually do it!

Deadwood

Posted on May 24, 2024 9:20 AM by Andrew

I was thinking the other day about tenured faculty who don’t do their job or who do the absolute minimum: I’m talking about professors who never show up to work, do no research, and whose teaching is at the absolute minimum level of quality. Such people can actually be a negative in that they give students a substandard level of education and can make it more difficult to institute changes. Because of tenure, it’s difficult for such people to be fired. In theory, the administration should be able to take away their offices (no need for an office if you never come in) and reduce their salaries; instead, there’s a default to just keep giving everyone something close to the same annual salary increase. I can understand these constraints—performance in teaching, research, and service can be difficult to judge, and if it were too easy to take away office space and reduce salaries, then the admin could do this for all sorts of bad reasons. Indeed, universities can have non-tenured deadwood faculty too (for example, this guy): once someone’s on the inside, he can stay there for a long time.

Ultimately, we just accept this as part of the cost of doing business—just about every organization ends up with high-paid employees with negative value. Think of all the business executives who extract massive rents while at best pushing paper around and wasting people’s time, and at worst making bad decisions that drive their companies into the ground. The problem of faculty deadwood is just particularly salient to me because I’ve seen it as a student and teacher.

I’m just complaining here; I have no solutions to offer. A few years at Columbia we had a longtime professor of astronomy who opposed the tenure system in principle: he would’ve been tenured had he wanted to be, but he chose to be on five-year contracts. Maybe that would be a good idea for everyone. I’m not sure, though: I have a feeling that, if we were to switch to five-year contracts for the permanent faculty, that it would either be a rubber stamp (so that the deadwood guys would stay around forever anyway), or that it would be a huge paperwork hassle (an endless cycle of forms and committees for the faculty), or perhaps both, the worst of both worlds. The must unproductive faculty would be impossible to get rid of, and the most productive would just quit because they wouldn’t to deal with the reviews.

P.S. Another solution would be that the deadwood faculty would feel bad about drawing a salary while not doing their job of teaching/research/service, but it would take an exceptional person to quit a job where they pay you a lot and you don’t have to work or even show up to the office. Especially given that, if you don’t quit, you can wait for enough years and then retire with excellent pension benefits.

Statistical Modeling, Causal Inference, and Social Science

Category Archives: Economics