Bill James hangs up his hat. Also some general thoughts about book writing vs. blogging. Also I push back against James’s claim about sabermetrics and statistics.

Bill James folded up his blog, Bill James Online. From his announcement:

I [Bill James] want to focus on writing books. This website consumes so much of time and energy that I have found it impossible to focus on other projects. . . .

Or to put the reasons for the shutdown into one different sentence, we have pushed the economic insanity of this as far as we can push it. I don’t know how much I have earned for the hours I have put into this over the years, but I can tell you great confidence that is nowhere near the minimum wage. Actually, LOTS of people who blog about baseball and other stuff aren’t making anywhere near the minimum wage either, and God bless them, but when you have better options, then the decision to put your time into THIS, rather than THAT, amounts to working for negative money. A sensible man can only do that for so long, not that I am claiming to be a sensible man. A sensible man would have done this ten years ago.

Ohhhh, this is a topic close to my heart! I write something like 400 blog posts a year. If I weren’t blogging, I could be putting out a book a year. That is, assuming I could translate the blogging effort into book-writing effort, in the same way that a high jumper converts forward momentum into upward momentum.

Also, like James, I’m making less than the minimum wage from blogging. Actually, my income from blogging is negative. No kidding. I don’t just mean negative-in-opportunity-cost, I mean negative in dollar terms. I get paid zero ($0) for this blogging, and I pay a few thousand dollars every year to host the blog. My co-bloggers, on the other hand, make exactly zero, because they are paid $0 to blog and they pay $0 to maintain the blog.

As an academic, I’m used to making negative money from writing. Some journals charge publication fees, and very few venues pay. I know what you’re thinking—I’m paid to do research, and publication is a demonstration of that, there’s also “publish or perish,” etc.—and, sure, if I published nothing every year, then I think they’d stop giving me raises. But the marginal benefit of a new publication can’t be much. I think that, say, five published papers a year would be more than enough.

In summary: I blog for the same reason I publish articles and books, which is not for the money; it’s out of some desire to express myself, to share my thoughts with the world, out of a Bill-James-like annoyance at other people’s misconceptions that I’d like to fix, as well as a Bill-James-like understanding that public writing can be a great way to work out ideas, to let them develop far beyond what they would be within the cramped confines of my head. The usual story.

Now back to what Bill James said:

1. I’m glad that he’s switching back from blogs to books. I absolutely love many of his books, and I think he does better when he has the space to fully explore his arguments. Come to think of it, his classic Baseball Abstract books from the 80s had something of the flavor of blog posts—he used each team as a starting point for some musings and data analysis—; even there, though, the structure of going through all the teams in the league worked well.

As an author of several books, I recognize the importance of structure: much of what makes a book is not just its content but also how it is arranged, and I think this structure matters even for readers who jump around the book rather than going methodically from page 1 onward.

So, yes, much as it pains me as a blogger to say this, I think it’s good news that James is shifting his focus, and I look forward to reading his next baseball book.

2. James wrote, “A sensible man can only [write for much less than the minimum wage, when higher-earning options are available] for so long.” I don’t buy that! I know nothing about Bill James’s finances, but my guess is that he’s doing just fine and doesn’t need the money: he wrote a bunch of successful books, he worked for the Red Sox for many years, I recall reading that he made some money working on arbitration cases, and he seems like a sensible guy, has probably paid off his mortgage, drives an affordable car, no major gambling or drug habits, etc. Assuming that’s the case, then, no, there’s no “sensible” requirement to stop blogging.

As I just wrote above, I do see a very good reason for James to shut down the blog: it’s reason 1 above. Also, I see the logic to just stopping rather than, say, reducing the rate of posts. Readers expect a certain flow of material, and if the blog is there, then you’ll have the temptation to post things. Shut the blog down, go cold turkey, and turn that energy into book writing.

James describes his website as “a theoretically profit-making enterprise”—but why does it have to be? There’s no shame in working for free, sharing your thoughts and analyses with the world, and providing a forum for discussion. If you can make money doing all that, sure, go for it—but if you’re not set up to make it into a profit-making enterprise, so be it.

Should I stop blogging?

Should I stop blogging—or, at least, curtail it—so I can be more productive in other aspects of my job? Maybe so. My colleagues and I have several book projects lined up. Indeed, if I wasn’t blogging right now, I’d be working on the Bayesian Workflow book. Another way to roll would be to continue writing things like this, but instead of typing it into the WordPress window and scheduling it for July or whenever, inserting it into some big Word document and shaping it, along with the material from various other posts, into a book. That might be a good idea!

In theory I could get the best of both worlds—first post on blog, then reformat into a book—, but in practice this would not be so easy, because writing is structuring. Even writing this blog post involves thought about structure. So if I were to decide to put everything into a book, that structuring effort would go into that direction. I’m not sure.

I’ve been blogging here for nearly 20 years. Right now, my guess is that I’ll keep doing it for another 20, but I do see the logic to stopping. Or to changing the rules, in some way, for example first putting my ideas into books and then spinning them off into blogs, rather than the other way around.

One good thing about book writing and blogging is that these are two formats where you can write whatever you want. Writing articles for scholarly journals or public-facing newspapers or magazines is much more of a pain in the ass because you have to go through reviewers and editors. Don’t get me wrong—reviews and edits typically improve my writing—for example, this journal version of our Two Truths and a Lie activity is better in all respects than the original version I’d written up for the first draft of our Active Statistics book—but this sort of thing takes a lot of work. There’s enough new material in Active Statistics for 100 or 200 journal articles, but writing a hundred journal articles and shepherding them through the review process, that would’ve killed me.

So maybe the best option would be for me to stop blogging my immediate ideas and instead put this material into vessels for new books, then I could extract material from these books-in-preparation and use them as blog posts.

I’m not quite ready to take that step, but it could be the best of both worlds, allowing me to write more books while still providing a flow of material to all of you, and still providing this space for fun and thoughtful discussion.

Yes, sabermetrics is about statistics

After discussing his approach to blogging (which is similar to my approach, except that I think I engage more with the comments here), James says a few things that, as a statistician, I find annoying. Here he is:

It frustrates me that people still imagine that sabermetrics is about statistics. Sabermetrics doesn’t have anything at all to do with statistics, and we don’t use numbers any more than a doctor does, or an architect, or an economist.

C’mon, Bill. Sabermetrics is all about statistics. Not just any statistics, it’s about baseball statistics, but, still, sabermetrics without statistics is almost nothing. In contrast, subtract numbers from architecture and you still have much of architecture; subtract numbers from medicine and you still have much of medicine. You don’t need numbers to heal a broken bone; you don’t need numbers to culture penicillin. Numbers help in all sorts of areas of architecture and medicine, but they’re not central to these fields in the way that they are to sabermetrics. Economics, though, is pretty much reliant on numbers. Without numbers, you still have lots of economic theory and storytelling and policy, from Adam Smith to Karl Marx, but the numbers are where the rubber meets the road. Similar, you can do some sabermetrics without numbers, for example making a solid argument about batting order, but the numbers are what make it all works.

So I think Bill James is protesting too much. I get it: he’s annoyed when outsiders think of sabermetricians as pocket-protector-wearing geeks who don’t care about the sport. But to deny the centrality of statistics to sabermetrics, that just seems silly.

James follows up on his analogy:

What do you think doctors do? They create ways to measure things, things about your body, things about your health. They have reference to thousands and thousands and thousands of statistics that the medical community has created about every little part of your body, the thickness of the walls of your heart and the amount of iron in your blood and the amount of nitrogen in your blood and the amount of every other known substance that can be found in your blood or your hair or your sweat or saliva. If you want to know how many days of your life you could be expected to lose if you put on 1.7 pounds, your doctor could tell you if he wanted to, because of the hundreds of studies done by other doctors.

Ummm, no. Doctors don’t know many days of your life you could be expected to lose if you put on 1.7 pounds. This is a tough problem of causal inference, and there’s a lot of debate about, beyond the simple fact that this expected outcome, even if were known, would depend on your own situation in ways that are not well addressed by those “hundreds of studies.”

Let me put it another way. The analogy would go like this. Medicine is like baseball: it’s a field of human endeavor that is not inherently statistical but which produces a stream of data which, when collected and analyzed well, can yield useful insights. Biostatistics is like sabermetrics: it’s a field of human endeavor involving the collection and analysis and interpretation of data in the relevant field. Biostatistics is “about” medicine—it’s a field that only exists because of our interest in medicine—in the same way that sabermetrics is “about” baseball.

Sabermetrics is about baseball; it’s also about statistics. That should be no more controversial than saying that biostatistics is about medicine and also about statistics.

James continues:

Yes, we use statistics a lot. A painter uses a paint brush, we might say, every time that he paints . . . but when you interview a painter, you ask him about his paintings, not about his paint brushes.

But we do care about painters’ technique! And even about their brushes. Just for example, here’s something I found from a quick google search:

Monet’s gestural approach to painting encouraged Renoir to handle paint more freely, although Renoir’s style would remain distinct from his friend’s: while Monet created harsh and clearly circumscribed brushstrokes with fat pigments and flat-ended brushes, Renoir employed thinned paint to create liquid strokes of color that shimmered across his canvases. Such fluency can be seen in Road at Wargemont, where everything quivers with movement: wind gusts through the trees, sends clouds scudding across the sky, and prevents the rain from striking the earth.

There’s a discussion of technique—and of brushes—in the context of the paintings. By analogy, we might talk about some of the techniques used by sabermetricians in the context of their statements about baseball, for example when Bill James criticized Pete Palmer’s use of the linear weights method by looking at some ratings from some specific players. Here’s another example from James:

Total Baseball has Glenn Hubbard rated as a better player than Pete Rose, Brooks Robinson, Dale Murphy, Ken Boyer, or Sandy Koufax, a conclusion which is every bit as preposterous as it seems to be at first blush.

To a large extent, this rating is caused by the failure to adjust Hubbard’s fielding statistics for the ground-ball tendency of his pitching staff. Hubbard played second base for teams which had very high numbers of ground balls, as is reflected in their team assists totals. The Braves led the National League in team assists in 1985, 1986, and 1987, and were near the league lead in the other years that Hubbard was a regular. Total Baseball makes no adjustment for this, and thus concludes that Hubbard is reaching scores of baseballs every year that an average second basement would not reach, hence that he has enormous value.

It’s all about baseball—but the discussion gets into technique (“paint brushes”) as well.

I don’t think I’m disagreeing with James on his main point, which is that his subject matter is baseball, not statistics. The main difference is that I’m a statistician, so I don’t find it insulting for someone to say that sabermetrics is statistical!

P.S. The above post in itself demonstrates the strength and weakness of blogging.

The strength is that some little thing got me to start writing, and then the essay took on a life of its own, moving from a discussion of Bill James’s blogs and books, to my own blogs and books, to sabermetrics, to some general points about what is statistics, to art technique, and then looping back to a point made by James in one of his Baseball Abstracts.

The weakness is that it’s all mixed together, and as a result it might be difficult to reach all the desired audiences. I have no idea if this will ever be seen by Bill James, let alone all the other people who might enjoy it if they knew where to look.

Google is violating the First Law of Robotics.

There are a few sad things about this story:

1. Google failing at its job

2. Bad behavior being rewarded

3. The conversion of the news media into empty brands, followed by the brands being sold for scrap.

Back when news media companies actually made money, their reputations were valuable for their own sake. Now these brands are being mined for whatever remaining value that they might have.

Remember that book from 1990, “America: What Went Wrong?” OK, probably not. Many of you weren’t even alive in 1990. Anyway, it was a book about how manufacturing companies (I remember the example of Simplicity Patterns) were being bought, stripped, and destroyed by corporate raiders. Creative destruction, sure, that’s the way the world works, but in this case it was more like stealing.

The same thing has gone on for awhile with many media companies.

As they say in Goodfellas, it’s a bust-out operation.
Continue reading

Ancestor-worship in academia: Where does it happen?

I was reading this post by economist Rajiv Sethi, which began, “The celebrated economist Robert Solow has died at the age of 99,” and continued:

Solow is best known for his model of growth which is simple in the best sense . . . With over 42,000 citations, Solow’s paper on growth is among the most influential in economics. Much of the subsequent literature has built on his foundations . . . While Solow’s influence on the profession through his writings was profound, his influence through the students he advised was even greater. Take a look at his academic family tree, and you will see an extraordinary collection of economists . . . Some of his former students were themselves incredibly fertile (in generating economists), so Solow has an extraordinary collection of academic grandchildren. . . . Scholars have influence through their work, which can be tracked and traced through a thicket of citations. But they also have influence through their interpersonal guidance, which remains largely invisible and unheralded. Solow’s influence through both channels was staggering and profound. . . .

This is fine. I like to write obituaries too, and these are the places to write nice things (for example here, here, here, here, here, here, and here).

With regard to Solow, though, I was moved to reply:

I met Solow once and wasn’t impressed; see here.

But maybe he was having a bad day.

The larger problem I see is the habit of economists describing other economists in heroic terms. I guess econ takes this from math and physics. I remember as a math and physics student how we were supposed to worship Archimedes, Newton, Euler, etc. I’m sure they all deserved this, but it seems to have transferred into academics in certain fields making idols of their predecessors and even of their colleagues.

In poli sci I don’t see this ancestor-worship being so strong. Locke and Hobbes are intellectual heroes, sure (I’m not talking about their personal characteristics here), but modern political scientists don’t seem to be idolizing the political scientists of the mid and late twentieth century.

As to statistics: as students, we were taught that Fisher or Neyman were intellectual heroes, and I do think that has caused some problems. We should be able to celebrate great work without idealizing the individuals involved. They’re just people!

Rajiv responded:

Economists can be quite brutal and dismissive towards each other also, see for example Romer in the post you linked to! Solow himself was very critical of Lucas. For me there are relatively few real heroes, and Solow would not be among them. But his list of academic descendants is unmatched in terms of status and influence.

To which I replied:

I agree that economists can be negative about each other (for example, Krugman on Hayek and Galbraith). In that, they’re different from mathematicians and physicists, who wrote about their predecessors with worship or else didn’t bother writing about them at all. I can’t recall any mathematicians saying, “Cauchy wasn’t all that.”

Some more examples from economics are the extravagant and, in my impression, unreasonable adulation given to Gary Becker and Lawrence Summers. I’m sure these guys were impressive, but I don’t think the whole thing of treating them as geniuses worked out so well.

In a dark room with many of the lights covered up by opaque dollar bills, even a weak and intermittent beam can appear brilliant, if you look right at it.

I wonder if part of the problem is that economics has traditionally been taught as a history of great men as much as a history of ideas. When I took economics in high school we read a book called The Worldly Philosophers, which was all about how Adam Smith was so great, and David Ricardo was so great, and Keynes was so great . . . it’s no surprise that later economists wanted to add their personal heroes to the pantheon.

In contrast, in statistics, yes, we hear about Gauss and Laplace and Galton and Pearson and Neyman and Fisher, but it seems much more closely tied to the content. Gauss has the normal distribution, Laplace has probability theory, Galton has correlation and regression, etc. This continues to modern figures such as Rubin with causal inference, Wahba with splines, and Efron with bootstrap. The great contributors are known by name, and sometimes by epoynyn (“Gaussian distribution,” “Bayesian inference,” etc.), but still it seems ultimately more about the ideas than about the genius of their promulgators.

For some reason, things have gone differently in psychology. Great names such as Freud, Piaget, and Skinner get reassessed, and modern leaders in the field are respected for what they have not done, not so much for who they are. On the rare occasions that a psychologist attempts to puff himself up into a Great Man, he becomes more of a figure of derision than anything else.

It’s $ time! How much should we charge for a link?

The following came in the email, subject line “Paid Link Insertion Request”:

Hello Andrew,

I hope this message finds you well. In the course of our ongoing collaborations with clients, we’ve received feedback indicating their interest in incorporating a link on your site.

Could you please confirm if your site,, is open to link insertions, and if so, could you provide details regarding the associated charges?

If link insertions are not within your preferences, we are eager to submit a Guest Post instead. Could you kindly share any specific content requirements or guidelines for the submission?

Thank you for your time and consideration.

Best regards,

Joe Stone

I know I shouldn’t reply, but I couldn’t resist, so I sent a message back:

How much do you pay and what is in the link?

If Joe Stone responds to this, I’ll keep you informed.

P.S. It’s been a few months, and . . . no response. That’s fine—I can’t imagine that whatever they’d have paid would be enough to be worth the effort—but I’m kinda surprised I didn’t hear back from them. Kinda weird.

P.P.S. A few more months and still no reply from “Joe Stone.” As is often the case, I’m pretty sure this is a scam but I have no idea how it’s supposed to work.

Arnold Foundation and Vera Institute argue about a study of the effectiveness of college education programs in prison.

OK, this one’s in our wheelhouse. So I’ll write about it. I just want to say that writing this sort of post takes a lot of effort. When it comes to social engagement, my benefit/cost ratio is much higher if I just spend 10 minutes writing a post about the virtues of p-values or whatever. Maximizing the number of hits and blog comments isn’t the only goal, though, and I do find that writing this sort of long post helps me clarify my thinking, so here we go. . . .

Jonathan Ben-Menachem writes:

Two criminal justice reform heavyweights are trading blows over a seemingly arcane subject: research methods. . . . Jennifer Doleac, Executive Vice President of Criminal Justice at Arnold Ventures, accused the Vera Institute of Justice of “research malpractice” for their evaluation of New York college-in-prison programs. In a response posted on Vera’s website, President Nick Turner accused Doleac of “giving comfort to the opponents of reform.”

At first glance, the study at the core of this debate doesn’t seem controversial: Vera evaluated Manhattan DA-funded college education programs for New York prisoners and found that participants were less likely to commit a new crime after exiting prison. . . . Vera used a method called propensity score matching, and constructed a “control” group on the basis of prisoners’ similarity to the “treatment” group. . . . Despite their acknowledgment that “differences may remain across the groups,” Vera researchers contended that “any remaining differences on unobserved variables will be small.”

Doleac didn’t buy it. . . . She argued that propensity score matching could not account for potentially different “motivation and focus.” In other words, the kind of people who apply for classes are different from people who don’t apply, so the difference in outcomes can’t be attributed to prison education. . . .

Here’s Doleac’s full comment:

Vera Institute just released this study of a college-in-prison education program in NY, funded by the Manhattan DA’s Criminal Justice Investment Initiative. Researchers compared people who chose to enroll in the program with similar-looking people who chose not to. This does not isolate the treatment effect of the education program. It is very likely that those who enrolled were more motivated to change, and/or more able to focus on their goals. This pre-existing difference in motivation & focus likely caused both the difference in enrollment in the program and the subsequent difference in recidivism across groups.

This report provides no useful information about whether this NY program is having beneficial effects.

Now we return to Ben-Menachem for some background:

This fight between big philanthropy and a nonprofit executive is extremely rare, and points to a broader struggle over research and politics. The Vera Institute boasts a $264 million operating budget, and . . . has been working on bail reform since the 1960s. Arnold Ventures was founded in 2010, and the organization has allocated around $400 million to criminal justice reform—some of which went to Vera.

How does the debate over methods relate to larger policy questions? Ben-Menachem writes:

Although propensity score matching does have useful applications, I might have made a critique similar to Doleac if I was a peer reviewer for an academic journal. But I’m not sure about Doleac’s claim that Vera’s study provides “no useful information,” or her broader insistence on (quasi) experimental research designs. Because “all studies on this topic use the same flawed design,” Doleac argued, “we have *no idea* whether in-prison college programming is a good investment.” This is a striking declaration that nothing outside of causal inference counts.

He connects this to an earlier controversy:

In 2018, Doleac and Anita Mukherjee published a working paper called “The Moral Hazard of Lifesaving Innovations: Naloxone Access, Opioid Abuse, and Crime” which claimed that naloxone distribution fails to reduce overdose deaths while also “making riskier opioid use more appealing.” In addition to measurement problems, the moral hazard frame partly relied on an urban myth—“naloxone parties,” where opioid users stockpile naloxone, an FDA approved medication designed to rapidly reverse overdose, and intentionally overdose with the knowledge that they can be revived. The final version of the study includes no references to “naloxone parties,” removes the moral hazard framing from the title, and describes the findings as “suggestive” rather than causal.

Later that year, Doleac and coauthors published a research review in Brookings citing her controversial naloxone study claiming that both naloxone and syringe exchange programs were unsupported by rigorous research. Opioid health researchers immediately demanded a retraction, pointing to heaps of prior research suggesting that these policies reduce overdose deaths (among other benefits). . . .

Ben-Menachem connects this to debates between economists and others regarding the role of causal inference. He writes:

While causal inference can be useful, it is insufficient on its own and arguably not always necessary in the policy context. By contrast, Vera produces research using a very wide variety of methods. This work teaches us about the who, where, when, what, why, and how of criminalization. Causal inference primarily tells us “whether.”

I disagree with him on this one. Propensity score matching (which should be followed up with regression adjustment; see for example our discussion here) is a method that is used for causal inference. I will also channel my causal-inference colleagues and say that, if your goal is to estimate and understand the effects of a policy, causal inference is absolutely necessary. Ben-Menachem’s mistake is to identify “causal inference” with some particular forms of natural-experiment or instrumental-variables analyses. Also, no matter how you define it, causal inference primarily tells us, or attempts to tell us, “how much” and “where and when,” not “whether.” I agree with his larger point, though, which is that understanding (what we sometimes call “theory”) is important.

I think Ben-Menachem’s framing of this as economists-doing-causal-inference vs. other-researchers-doing-pluralism misses the mark. Everybody’s doing causal inference here, one way or another, and indeed matching can be just fine if it is used as part of a general strategy for adjustment, even if, as with other causal inference methods, it can do badly when applied blindly.

But let’s move on. Ben-Menachem continues:

In a recent interview about Arnold Ventures’ funding priorities, Doleac explained that her goal is to “help build the evidence base on what works, and then push for policy change based on that evidence.” But insisting on “rigorous” evidence before implementing policy change risks slowing the steady progress of decarceration to a grinding halt. . . .

In an email, Vera’s Turner echoed this point. “The cost of Doleac’s apparently rigid standard is that it not only devalues legitimate methods,” he wrote, “but it sets an unreasonably and unnecessarily high burden of proof to undo a system that itself has very little evidence supporting its current state.”

Indeed, mass incarceration was not built on “rigorous research.” . . . Yet today some philanthropists demand randomized controlled trials (or “natural experiments”) for every brick we want to remove from the wall of mass incarceration. . . .

Decarceration is a fight that takes place on the streets and in city halls across America, not in the halls of philanthropic organizations. . . . the narrow emphasis on the evaluation standards of academic economists will hamstring otherwise promising efforts to undo the harms of criminalization.

Several questions arise here:

1. What can be learned from this now-controversial research project? What does it tell us about the effects of New York college-in-prison programs, or about programs to reduce prison time?

2. Given the inevitable weaknesses of any study of this sort (including studies that Doleac or I or other methods critics might like), how should its findings inform policy?

3. What should advocates’ or legislators’ views of the policy options be, given that the evidence in favor of the status quo is far from rigorous by any standard?

4. Given questions 1, 2, 3 above, what is the relevance of methodological critiques of any study in a real-world policy context?

Let me go through these four questions in turn.

1. What can be learned from this now-controversial research project?

First we have to look at the study! Here it is: “The Impacts of College-in-Prison Participation on Safety and Employment in New York State: An Analysis of College Students Funded by the Criminal Justice Investment Initiative,” published in November 2023.

I have no connection to this particular project, but I have some tenuous connection to both of the organizations involved in this debate, as many years ago I attended a brief meeting at the Arnold Foundation regarding a study being done by the Vera Institute regarding a program they were doing in the correctional system. And many years ago my aunt Lucy taught math at Sing Sing prison for awhile.

Let’s go to the Vera report, which concludes:

The study found a strong, significant, and consistent effect of college participation on reducing new convictions following release. Participation in this form of postsecondary education reduced reconviction by at least 66 percent. . . .

Vera also conducted a cost analysis of these seven college-in-prison programs . . . Researchers calculated the costs reimbursed by CJII, as well as two measures of the overall cost: the average cost per student and the costs of adding an additional group of 10 or 20 students to an existing college program . . . Adding an additional group of 10 or 20 students to those colleges that provided both education and reentry services would cost colleges approximately $10,500 per additional student, while adding an additional group of students to colleges that focused on education would cost approximately $3,800 per additional student. . . . The final evaluation report will expand this cost analysis to a benefit-cost analysis, which will evaluate the return on investment of these monetary and resource outlays in terms of avoided incarceration, averted criminal victimization, and increased labor force participation and improved income.

And they connect this to policy:

This research indicates that academic college programs are highly effective at reducing future convictions among participating students. Yet, interest in college in prison among prospective students far outstrips the ability of institutions of higher education to provide that programming, due in no small part to resource constraints. In such a context, funding through initiatives such as CJII and through state and federal programs not only supports the aspirations of people who are incarcerated but also promotes public safety.

Now let’s jump to the methods. From page 13 of the report onward:

To understand the impact of access to a college education on the people in the program, Vera researchers needed to know what would have happened to these people if they had not participated in the program. . . . Ideally, researchers need these comparisons to be between groups that are otherwise as similar as possible to guard against attributing outcomes to the effects of education that may be due to the characteristics of people who are eligible for or interested in participating in education. In a fair comparison of students and nonstudents, the only difference between the two is that students participated in college education in prison while nonstudents did not. . . . One study of the impacts of college in prison on criminal legal system outcomes found that people who chose or were able to access education differed in their demographics, employment and conviction histories, and sentence lengths from people who did not choose or have the ability to access education. This indicates a need for research and statistical methods that can account for such “selection” into college education . . .

The best way to create the fair comparisons needed to estimate causal effects is to perform a randomized experiment. However, this was not done in this study due to the ethical impact of withholding from a comparison group an intervention that has established positive benefits . . . Vera researchers instead aimed to create a fairer comparison across groups using a statistical technique called propensity score matching . . . Vera researchers matched students and nonstudents on the following variables:
– demographics . . .
– conviction history . . .
– correctional characteristics . . .
– education characteristics . . .
Researchers considered nonstudents to be eligible for comparison not only if they met the same academic and behavioral history requirements as students but also if they had a similar time to release during the CIP period, a similar age at incarceration, and a similar time from prison admission to eligibility. . . . when evaluating whether an intervention influences an outcome of interest, it is a necessary but not sufficient condition that the intervention happens before the outcome. Vera researchers therefore defined a “start date” for students and a “virtual start date” for nonstudents in order to determine when to begin measuring in-facility outcomes, which included Tier II, Tier III, high-severity, and all misconducts. . . . To examine the effect of college education in prison on misconducts and on reported wages, Vera researchers used linear regression on the matched sample. For formal employment status and for an incident within six months and 12 months of release that led to a new conviction, Vera used logistic regression on the matched sample. For recidivism at any point following release, Vera used survival analysis on the matched sample to estimate the impact of the program on the time until an incident that leads to a new conviction occurs.

What about the concern expressed by Doleac regarding differences that are not accounted for by the matching and adjustment variables? Here’s what the report says:

Vera researchers have attempted to control [I’d prefer the term “adjust” — ed.] for pre-incarceration factors, such as conviction history, age, and gender, that may contribute to misconducts in prison. However, Vera was not able to control for other pre-incarceration factors that have been found in the literature to contribute to misconducts, such as marital status and family structure, mental health needs, a history of physical abuse, antisocial attitudes and beliefs, religiosity, socioeconomic disadvantage and exposure to geographically concentrated poverty, and other factors that, if present, would still allow a person to remain eligible for college education but might influence misconducts. Vera researchers also have not been able to control for factors that may be related to misconducts, including characteristics of the prison management environment, such as prison size, and the proportion of people incarcerated under age 25, as Vera did not have access to information about the facilities where nonstudents were incarcerated. Vera also did not have access to other programs that students and nonstudents may be participating in, such as work assignments, other programming, or health and mental health service engagement, which may influence in-facility behavior and are commonly used as controls in the literature. If other literature on the subject is correct and education does help to lower misconducts, Vera may have, by chance, mismatched students with controls who, unobserved to researchers and unmeasured in the data, were less likely to have characteristics or be exposed to environments that influence misconducts. While prior misconducts, assigned security class, and time since admission may, as proxies, capture some of this information, they may do so imperfectly.

They have plans to mitigate these limitations going forward:

First, Vera will receive information on new students and newly eligible nonstudents who have enrolled or become eligible following receipt of the first tranche of data. Researchers will also have the opportunity to follow the people in the analytical sample for the present study over a longer period of time. . . . Second, researchers will receive new variables in new time periods from both DOCCS and DOL. Vera plans to obtain more detailed information on both misconducts and counts of misconducts that take place in different time periods for the final report. . . . Next, Vera will obtain data on pre-incarceration wages and formal employment status, which could help researchers to achieve better balance between students and nonstudents on their work histories . . .

In summary: Yeah, observational studies are hard. You adjust for what you can adjust for, then you can do supplementary analyses to assess the sizes and directions of possible biases. I’m kinda with Ben-Menachem on this one: Doleac’s right that the study “does not isolate the treatment effect of the education program,” but there’s really no way to isolate this effect—indeed, there is no single “effect,” as any effect will vary by person and depend on context. But to say that the report “provides no useful information” about the effect . . . I think that’s way too harsh.

Another way of saying this is that, speaking in general terms, I don’t find adjusting for existing pre-treatment variables to be a worse identification strategy than instrumental variables, or difference-in-differences, or various other methods that are used for causal inference from observational studies. All these methods rely on strong, false assumptions. I’m not saying that these methods are equivalent, either in general or in any particular case, just that all have flaws. And indeed, in her work with the Arnold Foundation, Doleac promotes various criminal-justice reforms. So I’m not quite sure why she’s so bothered by this particular Vera study. I’m not saying she’s wrong to be bothered by it; there just must be more to the story, other reasons she has for concern that were not mentioned in her above-linked social media post.

Also, I don’t believe that estimate from the Vera study that the treatment reduces recidivism by 66%. No way. See the section “About that ’66 percent'” below for details. So there are reasons to be bothered by that report; I just don’t quite get where Doleac is coming from in her particular criticism.

2. Given the inevitable weaknesses of any study of this sort, how should its findings inform policy?

I guess it’s the usual story: each study only adds a bit to the big picture. The Vera study is encouraging to the extent that it’s part of a larger story that makes sense and is consistent with observation. The results so far seem too noisy to be able to say much about the size of the effect, but maybe more will be learned from the followups.

3. What should advocates’ or legislators’ views of the policy options be, given that the evidence in favor of the status quo is far from rigorous by any standard?

This I’m not sure. It depends on your understanding of justice policy. Ben-Menachem and others want to reduce mass incarceration, and this makes sense to me, but others have different views and take the position that mass incarceration has positive net effects.

I agree with Ben-Menachem that policymakers should not stick with the status quo, just on the basis that there is no strong evidence in favor of a particular alternative. For one thing, the status quo is itself relatively recent, so it’s not like it can be supported based on any general “if it ain’t broke, don’t fix it” principle. But . . . I don’t think Doleac is taking a stick-with-the-status-quo position either! Yes, she’s saying that the Vera study “provides no useful information”—a statement I don’t really agree with—but I don’t see her saying that New York’s college-in-prison education program is a bad idea, or that it shouldn’t be funded. I take Doleac as saying that, if policymakers want to fund this program, they should be clear that they’re making this decision based on their theoretical understanding, or maybe based on political concerns, not based on a solid empirical estimate of its effects.

4. Given questions 1, 2, 3 above, what is the relevance of methodological critiques of any study in a real-world policy context?

Methodological critique can help us avoid overconfidence in the interpretation of results.

Concerns such as Doleac’s regarding identification help us understand how different studies can differ so much in their results: in addition to sampling variation and varying treatment effect, the biases of measurement and estimation depend on context. Concerns such as mine regarding effect sizes should help when taking exaggerated estimates and mapping them to cost-benefit analyses.

Even with all our concerns, I do think projects such as this Vera study are useful in that they connect the qualitative aspects of administrating the program with quantitative evaluation. It’s also important that the project itself has social value and that the proposed mechanism of action makes sense. I’m reminded of our retrospective control study of the Millennium Villages project (here’s the published paper, here and here are two unpublished papers on the design of the study, and here’s a later discussion of our study and another evaluation of the project): the study could never have been perfect, but we learned a lot from doing a careful comparison.

To return to Ben-Menachem’s post, I think the framing of this as a “fight over rigor” is a mistake. The researchers at the Vera Institute and the economist at the Arnold Foundation seem to be operating at the same, reasonable, level of rigor. They’re concerned about causal identification and generalizability, they’re trying to learn what they can from observational data, etc. Regression adjustment with propensity scores is no more or less rigorous than instrumental variables or change-point analysis or multilevel modeling or any other method that might be applied in this sort of problem. It’s really all about the details.

It might help to compare this to an example we’ve discussed in this space many times before: flawed estimates of the effect of air pollution on lifespan. There’s lot of theory and evidence that air pollution is bad for your life expectancy. The theory and evidence are not 100% conclusive—there’s this idea that a little bit of pollution can make you stronger by stimulating your immune system or whatever—but we’re pretty much expecting heavy indoor air pollution to be bad for you.

The question then comes up, what is learned that is policy relevant from a really bad study of the effects of air pollution. I’d say, pretty much nothing. I have a more positive take on the Vera study, partly because it is very directly studying the effect of a treatment of interest. The analysis has some omitted variables concerns, also the published estimates are, I believe, way too high, but it still seems to me to be moving the ball forward. I guess that one way they could do better would be to focus on more immediate outcomes. I get that reduction in recidivism is the big goal, but that’s kind of indirect, meaning that we would expect smaller effects and noisier estimates. Direct outcomes of participation in the program could be a better thing to focus on. But I’m speaking in general terms here, as I have no knowledge of the prison system etc.

About that “66 percent”

As noted above, the Vera study concluded:

Participation in this form of postsecondary education reduced reconviction by at least 66 percent.

“At least 66 percent” . . . where did this come from? I searched the paper for “66” and found this passage:

Vera’s study found that participation in college in prison reduced the risk of reconviction by 66 to 67 percent (a relative risk of 0.33 and 0.34). (See Table 7.) The impact of participation in college education was found to reduce reconviction in all three of the analyses (six months, 12 months, and at any point following release). The consistency of estimated treatment effects gives Vera confidence in the validity of this finding.

And here is the relevant table:

Ummmm . . . no. Remember Type M errors? The raw estimate is HUGE (a reduction in risk of 66%) and the standard error is huge too (I guess it’s about 33%, given that a p-value of 0.05 corresponds to an estimate that’s approximately two standard errors away from zero) . . . that’s the classic recipe for bias.

Give it a straight-up Edlin factor of 1/2 and your estimated effect is to reduce the risk of reconviction by 33%, which still sounds kinda high to me, but I’ll leave this one to the experts. The Vera report states that they “detected a much stronger effect than prior studies,” and those prior studies could very well be positively biased themselves, so, yeah, my best guess is that any true average effect is less than 33%.

So when they say, “at least 66 percent”: I think that’s just wrong, an example of the very common statistical error of reporting an estimate without correcting for bias.

Also, I don’t buy that the result appearing in all three of the analyses represents a “consistency of estimated treatment effects” that should give “confidence in the validity of this finding.” The three analyses have a lot of overlap, no? I don’t have the raw data to check what proportion of the reconvictions within 12 months or at any point following release already occurred within 6 months, and I’m not saying the three summaries are entirely redundant. But they’re not independent pieces of information either. I have no idea why the estimates are soooo close to each other; I guess that is probably just one of those chance things which in this case give a misleading illusion of consistency.

Finally, to say a risk reduction of “66 to 67 percent” is a ridiculous level of precision, given that even if you were to just take the straight-up classical 95% intervals you’d get a range of risk reductions of something like 90 percent to zero percent (a relative risk between 0.1 and 1.0).

So we’re seeing overestimation of effect size and overconfidence in what can be learned by the study, which is an all-too-common problem in policy analysis (for example here).

None of this has anything to do with Doleac’s point. Even with no issues of identification at all, I don’t think this treatment effect estimate of 66% (or “at least 66%” or “66 to 67 percent”) decline in recidivism should be taken seriously.

To put it another way, if the same treatment were done on the same population, just with a different sample of people, what would I expect to see? I don’t know—but my best estimate would be that the observed difference would be a lot less than 66%. Call it the Edlin factor, call it Type M error, call it an empirical correction, call it Bayes; whatever you want to call it, I wouldn’t feel comfortable taking that 66% as an estimated effect.

As I always say for this sort of problem, this does not mean that I think the intervention has no effect, or that I have any certainty that the effect is less than the claimed estimate. The data are, indeed, consistent with that claimed 66% decline. The data are also consistent with many other things, including (in my view more plausibly) smaller average effects. What I’m disagreeing with is the claim that the study demonstrates provides strong evidence for that claimed effect, and I say this based on basic statistics, without even getting into causal identification.

P.S. Ben-Menachem is a Ph.D. student in sociology at Columbia and he’s published a paper on police stops in the APSR. I don’t recall meeting him, but maybe he came by the Playroom at some point? Columbia’s a big place.

1. Why so many non-econ papers by economists? 2. What’s on the math GRE and what does this have to do with stat Ph.D. programs? 3. How does modern research on combinatorics relate to statistics?

Someone who would prefer to remain anonymous writes:

A lot of the papers I’ve been reading that sound really interesting don’t seem to involve economics per se (e.g.,, but they usually seem to come out of econ (as opposed to statistics) departments. Why is that? Is it a matter of culture? Or just because there are more economists? Or something else?

And here’s the longer version of my question.

I’ve been reading your blog for a couple of years and this post of yours, “Is an Oxford degree worth the parchment it’s printed on?”, from a month ago got me thinking about studying statistics. My background is mainly in engineering (BS CompE/Math, MS EE). Is it possible to get accepted to a good stats program with my background? I know people who have gone into econ with an engineering, but not statistics. I’ve also been reading some epidemiology papers that are really cool, so statistics seems ideal, since it’s heavily used in both econ and epidemiology, but I wonder if there’s some domain specific knowledge I’d be missing.

I’ve noticed that a lot of programs “strongly recommend” taking the GRE math subject test; is that pretty much required for someone with an unorthodox background? I’d probably have to read a topology and number theory text, and maybe a couple others to get an acceptable GRE math score, but those don’t seem too relevant to statistics (?). I’ve done that sort of thing before – I read and did all the exercises in a couple of engineering texts when I switched fields within engineering, and I could do it again, but, if given the choice, there are a other things I’d rather spend my time on.

Also, I recently ran into my old combinatorics professor, and he mentioned that he knew some people in various math departments who used combinatorics in statistics for things like experimental design. Is that sort of work purely the realm of the math departments, or does that happen in stats departments too? I loved doing combinatorics, and it would be great if I could do something in that area too.

My reply:

1. Here are a few reasons why academic economists do so much work that does not directly involve economics:

a. Economics is a large and growing field in academia, especially if you include business schools. So there are just a lot of economists out there doing work and publishing papers. They will branch out into non-economics topics sometimes.

b. Economics is also pretty open to research on non-academic topics. You don’t always see that in other fields. For example, I’ve been told that in political science, students and young faculty are often advised not to work in policy analysis.

c. Economists learn methodological tools, in particular, time series analysis and observational studies, which are useful in other empirical settings.

d. Economists are plugged in to the news media, so you might be more likely to hear about their work.

2. Here’s the syllabus for the GRE math subject test. I don’t remember any topology or number theory on the exam, but it’s possible they changed the syllabus some time during the past 40 years, also it’s not like my memory is perfect. Topology is cool—everybody should know a little bit of of topology, and even though it only very rarely arises directly in statistics, I think the abstractions of topology can help you understand all sorts of things. Number theory, yeah, I think that’s completely useless, although I could see how they’d have it on the test, because being able to answer a GRE math number theory question is probably highly correlated with understanding math more generally.

3. I am not up on the literature for combinatorics for experimental design. I doubt that there’s a lot being done in math departments in this area that has much relevance for applied statistics, but I guess there must be some complicated problems where this comes up. I too think combinatorics is fun. There probably are some interesting connections between combinatorics and statistics which I just haven’t thought about. My quick guess would be that there are connections to probability theory but not much to applied statistics.

P.S. This blog is on a lag, also sometimes we respond to questions from old emails.

When the story becomes the story

I was thinking recently about the popularity of Nudge, despite all its serious flaws, not just in presentation but in substance, not just extolling fraudulent science and the later not coming to terms with it, but also being part of a whole academic movement that relies on junk science even when you eliminate the clearly-identified fraud.

I can kinda see why this stuff would be popular with the Ted/NPR crowd, the kind of people who want to take your organization’s spare cash and spend it on management consultants, motivational speakers, and people who will organize lifeboat activities, which I guess is the modern equivalent of making people go to church every Sunday and mouth the words even if they don’t believe.

But how did it become so influential within academia? How is it that psychologists and economists (not to mention business and law professors) at top universities fell for it all?

Part of it is the whole academic-gold-rush thing: Tversky, Kahenman, and their predecessors and successors in the field of judgment and decision making really did indeed have lots of good ideas (see here, for example), and it made sense for other researchers to follow up and for others to popularize and promote the ideas.

So far, so good. But, then, when it moved from lab experiments and studies of defaults to goofy stuff like power pose and bottomless soup bowls and signing at the bottom and himmicanes and all the rest, why has it taken so long for academic researchers to jump off the train (and, indeed, some are still on it, serenely taking drinks in the club car as it goes off the cliff)?

Again, I’ll start with the charitable explanation, which I do think has a lot of truth to it: judgment and decision making is a real area of research, don’t want to throw out the baby, let’s accentuate the positive, etc etc. This is a strategic argument to keep quiet, keep getting some use of the bathwater as it slowly drains out [ok, sorry for switching metaphors but this just a blog post, ok? — ed.], basically the idea is to extract what value there is here and kinda keep quiet about the problems.

But . . . I think something else has been going on, not so much now as ten or fifteen years ago when these ideas were at their height, and that’s that the story became the story, which is indeed the subject of this post.

What do I mean by “the story became the story”? I mean that a big part of the appeal of the Nudge phenomenon is not just the lab studies of cognitive biases, not just the real-world studies of big effects (ok, some of these were p-hacked and others were flat-out faked, but people didn’t know that at the time), not just “nudge” as a cool unifying slogan that connected academic research to policy, not just potential dollars that flow toward a business- and government-friendly idea, but also the idea of Nudge as an academic success. The idea is that we should be rooting for Thaler and Sunstein because they’re local boys made good. The success is part of the story, in the same way that in the 1990s, Michael Jordan’s success was part of his story: people were rooting for Michael to break more records because it was all so exciting, the same way people liked to talk about how world-historically rich Bill Gates was, or about the incredible Tiger Woods phenomenon.

Sometimes when something gets big enough, its success becomes part of the story, and I think that’s what happened with Nudge and related intellectual products among much of social-science academia. One of their own had made it big.

Another example comes up in political campaigns and social movements. Brexit, Black Lives Matter, Barack Obama, Donald Trump: sometimes the story becomes the story. Part of the appeal of these movements is the story, that something big is happening.

It doesn’t have to happen that way. Sometimes we see the opposite, which is that someone or something becomes overexposed and then there’s a backlash. I guess that happened to some extent with Gladwell (follow-up here but also see here). So it’s not like I’m postulating any general laws here or anything. I just think it’s interesting how, in some cases, the story becomes the story.

Whassup with those economists who predicted a recession that then didn’t happen?

In a recent column entitled “Recession was inevitable, economists said. Here’s why they were wrong,” Gary Smith writes:

In an August 2022 CNBC interview, Steve H. Hanke, a Johns Hopkins University economics professor, predicted: ‘We’re going to have one whopper of a recession in 2023.’ In April 2023, he repeated the warning: ‘We know the recession is baked in the cake,’ he said. Many other economists also anticipated a recession in 2023. They were wrong.”

I am not an expert on monetary policy or economics. Rather, this story interests me as a political scientist, in that policy recommendations sometimes rely on academic arguments, and also as a student of statistical workflow I am interested in how people revise their models when they learn that they have made a mistake.

Along those lines, I sent an email to Hanke asking if he had written anything addressing his error regarding the recession prediction, and how he had revised his understanding of macroeconomics after the predicted outcome did not come to pass.

Hanke replied:

Allow me to first respond to your query of January 23rd. No, I have not written up why my longtime colleague John Greenwood and I changed our forecast concerning the timing of a likely recession. But, given your question, I now plan to do that. More on that below.

In brief, Greenwood and I employ the quantity theory of money to diagnose and predict the course of the economy (both inflation and real GDP growth). That’s the model, if you will, and we did not change our model prior to changing our forecast. So, why was our timing on the likely onset of a recession off? After the onset of the COVID pandemic, the money supply, broadly measured by M2, exploded at an unprecedented rate, resulting in a large quantity of excess money balances (see Table 2, p. 49 of the attached Greenwood-Hanke paper in the Journal of Applied Corporate Finance). We assumed, given historical patterns, etc., that this excess money would be exhausted and that a recession would commence in late 2023. Note that economic activity is typically affected with a lag of between 6 and 18 months after a significant change in the money supply. The lags are long and variable, sometimes even shorter than 6 months and longer than 18 months.

We monitored the data and realized that the excess money exhaustion was taking longer than we had originally assumed. So, we changed our forecast, but not our model. The attached Hanke-Greenwood article contains our new forecast and the reason why we believe a recession is “baked in the cake” in late 2024.

All this is very much in line with John Maynard Keynes’ quip, which has become somewhat of an adage: “When the facts change, I change my mind. What do you do, sir?”

Now, for a little context. After thinking about your question, I will include a more elaborate answer in a chapter in a book on money and banking that I am under contract to deliver by July. That chapter will include an extensive discussion of why the quantity theory of money allowed for an accurate diagnosis of the course of the economy and inflation during the Great Financial Crisis of 2008. In addition, I will include a discussion of how Greenwood and I ended up being almost the only ones that were able to anticipate the course of inflation in the post-pandemic period. Indeed, in 2021, we predicted that U.S. headline CPI would peak at 9% per year. This turned out to be very close to the 9.1% per year CPI peak in June 2022. Then, the Fed flipped the switch on its monetary printing presses. Since March 2022, the U.S. money supply has been falling like a stone. With that, Greenwood and I forecasted that CPI would end 2023 between 2% and 5% per year. With December’s CPI reading coming in at 3.4% per year, we hit the bullseye again. And, in this chapter, I will also elaborate on the details of why our initial prediction of the onset of a recession was too early, and why the data have essentially dictated that we move the onset forward by roughly a full year. In short, we have moved from the typical short end of the lag for the onset of a recession to the long end.

Again, macroeconomics is not my area of expertise. My last economics class was in 11th grade, and I remember our teacher telling us about challenges such as whether checking accounts count as “money.” I’m sure that everything is a zillion times more complicated now. So I’ll just leave the discussion above as is. Make of it what you will.

P.S. Since writing the above I came across a relevant news article by Jeanna Smialek and Ben Casselman entitled, “Economists Predicted a Recession. So Far They’ve Been Wrong: A widely predicted recession never showed up. Now, economists are assessing what the unexpected resilience tells us about the future.”

Dan Luu asks, “Why do people post on [bad platform] instead of [good platform]?”

Good analysis here. Here are Luu’s reasons why people post on twitter or do videos instead of blogging:


Just looking at where people spend their time, short-form platforms like Twitter, Instagram, etc., completely dominate longer form platforms like Medium, Blogspot, etc.; you can see this in the valuations of these companies, in survey data, etc. Substack is the hottest platform for long-form content and its last valuation was ~$600M, basically a rounding error compared to the value of short-form platforms . . . The money is following the people and people have mostly moved on from long-form content. And if you talk to folks using substack about where their readers and growth comes from, that comes from platforms like Twitter, so people doing long-form content who optimize for engagement or revenue will still produce a lot of short-form content.


A lot of people are going to use whatever people around them are using. . . . Today, doing video is natural for folks who are starting to put their thoughts online.


When people talk about [bad platform] being lower friction, it’s usually about the emotional barriers to writing and publishing something, not the literal number of clicks it takes to publish something. We can argue about whether or not this is rational, whether this “objectively” makes sense, etc., but at the end of the day, it is simply true that many people find it mentally easier to write on a platform where you write short chunks of text instead of a single large chunk of text.


And whatever the reason someone has for finding [bad platform] lower friction than [good platform], allowing people to use a platform that works for them means we get more content. When it comes to video, the same thing also applies because video monetizes so much better than text and there’s a lot of content that monetizes well on video that probably wouldn’t monetize well in text.

Luu demonstrates with many examples.

I’m convinced by Luu’s points. They do not contradict my position that Blogs > Twitter (see also here). Luu demonstrates solid reasons for using twitter or video, even if blogging results in higher-quality argumentation and discussion.

Blogging feels like the right way to go for me, but I also like writing articles and books. If I’d been born 50 years earlier, I think I just would’ve ended up writing lots more books, maybe a book a year instead of every two or three years.

As for Luu, he seems to do a lot more twitter posting than blog posting. I went on twitter to take a look, and his twitter posts are pretty good! That won’t get me to be a regular twitter reader, though, as I have my own tastes and time budget. I’ll continue to read his blog, so I hope he keeps posting there.

P.S. I was thinking of scheduling this for 1 Apr and announcing that I’d decided to abandon the blog for twitter, but I was afraid the argument might be so convincing that I’d actually do it!


I was thinking the other day about tenured faculty who don’t do their job or who do the absolute minimum: I’m talking about professors who never show up to work, do no research, and whose teaching is at the absolute minimum level of quality. Such people can actually be a negative in that they give students a substandard level of education and can make it more difficult to institute changes. Because of tenure, it’s difficult for such people to be fired. In theory, the administration should be able to take away their offices (no need for an office if you never come in) and reduce their salaries; instead, there’s a default to just keep giving everyone something close to the same annual salary increase. I can understand these constraints—performance in teaching, research, and service can be difficult to judge, and if it were too easy to take away office space and reduce salaries, then the admin could do this for all sorts of bad reasons. Indeed, universities can have non-tenured deadwood faculty too (for example, this guy): once someone’s on the inside, he can stay there for a long time.

Ultimately, we just accept this as part of the cost of doing business—just about every organization ends up with high-paid employees with negative value. Think of all the business executives who extract massive rents while at best pushing paper around and wasting people’s time, and at worst making bad decisions that drive their companies into the ground. The problem of faculty deadwood is just particularly salient to me because I’ve seen it as a student and teacher.

I’m just complaining here; I have no solutions to offer. A few years at Columbia we had a longtime professor of astronomy who opposed the tenure system in principle: he would’ve been tenured had he wanted to be, but he chose to be on five-year contracts. Maybe that would be a good idea for everyone. I’m not sure, though: I have a feeling that, if we were to switch to five-year contracts for the permanent faculty, that it would either be a rubber stamp (so that the deadwood guys would stay around forever anyway), or that it would be a huge paperwork hassle (an endless cycle of forms and committees for the faculty), or perhaps both, the worst of both worlds. The must unproductive faculty would be impossible to get rid of, and the most productive would just quit because they wouldn’t to deal with the reviews.

P.S. Another solution would be that the deadwood faculty would feel bad about drawing a salary while not doing their job of teaching/research/service, but it would take an exceptional person to quit a job where they pay you a lot and you don’t have to work or even show up to the office. Especially given that, if you don’t quit, you can wait for enough years and then retire with excellent pension benefits.

If I got a nickel every time . . .

Justin Savoie came across this webpage on “Abacus Data MRP: A Game-Changer for GR, Advocacy, and Advertising”:

We should get these people to write our grant proposals! Seriously, we should tap them for funding. They’re using MRP, we’re developing improvements to MRP . . . it would be a good investment.

After noticing this bit:

Abacus Data collaborated closely with the Association to design, execute, and analyze a comprehensive national survey of at least 3,000 Canadian adults. The survey results were meticulously broken down by vital demographics, political variables, and geographical locations in a easy to read, story-telling driven report. Something we are known for – ask around!

The real innovation, however, lay in Abacus Data MRP’s unique capability. Beyond the general survey analysis, it generated 338 individual reports, each tailored for a specific Member of Parliament.

Savoie commented:

Cool! But 3000/338 = 9 … so I don’t know about the tailoring.

Good point. Let’s not oversell.

How to think about the effect of the economy on political attitudes and behavior?

We’re familiar with the idea that the economy can and should affect elections. The connection is borne out empirically (Roosevelt’s victory in 1932 following the depression, his landslide reelection in 1936 amid economic growth, Johnson and Reagan winning huge reelections during the boom periods of 1964 and 1984, Carter losing in the 1980 recession and Bush Sr. losing in the 1992 mini-recession) and theoretically, either from the principle of retrospective voting (giving a political party credit or blame for its stewardship of the economy) or prospective voting (giving the keys of the economy to the party that you think can do the job). A good starting point here is Steven Rosenstone’s book from 1983, Forecasting Presidential Elections.

Indeed, the principle of economic voting (“It’s the economy, stupid”) has become so familiar that it was overgeneralized to apply to off-year (non-presidential) elections as well. The evidence appears to show, however, that off-year elections are decided more by party balancing; see previous discussions from 2018 and 2022:

What is “party balancing” and how does it explain midterm elections?

The Economic Pit and the Political Pendulum: Predicting Midterm Elections


But this year a presidential election is coming, and the big question is why Biden is not leading in the polls given the strong economy. There are lots of reason to dislike Biden—or any other political candidate—so in that sense the real point is not his unpopularity but the implications for the election forecast. As the above-cited Rosenstone and others have pointed out, pre-election polls can be highly variable and so in that sense there’s no reason to take polls from May so seriously. In recent campaigns, however, with the rise of political polarization, campaign polls have been much more stable.

As we discussed the other day, one piece of the puzzle is that perceptions of the economy are being driven by political polarization. This is not new; for example:

A survey was conducted in 1988, at the end of Ronald Reagan’s second term, asking various questions about the government and economic conditions, including, “Would you say that compared to 1980, inflation has gotten better, stayed about the same, or gotten worse?” Amazingly, over half of the self-identified strong Democrats in the survey said that inflation had gotten worse and only 8% thought it had gotten much better, even though the actual inflation rate dropped from 13% to 4% during Reagan’s eight years in office. Political scientist Larry Bartels studied this and other examples of bias in retrospective evaluations.

That said, it does seem that polarization has made these assessments even more difficult, even when people are characterizing their own personal financial situations.

Here’s the question

The above is all background. Here’s my question: how is the effect of the economy on political attitudes and behavior supposed to work? That is, what are the mechanisms? I can see two possibilities:

– Direct observation. You lose your job or find a new job, or you know people who lose their jobs or find new jobs, or you observe prices going up or down, or you get a raise, or you don’t get a raise, etc.

– The news media. You read in the news or see on TV that unemployment or inflation has gone up or down, or that the economy is growing, etc.

Both mechanisms are reasonable, both in the descriptive sense that people get information from these different sources and also in the normative sense that it seems fair, to some extent, to use economic performance to judge the party in power. Not completely fair (business cycles happen!) and sometimes they lead to bad incentives such as pro-cyclical expansionary policies, but, still, there’s a strong logic there.

The thing I’m struggling with is how the direct observation is going to work. A 2% change in economic growth, or a 4% change in the unemployment or inflation rate, is a big difference, but how will it be perceived by an individual voter. Everybody’s experience is different, and it’s not clear that any simple aggregation will apply. If you think of each voter as having an impression of the economy, which can then affect that person’s vote, then, fine, the average impression will correspondingly affect the average vote—but any bias in the impressions will lead to a bias in the average, and there’s no reason to think that people’s perceptions are unbiased or even close to that, even in the absence of political polarization.

As I wrote a couple days ago:

Wolfers’s column is all about how individuals can feel one way even as averages are going in a different direction, and that’s interesting. I will say that the even the comments that are negative about the economy are much less negative than you’d see in a recession. In a recession, you see comments about people losing everything; here, the comments are more along the lines of, “It’s supposed to be an economic boom, but we’re still just getting by.” But, sure, if there’s an average growth of 2%, say, then (a) 2% isn’t that much, especially if you have a child or you just bought a new house or car, and (b) not everybody’s gonna be at that +2%: this is the average and roughly half the people doing worse than that. The point is that most people are under economic constraints, and there are all sorts of things that will make people feel these constraints—including things like spending more money, which from an economic consumption standpoint is a plus, but also means you have less cash on hand.

So, lots of paradoxes here at the intersection of politics, economics, and psychology: some of the components of economic growth can make people feel strapped—if they’re focusing on resource constraints rather than consumption. . . .

People have this idea that a positive economy would imply that their economic constraints will go away—but even if there really is a 2% growth, that’s still only 2%, and you can easily see that 2% disappear cos you spent it on something. From the economist’s perspective, if you just spent $2000 on something nice, that’s an economic plus for you, but from the consumer’s point of view, spending that $2000 took away their financial cushion. The whole thing is confusing, and I think it reflects some interaction between averages, variations, and people’s imagination of what economic growth is supposed to imply for them.

My point is not that people “should” be feeling one way or another, just that the link between economic conditions and economic perception at the individual level is not at all as direct as one might imagine based on general discussions of the effects of the economy and politics.

This makes me think that the view of the economy from the news media is important, as the media report hard numbers which can be compared from election to election. For example, back in the 1930s, the press leaned Republican, and they gave the news whatever spin they could, but they reported the economic news as it was; similarly for the Democratic-leaning news media in 1980 and 1984.

My current take on the economy-affecting-elections thing is that, in earlier decades, economic statistics reported in the news media served as a baseline or calibration point which individual voters could then adjust based on their personal experiences. Without the calibration, the connection between the economy and political behavior is unmoored.

The other issue—and this came up in our recent comment thread too—is what’s the appropriate time scale for evaluating economic performance? Research by Rosenstone and others supports a time horizon of approximately one year—that is, voters are responding to the relative state of the economy at election time, compared to a year earlier. So then the election turns on how things go in the next few months. Normatively, though, it does not seem like a good idea to base your vote on just one year of economic performance. So then maybe the disconnect between the economy and vote preference is a good thing?

Indeed, it is usually considered to be politically beneficial for a presidential term to start with a recession and then bounce back (as with Reagan or, to a lesser extent, Obama) than for a term to start good but end with a downturn (as with Carter)—even though up-then-down corresponds to higher economic output than down-then-up. Again, any individual voter is only experiencing part of the story, which returns us to the puzzle of why we should expect economic experiences to aggregate in a reasonable way when transformed to public opinion.


Journalists and political scientists (including me!) have a way of talking about the aggregate effect of the economy on voting, with the key predictors being measures of economic performance in the year leading up to an election. There’s a logic to such arguments, and they fit reasonably well to past electoral data, but the closer you look at this reasoning, the more confusing it becomes: Why should voters care so much about recent performance? and How exactly do economic conditions map onto perceptions? What are the roles of voters’ individual experiences, their observations of local conditions, and what they learn from the news media? There’s a lot of incoherence here, not just among voters but in the connections between our macro theories and our implicit models of individual learning and decision making.

P.S. Some discussion here from Paul Campos. I remain bothered by the gap in our political science models regarding how voters aggregate their economic experiences. At some level, yeah, sure, I get it: a good economy or a bad economy will on the margin make a difference. The part that’s harder for me to get is how this is supposed to work when comparing one election to another, years later.

How to think about the claim by Justin Wolfers that “the income of the average American will double approximately every 39 years”?

Paul Campos forwards along this quote from University of Michigan economist Justin Wolfers:

The income of the average American will double approximately every 39 years. And so when my kids are my age, average income will be roughly double what it is today. Far from being fearful for my kids, I’m envious of the extraordinary riches their generation will enjoy.

I don’t know where to begin with this one! OK, let me begin with what Campos reports: “a quick glance at the government’s historical income tables shows me that the 20th percentile of household income is currently $30,000, while it was $24,000 39 years ago (constant dollars obvi) which is . . . far from doubling.”

This got me curious so I googled *government historical income tables*. Google isn’t entirely broken: the very first link was right here from the U.S. Census. . . . Scrolling down, it looks like we want “Table H-3. Mean Household Income Received by Each Fifth and Top 5 Percent,” which gives a convenient Excel file. Wolfers was writing about “the income of the average American,” which I guess is shorthand for the middle fifth of income. Household income for that category is recorded as $74,730 in 2022 and $55,390 (in 2022 dollars) in 1983, so . . . yeah, not doubling.

On the other hand, Wolfers is talking about his kids, and that’s a different story. They’re at the 99th percentile, not the 50th percentile. And the 99th percentile has done pretty well during the past 39 years! How well? I’m not quite sure. The Census page doesn’t have anything on the top 1%. They do have data on the top 5%, though. Average income in this group was 499,900 in 2022 and 230,600 (in 2022 dollars) . . . hey, that is pretty close to that doubling reported by Wolfers.

But this won’t quite work for Wolfers’s kids either, because regression to the mean. If you’re at the top 1%, your kids are likely to be lower on the relative income ladder than you. I’m sure Wolfers’s kids will do just fine. But maybe they won’t see a doubling of income, compared to what they’re growing up with.

OK, that’s household income. The Census also has a page with trends of family (rather than household) income. Let’s again go to the middle quantile, which is our closest to “the average American”: It’s $93,130 in 2022 and $65,280 (in 2022 dollars) in 1983. Again, not a doubling.

I did some searching and couldn’t find any Census tables for quantiles of individual income, so I guess maybe that’s what doubled in the past 39 years? I’m skeptical, but Wolfers is the economist, and I’m pretty sure his calculations are based on some hard numbers.

Beyond all that, though, there are two other things that bother me about Wolfers’s quote:

1. “Approximately every 39 years”: what kind of ridiculous hyper-precision is this? Incomes go up and down, there are booms and recessions, not to mention inflation and currency fluctuations. In what sense could you possibly make a statement with this sort of precision?

I’m reminded of the story about the tour guide who told people that the Grand Canyon was 5,000,036 years old. When asked how he came up with the number, the grizzled guide replied that, when he started in the job they told him the canyon was 5 million years old, and that was 36 years ago.

2. “The income of the average American will . . .”: Hey, you’re talkin bout the future here. Show some respect for uncertainty! Economists know about that, right? Do you want to preface that sentence with, “If current trends continue” or “According to our models” or . . . something like that?

I guess we can check back in 39 years.

I’m kinda surprised that an economist would write an article, even for a popular audience, that would imply that future income growth is a known quantity. Especially given that elsewhere he argues, or at least implies, that there are major economic consequences depending on which party is in power. If we don’t even know who’s gonna win the upcoming election, and that can affect the economy, how can we possibly know what will happen in the next 39 years?

And here’s another, where Wolfers reports, “For the first time in forever, real wage gains are going to those who need them most.” If something as important as this is happening “for the first time in forever,” then, again, how can you know what will happen four decades from now? All sorts of policy changes might occur, right?

That said, I agree with Campos that Wolfers’s column has value. It’s interesting to read the column in conjunction with the accompanying newspaper comments, as this gives some sense of the differences between averages and individual experiences. Wolfers’s column is all about how individuals can feel one way even as averages are going in a different direction, and that’s interesting. I will say that the even the comments that are negative about the economy are much less negative than you’d see in a recession. In a recession, you see comments about people losing everything; here, the comments are more along the lines of, “It’s supposed to be an economic boom, but we’re still just getting by.” But, sure, if there’s an average growth of 2%, say, then (a) 2% isn’t that much, especially if you have a child or you just bought a new house or car, and (b) not everybody’s gonna be at that +2%: this is the average and roughly half the people doing worse than that. The point is that most people are under economic constraints, and there are all sorts of things that will make people feel these constraints—including things like spending more money, which from an economic consumption standpoint is a plus, but also means you have less cash on hand.

So, lots of paradoxes here at the intersection of politics, economics, and psychology: some of the components of economic growth can make people feel strapped—if they’re focusing on resource constraints rather than consumption.

Aaaaand, the response!

I sent the above to Wolfers, who first shared this note that someone sent to him:

It seems like you were getting a lot of hate in the comments section from people who thought you had too positive a view of the economy—which seemed to just further your point: people feel like the economy is doing awful even though it really isn’t.

I agree. That’s one of the things I was trying to get at in my long paragraph above. People have this idea that a positive economy would imply that their economic constraints will go away—but even if there really is a 2% growth, that’s still only 2%, and you can easily see that 2% disappear cos you spent it on something. From the economist’s perspective, if you just spent $2000 on something nice, that’s an economic plus for you, but from the consumer’s point of view, spending that $2000 took away their financial cushion. The whole thing is confusing, and I think it reflects some interaction between averages, variations, and people’s imagination of what economic growth is supposed to imply for them.

Next came the measurement issue. Wolfers wrote:

But let’s get to the issue you raised, which is how to think about real income growth. Lemme start by being clear that my comment was about average incomes, rather than incomes at any given percentile. After all, if I’m trying to speak to all Americans, I should use an income measure that reflects all Americans. As you know, there are many different income concepts, but I wanted to: a) Use the simplest, most transparent measure; and b) broadest possible income concept; which was c) Not distorted either by changing household composition, or changing distribution. And so that led me to real GDP per capita. (Yes, you might be used to thinking of GDP as a measure of total output, but as you likely know, it’s also a measure of total income… This isn’t an equilibrium condition, but an accounting identity.)

My guess is that if you were trained as a macroeconomist, your starting point for long-run income growth trends would have been to look at GDP per capita.

And indeed, that’s where I started. There’s a stylized fact in macro—which I suspect was first popularized by Bob Lucas many moons ago—that the US economy seems to persistently grow at around 2% per year on a per capita basis, no matter what shocks hit the economy. I went back and updated the data, here, and it’s a shame that I didn’t have the space to include it. The red line shows the trend from regressing log(GDP per capita) on time, and it yields a coefficient of 0.018, which is the source for my claim that the economy tends to grow at this rate. (My numbers are comparable to—but a bit higher than—CBO’s long-term projections, which shows GDP per person growing at 1.3% from 2024-2054.) Then it’s just a bit of arithmetic to figure out that the time it takes to double is every 39 years.

You suggest that saying that it’ll double “approximately every 39 years,” is a bit too precise. I agree! I wish we had better language conventions here, and would love to hear your suggestion. For instance, I was raised to understand that saying inflation is 2-1/2 percent was a useful way of showing imprecision, relative to writing that it’s 2.5%. But we don’t have similar linguistic terms for whole numbers. I could have written “every 40 years,” but then any reader who understands compounding would have been confused as to why I wrote 40 when I meant 39. So we added the “approximately” to give a sense of uncertainty, while reporting the (admittedly rough) point estimate directly.

Let’s pan back to the bigger picture. Yes, there’s uncertainty about the growth of average real incomes. And while we could quibble about the likely growth rate of the US economy over the next several decades, I think that for nearly every reasonable scenario, I’m still going to end up thinking about my kids and being “envious of the extraordinary riches their generation will enjoy.” That’s the thing about compounding growth — it delivers really big gains! Moreover, I think this is a point that too few understand. After all, according to one recent survey, 72 percent of Americans believe that their children will be worse off financially than they are. If you think about historical rates of GDP growth, and the narrow corridor within which we’ve observed decade-to-decade growth rates, it’s almost implausible that this could happen, even if inequality were to rise!

I replied:

Regarding what you said in your column: you didn’t say “average income”; you specifically said, “The income of the average American.” And if you’re gonna be referring to “the average American,” I think it does refer to the 50% percentile or something like that.

Regarding the specifics, if the CBO’s long-term projections are 1.3% growth per year, then you get doubling after 54 years, not 39 years. So I guess you might say something like, “Current projections are for a doubling of GDP over the next 50 years or so.”

Justin responded:

1. I understand that there’s a meaning of “average” that incorporate many measures of central tendency (mean, median, mode, etc), but it’s also often used specifically to refer to a “mean.” See below, for the top google search. Given this, I’m not a fan of using the word “average” ever to refer to a “median” (unless there was some supporting verbiage to describe it more).

2. On 1.3% v. 1.8%: Even at 1.3% growth, in 39 years time, average income will be 65% higher. Point is, that’s “a lot” (as is 100% higher). Also, here’s the graph:

My summary

– If you say “average income,” I agree that this refers to the mean. If you say “average American” . . . well, that’s not clearly defined, as there is no such thing as the “average American,” but if you’re talking about income and you say “average American,” that does sound like the 50% percentile to me.

– I was giving Wolfers a hard time about the making a deterministic statement about the future, but, given the above graph, sure, I guess he has some justification!

– I think there is a slightly cleaner phrasing that would allows him to avoid overprecision and determinism: “Current projections are for a doubling of GDP over the next 50 years or so.” Or, “If trends of the past decades continue, we can expect the income of the average American to double by the end of the century.”

– Yes, I know that this is me being picky. But I’m a statistician: being picky is part of my job! You wouldn’t want it any other way.

Studying causal inference in the presence of feedback:

Kuang Xu writes:

I’m a professor at the business school at Stanford working on operations research and statistics. Recently, I shared one of our new preprints with a friend who pointed out some of your blog posts that seem to be talking about some related phenomenon. In particular, our paper studies how, by using adaptive control, the states of a processing system are effected in such a way that congestion no longer “correlates” with the underlying slowdown of services.

You mentioned in the blog where you wonder if there’s some formal treatment of this phenomenon where control removes correlation in a system, and I thought you might find this to be interesting, possibly a formal example of the effect you were thinking about.

We’ve been wondering if there are other similar, concrete examples in the policy realm that resemble this.

My reply: I’m not sure. On one hand, the difficulty of causal inference with observational data is well known—it’s a strong theme of all presentations of causal inference—but it seems that most of the concerns come with selection rather than feedback.

Xu responds:

We tried to explore this connection to a small degree in the lit review – there’s some similarity to how people use inverse [estimated] probability weighting to debias selection, but these are generally one-time interventions so not so much of a feedback loop. Like you wrote in that blog post, something like monetary policy is more like a feedback loop, but it’s hard to isolate such effects in these complex systems.

As I wrote in my earlier post on the topic, I’m pretty sure there was tons of work back in the 1940s-1960s in this area of feedback in control systems. I can just picture a bunch of guys in crewcuts wearing short-sleeved button-up shirts with pocket protectors working on these problems. For some reason, though, I haven’t hear much about any of this nowadays within statistics or econometrics. Seems like there’s room for some unification, along with some communication so that the rest of us can make use of whatever has been doing in this area already.

A cook, a housemaid, a gardener, a chauffeur, a nanny, a philosopher, and his wife . . .

From Ray Monk’s biography of Bertrand Russell:

Though the Russells were not especially wealthy, they employed—as was common in Britain until after the Second World War—a number of servants: a cook, a housemaid, a gardener, a chauffeur and a nanny.

Arguably this is not so much different than modern society: even if we who live in comfortable circumstances do not employ personal servants, we still benefit from the labor of thousands of people working in farms, factories, and everything in between.

What struck me about the above story regarding Russell is not so much that he had all these servants—it’s indeed hard to picture the great philosopher shopping in the supermarket or frying an egg or folding the sheets or whatever—, but rather that he must have had some flexibility in his finances. Monk also said that Russell did a lot of writing just for the money, which may have been the case, but did he really need the money if that’s what he was spending it on? Without any particular knowledge of Russell, I kinda suspect it went the other direction: he wrote a lot for general audiences because he enjoyed writing, it was a way for him to work out his ideas, he was a good writer (ok, Monk also shares snippets from many of Russell’s private letters, and they are pretty uniformly cringe-worthy and unreadable, so let me just say he was a good writer when it came to his public writings), and he wanted to communicate with and, if possible, influence a broad public. But he had this aristocratic background that made all those motivations suspect. From that perspective, “I did it for the money” is a convenient excuse. I’m guessing he did the writing because he wanted to, and for good reasons, and then he kept spending that money, which helped motivate him to keep writing.

“When are Bayesian model probabilities overconfident?” . . . and we’re still trying to get to meta-Bayes

Oscar Oelrich, Shutong Ding, Måns Magnusson, Aki Vehtari, and Mattias Villani write:

Bayesian model comparison is often based on the posterior distribution over the set of compared models. This distribution is often observed to concentrate on a single model even when other measures of model fit or forecasting ability indicate no strong preference. Furthermore, a moderate change in the data sample can easily shift the posterior model probabilities to concentrate on another model.

We document overconfidence in two high-profile applications in economics and neuroscience.

To shed more light on the sources of overconfidence we derive the sampling variance of the Bayes factor in univariate and multivariate linear regression. The results show that overconfidence is likely to happen when i) the compared models give very different approximations of the data-generating process, ii) the models are very flexible with large degrees of freedom that are not shared between the models, and iii) the models underestimate the true variability in the data.

This is related to our work on stacking:

[2018] Using stacking to average Bayesian predictive distributions (with discussion). {\em Bayesian Analysis} {\bf 13}, 917–1003. (Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman)

[2022] Bayesian hierarchical stacking: Some models are (somewhere) useful. {\em Bayesian Analysis} {\bf 17}, 1043–1071. (Yuling Yao, Gregor Pirš, Aki Vehtari, and Andrew Gelman)

[2022] Stacking for non-mixing Bayesian computations: The curse and blessing of multimodal posteriors. {\em Journal of Machine Learning Research} {\bf 23}, 79. (Yuling Yao, Aki Vehtari, and Andrew Gelman)

Big open problems remain in this area. For choosing among or working with discrete models, stacking or other predictive model averaging techniques seem to work much better than Bayesian model averaging. On the other hand, for models with continuous parameters we’re usually happy with full Bayes. The difficulty here is that discrete models can be embedded in a continuous space, and continuous models can be discretized. What’s missing is some sort of meta-Bayes (to use Yuling’s term) that puts this all together.

For that price he could’ve had 54 Jamaican beef patties or 1/216 of a conference featuring Gray Davis, Grover Norquist, and a rabbi

It’s the eternal question . . . what do you want, if given these three options:

(a) 54 Jamaican beef patties.

(b) 1/216 of a conference featuring some mixture of active and washed-up business executives, academics, politicians, and hangers-on.

(c) A soggy burger, sad-looking fries, and a quart of airport whisky.

The ideal would be to put it all together: 54 Jamaican beef patties at the airport, waiting to your flight to the conference to meet Grover Norquist’s rabbi. Who probably has a lot to say about the ills of modern consumerism.

I’d pay extra for airport celery if that’s what it took, but there is no airport celery so I bring it from home.

P.S. The above story is funny. Here’s some stuff that makes me mad.

How large is that treatment effect, really? (my talk at NYU economics department Thurs 18 Apr 2024, 12:30pm)

19 W 4th Street, Room 517:

How large is that treatment effect, really?

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

“Unbiased estimates” aren’t really unbiased, for a bunch of reasons, including aggregation, selection, extrapolation, and variation over time. Econometrics typically focus on causal identification, with this goal of estimating “the” effect. But we typically care about individual effects (not “Does the treatment work?” but “Where and when does it work?” and “Where and when does it hurt?”). Estimating individual effects is relevant not only for individuals but also for generalizing to the population. For example, how do you generalize from an A/B test performed on a sample right now to possible effects on a different population in the future? Thinking about variation and generalization can change how we design and analyze experiments and observational studies. We demonstrate with examples in social science and public health.