Supporting research because it’s cool or because it’s useful

In a recent post, I wrote, “I study social science because it’s interesting and important, not because I think there are buttons we can push.”

Sean Manning adds this:

I think this touches on a basic division. A lot of us would be happy to do and popularize cheap research which gets public funds for the same general reason that arenas and golf clubs and parks get public funds: some people enjoy using them, and more people enjoy watching people using them. This faces direct opposition from neoliberals (people who think that all institutions should be modelled on for-profit corporations) who disagree that satisfying curiosity is a good, but indirect opposition from people who want lots of money for their research and make grandiose claims about how useful their research is. The more institutions become dependant upon funds obtained by promising results, the more they spend and the more expensive that cheap research looks. Lots of people with academic jobs give up applying for grants or refunds because the paperwork created to manage million-dollar grants is so time consuming on a thousand-dollar grant and the delay between applying and actually getting to do the work drains their energy.

One of Richard Feynman’s colleagues already felt ashamed in 1974 to say that he did research because knowing the answer would be cool. And there are books about how the American research university as we know it today grew out of research for the military during WW II.

It’s kind of political, no? Universities are disliked on the left for perpetuating inequality and disliked on the right for providing jobs to a bunch of left-wingers. On the other hand, scientific research is generally thought of as a good thing (with some exceptions such as research on offensive weapons).

Unjournal Update

A few years ago we linked to the Unjournal, a sort of recommender system for preprints. I don’t know much about it, but it seems like a good idea so I’m happy to publicize it.

Here’s David Reinstein, the organizer of the Unjournal, with an update:

The Unjournal (unjournal.org) is a nonprofit that commissions (and pays) experts to publicly evaluate and rate publicly-hosted research. It prioritizes globally-impactful work in economics and quantitative social science. It doesn’t charge for anything and it’s all open-access: you can see the evaluation output at unjournal.pubpub.org.

We are looking for people to join the pool of evaluators, and there is a range of other opportunities for involvement (with compensation and incentives). We recently introduced the Unjournal Research Affiliate role —  a low-commitment opportunity to help them identify, prioritize and discuss high-impact research, and to spread the word about this new system. You can learn about this and other ways to get involved in their knowledge base here, and apply for these roles and opportunities here.

Some other news and opportunities from The Unjournal:

  • Pivotal questions project: We received a grant to work with impact-focused organizations to identify their most decision-relevant and high value-of-information questions and claims,  operationalize them, source relevant research, evaluate the research and the questions and claims, and measure and track beliefs about these questions and claims, potentially engaging prediction markets.  We’re looking for your suggestions and engagement.

  • Independent evaluations: We’re offering prizes and recognition for independent evaluations of potentially impactful research, including for work we have already evaluated, and work we are prioritizing (see database here). See the independent evaluations trial and kickstarter incentive.

  • New approach to our “applied stream”: We’re evaluating research from ~every domain-relevant research org that received substantial funding from an Effective Altruism-adjacent or impact-oriented funder. See forum post here.

  • We’re exploring collaborations with The Black Spatula Project

  • The Unjournal is serving as an Evaluation Service Partner for the Center for Open Science Lifecycle Journal

  • Reading/Evaluation groups: We’re planning to co-host and work with existing PhD/Faculty “Reading/Evaluation groups” in economics and other areas, to help disseminate and reward their evaluation work and feedback. Let us know if you want to help organize this or get involved. We can offer some incentives. (Notes sketched here).

Again, I haven’t followed the details, but it seems good that people are trying new things.  A few years ago we discussed another such idea, Researchers.One.  The more the merrier, I say.

Problems caused by grade inflation

Columbia math lecturer Peter Woit writes:

There has been significant grade inflation over the years, so having a transcript with a string of As isn’t worth what it once was. This is not good for the unusually talented, who now need to find other ways to distinguish themselves.

That’s a good point! I’ve typically thought of grade inflation in isolation (as in my post asking why weren’t the instructors all giving all A’s already?) with the problem being that inflated grades provide less information to future employers.

Woit’s point is related but goes further. Now that A’s are given out like candy corn in the world’s worst Halloween party, they don’t provide much signal, first because, as Woit says, non-unusually-talented students can also get strings of A’s on their transcripts, and also because if you’re competing on grades, the occasional slip can be so costly. Either way, ambitious students have to distinguish themselves in other ways—for example, by publishing articles in journals and conferences. This propagation of “publish or perish” down to the high school level just exacerbates the explosion of publications—apparently, zillions of medical students are kinda required to publish research too, and if publication is a requirement, then the quality is not gonna matter so much, and these papers just get stirred in with whatever remaining legitimate literature is being produced.

So, yeah, if we were to give out more B’s and C’s, maybe the world would be a better place.

I’m not planning to first, though. As I wrote a few years ago, the real mystery to me is not, Why is there grade inflation?, but rather, Why is there any room left to inflate: why weren’t the instructors all giving all A’s already?

At that time, I recommended statistician Val Jonhson’s plan to “make post-hoc adjustments to assigned grades to account for differences in faculty grading policies”—basically, fit a multilevel item-response model to estimate students’ latent abilities based on their grades. As I wrote at the time:

The beauty of Val’s approach is that it does three things:

1. By statically correcting for grading practices, Val’s method produces adjusted grades that are more informative measures of student ability.

2. Since students know their grades will be adjusted, they can choose and evaluate their classes based on what they expect to learn and how they expect to perform; they don’t have to worry about the extraneous factor of how easy the grading is.

3. Since instructors know the grades will be adjusted, they can assign grades for accuracy and not have to worry about the average grade. (They can still give all A’s but this will no longer be a benefit to the individual students after the course is over.)

I still like Val’s idea, but at this point there may be too much grade inflation at some schools for it to work. At some point there is so little signal left that you can’t recover the information you want.

OK, at this point you might say, sure, grades are B.S., whatever. But that puts in the worse position of implicitly requiring students to have other qualifications. At best, this sends students to interesting research projects and internships, but many times it just pushes them into trying to hop on projects to get credentials. Rather than writing some crappy Neurips paper and then learning the tricks to get it accepted, I think they’d be better off taking interesting courses in college, working hard, doing well on exams, and writing good term papers.

This looks like an excellent new business line for Wolfram Research!

Can you catch the trick in the above letter? I was staring and staring and couldn’t figure out the scam. Yes, I get it that “Mushens and Churchill” is a fake literary agency (according to this post from Victoria Strauss, which is where I found this story, this particular scammer is taking the name of the legitimate literary agent Juliet Mushens, which is a really horrible thing to do), and they’re preying on the hopes of authors. I get that “we cannot promise the moon and the stars” is classic soft-sell. What I couldn’t figure out is what’s the motivation for the scammer. They get someone to send them their unpublished manuscript? Later they ask for money to publish the book? But that doesn’t make sense—the author is already self-publishing. (And, yes, there’s no shame in paying money to publish your own work—do you think this blog hosting comes for free?)

Strauss explains how the scam works:

Although I haven’t yet heard from anyone who has actually signed up with MCLit, and therefore don’t know what they’re charging, the fifth paragraph of the solicitation above gives away what they’re selling: an “International Literary Registration Seal and Bookstore Access Code”. Both of these are completely bogus items that scammers have invented to enable them to drain writers’ bank accounts.

Ha! I didn’t catch that at all.

Here’s another:

Strauss spells it out for us:

Story Arc Literary Groups employs an approach common to many fake literary agency scams: promising to work on commission only, with no other fees due (note especially paragraph 5, which helpfully explains that “a reputable literary agent should not charge upfront fees”). The aim of such solicitations, however, is always money, and writers who sign up with Story Arc soon discover this. In order for Story Arc to successfully pitch a book to traditional publishers, authors are told they must first “re-license” their book (a requirement that, as I’ve explained in another blog post, is completely fictional). As is typical for this type of scam, they’re referred to a “trusted” company to perform the service–in this case, an outfit called CreativeIP. The price tag: $5,000.

Ouch!

Here’s another:

Typical of fake literary agency scams, Zenith Literary is an aggressive solicitor. One writer who responded to this solicitation was told that in order to snag a traditional publisher’s interest, they needed to gather various “action items”, including “ten editorial reviews and endorsements” (hint: reviews and endorsements are nice, but they are absolutely not required by traditional publishers). To obtain these, the writer was referred to Verse Bound Solutions, a company with no apparent existence beyond a Wyoming business registration but active enough to phone the author and offer them ten book reviews for $3,000.

And another:

The author who was targeted with [a solicitation from “ImplicitPress Literary Agency”] was asked to supply a variety of necessary “documents”; note #5, which is what this scam is hoping to sell (no publisher requires or cares about a book trailer):

If you’re itching for more such stories, just go here:

What a world we live in.

P.S. In case you’re wondering about the title of this post, see here for the relevant background.

The Theory and Practice of Oligarchical Collectivism

Paul Campos shares the story of a tech zillionaire who allegedly pocketed $415,726 from an insider trading scheme. Campos asks:

Why commit a serious crime for a payoff that was, for the criminal, almost literally nothing in practical terms? Keep in mind that for a Silicon Valley legend like this guy, incredibly juicy investment opportunities are never more than a thirty-second phone call away, so the call he made to make this trade probably had an opportunity cost higher than his prospective payoff from making it, even if you ignore the potential criminal liability. . . .

So maybe this really isn’t even about money. Maybe it’s about the apparently irresistible urge to get over on the System, even when the System has rewarded you with so much money that more money should mean nothing.

On the other hand, in our world more money never means nothing, because the acquisitive habit can become as compulsive as any other perverse addiction. . . .

I have a different theory. Campos is asking, Why would someone so rich throw it all away for a trivial amount of money? My take is that the guy didn’t see himself as risking anything. From the zillionaire’s point of view, insider trading isn’t really illegal, in two senses.

First, the dude probably doesn’t think that insider trading is wrong. It’s just capitalism, friends helping friends. Indeed, he could well take the position that laws against insider trading are productivity-destroying interferences with the natural laws of the market (see here, for example).

Second, he probably doesn’t think he would personally suffer serious consequences if caught. He’s a pillar of the community, and judges are reasonable people, right? And, indeed, if you follow the link to the news article, you see this:

Last month, Mr. Bechtolsheim, 68, settled the insider trading charges without admitting wrongdoing. He agreed to pay a fine of more than $900,000 and will not serve as an officer or director of a public company for five years.

For a guy with billions of dollars, a million-dollar fine ain’t much. And, sure, he can’t serve as an officer or director of a public company for five years. . . . but there are other ways he can spend his time! And I doubt he’ll be socially shunned: he’s such a successful investor! And so rich!

I’m not saying he faced zero consequences, I’m just saying that I don’t think we need an elaborate theory of greed or irresistible urges or acquisitive habits. Dude did something that in his view might have been “technically” illegal, kind of like what you might feel if you took some pens home or used the office copier for personal items or parked illegally or whatever. I think the framing of this as a risky decision doesn’t quite match where he was coming from.

The most interesting part of the story is that the publisher went through all these steps of reviewing and revising. If they just want to make money by publishing crap, why bother engaging outside reviewers at all?

Dorothy Bishop shares this hilarious/horrible story from neurosurgery researcher René Aquarius:

Mid November 2023 I received a request to peer-review a manuscript for a special issue on subarachnoid hemorrhage for the Journal of Clinical Medicine, published by MDPI. . . . I ended up recommending rejection of the paper . . . My biggest gripes were that the authors claimed that data were collected prospectively, but their protocol was registered at the very end of the period in which they included patients. In addition, I discovered some important discrepancies between protocol and the final study. . . . The biggest problem was the lack of a control group. . . .

OK, so far so good. Maybe a mistake to review this paper for free, but this sort of thing can be a learning experience—even a really bad paper can give you some sense of what people are working on.

The paper had two reviews, and the other reviewer was positive, so the journal split the difference and asked for revisions. Aquarius tells the story:

Revisions were quite extensive . . . and arrived only two days after the initial rejection. . . . before I could start my review of the revision, just four days after receiving the invitation, I received a response from the editorial office that my review was no longer needed because they already had enough peer-reviewers for the manuscript. I politely ignored this request, because I wanted to know if the manuscript had improved. . . .

The manuscript had indeed undergone extensive revisions. The biggest change, however, was also the biggest red flag. Without any explanation the study had lost almost 20% of its participants. An additional problem was that all the issues I had raised in my previous review report remained unaddressed. I sent my newly written feedback report the same day, exactly one week after my initial rejection.

When I handed in my second review report, I understood why I initially got an email that my review was not needed anymore. One peer reviewer had also rejected the manuscript and had concerns similar to mine. Two other reviewers, however, accepted the manuscript. . . . because my vote was now also cast and the paper received two rejections, the editor couldn’t do much more than to reject the manuscript, which happened three days after I handed in my review report.

Then things got worse:

In December, about a month later, I received an invitation to review a manuscript for the MDPI journal Geriatrics. You’ve guessed it by now: it was the same manuscript. . . . again, transformed. It was now very similar to the very first version I reviewed. Almost word-for-word similar. That also meant that the number of included patients was restored to the initial number. However, the registered protocol that was previously mentioned in the methods section (which had led to some of the most difficult to refute critiques) was now completely left out. The icing on the cake was that, for a reason that was not explained, another author was added to the manuscript. . . .

There was no mention in this invitation of the previous reviews and rejections of the same manuscript. Although one might wonder whether MDPI editors were aware of this, it would be strange if they were not, since they pride themselves on their Susy manuscript submission system where “editors can easily track concurrent and previous submissions from the same authors”.

Because the same issues were still present in the manuscript, I rejected it for a third time on the same day I agreed to review it. In an accompanying message to the editor, I clearly articulated my problems with the manuscript and the review process. The week after, I received a message that the editor had decided to withdraw the manuscript in consultation with the authors.

And, the unsurprising conclusion:

Late January 2024, the manuscript was published in the MDPI journal Medicina. I was not attached to the manuscript any more as a reviewer. There was no indication on the website of the name of the acting editor who accepted it.

The most interesting part of the story is that the publisher went through all these steps of reviewing and revising.

If they just want to make money by publishing crap, why bother engaging outside reviewers at all? I’m not really sure.

My best guess is that this is an interaction of the many actors in this system. The owners and managers of the publishing company are presumably mostly motivated by the money, but maybe they are also doing this in part to advance science, as they see it: publish anything, let a thousand flowers bloom, etc. But even if the incentives for the journal are purely financial—publish as many papers as possible, collect the publication fees, cash the checks, buy yachts NFTs—, they’ll still want to preserve some legitimacy, in part because if the journals are de-listed and discredited, it will be harder for them to get masses of submissions and special issues, and in part because the publisher may have a goal of selling themselves to a bigger fish (as here), in which case, again, they don’t want to be running too transparent of a publication mill of this sort.

Soooo . . . to maintain some credibility as a journal, they want to do some editing and reviewing, which means they need the participation of some editors and reviewers. Presumably the editors know the score, but they may still feel that publication in this journal is a way to get important work out there, and it shouldn’t be too hard for these editors to find reviewers who will be sympathetic to the papers being handled.

Putting it all together, the journal has some aspects of a scam but it’s not entirely fraudulent. Some reviewing and editing is done, just in the context of the larger goal which is to publish as many papers for money as possible. But there’s a balance.

My own story

The publisher discussed above is the notorious MDPI—check out its wikipedia article for lots of background.

I’ve published with MDPI! Twice. The first time was before I’d heard about the journal’s reputation. The paper is here. I just looked it up on Google scholar and it has 550 citations, so it did get out there a bit. This is not to say that publishing in Entropy was the best choice—maybe if we’d done the extra work and published in a conventional journal (typically a grueling review process which leads to an improvement in the quality of the paper but at large cost in time and effort) it would’ve had a larger impact—; that said, I see the appeal of the publish-in-a-journal-that-accepts-everything strategy. The many hurdles involved in conventional journals adds lots of stress and uncertainty to traditional publication. On the other extreme, we could just publish everything on Arxiv and bypass the journal system entirely—that might be fine too; I don’t really have a sense of how much more a paper will get read and taken seriously if it has some sort of official journal publication.

My second MDPI publication came this year—I received a personal request to submit a paper as a favor for a special issue that was being organized in someone’s honor, and I published this article, which is a revision of an unpublished manuscript I wrote with the late Keith O’Rourke. When I posted on that paper, we had some discussion in comments regarding the MDPI thing.

I share these stories just to give an example of how legitimate papers can get published in what are sometimes considered predatory journals. It’s always a mix.

It’s Harvard time, baby: “Kerfuffle” is what you call it when you completely botched your data but you don’t want to change your conclusions.

For it’s Harvard this, an’ Harvard that, an’ “Give the debt the boot!”
But it’s “Academic kerfuffle,” when the guns begin to shoot.

— Rudyard Kipling, American Economic Review, May, 1890.

Remember the “Excel error”? This was the econ paper from 2010 by Reinhart and Rogoff, where it turned out they completely garbled their analysis by accidentally shifting a column in their Excel table, and then it took years for it to all come out? And this was no silly Psychological Science / PPNAS bit of NPR and Gladwell bait; it was a serious article with policy relevance.

At the time this story blew up, I had some sympathy for Reinhart and Rogoff. Nobody suggested that they’d garbled their data on purpose. Even aside from that, the data analysis does not seem to have been so great (see here for some discussion), but lots of social scientists are not so great with statistics, and even if you disagree with Reinhart and Rogoff’s policy recommendations, you have to give them credit for attacking a live research problem. In this post from 2013, I criticized Reinhart and Rogoff for not admitting they’d messed up (“I recommend they start by admitting their error and then going on from there. I think they should also thank Herndon, Ash, and Pollin for finding multiple errors in their paper. Admit it and move forward.”), while at the same time recognizing that researchers are not trained to admit error. I was disappointed with the behavior of the authors of that paper after they were confronted with their errors, but I was not surprised or very annoyed.

But then I read this post by Gary Smith and now I am kinda mad. Smith writes:

In 2010, two Harvard professors, Carmen Reinhart and Ken Rogoff, published a paper in the American Economic Review, one of the world’s most-respected economics journals, arguing that when the ratio of a nation’s federal debt to its GDP rises above a 90% tipping point, the nation is likely to slide into an economic recession. . . .

Reinhart/Rogoff had made a spreadsheet error that omitted five countries (Australia, Austria, Belgium, Canada, and Denmark). Three of these countries had experienced debt/GDP ratios above 90% and all three had positive growth rates during those years. In addition, some data for Australia (1946–50), Canada (1946–50), and New Zealand (1946–49) are available, but were inexplicably not included in the Reinhart/Rogoff calculations.

The New Zealand omission was particularly important because these were four of the five years when New Zealand’s debt/GDP ratio was above 90%. Looking at all five years, the average GDP growth rate was 2.6%. With four of the five years excluded, New Zealand’s growth rate during the remaining high-debt year was a calamitous -7.6%.

There was also unusual averaging. . . . The bottom line is that Reinhart and Rogoff reported that the overall average GDP growth rate in high-debt years was a recessionary -0.1% but if we fix the above problems, the average is 2.2%.

That part of the story I’d heard before. But then there was this:

In a 2013 New York Times opinion piece, Reinhart and Rogoff dismissed the criticism of their study as “academic kerfuffle.”

C’mon. You are two Harvard professors; you published an article in an academic journal, leveraged the reputation of academic economics to make policy recommendations to the U.S. congress, and then you talk about “academic kerfuffle”! If you don’t want “academic kerfuffle,” maybe you should just write op-eds, maybe start a radio call-in show, etc.

It’s Harvard this, an’ Harvard that, when all is going well. But then when some pesky students and faculty at faculty at the University of Massachusetts check your data and find that you screwed everything up, then it’s academic kerfuffle!

UMass, can you imagine? The nerve of those people!

So, yeah, now I’m annoyed at Reinhart and Rogoff. If you don’t like academic kerfuffle, get out of goddam academia already. For a pair of decorated Harvard professors to dismiss serious criticism as “kerfuffle”—that’s a disgrace. It was a disgrace in 2013 and it remains a disgrace until they apologize for this anti-scientific, anti-scholarly attitude.

P.S. Just for some perspective on the way that work had been hyped, here’s a NYT puff piece on Reinhart and Rogoff from 2010:

Like a pair of financial sleuths, Ms. Reinhart and her collaborator from Harvard, Kenneth S. Rogoff, have spent years investigating wreckage scattered across documents from nearly a millennium of economic crises and collapses. They have wandered the basements of rare-book libraries, riffled through monks’ yellowed journals and begged central banks worldwide for centuries-old debt records. And they have manually entered their findings, digit by digit, into one of the biggest spreadsheets you’ve ever seen.

OK, you can’t fault the Times for a puff that appeared nearly three years before the error was reported. Still, it’s kinda funny that, of all things, they were praising those researchers for . . . their spreadsheet!

“Of course, this could conceivably be a case of near unbelievable luck: A flawed analysis based on wrong assumptions gave an unusually large causal effect estimate – but the misguided result just happened to be correct. We can imagine how the research team huddled nervously around the computer terminal biting their nails and silently praying as they executed their updated Stata code, only to erupt in joy and celebration as the results appeared on screen and revealed they were right all along. . . .”

An economist who desires anonymity write:

I think you’ll find this both fun and frustrating.

A group of prominent, well-published economists from Norway published a well-cited study on the causal effects of paid maternity leave: “A flying start? Maternity leave benefits and long-run outcomes of children” (https://www.journals.uchicago.edu/doi/10.1086/679627). The paper was published in the Journal of Political Economy—one of the top economics journals—and used a regression discontinuity design to identify unusually large and important causal effects of paid maternity leave on child outcomes. As described in their abstract, “Mothers giving birth before July 1, 1977, were eligible for 12 weeks of unpaid leave, while those giving birth after that date were entitled to 4 months of paid leave and 12 months of unpaid leave. The increased time spent with the child led to a 2 percentage point decline in high school dropout rates and a 5 percent increase in wages at age 30.”

Recently, a comment on the paper was published (“Not a flying start after all?” https://www.journals.uchicago.edu/doi/10.1086/732218). The problem they note: The reform did not take place as described. “Causal identification rested on a discontinuity implying that only mothers giving birth after a specific cutoff date were entitled to paid leave. We show that the analysis relied on an incorrect description of the reform. The reform did not introduce paid maternity leave, but extended it by 5–6 weeks. The postulated discontinuity never existed as treatment and control groups had the same maternity leave conditions.”

The comment goes on to explain that paid 12 weeks maternity leave had been in place for many years, the reform cutoff date was not strict, parts of the reform were implemented the next year, workers in the public sector experienced the reform earlier than the postulated date, the reform was not unexpected as it had been widely discussed in the media (contrary to the original paper’s claim), etc.

The response from the original authors is titled “Still Flying” (https://www.journals.uchicago.edu/doi/10.1086/732220). They take the new information in their stride and respond that: “The new facts led us to formulate a new research strategy taking the new facts into account. Our improved estimates show that the maternity leave reform in Norway had large long-term impacts on the lives of children. Quantitatively, they turn out to be similar to the original estimates in CLS.”

Especially interesting is how they point to two other papers to support their original results: “Since the publication of CLS two additional papers have used similar research strategies to examine the impacts of this maternity leave reform on different outcomes: Butikofer et al. (2021) and Schwartz (2021). Results from both these papers suggest that the July 1st, 1977 date was indeed important.”

What they do not note is that both these studies built on the original flying start paper and exploited the same non-existent sharp discontinuity at July 1st.

Of course, this could conceivably be a case of near unbelievable luck: A flawed analysis based on wrong assumptions gave an unusually large causal effect estimate – but the misguided result just happened to be correct. We can imagine how the research team huddled nervously around the computer terminal biting their nails and silently praying as they executed their updated Stata code, only to erupt in joy and celebration as the results appeared on screen and revealed they were right all along.

The cynical take would be to see this whole story as a natural experiment in itself: What happens if successful researchers believe in a non-existent reform and analyze its effects using “rich administrative data” and the standard researcher degrees of freedom? The answer: three distinct papers all finding large and robust effects of the same non-existent reform, and at least two of them to a level sufficient to convince referees in top econ journals.

In this light, it is perhaps less surprising that the new analysis with “improved estimates” found similar effects, although it appears to have taken some time: The comment notes that it first “informed the Norwegian authors of the two papers that we suspected these errors in February/March 2020”—yet it was only now (August 2024) that the journal published the comment (and its response).

Finally, for what it’s worth, it’s interesting to see that the Journal of Political Economy found it best to place the critical comment behind a paywall while making the original team’s response freely available and ungated.

I also noticed this unfortunate bit from the authors’ response, when they write, “If anything, this will dilute the impacts of the reform by including in the sample some ineligible mothers.” It’s the backpack fallacy!

No, learning that you have measurement error does not in general make your result stronger!

The above story is interesting, and we could stop right here. But I was kinda curious so I clicked through. As my correspondent said, the only paper of the three that’s immediately accessible is the authors’ response to the criticism. Rather than get lost in the details of discontinuities and differences and estimates and standard errors, I’ll follow our general approach and start from scratch, considering the problem as an observational study.

This means we need to identify three things:
1. The experimental units i and treatments z_i
2. The outcome measurement y_i
3. The pre-treatment measurements x_i.

The basic plan is that you then regress y on x and z, but depending on the quality of the pre-treatment measurements and how the treatment was assigned, you might have to do more. In any case, identifying the three components above is the starting point. It’s hard to talk about estimating the treatment effect until you’ve defined what the treatment is.

So let’s get to it. They’re talking about the effects of maternity leave reform on the lives of children. So I’m guessing that the experimental units are children and the treatments are something about how much maternity leave the mother is getting. From the discussion above, I know that this will be a bit complicated . . . I’ll read the paper and see what they say. It seems that the treatment is what they did in Norway after the 1977 reform, which is to give 18 weeks of partially paid leave, and the control is what was given before the reform, which was 12 weeks of partially paid leave. Also, they say the treatment is restricted to workers in the private sector, which apparently represented 70% of Norway’s female workforce at that time.

The other challenge is that there was a twelve-week period leading up to 1 Jul 1977 when the treatment was being introduced, and it seems that maybe they can’t figure out who was getting the treatment and who was getting the control during those twelve weeks. That doesn’t seem like such a big deal—just exclude those children from the analysis, right?—I guess the relevance comes as this was a weakness of the original study, which didn’t recognize that implementation issue.

The outcomes are “dropout rates, college and the log of earnings at age 30” of the children. In a footnote they also say, “CLS also presented results for years of schooling, teenage pregnancy (females) and IQ (males). As there was no robust effect on these outcomes we do not look into these outcomes further.” So actually they looked at 6 outcomes. But maybe more than 6, right? If they looked at IQ for the boys, I imagine they looked at IQ for the girls too. We generally recommend reporting all comparisons of interest. We also recommend following up by fitting a multilevel model, but really the key thing is reporting all the results and displaying them in a single graph. Here there seems to have been a lot of selection going on:
1. Some large number of potential outcome variables in the original data.
2. Some subset of them analyzed in the original study.
3. In that original study, 6 outcomes selected based on statistical significance or some other criterion.
4. In the followup, 3 of the outcomes discarded because their results are not statistically significant (or some other criterion; I’m not quite sure what was meant by “there was no robust effect on these outcomes”).
5. The followup paper reports the remaning 3 outcomes.

The next step is the choice of background or control variables for the regression. I guess the relevant pre-treatment measurements will involve parents’ socioeconomic status, along with birth weight and similar variables describing the babies. Also you’d want to include economic conditions . . . here there’d be some blurring over time, as it’s not like the exact state of the economy on the day you’re born will determine your economic future; I guess that will be covered by including something like a linear time trend over the range of the data, which is from 1975 through 1979.

Their analysis also has some strong data restrictions that don’t quite make sense to me: they exclude 1976, and for each year they only count the 90 days before 9 Apr and the 90 days after 1 Jul. I get that they want to use comparable data from each year, to avoid having to worry about seasonal effects; still, just using 90 days seems like throwing away data. They also do another analysis with 180 days before and after.

I’m concerned that they don’t include pre-treatment predictors such as parents’ SES and baby’s birth weight. Maybe these variables weren’t in the data? But I’d think that if they had all those measures on the kids,

Then there’s the analysis. They show three outcomes in separate figures, which is kind of annoying (but, yeah, I know, it’s standard practice; so few people have learned the benefits of putting multiple plots in a grid); anyway, here they are:



Make of these what you will.

It would be good to see the analysis including more pre-treatment years, a longer time window in each year, adjustments for key pre-treatment variables, and looking at all the outcomes of interest, not just the three that survived many levels of screening.

The odd non-spamness of some spam comments

I checked the spam filter this morning and came across a new comment on an old post.

It’s was a reasonable comment. Not an amazing contribution to the discussion, but not completely nothing, either. And I would have approved it—except that the url supplied by the commenter was a spam link. Or, maybe not spam, just some business that had no connection to anything that ever appears on the blog. In any case, I shot the comment into oblivion. I don’t want to be hosting or encouraging spam—at least, not for free!

We get this kind of comment from time to time, and it always makes me wonder: When people do this, are they coming to the blog with an intent to spam (or, one could say, to advertise their wares) and then they write some minimal comment in the hope that it gets approved? Or are they coming to the blog to write a comment, and then they figure they might as well get some benefit out of it so they throw in the spam link? I have no idea.

Prediction markets and the need for “dumb money” as well as “smart money”

tl;dr. Prediction markets give good forecasts because they attract “smart money” that will fix any gaps between current odds and best available information.

The “smart money” is in turn motivated by the profits they can take from “dumb money” coming from people who are participating in the market out of a desire for action or as a passive investment. The “dumb money” is itself reassured by presence of the “smart money” to keep prices roughly fair. At the limit of economic efficiency, the dumb money, by relying on public odds, can be almost as smart as the smart money.

One reason prediction markets have problems is the absence of sufficient dumb money to allow the smart money to overcome the vig.

Prediction markets are a way of aggregating knowledge and harnessing the wisdom of crowds.

To first order, just about any aggregation of forecasts will do better than individual guesses.

A simple method for forecasting some uncertain event (for example, the outcome of the upcoming presidential election) would be to randomly sample a bunch of random people, ask them to each give a forecast, and then average these forecasts. Research suggests that this direct approach works pretty well, at least for problems where the quantity being forecasted is in a clearly-defined range. (But don’t try this averaging method to estimate something that people are bad at guessing, and it won’t work so well; a classic example is total egg production in the United States in 1965, which came up in Alpert and Raiffa’s classic 1968 article, “A Progress Report on the Training of Probability Assessors,” which was reprinted in Kahneman, Slovic, and Tversky’s 1982 collection, Judgment Under Uncertainty; see also the activity on p.127-129 of our book Active Statistics. People aren’t so good at multiplying.)

You should be able to do better using some sort of weighted average, giving higher weights to guesses from people whose forecasts have a better track record, as for example in this classic 2004 paper from Prelec.

A betting market does the aggregation in a different way: Rather than the forecasts being averaged using some fixed rule, players in the market are financially motivated to offer bets based on their private knowledge. The idea is that this will keep the price—which can be treated as a sort of market forecast—at a reasonable value, and it will respond to new information at a rapid rate, as long as there are arbitrageurs with “smart money” who will jump in and achieve their expected gain by moving the line and taking bets when they see an opportunity.

This is related to the general point that the quality of the aggregate will depend on what information goes into the mix, as discussed in the final paragraph of this post.

It is well understood that, do their job of aggregating information, markets need smart money. Something I hadn’t realized until Josh Miller pointed it out to me the other day was that markets also need “dumb money.” The “smart money” bettors make their money by moving faster than the “dumb money.” To put it another way, the market gets it right because the sharks move in to rectify any mispricing. But the sharks need the minnows to feed on.

Here’s an elaboration of that point from Nick Whitaker and J. Zachary Mazlish.

They focus the question by asking, why are prediction markets not more popular in areas other than sports?

very few potential prediction markets are actually banned in the US. And yet most prediction markets that could legally exist do not exist, and the ones that do exist are not very popular. . . .

Even if one argued that the threat of regulation made these markets impossible in the US, this has problems explaining the lack of prediction markets in other countries where such regulation is not present, and seems unlikely to be introduced. Prediction markets, including election markets (as well as all sports betting), are completely legal in the United Kingdom, for example, and the country clearly has the financial institutions and market size to support them. Still, non-sport markets remain few and far between: Betfair, for example, offers markets on elections but only has a few dozen markets total and rarely offers markets on other topics like economics and science. In fact, even politics is relegated to a tab within sports. There are currently around twelve million pounds in play on the US presidential election – about the same as typically gets bet on a single cricket match. Entrepreneurs have not created ‘markets on everything’ even where it is legal to do so.

People bet on cricket! Who knew?

Whitaker and Mazlish “classify people who trade on markets into three groups”:

– Savers: who enter markets to build wealth. Prediction markets are not a natural savings device. They don’t attract money from pensions, 401(k)s, bank deposits, or brokerage accounts.

– Gamblers: who enter markets for thrills. Prediction markets are not a natural gambling device, due to various factors including their long time horizons and often esoteric topics. They rarely attract sports bettors, day traders, or r/WallStreetBets users.

– Sharps: who enter markets to profit from superior analysis. Without savers or gamblers, sharps who might enter the market to profit off superior analysis are not interested in participating. They also largely don’t need prediction markets to hedge their other positions.

They continue:

In our view, much of the volume that exists on financial markets comes from money that is not attempting to beat the market by correcting pricing errors (like an asset that is underpriced compared to its likely returns), but money that wants to be in a market for other reasons, like investing in companies that will deliver a long-run return (as savers do), or making a sports event more exciting (as gamblers do). . . . inelastic participants are often willing to pay a small premium for market access. But in doing so, investors or bettors of these kinds create a pool of surplus that smart participants try to obtain, which in turn drives prices toward efficiency. . . .

Markets become efficient when making them efficient is profitable. Large markets and markets where people will ‘pay’ expected return for access create those conditions. In our view, in prediction markets, no type of market participant – savers, gamblers, or sharps – is clamoring to be in the market, so there is no strong incentive pushing the market toward efficiency.

This brings them to the economic incentive:

There is one important reason that prediction markets are not used by savers, and probably never will be. Prediction markets, unlike most asset markets, are zero-sum – in fact they are negative-sum, once you factor in platform fees. And if your money is in a prediction market, it can’t be invested in equities, or be earning interest in the bank, either. Every winner of a prediction market necessitates an equal and opposite loser. Securities investors with diversified portfolios can expect positive returns in the long term, because they are giving up their money for others to use to create output and wealth, in exchange for a share of what they create. That’s why responsible people have their pensions in stocks and bonds, rather than a diversified portfolio of sportsbooks. Positive-sum savings vehicles are far, far superior to zero-sum ones, for the simple reason that they will grow your savings in the long run.

They argue that the only way to really make prediction markets work well is to subsidize them in some way. This might sound kinda goofy—the government or some private foundation stepping in to support a gambling platform—but, to the extent that a market provides the public good of fast and accurate forecasts, why not support it externally? There’s no need for the idea of a prediction market to be ideologically attached to a no-subsidy principle. Conversely, if someone is setting up a market, it’s not a bad idea to look into who’s in charge, as in the notorious case of the convicted terrorist who was running a terrorism prediction market.

Whitaker and Mazlish continue:

Without savers or gamblers, only sharps would remain. There are a few profiles of sharps who might seek value in prediction markets. Hobbyists, like politics nerds who want to capitalize on their knowledge, may constitute one group. Because insider trading is not prohibited in prediction markets, people with inside knowledge of some organization or event may want to trade on their information there. The hope of the prediction markets on everything vision is that true sharps would emerge in the form of hedge funds or other trading firms – professionals who would spend all their time investigating the probabilities of these events. . . . But since prediction markets lack savers – who flood security markets with capital and create profit opportunities – this never happens. Prediction markets are orders of magnitude smaller than other financial markets. . . . It’s hard to imagine how prediction markets would ever find the size and liquidity necessary to pay the salaries of top sharps without savers.

As most prediction markets also lack many of the features that attract gamblers, whom sharps would prefer to trade against, sharps are left with the unappealing prospect of trading only with one another. This is analogous to turning up to a poker table and discovering that all of the other competitors are poker champions. You would much rather have been at a table of drunk tourists.

Well put!

And this next bit is particularly relevant to election forecasting:

Markets are much less liquid when sharps trade only against sharps. As we’ve pointed out, the rewards for being right are smaller. But even beyond that, traders are more worried that they might be wrong when all of the other money is smart money. Why should they trust their model of the market probability over other sophisticated traders? . . . In practice, sharps would know they were mostly trading against sharps, but might still think they were better traders than their counterparties. But a sharp would usually understand they should be worried about their counterparty getting the better of them. The counterparty too assumes that they should also be worried, so both parties would be more hesitant to trade. And that’s not to mention platform fees, which would also take a cut.

They summarize:

We think that prediction markets as they exist are probably, at their best, similarly accurate to other high quality sources of information about the future, like the best forecasters, averages of forecasters like those found on Metaculus, and poll aggregators like 538. That is to say they do reasonably well, but are not authoritative or impossible for a highly motivated individual to beat.

That sounds about right. It’s related to the reasons I gave for not betting on the presidential election.

Subsidies?

As discussed above, one way to make prediction markets work better would be to subsidize them—an idea that I don’t think is as ridiculous or objectionable an idea as might sound at first. To the extent that a prediction market is a public good, it could make sense to subsidize it. There’s no need to be a market purist here, as just about everything is subsidized in some way or another—including people like me and various economists who post things for free on prediction markets; somebody’s paying for our time!

That said, Whitaker and Mazlish point out some practical challenges to the subsidy idea:

We haven’t seen many examples of this actually happening. . . . One way to subsidize a prediction market would be to get all those who are interested in gleaning information from the market to share the cost of the subsidy. . . . But how exactly to charge these users is difficult. Market prices tend to be public information. . . . Thus, a free rider problem emerges: many people who value the information a market provides cannot be charged. . . .

Subsidizing prediction markets likely is a relatively expensive way of aggregating information. . . . There is a simple reason for this: a subsidy needs to pay many market participants to create a crowd from which it could glean wisdom, whereas more conventional methods simply pay one group. Even if the wisdom of crowds derived from subsidized prediction markets performed better than individuals or teams, we worry that subsidizers would be unwilling to pay, as they might quickly run into diminishing marginal returns.

Indeed, if various poll aggregators are doing the job for free, and if market prices are reflecting some combination of polls and recent news, then not much value is being added by the market, except for a certain level of objectivity and whatever legitimacy is incurred by the feeling that somebody somewhere is betting real money on these numbers.

Whitaker and Mazlish add:

The final point is there are good alternatives to subsidizing prediction markets. Financial institutions have analysts; governments use intelligence agencies; companies use consultants; NGOs partner with economists and data scientists. Institutions employ these alternatives and virtually none employ subsidies.

Why would this be, if each of these groups can be beat when it comes to predicting the future? In many cases, individuals, firms, and governments do not just wish to know the probability of a future event. They would like to know the contingent probabilities around a cluster of events and actions and the reasoning behind those probabilities.

This is related to our point that a probability isn’t just a number; it’s part of a network of conditional statements.

They conclude:

We suspect that much demand for information about the future is satisfied by existing markets and firms. If it weren’t, wouldn’t private companies have taken up forecasting and prediction markets more quickly in the first place? That’s not to say that everyone has perfect information about the future. Instead, it’s that we suspect most people are paying for information that is as accurate as they need in a form that they can use. . . .

We are arguing against the view that were it not for pesky regulators, prediction markets for everything would be ubiquitous, and that those prediction markets would be the premier way to predict the future. On the contrary, the current size of the prediction market universe reflects market demand. Even if all regulatory hurdles were abolished, we do not expect that universe to dramatically expand.

Of course, we could be proved wrong. . . . But, in our view, prediction markets are held back by the lack of savers and gamblers . . .

Gambling

But . . . gambling is fun! Couldn’t recreational gamblers provide the “dumb money” needed for prediction markets to work?

Maybe so, but for elections, maybe not. First of all, any recreational gambler with access to the internet can see the poll aggregates, and it’s not clear that the “smart money” can do much more than that. To get some real “dumb money,” you’d want people betting just for fun, comparable to sports fans who will bet on the home team without looking at the odds, presumably relying on the accuracy of the market (the existence of “smart money”) to ensure that these odds are not major ripoffs.

But Whitaker and Mazlish argue that, in real life, most people only want to bet on events with very short time horizons:

Sports betting sites’ futures bets on longer-term outcomes are far less traded than bets on single games about to happen, even when the future event (like the winner of the Super Bowl) is far higher-profile than tonight’s game. For example, in late March, there was a mere £5,190 bet on the Wimbledon 2024 winner, but £227,421 was bet on the relatively unimportant, but in-play, Francesco Maestrelli vs. Pierre-Hugues Herbert match in tennis’s Napoli Cup. For reference, Wimbledon is the single biggest event in all of tennis, while no one ranked higher than 87th in the world is playing the Napoli Cup. Quick resolutions are so valued that live, in-game betting is becoming the most popular type of sports betting, despite the fact that the house tends to widen spreads on live bets, hurting bettors’ expected returns.

US presidential elections, surely the most well-known recurring political events on Earth, create a huge amount of buzz and theatrics, which fans closely follow. . . . Yet even in these cases, gamblers’ preference for quick resolution bites: 42 percent of the volume on the 2020 election was traded in the last week before the vote . . .

3M misconduct regarding knowledge of “forever chemicals”: As is so often the case, the problem was in open sight for a long time before anything was done

Horrifying story here from Sharon Lerner how chemical products company 3M (which has successfully branded itself as the cuddly people behind Post-it notes) polluted the world’s water supply and covered it up for decades.

It features several issues we’ve discussed in this space:

– Research misconduct, hiding of data, mischaracterizing findings, etc.

– Flat out lying.

– Bad actions by big business.

– The misconduct was sitting in open sight for a long time (as we’ve seen with Theranos, Pizzagate, the Canadian biologist, the Los Angeles tunnel, the financial fraudster, and many other cases)

– The use of the legal system to threaten people and suppress the truth (as again with Theranos)

Those last two bits go together. Keeping things quiet for a long time can require unscrupulous people to put in a lot of work.

Lerner writes:

Much of my reporting, which started in 2015, focussed on what 3M and DuPont knew, even as they continued to produce PFAS. But, as I reported on the coverup, I wondered what it meant for a sprawling multinational company to know that its products were dangerous. Who knew? How much, exactly, did they know? And how had the company kept its secret? . . .

2015! That’s almost ten years ago!

It’s everywhere in the environment

The story is about so-called “forever chemicals,” which spread to everyone in the world, and the companies that produced and distributed them even after having this knowledge, and the people who worked for these companies. No matter what you do, what food you consume, what water you drink, what products you use, you’ll be exposed to these. There’s no avoiding it.

And this got me thinking about research fraud. That’s hard to avoid too! I guess that right now if someone asked me if I’d do a job for 3M, I’d say hell no, but, then again, take a look at our list of sponsors, which among them have lots of misdeeds in their past and present histories. Just about every organization has problems. This is not at all intended to excuse those terrible people at 3M, it’s just all a big messy story. It doesn’t sound like any of those sleazes will pay any cost for what they did, which helps explain why it keeps happening. Moral hazard and all that.

Different perspectives on the claims in the paper, The Colonial Origins of Comparative Development

I was talking with an economist today about the recent prize given to the authors of the very influential 2001 article, The Colonial Origins of Comparative Development: An Empirical Investigation. According to my colleague, many economists have issues with that paper, with issues regarding data quality, the weakness of the instrument, and problems of selection bias in the analysis. The concern seems to be that those data could be used to show just about anything. Which, as usual, does not mean that their theories are wrong, just that their data are consistent with other theories.

I’ve never looked into this particular example, and a search of the blog turned up only this comment, so I’ll just pass along some references that my colleague sent to me:

Daron Acemoglu, Simon Johnson, and James Robinson (2001), The colonial origins of comparative development: An empirical investigation

David Y. Albouy (2012), The Colonial Origins of Comparative Development: An Empirical Investigation: Comment

Morgan Kelly (2019), The Standard Errors of Persistence

This recent post from Alex Tabarrok gives some sense of the importance and ideological dimensions of the work under discussion.

Some people love this work, some people don’t

From a sociology-of-science perspective, it’s interesting how this work is viewed differently in different corners of economics. As discussed by Tabarrok, “The Colonial Origins of Comparative Development” has had a huge influence within and outside the field, and it generally appears to be viewed very positively. But researchers who focus on methodology and replication don’t trust it. I wonder whether some of the popularity of that paper and subsequent work in that area is that it has something to offer to both the right and the left, unlike a lot of work in macroeconomics which will push in just one direction.

“Announcing the 2023 IPUMS Research Award Winners”

PUMS stands for Public Use Microdata Sample—it’s a subset of the U.S. Census that contains individual-level data, I think it was 1% of the Census. I don’t know the full history, but here’s the current Census website with these data.

IPUMS is a compendium of public use microdata from different sources:

In collaboration with 105 national statistical agencies, nine national archives, and three genealogical organizations, IPUMS has created the world’s largest accessible database of census microdata. IPUMS includes almost a billion records from U.S. censuses from 1790 to the present and over a billion records from the international censuses of over 100 countries. We have also harmonized survey data with over 30,000 integrated variables and 150 million records, including the Current Population Survey, the American Community Survey, the National Health Interview Survey, the Demographic and Health Surveys, and an expanding collection of labor force, health, and education surveys. In total, IPUMS currently disseminates integrated microdata describing 1.4 billion individuals drawn from over 750 censuses and surveys. . . .

Our signature activity is harmonizing variable codes and documentation to be fully consistent across datasets. This work rests on an extensive technical infrastructure developed over more than two decades, including the first structured metadata system for integrating disparate datasets. By using a data warehousing approach, we extract, transform, and load data from diverse sources into a single view schema so data from different sources become compatible. The large-scale data integration from IPUMS makes thousands of population datasets interoperable. . . .

I’m on their mailing list—maybe I requested some of their data at some point?—and this announcement came in the email:

We are thrilled to announce the winners of our annual IPUMS Research Awards competition. This competition celebrates innovative research from 2023 that uses IPUMS data to advance or deepen our understanding of social and demographic processes. . . .

IPUMS USA

  • Best published work: Zachary Ward. “Intergenerational Mobility in American History: Accounting for Race and Measurement Error.”
  • Best student work: Jonathan Tollefson. “Environmental Risk and the Reorganization of Urban Inequality in the Late 19th and Early 20th Century.”

IPUMS Spatial: IPUMS NHGIS, IPUMS IHGIS, IPUMS Terra, or IPUMS CDOH

  • Best published work: Clark Gray and Maia Call. “Heat and Drought Reduce Subnational Population Growth in the Global Tropics.”
  • Best student work: Nicolas Longuet-Marx. “Party Lines or Voter Preferences? Explaining Political Realignment.”

IPUMS CPS

  • Best published work: Kaitlyn M. Berry, Julia A. Rivera Drew, Patrick J. Brady, and Rachel Widome. “Impact of Smoking Cessation on Household Food Security.”
  • Best student work: Sungbin Park, Kyung Min Lee, and John Earle. “Death Without Benefits: Unemployment Insurance, Re-Employment, and the Spread of Covid.”

IPUMS International

  • Best published work: Seife Dendir. “Intergenerational Education Mobility in Sub-Saharan Africa.”
  • Best student work: Rita Trias-Prats. “Gender Asymmetries in Household Leadership.”

IPUMS Global Health: IPUMS DHS and/or IPUMS PMA

  • Best published work: Chad Hazlett, Antonio P. Ramos, and Stephen Smith. “Better Individual-Level Risk Models Can Improve the Targeting and Life-Saving Potential of Early-Mortality Interventions.”
  • Best student work: Sara Ronnkvist, Brian Thiede, and Emma Barber. “Child Fostering in a Changing Climate: Evidence from Sub-Saharan Africa.”

IPUMS Health Surveys: IPUMS NHIS or IPUMS MEPS

  • Best published work: Jessica Y Ho. “Lifecourse Patterns of Prescription Drug Use in the United States.”
  • Best student work: Namgyoon Oh. “Nutrition to Nurturance: The Impact of Children’s WIC Eligibility Loss on Parental Well-being.”

IPUMS Time Use: IPUMS ATUS, IPUMS MTUS, or IPUMS AHTUS

  • Best published work: Eunjeong Paek. “Workplace Computerization and Inequality in Schedule Control.”
  • Best student work: Anja Gruber. “The Impact of Job Loss on Parental Time Investment.”

Excellence in Research
This award highlights outstanding research using any of the IPUMS data collections by authors who identify as members of groups that are underrepresented in social science and health research.

  • Best published work: Samuel H. Kye, and Andrew Halpern-Manners. “If Residential Segregation Persists, What Explains Widespread Increases in Residential Diversity?”
  • Best student work: Sophie Li. “The Effect of a Woman-Friendly Occupation on Employment: U.S. Postmasters Before World War II.”

That’s great. I love that they give this award to people who use their data.

Also, their slogan is “Use it for good!” How cool is that?

Good job, IPUMS.

3 levels of fraud: One-time, Linear, and Exponential

In a one-time fraud, you do it, you get it over with, and you move on. Kind of like that saying, “The secret of a great success for which you are at a loss to account is a crime that has never been found out, because it was properly executed.” One-time frauds occur where for some reason the victim of the fraud is in no position to do anything about it, and the fraudster has no need to keep doing more of it.

In a linear fraud, you do it, and then you need to keep doing it, or you need to keep covering for it. An example is if you do fraudulent research to get an academic job, and then you’re expected to continue producing amazing results if you want promotion. So what do you do? You keep on doing what it takes to get those amazing results. Remember the Armstrong Principle: if you’re pushed to promise more than you can deliver, you’re motivated to cheat. Or, for another example, Lance Armstrong himself: he didn’t need to keep doping—retirement from competitive cycling was an option at any point—but he did need to continue to lie, and to intimidate others from lying, because these doping investigations kept happening. The system of laws and rules in cycling did not allow his cheating to be grandfathered in, so he had to keep on frauding.

In an exponential fraud, as time goes on you have to do larger and larger frauds. As we discussed in the context of Dan Davies’s book, Lying for Money, this exponential property is characteristic of financial fraud, where you have to keep scamming or borrowing more money to cover your past losses. At no point can you just close out, because if you don’t keep covering what you’d already promised, your creditors would close in. This also happens with people who try to resolve their gambling debts by making it all back at the track: they need to bet more and more until eventually they go bust. It’s worse than the classic gambler’s ruin problem, because the bets get bigger and bitter.

The above-linked post discussed linear and exponential frauds, but I hadn’t thought to include one-time fraud. As I wrote at the time, Maradona didn’t have to keep punching balls into the net; once was enough, and he still got to keep his World Cup victory. If Brady Anderson doped, he just did it and that was that; no escalating behavior was necessary.

It’s martingale time, baby! How to evaluate probabilistic forecasts before the event happens? Rajiv Sethi has an idea. (Hint: it involves time series.)

My Columbia econ colleague writes:

The following figure shows how the likelihood of victory for the two major party candidates has evolved since August 6—the day after Kamala Harris officially secured the nomination of her party—according to three statistical models (Trump in red, Harris in blue):

Market-derived probabilities have fluctuated within a narrower band. The following figure shows prices for contracts that pay a dollar if Harris wins the election, and nothing otherwise, based on data from two prediction markets (prices have been adjusted slightly to facilitate interpretation as probabilities, and vertical axes matched to those of the models):

So, as things stand, we have five different answers to the same question—the likelihood that Harris will prevail ranges from 51 to 60 percent across these sources. On some days the range of disagreement has been twice as great.

As we’ve discussed, a difference in 10% of predicted probability corresponds to roughly a difference of 0.4% (that is, 0.004) in predicted vote share. So, yeah, it makes complete sense to me that different serious forecasts would differ by this much, also it makes sense that markets could be different by this much. (As Rajiv discussed in an earlier post, for logistical reasons it’s not easy to arbitrage between the two markets shown above, so it’s possible for them to maintain some daylight between them.)

Also, this is a minor thing but if you’re gonna plot two lines that add to a constant, it’s enough to just plot one of them. I say this in part out of general principles and in part because these lines that cross 0.5 create shapes and other visual artifacts such as the “vase” in PredictIt plot. I think these visual artifacts get in the way of seeing and learning from the data.

OK, that’s all background. Different forecasts differ. The usual way we talk about evaluating forecast is by comparing to outcomes. Rajiv writes:

The standard approach would involve waiting until the outcome is revealed and then computing a measure of error such as the average daily Brier score. This can and will be done, not just for the winner of the presidency but also the outcomes in each competitive state, the popular vote winner, and various electoral college scenarios.

I think the evaluation should be done on vote margin, not on the binary win/loss outcome, as the evaluation based on a binary outcome is hopelessly noisy, a point that’s come up on this blog many times and which I explained again last month. Even if you use the vote margin, though, you still have just one election outcome, and that won’t be enough for you to compare different reasonable forecasts.

Rajiv has a new idea:

But there is a method of obtaining a tentative measure of forecasting performance even prior to event realization. The basic idea is this. Imagine a trader who believes a particular model and trades on one of the markets on the basis of this belief. Such a trader will buy and sell contracts when either the model forecast or the market price changes, and will hold a position that will be larger in magnitude when the difference between the forecast and the price is itself larger. This trading activity will result in an evolving portfolio with rebalancing after each model update. One can look at the value of the resulting portfolio on any given day, compute the cumulative profit or loss over time, and use the rate of return as a measure of forecasting accuracy to date.

This can be done for any model-market pair, and even for pairs of models or pairs of markets (by interpreting a forecast as a price or vice versa).

I don’t agree with everything Rajiv does here—he writes, “The trader was endowed with $1,000 and no contracts at the outset, and assigned preferences over terminal wealth given by log utility (to allow for some degree of risk aversion),” which makes no sense to me, as I think anyone putting $1000 in a prediction market would be able to lose it all without feeling much bite—but I’m guessing that if the analysis were switched to a more sensible linear utility model, the basic results wouldn’t change.

Rajiv summarizes his empirical results:

Repeating this exercise for each model-market pair, we obtain the following returns:

Among the models, FiveThirtyEight performs best and Silver Bulletin worst against each of the two markets, though the differences are not large. And among markets, PredictIt is harder to beat than Polymarket.

I don’t take this as being a useful evaluation of the three public forecasts, because . . . these are small numbers, and this is still just N = 1. It’s one campaign we’re talking about. Another way to put it is: What are the standard errors on these numbers? You can’t get a standard error from only one data point.

This doesn’t mean that the idea is empty; we should just avoid overinterpreting the results.

I haven’t fully processed Rajiv’s idea but I think it’s connected to the martingale property of coherent probabilistic forecasts, as we’ve discussed in the context of betting on college basketball and elections.

However you look at it, the thing that will kill your time series of forecasts is too much volatility: if you anticipate having to incorporate a flow of noisy information, you need to anchor your forecasts (using a “prior” or a “model”) to stop your forecast from being bounced around by noise.

Unfortunately, that goal of sensible calibrated stability runs counter to another goal of public forecasters, which is to get attention! For that, you want your forecast to jump around so that you are continuing to supply news.

I sent the above to Rajiv, who wrote:

One thing I didn’t show in the post is the how the value of the portfolio (trading on PredictIt) would have evolved over time:

You will see that if I had conducted this analysis a couple of weeks ago, Silver Bulletin would have been doing well. It really suffered when the price of the Harris contract rose on the markets, since it was holding a significant short position.

The lesson here (I think) is that we need lots of events to come to any conclusion. I am working on the state level and popular vote winner forecasts, but of course these will all be correlated so doesn’t really help much with the problem of small numbers of events. This is the Grimmer-Knox-Westwood point as I understand it.

If we were to apply this sort of procedure retroactively to past time series of forecasts or betting markets that were too variable because they were chasing the polls too much, then I’d think/hope it could reveal the problems. After 2016, forecasters have worked hard to keep lots of uncertainty in their forecasts, and I think that one result of that is to keep the market prices more stable.

Arguing about bitcoin

Last year, we ran a post, Omid Malekan on why crypto is not a scam. After that, I had lunch with Malekan and fellow Columbia Business school denizen Gur Huberman, and Malekan gave me a copy of his book, “Re-Architecturing Trust: The Curse of History and the Crypto Cure for Money, Markets, and Platforms.”

I flipped through the book and came to this passage on page 215:

Ironically, for all the hand-writing about Bitcoin’s price volatility, it has been the cryptocurrency’s single greatest selling point because, for over a decade, and despite several bear markets, it has always resolved itself to the upside. Put differently, those who have believed in this monetary prowess have been handsomely rewarded. To them, bitcoin’s volatility is a feature, not a bug. Had they held dollars (or any other fiat currency) instead, then they’d be a lot less rich. That wealth creation, along with the titillating prospect of even more to come, has propelled both the coin and the underlying technology forward for over a decade.

Seems descriptively accurate. I continue to think that the medium-term plan of bitcoin, as with other tech innovations, is regulatory capture: get enough well-connected people inside the pyramid so they can ensure government support in some way. But, in any case, yeah, the price of bitcoin has made some early investors rich.

Is “wealth creation” an accurate description of the rising price of a speculative asset? I’m not so sure about that. Sometimes yes, sometimes no, right? If I discover a new talent for painting, and my paintings are popular, and they start selling for a million dollars each, and I make a thousand of them, then, sure, in some sense I really have created a billion dollars of wealth, in that the world has these thousand paintings which people really seem to value. But this argument ultimately relies on the paintings as having this value. If all of them are blank white canvases, and I’ve sold them by convincing people that beautiful paintings will eventually appear on them (with the idea being that I used some special paint that takes years to show up on the canvas), then the argument for creating a billion dollars of wealth seems shakier; we’re more in a conditional zone where the value exists only for as long as people agree that this value is there.

So I think the claim of “wealth creation” leans on the assumption that bitcoin has an inherent value, something other than “the titillating prospect of even more to come.” And Malekan does argue that bitcoin has inherent value, and not just for early investors and criminals. I did not try to evaluate those claims, but that’s really the crux. Making people rich is not by itself wealth creation; it’s just a form of transfer payment.

Another issue that comes up is energy consumption, both the direct substitution effects—energy used in the bitcoin process is not being used for some more direct human purpose—and for the concern about climate change.

Malekan writes something about this, on page 217:

Here we should pause for a second to consider the environmental impact of mining, as it is often presented as a negative. Indeed, other than its (highly exaggerated) use in illicit activity, the Bitcoin platform’s electrical consumption, which we should admit is significant, is the most popular argument against further adoption. As with other critiques against crypto, there is an obvious double standard. Lots of activities negatively impact the environment, especially when measured in aggregate. Tellingly, none are as controversial. The United States uses more power for air-conditioning than the UK does in total, yet there is no shortage of crypto critics waxing poetic from their comfortably cool offices.

Bitcoin’s electrical consumption is a feature and not a bug, the natural evolution of the long-running relationship between money and power. Those who have issued the former have often deployed the latter to preserve its purchasing power. Case in point, one of the primary purposes of the US military—a major energy consumer in its own right—is to protect the purchasing power of the dollar. It does this by protecting unsavory regimes that price their exports in dollars and by dropping the occasional bomb.

Bitcoin’s clever contribution is to make this previously implicit relationship explicit. People trust its coins because they require a lot of power to produce. Human beings don’t value things that are easy to make. That’s why a handmade automatic Switch watch costs one thousand times more than a machine-made quartz one, even though the latter is better at telling time.

The complexity and cost of mining provide Bitcoin with a stronger foundation to grow from. They also allow it to become the rare monetary system that doesn’t rely on coercion, and is strictly opt-in. . . . Anyone concerned about the Environmental, Social and Governance (ESG) impact of mining should consider the social and governance benefits of a monetary system that is universally accessible, fully meritocratic, and not predicated on the threat of violence.

Ummmm . . . I disagree. I’m no fan of air conditioning myself, so I guess I’m with Malekan on that one, but otherwise the above paragraphs make no sense to me at all. Suppose Bitcoin used twice as much energy as it already did? Then would it be even better?? Also, people value many things that are easy to make. People don’t pay much for such things, but that doesn’t mean they don’t value them. I have no idea what it means for Bitcoin to be “fully meritocratic,” and I don’t see how the monetary system that supports the value of the $20 bill in my wallet is “predicated on the threat of violence,” any more than any possession or way of life is predicated on the threat of violence in the sense that if someone tries to violently rob me, some threat of violence might be necessary to deter them.

Again, it’s ultimately a cost-benefit question. Bitcoin using massive amounts of power is a cost, its use in crime is a cost, but, sure, maybe it has benefits too. To deny the cost and act as it’s actually a positive thing—that just seems nuts.

It’s kind of stunning how far apart Malekan and I are on this one. To disagree on the benefits of Bitcoin is one thing. But if he’s saying “Bitcoin’s electrical consumption is a feature and not a bug” . . . is there any possibility of communication at all?

P.S. I sent the above to Malekan, who responded:

I appreciate your pushback on my arguments. That we end up agreeing is of little importance to me, it’s the debate that is valuable.

On the question of monetary coercion, here’s what I meant: US law requires you to pay your taxes in dollars. If you insisted on paying in any other currencies, be it Bitcoin or Canadian dollars, then you’d be committing a felony and can face jail time. Similarly, US law also considers the dollar legal tender, so lenders have to accept it from debtors. There are civil consequences to refusing to do so. Zooming out, maintaining dollar supremacy is a publicly disclosed goal of America’s geopolitical disposition, which includes going to war. We have deals with countries like Saudi Arabia to provide military protection in exchange for them pricing and selling their oil in dollars (with the added consequence that they buy Treasuries with the proceeds). Some people argue that America’s endless military misadventures in the MidEast are more about preserving the Petrodollar system than control of oil. America is now the world’s #1 oil producer, so we aren’t as dependent as we used to be. But the dollar losing its reserve status would have a profound impact across the economy.

I’ll leave it up to you to decide whether the above is coercion, but keep in mind America is not the rest of the world. Are there kooky people here who love Bitcoin because they think the dollar is on the verge of collapse? There are—and I spend much more time debating with crypto purists like them than I do skeptics like you—but that’s not the kind of investor that made the new Bitcoin ETFs one of the most successful Wall Street product launches in history. Surely a pension fund that buys Blackrock’s IBIT through Morgan Stanley is not doing so because it wants to invest in criminal activity or is anti-ESG.

I have this running joke that almost every major critique of Bitcoin (including yours) makes a lot of sense…on the upper west side. Many countries have capital controls that restrict how their citizens use and spend their own money. I don’t have the exact figures but China and India both do, and that’s over 1/3 of the global population. These controls are often used by governments to pick winners over losers, enabling connected (and often corrupt) elites and crushing the financial livelihoods of ordinary people. This is certainly the case in my native Iran, and was also true in Argentina until not that long ago.

This thesis is partially validated by data on per capita cryptocurrency adoption. It tends to be highest in countries with high inflation, corrupt governments, and unreliable banking (banking in every country is an extension of its monetary system). I’d posit that 99% of the (often poor and brown) people in the top countries in that list have a valid reason for embracing crypto. It’s just not obvious to us because we have the luxury of a monetary and legal system that works.

Lastly, many of the commenters in your blog seem to think my book is some screed against fiat currencies and in defense of Bitcoin. But as you’ll see if you read the rest, it also argues favorably for solutions like stablecoins, which only expand the reach of the US dollar.

Pete Rose and gambling addiction: An insight and a question

I just read “Charlie Hustle: The rise and fall of Pete Rose and the last glory days of baseball,” by Keith O’Brien, and it gave me some insight into the logic of gambling addiction, along with a question.

The insight—which is not a deep insight, or anything original with me, it’s just something that I realized after reading O’Brien’s book—is that gambling addition cumulates. It’s related to what I called “exponential frauds” in my review of Dan Davies’s book.

It goes like this: if you’re a gambler, and you’re like the vast majority of gamblers, you lose. You’re repeatedly playing a game with negative expected value. And if your strategy is to try to make up your losses by further gambling, you’re heading into an exponential spiral.

For a compulsive gambler, a loss is not just a loss, it’s also part of an exponentially growing problem, that the more you lose, the more betting you’ll have to do to try to cover your debts, which leads to larger debts, etc., until everything blows up.

From that perspective, the question, “Why did Pete Rose bet on baseball?” has an easy answer. Pete Rose bet on the horses, and he bet on football, and he bet on basketball; the more he bet, the more he lost; the more he lost, the more he had to cover his bets (he also seems to have stiffed a lot of people along the way); so, eventually, yeah, he bet on baseball because that would’ve seemed to give him the best chance of winning back his money. I expect that the next step would’ve been for him to start throwing games.

I suspect that one reason that some people (including, notoriously Bill James; a good discussion of James’s position on the Pete Rose gambling issue is this post on Baseball Prospectus by Derek Zumsteg, which is all the more impressive given that it was written in 2003, before Rose’s confession that appeared a year later) didn’t want to accept that Rose bet on baseball is that: (a) Rose would be risking his livelihood and reputation by betting on baseball, and (b) he was already betting on other things, so it’s not like he needed baseball to scratch his gambling itch. But this reasoning didn’t account for the exponential spiral. With the financial walls closing around him, betting on baseball was Rose’s last shot at getting out of the hole. It was a natural progression.

O’Brien writes that, many years later, “Pete calculated that his banishment had cost him roughly $100 million since 1990, between lost earnings as a manager and lost sponsorship deals.”

I disagree! Based on Pete’s trajectory, I’m guessing that every one of those hundred million dollars—and more—would’ve gone to bookies. Or, to put it more precisely, those hundred million dollars would never have materialized, because if baseball hadn’t banished him and taken away his main streams of income, he would’ve gone to prison for tax evasion or drug dealing or some other mob-related crime. Had he been able to stop gambling, that would’ve been another story, but the most likely outcome seems to be that the necessary first step would’ve been for him to first spend every dollar he could get his hands on.

The question is: How did it take so long for things to get to that point? It seems that Rose had been addicted to gambling for decades before it all started crashing down. I guess what happened is that he first gambled small enough amounts that he was able to sustain the losses. And for many years his salary was increasing, first because he joined the major leagues and became a star, then later because his later career overlapped with the beginning of the sports salary explosion associated with TV contracts and free agency.

The two exponential time series—Rose’s betting losses and his income—were roughly on pace.

But then in the early 1980s, the curve of losses kept going up and the income stayed flat. For awhile he made up the difference through those notorious baseball card signings, along with a bit of tax evasion and maybe some investments in criminal businesses, but that wasn’t enough either. Next step was to bet on baseball, and you know the rest.

In his mangling of the Rose betting story, Bill James wrote, “Pete Rose, in the mid-1980s, had: 1. Become addicted to gambling, and 2. Largely lost his moral compass. Rose had entangled his life with a number of shady characters . . .” But from O’Brien’s book it seems clear that Rose was already addicted to gambling in the 1960s, and had entangled his life with a number of shady characters back then too. The “moral compass” thing isn’t so clear. I’m guessing that Rose never would’ve stiffed creditors or cheated on his taxes if he didn’t feel that he had to. It’s easier to be moral if you can live a comfortable life without needing to resort to immoral behavior. The gambling addition was already there, as were the shady characters; the moral compass was set aside because, in a classic addicted-gambler move, Rose saw no solution to his gambling problem other than to gamble more.

To me, an interesting aspect of this case is how long it took to blow up: the exponential spiral started slowly enough that Rose’s income, plus his various cheats, were enough to keep his addiction afloat for a long time.

A good reminder that something can be on an unsustainable path but take awhile to get there.

P.S. The sad thing, of course, is that the gambling is the least interesting thing about Rose. There are millions of addicts out there but only one person who played like Pete Rose. This is in contrast, for example, to William Bennett, the mediocre but well connected former Secretary of Education and political pundit, who was entirely uninteresting except for his gambling addiction.

P.P.S. Here’s another story making the same general point about the exponential nature of an addict’s growing problem, someone gambling more and more recklessly to cover the losses which continue to increase until the whole thing blows up.

Election prediction markets: What happens next?

Rajiv Sethi discusses a recent ruling by a U.S. regulatory agency to halt trading on election prediction markets. Here’s Rajiv:

As evidence for this assertion the agency cited some of my [Rajiv’s] writing [that] describe attempts at market manipulation, one for financial gain and the other seemingly for maintaining optimism about the prospects of a candidate.

However, I [Rajiv] feel that the agency is drawing the wrong conclusions from this work, and a proper understanding of it undermines rather than bolsters the case for prohibition.

Some people believe that attempts at manipulating prediction markets are doomed to failure—that such attempts can have no more than a modest and short-lived effect on prices before other traders see a significant profit opportunity and pounce. I do not subscribe to this view. But when there exist prediction markets that lie outside the reach of our regulators, such as crypto-based Polymarket or the British exchange Betfair, the best defense against market manipulation is not prohibition but greater competition and transparency.

That is, I favor allowing Kalshi to proceed with the listing of contracts that reference election outcomes, not because I dismiss concerns about market manipulation or election integrity, but because I take them very seriously. . . .

Visibility is sharpened by the existence of multiple competing markets, especially if they have limited participant overlap. Kalshi is a regulated exchange restricted to verified domestic accounts funded with cash. Polymarket is crypto-based, does not accept cash deposits from US residents, and lies largely outside the purview of our regulatory apparatus.

Rajiv continues:

Prediction markets are characterized by an inescapable paradox. If they are taken seriously as unbiased aggregators of distributed information, beliefs will shift when prices change, and incentives for manipulation will be significant. But if they are seen as vulnerable to manipulation and frequently biased, price changes will be largely ignored, and they will not be worth manipulating. This logic places bounds on the extent of manipulation in markets—it cannot be absent altogether, but cannot be so large as to undermine their credibility.

Another way of saying this is that there are three sorts of manipulations to be concerned about:

1. People manipulating the market to make money using classic schemes such as spreading false information and using this to pump-and-dump, or taking advantage of insider information.

2. People manipulating the market to affect voting behavior or the behavior of potential donors or endorsors. I doubt this would have much effect for major-party candidates in a general election, but it could be an issue in primary elections, where there’s a clear benefit to be perceived as being in the top tier.

3. People trying to throwing the election to win a bet—or something more subtle such as point shaving.

In his post, Rajiv talks about #1 and #2 but not #3. Related to his “prediction market paradox” is that #2 can happen if the amount of money in the markets is low compared to the stakes of the election, so that the prices are affordable to manipulate given the potential gain, whereas #3 can happen if the amount of money in the markets is high compared to the stakes of the election, so that it’s worth taking the risk to throw it. Is there a sweet spot where the markets are big enough so they can’t easily be manipulated but small enough that there’s no motivation to throw the election? I guess that for U.S. presidential elections, the answer would be yes. For some other elections, maybe not.

Rajiv is not saying that prediction markets should be unregulated, indeed he writes that derivative contracts in election markets “serve no legitimate purpose and open up rather obvious strategies for manipulation. And even if attempts at manipulation fail in the end, they still arouse suspicion and sow confusion. . . . It would be a good thing if they were discontinued.” And, even setting aside derivative contracts, even with straight-up election bets, concerns about market manipulation, insider trading, and point shaving are real. But these are issues with all markets: they’re reasons to regulate, not to ban. Or, to put it another way, regulations are about tradeoffs. There are political costs to gambling on elections, but, as Rajiv argues, there are political benefits too.

Rajiv also asks:

Can the forecasting accuracy of markets exceed that of statistical models?

My quick answer is, Yes, I think that markets can be more accurate than statistical forecasts! Statistical election forecasts are public, so market players can make use of that information for free. Indeed, as discussed in this recent post, I think the presence of different public forecasts (including ours at The Economist) does its part in stabilizing market behavior, at least for the national electoral college and popular vote.

For side bets such as individual states and tail probabilities (what’s the chance that Harris wins South Dakota?) or various trifecta-style bets (what’s the chance Trump wins states X, Y, and Z?), not so much, and I say this for two reasons. First, even the forecasts that do well at the headline numbers have problems in the tails (as with the Fivethirtyeight forecast and the Economist’s as well), and prediction markets also do weird things when the probabilities are near 0 or 1 (see here). As for conditional probabilities, which seem so tantalizingly available by juxtaposing prices on multiple bets . . . I think there might be something there, but it would take a bit of statistical modeling; given the noise on each price, if you try to put them together in a naive way, you’ve got nothing but trouble.

This is not to say that prediction markets are a bad thing; just that we should be understanding them empirically—or, I should say, scientifically, combining empirics and theory, since neither alone will do the job—, not just blindly following pro- or anti-market ideology. As Rajiv says, the markets are out there already, and we can learn from them.

20 years of blogging . . . What have been your favorite posts?

Our first post was on 12 Oct 2004: A weblog for research in statistical modeling and applications, especially in social sciences; followed by The Electoral College favors voters in small states; Why it’s rational to vote; Bayes and Popper; and Overrepresentation of small states/provinces, and the USA Today effect.

Later that month we had our first post on the Red State/Blue State Paradox, a guest post on statistical issues in modeling social space, and a stab at partial pooling of interactions.

On 27 Oct came one of my favorite early posts, The blessing of dimensionality, and early the next month came Sam Cook’s post on her now-classic work on Bayesian software validation, which we now call SBC, for simulation-based calibration checking, an early post on cross-validation for Bayesian multilevel modeling—a topic on which we made lots of progress in the decades since.

Also in Nov 2004 came our first formulation of the important (to me) concept of institutional decision analysis, my discovery about correlations in before-after data, and a debunking of a claim of possible election fraud in Florida.

Those were the days!

Here’s my question for you!

What are your favorite posts? It would be fun to compile a list in commemoration of 20 years and over 12,000 posts. So post titles and links to your favorites in the comments below. You can include as many as you want.

Also if you have any favorite posts from the past 20 years on other blogs, share those too. Thanks!

Freakonomics asks, “Why is there so much fraud in academia,” but without addressing one big incentive for fraud, which is that, if you make grabby enough claims, you can get featured in . . . Freakonomics!

There was this Freakonomics podcast, “Why Is There So Much Fraud in Academia?” Several people emailed me about it, pointing out the irony that the Freakonomics franchise, which has promoted academic work of such varying quality (some excellent, some dubious, some that’s out-and-out horrible), had a feature on this topic without mentioning all the times that they’ve themselves fallen for bad science.

As Sean Manning puts it, “That sounds like an episode of the Suburban Housecat Podcast called ‘Why are bird populations declining?'”

And Nick Brown writes:

Consider the first study on the first page of the first chapter of the first Freakonomics book (Gneezy & Rustichini, 2000, “A Fine is a Price”, 10.1086/468061), in which, when daycare centres in Israel started “fining” parents for arriving late to pick up their children, the amount of lateness actually went up. I have difficulty in believing that this study took place exactly as described; for example, the number of children in each centre appears to remain exactly the same throughout the 20 weeks of the study, with no mention of any new arrivals, dropouts, or days off due to illness or any other reason. Since noticing this, I have discovered that an Israeli economist named Ariel Rubinstein had similar concerns (https://arielrubinstein.tau.ac.il/papers/76.pdf. pp. 249–251). He contacted the authors, who promised to put him in touch with the staff of the daycare centres, but then sadly lost the list of their names. The paper has over 3,200 citations on Google Scholar.

I replied: Indeed, the Freakonomics team has never backed down on many ridiculous causes they have promoted, including the innumerate claim that beautiful parents are 36% more likely to have girls and some climate change denial. But I’m not criticizing the researchers who participated in this latest Freakonomics show; we have to work with the news media we have, flawed as they are.

And, as I’ve said many times before, Freakonomics has so much good stuff. That’s why I’m disappointed, first when they lower their standards and second when they don’t acknowledge or wrestle with their past mistakes. It’s not too late! They could still do a few shows—or even write a book!—on various erroneous claims they’ve promoted over the years. It would be interesting, it would fit their brand, it could be educational and also lots of fun.

This is similar to something that occurs in the behavioral economics literature: there’s so much research on how people make mistakes, how we’re wired to get the wrong answer, etc., but then not so much about the systematic errors made in the behavioral economics literature itself. As they’d say in Freakonomics, behavior can be driven by incentives.

P.S. Some interesting discussion in comments regarding the Gneezy and Rustichini paper. I’ve not looked into this one in detail, and my concerns with Freakonomics don’t come from that example but from various other cases over the years where they’ve promoted obviously bad science; see above links.