They say that stocks go down during the day and up at night.

Bruce Knuteson writes:

Prompted by your blog post this morning, I attach a plot from Figure 3 of They Still Haven’t Told You showing overnight and intraday returns to AIG (with logarithmic vertical scale, updated with data through the end of October).

If you invested $1 in AIG at the start of 1990 and received only intraday returns (from market open to market close), you would be left with one-twentieth of a penny, suffering a cumulative return of -99.95%. If you received only overnight returns (from market close to the next day’s market open), you would have $1,017, achieving a cumulative return of roughly +101,600%.

You can easily reproduce this plot yourself. Data are publicly available from Yahoo Finance.

AIG is just one of many stocks with a suspiciously divergent time series of overnight and intraday returns.

If you have a compelling innocuous explanation for these strikingly suspicious overnight and intraday returns that I have not already addressed, I would of course be keen to hear it.

Alternatively, if you can think of a historical example of a strikingly suspicious return pattern in a financial market that turned out to clearly be fine, I would be keen to hear it.

If neither, perhaps you can bring these strikingly suspicious return patterns to the attention of your readers.

What continues to stun me is how something can be clear and unambiguous, and it still takes years or even decades to resolve.

The linked article is fun to read, but I absolutely have no idea about this, so just sharing with you. Make of it what you will. You can also read this news article from 2018 by Matt Levine which briefly discusses Knuteson’s idea.

Pssst . . . anyone interested in the email list of Modern Language Association??

This one came in the email:

On Dec 5, 2022, at 12:01 PM, ** <**@gmail.com> wrote:

Hi,

I just wanted to know if you’re interested in acquiring the email-list of Modern Language Association of America 2023

Please let me know your thoughts on this so that I can share with you the number of attendees and the cost.

Awaiting your reply.

Regards,
**
Marketing Manager

The bolding and highlighting was in the original.

I’m really tempted to reply to find out the number of attendees and the cost, but not so tempted that I want to get into an email conversation with a spammer. And, as David Owen memorably pointed out so many years ago in his report on the meeting of meeting planners, the people who go to the meetings are the attenders; it is the conference itself that is the attendee.

In any case, I just love the idea that someone’s out there hawking the MLA membership list. What’re they planning to sell to these people? Autographed copies of the complete works of Chaucer?

On the ethics of pollsters or journalists or political scientists betting on prediction markets

There’s been a bit of concern lately about political consultants or pundits offering some mix of private and public forecasting and advance, and also making side bets on elections. I don’t know enough about these stories to comment on them in any useful way. Instead I’ll share my own perspectives regarding betting on elections.

In June 2020 I wrote a post about an opportunity in a presidential election prediction market—our model-based forecast was giving Biden an 84% chance of winning the election, whereas the market’s implied odds were 53% for Biden, 40% for Trump, 2% for Mike Pence, 2% for Hillary Clinton (!), and another few percent for some other possible longshot replacements for Biden or Trump.

Just to be clear: Those betting odds didn’t correspond to Biden getting 53% of the vote, they corresponded to him having a 53% chance of winning, which in turn basically corresponded to the national election being a tossup.

I thought seriously about laying down some $ on Biden and then covering it later when, as anticipated, Biden’s price moved up.

Some people asked why I wasn’t putting my money where my mouth was. Or, to put it another way, if I wasn’t willing to bet on my convictions, did I really believe in my own forecast? Here’s what I wrote:

I agree that betting is a model for probability, but it’s not the only model for probability. To put it another way: Yes, if I were planning to bet money on the election, I would bet using the odds that our model provided. And if I were planning to bet a lot of money on it, I guess I’d put more effort into the forecasting model and try to use more information in some way. But, even if I don’t plan to bet, I can still help to create the model as a public service, to allow other people to make sensible decisions. It’s like if I were a chef: I would want to make delicious food, but that doesn’t mean that I’m always hungry myself.

Ultimately I decided not to bet, for a combination of reasons:

– I didn’t quite know how to do it. And I wasn’t quite sure it would be legal.

– The available stakes were low enough that I couldn’t make real money off it, and, if I could’ve, I would’ve been concerned about the difficulty of collecting.

– The moral issue that, as a person involved in the Economist forecast, I had a conflict of interest. And, even if not a real conflict of interest, a perceived conflict of interest.

– The related moral issue that, to the extent that I legitimately am an expert here, I’m taking advantage of ignorant people, which doesn’t seem so cool.

– Asymmetry in reputational changes. I’m already respected by the people that matter, and the people who don’t respect me won’t be persuaded by my winning some election bets. But if I lose on a public bet, I look like a fool. See the last paragraph of section 3 of this article.

Also there’s my article in Slate on 19 lessons from the 2016 election:

I myself was tempted to dismiss Trump’s chances during primary season, but then I read that article I’d written in 2011 explaining why primary elections are so difficult to predict (multiple candidates, no party cues or major ideological distinctions between them, unequal resources, unique contests, and rapidly changing circumstances), and I decided to be careful with any predictions.

The funny thing is that, in Bayes-friendly corners of the internet, some people consider it borderline-immoral for pundits to not bet on what they write about. The idea is that pundits should take public betting positions with real dollars cos talk is cheap. At the same time, these are often the same sorts of people who deny that insider trading is a thing (“Differences of opinions are what make bets and horse races possible,” etc.) It’s a big world out there.

Real-world prediction markets vs. the theoretical possibility of betting

Setting aside the practical and ethical problems in real-world markets, the concept of betting can be useful in fixing ideas about probability. See for example this article by Nassim Taleb explaining why we should not expect to see large jumps up and down in forecast probabilities during the months leading up to the event being forecast.

This is a difficult problem to wrestle with, but wrestle we must. One challenge for election forecasting comes in the mapping between forecast vote share and betting odds. Small changes in forecast vote shares correspond to big changes in win probabilities. So if we want to follow Taleb’s advice and keep win probabilities very stable until shortly before the election (see figure 3 of his linked paper), then that implies we shouldn’t be moving the vote-share forecast around much either. Which is probably correct, but then what if the forecast you’ve started with is way off? I guess that means your initial uncertainty should be very large, but how large is reasonable? The discussion often comes up when forecasts are moving around too much (for example, when simple poll averages are used as forecasts, then predictable poll movements cause the forecasts to jump up and down in a way that violates the martingale property that is required of proper probability forecasts), but the key issue comes at the starting point.

So, what about future elections? For example, 2024. Or 2028. One issue that came up with 2020 was that everyone was pretty sure ahead of time, and also correct in retrospect, that the geographic pattern of the votes were aligned so that the Democrats would likely need about 52% of the vote to win the electoral college. So, imagine that you’re sitting in 2019 trying to make a forecast for the 2020 presidential election, and, having read Taleb etc., you want to give the Democrats something very close to a 50/50 chance of winning. That would correspond to saying they’re expected to get 52% of the popular vote. Or you could forecast 50% in the popular vote but then that would correspond to a much smaller chance of winning in the electoral college. For example, say your forecast popular vote share, a year out, is 0.50 with a standard error of 0.06 (so the Democratic candidate has a 95% chance of receiving between 38% and 62% of the two-party vote), then the probability of them winning at least 52% is pnorm(0.50, 0.52, 0.06) = 0.37. On the other hand, it’s been awhile since the Republican candidate has won the popular vote . . . You could give arguments either way on this, but the point is that it’s not so clear how to express high ignorance here. To get that stable probability of the candidate winning, you need a stable predicted vote share, and, again, the difficulty here isn’t so much with the stability as in what’s your starting point is.

Summary

Thinking about hypothetical betting odds can be a useful way to understand uncertainty. I’ve found it helpful when examining concerns with my own forecasts (for example here) as well as identifying problems with forecast-based probability statements coming from others (for example here and here).

Actual betting is another story. I’m not taking a strong moral stance against forecasters also making bets, but I have enough concerns that I’m not planning to do it myself. On the other hand, I’m glad that some low-key betting markets are out there; they provide some information even if not as much as sometimes claimed. Rajiv Sethi discusses this point in detail here and here.

Grade inflation: Why hasn’t it already reached its terminal stage?

Paul Alper writes:

I think it is your duty to write something about this. Why? For one thing, it does not involve Columbia. For another, I presume you and your family will return to NYC and someone in your family in the future will seek medical care in hopes that the physician understands organic chemistry and what it implies for proper medical care. However, if you do not want to be recorded on this, how about getting one of your bloggers who has medical degrees or have some sort of connection with NYU? To be extra safe, have the column written in French or dialect thereof. Another idea: relate this to students taking an undergraduate statistics course where the failure rate is high.

P.S. True story: When I was an undergrad at CCNY in engineering, the administration was up-front loud and proud about the attrition rate from first to second year in engineering because that proved it was an academically good institution.

“This” is a news article entitled “At NYU, Students Were Failing Organic Chemistry. Whose Fault Was It?”, which continues, “Maitland Jones Jr., a respected professor, defended his standards. But students started a petition, and the university dismissed him.” It seems that he was giving grades that were too low.

Fitting this into the big picture: this particular instructor was a retired professor who was teaching as an adjunct, and, as most readers are aware, adjuncts have very little in the way of workplace rights.

The real question, though, which I asked ten years ago, is: Why haven’t instructors been giving all A’s already? All the incentives go in that direction. People talk lots about grade inflation, but the interesting thing to me is that it hasn’t already reached its terminal stage.

Water Treatment and Child Mortality: A Meta-analysis and Cost-effectiveness Analysis

This post is from Witold.

I thought some of you may find this pre-print (that I am a co-author of) interesting. It’s a meta-analysis of improving water quality in low and middle income countries. We estimated this reduced odds of child mortality by 30% based on 15 RCT. That’s obviously a lot! If true, this would have very large real-world implications, but there are of course statistical considerations of power, publication bias etc. So I thought that maybe some of the readers will have methodological comments while others may be interested in the public health aspect of it. It also ties to a couple of follow-up posts I’d like to write here on effective altruism and finding cost-effective interventions.

First, a word on why this is an important topic. Globally, for each thousand births, 37 children will die before the age of 5. Thankfully, this is already half of what it was in 2000. But it’s still about 5 million deaths per year. One of the leading causes for death in children is diarrhea, caused by waterborne diseases. While chlorinating [1, scroll down for footnotes] water is easy, inexpensive, and proven to remove pathogens from water, there are many countries where most people still don’t have access to clean water (the oft-cited statistic is that 2 billion people don’t have access to safe drinking water).

What is the magnitude of impact of clean water on mortality? There is a lot of experimental evidence for reductions in diarrhea, but making a link between clean water and mortality requires either an additional, “indirect”, model connecting disease to deaths, which is hard [2], or directly measuring deaths, which are rare (hence also hard) [3].

In our pre-print [4], together with my colleagues Michael Kremer, Steve Luby, Ricardo Maertens, and Brandon Tan we identify 53 RCTs of water quality treatments. Contacting the authors of each study resulted in 15 estimates that could be meta-analysed, with about 25,000 children. (Why only 15 out of 53? Apparently because the studies were not powered for mortality, with each one of them contributing just a handful of deaths, in some cases the authors decided to not collect, retain or report deaths.) As far as we are aware, this is the first attempt to meta-analyse experimental evidence on mortality and water quality.

We conduct a Bayesian meta-analysis of these 15 studies using a logit model and find a 30% reduction in odds of all-cause mortality (OR = 0.70, with a 95% interval 0.49 to 0.93), albeit with high (and uncertain) heterogeneity across studies, which means the predictive distribution for a new study has a much wider interval and slightly higher mean (OR=0.75, 95% interval 0.29 to 1.50). This heterogeneity is to be expected because we compare different types of interventions in different populations, across a few decades.[5] (Typically we would want to address this with a meta-regression, but that is hard due to a small sample.)

The whole analysis is implemented in baggr, an R package that provides meta-analysis interface for Stan. There are some interesting methodological questions related to modeling of rare events, but repeating this analysis using frequentist methods (random-effects model on Peto’s OR’s has a mean OR of 0.72) as well as various sensitivity analyses we could think of all lead to similar results. We also think that publication bias is unlikely. Still, perhaps there are things we missed.

Based on this we calculate about $3,000 cost per child death averted, or under $40 per DALY. It’s hard to convey how extremely cost-effective this is (a typical cost effectiveness threshold is equivalent of one years GDP per DALY; this is reached at 0.6% reduction in mortality), but basically it is on par with the most cost-effective child health interventions such as vaccinations.

Since the cost-effectiveness is potentially so high, there are obviously big real-world implications. Some funders have been reacting to the new evidence already. For example, some months ago GiveWell, an effective altruism non-profit that many readers will already be familiar with, conducted their own analysis of water quality interventions and in a “major update” of their assessment recommended a grant of $65 million toward a particular chlorination implementation [6]. (GiveWell’s assessment is an interesting topic for a blog post of its own, so I hope to write about it separately in the next few days.)

Of course in the longer term more RCTs will contribute to precision of this estimate (several are being worked on already), but generating evidence is a slow and costly process. In the short term the funding decisions will be driven by the existing evidence (and our paper is still a pre-print), so it would be fantastic to see if readers have comments on methods and its real-world implications.

 

Footnotes:

[1] For simplicity I simply say “chlorination” but this may refer to chlorinating at home, at the point from which water is drawn, or even using a device in the pipe, if households have piped water which may be contaminated. Each of these will have different effectiveness (primarily due to how convenient it is to use) and costs. So differentiating between them is very important for a policy maker. But in this post I group all of this to keep things simple. There are also other methods of improving quality, e.g. filtration. If you’re interested, this is covered in more detail in the meta-analyses that I link to.

[2] Why is extrapolating from evidence on diarrhea into mortality hard? First, it is possible that reduction in severe disease is higher (in the same way that vaccine may not protect you from infection, but it will almost definitely protect you from dying). Second, clean water also has lots of other benefits, e.g. it likely makes children less susceptible to other infections, nutritional deficiencies, and also makes their mothers healthier (which could in turn lead to fewer deaths during birth). So while these are just hypotheses, it’s hard a priori to say how a reduction in diarrhea would translate to a reduction in mortality.

[3] If you’re aiming for 80% power to detect 10% reduction in mortality you will need RCT data on tens of thousands of children. Exact number of course depends on how high baseline mortality rate is in the studies.

[4] Or, to be precise, an update to a version of this pre-print which we released in February 2022. If you happened to read the previous version of the paper, both main methods and results are unchanged, but we added extra publication bias checks, characterization of the sample and rewrote most of the paper for clarity.

[5] That last aspect of heterogeneity seems important, because some have argued that the impact of clean water may diminish with time. There is a trace of that in our data (see supplement), but with 15 studies the power to test for this time trend is very low (which I show using a simulation approach).

[6] GiveWell’s analysis included their own meta-analysis and led to more conservative estimates of mortality reductions. As I mention at the end of this post, this is something I will try to blog about separately. Their grant will fund Dispensers for Safe Water, an intervention which gives people access to chlorine at the water source. GW’s analysis also suggested a much larger funding gap in water qulity interventions, of about $350 million per year.

“Several artists have torn, burned, ripped and cut their artwork over the course of their careers. it is still possible for them to do it, but they will be able to preserve their artwork permanently on the blockchain.”

I’m tinguely all over from having received this exclusive 5D invitation, and I just have to share it with all of you!

It’s a combination of art museum and shredder that’s eerily reminiscent of the Shreddergate exhibit at the Museum of Scholarly Misconduct.

Here’s the story:

You Are Invited: Media Event During Art Basel Week in Miami

Live demonstration of groundbreaking new NFT technology by the engineers, alongside some of Miami’s leading artists.

Two dates for the live presentations for the media:

  • Thursday, Dec. 1 at 3:00 p.m.
  • Friday, Dec. 2 at 3:00 p.m.

At the Center for Visual Communication in Wynwood, 541 NW 27th Street in Miami.

* * * Media RSVP is required at: https://www.eventbrite.com/**

This is a private event for the news media, by invitation only, and is not open to the public.

This media event will be presented at the location of the new exhibition “The Miami Creative Movement” featuring 15 of Miami’s leading artists.

Media Contacts: ** & ** 305-**-** **@**.com

– This will be the official launch of the new ** Machine, the first hardware-software architecture that creates a very detailed digital map of an artwork using a novel ultra-detailed 5D scanning technology.

– They will transform physical artworks into NFTs in real time for the audience, via this new hardware device they invented.

– The sublimated artworks will be uploaded to the blockchain live, in real-time, and will be showcased in an immersive VR environment at the event.

– After the scanning is completed, a laser-shredder “sublimates” the object, erasing the physical artwork and minting a new NFT directly on the blockchain.

The technology’s creator — ** — hails this as: “The first NFT-based technology that will allow artists and collectors to preserve works of art indefinitely in digital form, simply and without loss of information. Provenance is indisputable and traceable back to the original work of art to every brushstroke and minutiae detail.”

“As this new hardware revolutionizes art conservation around the world and attracts many artists to Web3, it also adds legitimacy to real world art on blockchains, enabling them to be traded.”

LIVE EVENT FOR THE MEDIA DURING ART BASEL WEEK IN MIAMI:

** Presents a Technology That Could Revolutionize NFTs and the World of Physical Art Forms

** – an Argentinian team of blockchain experts, technologists and artists, announces the official launch of the ** Machine, the first hardware-software architecture that creates a very detailed digital map of an artwork, using a novel ultra-detailed 5D scanning technology.

After the scanning is completed, a laser-shredder “sublimates” the object, erasing the physical artwork and minting a new NFT directly on the blockchain.

The technology’s creator hails this as the first NFT-based technology that will allow artists and collectors to preserve works of art indefinitely in digital form, simply and without loss of information.

The new artwork transcends the physical work into the blockchain as a unique NFT that can be referenced to the original sublimated artwork, and to which provenance is indisputable and traceable back to the original work of art to every brushstroke and minutiae detail.

Several artists have torn, burned, ripped and cut their artwork over the course of their careers. It is still possible for them to do it, but they will be able to preserve their artwork permanently on the blockchain.

As the hardware revolutionizes art conservation around the world and attracts many artists to Web3, it also adds legitimacy to real world art on blockchains, enabling them to be traded.

Artists who “burn” their paintings with the ** technology will get 85% of the revenue obtained from the newly created NFT and its addition to the most popular NFT marketplaces.

The artist ** completing the process of physical destruction of his painting

This process of creative destruction will be showcased for the news media during the week of Art Basel Miami at the Center for Visual Communication in Miami’s Wynwood Arts District.

There will be a live demonstration for members of the press and the sublimated artworks uploaded to the blockchain will be showcased in the art gallery in an immerse VR environment.

Argentina-based ** is a Web3 and Metaverse company dedicated to transferring the value of art to the digital world. The company was co-founded by **, **and **.

**’s first product is the ** Machine, a technology that scans and laser cuts physical artworks to produce NFTs. Read more at **.

You Are Invited: Media Event During Art Basel Week in Miami

Live demonstration of groundbreaking new NFT technology by the engineers, alongside some of Miami’s leading artists.

Two dates for the live presentations for the media:

  • Thursday, Dec. 1 at 3:00 p.m.
  • Friday, Dec. 2 at 3:00 p.m.

At the Center for Visual Communication in Wynwood, 541 NW 27th Street in Miami.

* * * Media RSVP is required at: https://www.eventbrite.com/**

This is a private event for the news media, by invitation only, and is not open to the public.

This media event will be presented at the location of the new exhibition “The Miami Creative Movement” featuring 15 of Miami’s leading artists.

Media Contacts: ** & ** 305-***-**** **@**.com

**
**
P.O. Box **
Miami Beach, FL 33239

“This is a private event for the news media, by invitation only, and is not open to the public.” . . . . wow, this makes me feel so important! There’s so much juicy stuff here, from the “5D scanning technology” onward. “This process of creative destruction” indeed. I’m assuming that anyone who showed up to this event was escorted there in a shiny new Hyperloop vehicle directly from their WeWork shared space. Unicorns all around!

But what’s with the “laser shredder”? Wouldn’t it be enough just to crumple up the original artwork and throw it in the trash?

It’s always fun to be on the inside of “a private event for the news media, by invitation only,” even if it’s not quite as impressive as the exclusive “non-transferable” invitation to hang out with Grover Norquist, Anne-Marie Slaughter, and a rabbi for a mere $16,846.

.

There are five ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, (5) refusing to promote gambling to college students

Wow, this story is horrible. It starts out innocuously enough with some boardroom shenanigans:

In September 2021, an official in Michigan State University’s athletic department sent an email to his boss with exciting news: An online betting company was willing to pay handsomely for the right to promote gambling at the university.

“Alan, if we are willing to take an aggressive position, we have a $1 M/year deal on the table with Caesar’s,” Paul Schager wrote to Alan Haller, the university’s athletic director. . . .

Unlike public universities, which are subject to government disclosure rules and freedom of information requests, the sports-marketing companies are privately held. That means the terms of the deals they strike don’t have to be publicly disclosed if the universities are not a party to the contracts.

Hey, don’t they know it’s “Caesars,” not “Caesar’s”? A bunch of ignoramuses there, is what we’ve got. In any case, can’t they just follow the path of their Big Ten rivals at Ohio State and get their ill-gotten gains via government grants for fraudulent research?

But, hey, it’s cool, all explained in Newspeak for you right here:

Mr. Schager, executive associate athletic director at Michigan State, described this benefit of the system.

“With the multimedia rights holder, public institutions like Michigan State no longer have to disclose all those sponsorship deals,” he said in an interview. “This helps with the sponsors being able to spend what they feel is appropriate without having the public or employees or stockholders question that investment.”

The Michigan State athletic department . . . What could possibly go wrong?

But then there’s this:

Some aspects of the deals also appear to violate the gambling industry’s own rules against marketing to underage people. The “Responsible Marketing Code” published by the American Gaming Association, the umbrella group for the industry, says sports betting should not be advertised on college campuses.

And this:

The University of Maryland, for example, has a partnership with the sports-gambling platform PointsBet. A university website links to a PointsBet page that entices new customers this way: “Get your first bets risk free up to $2000 + $100 in free bets.” The pitch means that if you lose your initial $2,000, PointsBet will let you make another $2,000 worth of complimentary bets. . . .

The University of Maryland! I was gonna say that they haven’t had any major scandal since 1986, but then just to check I googled *university of maryland athletic department scandal* and . . . yes, they’ve had major scandals since then.

And this doozy:

Cody Worsham, L.S.U.’s associate athletic director and chief brand officer, said in a statement that Caesars and the university “share a commitment to responsible, age-appropriate marketing.” That commitment, Mr. Worsham added, “is integral to a sustainable and responsible partnership benefiting our entire department, university, and fan base.” . . . At L.S.U., Caesars promotions downplay the risk of losing. In an email, gamblers were told they could bet “on all the sports you love right from the palm of your hand, and every bet earns more with Caesars Rewards — win or lose.”

LSU, huh? I guess they have some need of a “chief brand officer.”

This one’s pretty good too:

In 2020, Texas Christian University, in Fort Worth, joined WinStar World Casino and Resort to open a new club with suites and premium seating.

I haven’t kept up on which religions currently allow drinking, betting, and dancing. What next, rock ‘n’ roll?

I can’t wait till Columbia gets its own sports betting contract. It’s been a few years since Columbia’s been to a bowl game or the NCAA tournament, but we could always bet on movements in our U.S. News ranking or things like that. No possibilities for insider trading there, right??

P.S. That all said, I’m a Michigan football fan. Not a lot—I don’t really follow college sports at all—but a little, partly because my sister teaches at the University of Michigan and partly because my dad hated Woody Hayes. And I enjoy betting on sports from time to time. Betting is fun, in moderation. The thing that bugs me about these gambling companies is that their business model seems to be based on getting addicts to gamble more. As I wrote a few years ago, as a statistician I am pretty disgusted about articles that celebrate the use of statistics to rip people off. This might be the same way that, if I were a programmer, I’d dislike articles that glamorize the hackers who scam people out of their passwords. Yes, statistics can be used for all sorts of bad ends and this should be vigorously reported. But not celebrated.

Update on the “burly coolie” and the “titled English woman”

The Journal of Political Economy published this correction:

There is an error in “Self-Control at Work” (Kaur, Kremer, and Mullainathan 2015), published in the October 2015 edition of this journal (vol. 123, no. 6). In section VI, on page 1274, the paper includes the following incorrect quote from a paper by Steven N. S. Cheung:

The second view—that joint production necessitates the need for monitoring (Alchian and Demsetz 1972)—is summarized in a story by Steven Cheung (1983, 8): “On a boat trip up China’s Yangtze River in the 19th Century, a titled English woman complained to her host of the cruelty to the oarsmen. One burly coolie stood over the rowers with a whip, making sure there were no laggards. Her host explained that the boat was jointly owned by the oarsmen, and that they hired the man responsible for flogging.”

While the incorrect quote also appears in other earlier sources, it does not appear in Cheung’s original article. [For example, the incorrect version of the quote also appears in Jensen et al. (1998).]

The accurate quotation from Cheung (1983, 8) is as follows:

My own favorite example is riverboat pulling in China before the communist regime, when a large group of workers marched along the shore towing a good-sized wooden boat. The unique interest of this example is that the collaborators actually agreed to the hiring of a monitor to whip them. The point here is that even if every puller were perfectly “honest,” it would still be too costly to measure the effort each has contributed to the movement of the boat, but to choose a different measurement agreeable to all would be so difficult that the arbitration of an agent is essential.

The inaccurate quote was included simply as a way to illustrate the idea that joint production might necessitate the need for monitoring. . . . However, the quote is in no way central to the core point of the paper, or even for the discussion in section VI of the paper. . . . Consequently, this incorrect quote can be omitted from the paper without any impact on the substance of the paper.

See here and here for background.

I’m glad the journal fixed the error.

Some loose ends remain. I’m still not clear how the authors of this article ended up taking an over-the-top description from an obscure collection of practice exam questions (Jensen et al., 1998). I also remain disturbed by the whole “titled English woman” thing, as it just fits in so well with economists’ attitude they are superior to the sort of softies who think flogging is a bad thing. Also not clear how they attributed the story from a collection of practice exam questions to a specific page in the Cheung (1983) article. I’m not saying there was anything nefarious going on here, just that they were a bit too comfortable with a ridiculous story that confirmed their preconceptions.

To put it another way: If the original story was a good illustration of their general point, then it should make a difference that the original story was fabricated (or, to put it more charitably, was elaborated to a ridiculous degree), no? What does it mean to mistakenly use a false story as evidence? At the very least, it’s informative that the authors and others seemed to have believed the story was true. The whole thing’s still eating at me. It would be as if I wrote a political science paper and used a story from Game of Thrones to illustrate a point—but presenting it as if Westeros was a real place. Using a made-up story as explanation is fine; just clearly label it as such, and maybe reflect on what it is about the story that’s so appealing to you.

Fields where it matters that “there’s no there there,” fields where you can thrive on B.S. alone, and everything in between

Seeing this desperate attempt by Tyler Cowen to cover for crypto scams (his list of “falls in status” includes silly items such as “Mrs. Jellyby,” bizarre items such as “Being unmarried (and male) above the age of 30,” and “Venture capital,” but, oddly enough, not “Crypto” itself) made me think that smart people are overrated. Let me put it this way: if you’re a smart astrologer, you’re still not gonna be able to do “real” astrology, which doesn’t exist. To say it slightly differently: it’s easy to promise things, especially if you have a good rep; you have to be careful not to promise things you can’t deliver. It doesn’t matter how smart James Watson’s friend was; he didn’t have that promised cancer cure in two years.

As the saying goes: Saying it don’t make it so. I could go around telling the world I had a solution to all the problems of MRP, and some people might believe me for awhile—but I don’t have such a solution.

I can see how Cowen in his above-linked post doesn’t want to believe that crypto is fundamentally flawed—and maybe he’s right that it’s a great thing, it’s not like I’m an expert—but it’s funny that he doesn’t even consider that it might be a problem, given the scandal he was writing about.

All this got me thinking: in what fields of endeavor does it matter that you’re just B.S.-ing, and in what fields can you get away with it?

Sports: Chess cheating aside, if you don’t got it, you don’t got it. Public relations can get you endorsement contracts but not the coveted W. Yes, you can get lucky, but at the highest levels, only the best players can get lucky enough to win.

Science: You can have a successful scientific career based on a deft combination of B.S., promotion, and academic politics—just ask Trofim Lysenko, or Robert Sternberg—but you won’t be producing successful science. That said, you can do good science even with terrible theories: as I like to say, often the house is stronger than its foundations. I’ve heard that Isaac Newton learned a few real things even while trying in vain to convert lead into gold, and, at a much lower level, my colleagues and I have had spinoff successes from some of our overly-ambitious statistical failures.

Literature: Here, being smart, or inspired, will do the trick. Consider Philip K. Dick, who believed all sorts of weird things which he transmuted into wonderful literature.

Finance: This one’s somewhere in between. With a good line of B.S. you can do well for a long time, even until the end of your life (for example, Jack Welch); other times you’ll get caught out, as with the recent crypto scandal.

Often I think of this great line that Craig delivered to Phil one day in high school. They were arguing about something, and Craig concluded with, “You may be winning the argument, but that doesn’t mean you’re right. It just means you’re better at arguing.”

That was a good point. A good debater can persuade with a bad position. That doesn’t suddenly make the position correct. And sometimes it can be a bad thing to be too good a debater, or to be too insulated—personally or financially—from people who can present the opposite view. As discussed above, it depends on what field you’re working in.

The good news is that the pound is collapsing so this particular publishing scam is not as expensive as you might think!

A colleague received the following email:

*From:* ** <**@**>
*Date:* November 11, 2022 at 6:43:40 AM EST
*To:* **
*Subject:* *A podcast about – ***

Dear Dr. **,

My name is ** with **, based in the UK. I hope you do not mind my sending you a brief email referencing some of your recent work.

I’m a creative manager for **.org – a new podcast service for the research community that works with researchers, to connect their work to peers and a broad, engaged online audience.

** improves research impact and AltMetric scores. When it comes to improving impact and dissemination, podcasts released with a new publication or as part of a projects knowledge transfer strategy.

I believe our productions could have a real benefit to your work either now, or in the future.

**’s mission is covered in detail on our website:

https://www.**.org

We provide lots of information about how we work, the cost and benefits of working with us.You can also find past episodes and interviews we’ve conducted on topics ranging from heart surgery to media analysis.

Our Podcast offering is very straightforward, and requires only a little of your time:

– We request background details of the work to be covered within the podcast
– Our scientific script writers shall then create a 1500 word script based on this work, in a language the broader audience can understand and connect with
– We send the script back to you, for your approval
– We then edit, polish and record the podcast – using a range of professional voice actors
– The podcast is then released across the worlds leading podcast distributors
– We promote this content to a huge global audience, whilst linking the podcast release back to your most recent work
– You receive an Impact Report, breaking down listenership data, and how that’s translated into AltMetric/other impact scores

Would you consider this sort of service for your paper **?

As you can see, the production process is time-effective, yet professional and impactful.

Could we discuss the possibility of developing a Podcast to better help promote your ongoing work?

I would be delighted to discuss the process, along with the benefits a Podcast production will bring.

Thanks for your time, and I look forward to speaking again with you soon.

Best wishes,

**
Partnership Executive
**

T: +44 (0) **
W: https://www.**.org
E: **

IMPORTANT INFORMATION

This email may contain information that is privileged, confidential or otherwise protected from disclosure. It may not be used by, or its contents copied or disclosed to, persons other than the address(ees). If you have received this email in error please notify the sender immediately and delete the email.

Whilst ** has taken every reasonable precaution to minimise the risk of viruses, we cannot accept liability for any damage that is sustained as a result of software viruses which may be contained in this email.

If you do not wish to receive emails from ** in the future please reply to this email with ‘remove’ in the subject line.

Another colleague replied:

They forgot to mention the cost in the email:
£445 if they read the transcript and £990 if they interview you.

Oh, yeah, also the “some of my recent work” that they referred to is super-technical and could never possibly be the subject of a podcast. I guess this company is just scraping Arxiv.

I love that “Whilst” think near the end, though. There’s something about the British language that just seems so classy!

The interaction between predatory journals and mainstream social science

From a recent post at Retraction Watch:

Authors should be very much aware of all aspects of publication ethics, which, despite their importance and career-threatening consequences, are rarely taught in any depth at even the most research-intensive universities. However, even if adequate training were given to all postgraduates as potential authors, many would still fall for predatory scams and may even be alerted to the attractiveness of guaranteed publication in a matter of days for just a few hundred dollars. . . .

The “just a few hundred dollars” thing reminded me of the ludicrously innumerate claim that scientific citations are worth $100,000 each. That statement (literally, “It’s possible to put actual monetary value on each citation a paper receives. We can, in other words calculate exactly how much a single citation is worth. . . . in the United States each citation is worth a whopping $100,000.”) was made by a mainstream social scientist—a professor at a legit U.S. university who has over 200,000 citations on Google scholar.

If you were to literally believe the claim from that renowned scientist, then, yeah, a few hundred bucks for a publication is an absolute bargain. Of course that $100,000 number is a joke, good enough for a Ted talk or an NPR appearance or an article in PNAS but not serious science.

My point in juxtaposing these items is to point out the way in which the mainstream social science establishment provides intellectual cover, as it were, for predatory publishing.

Journalist and historian vs. economists: Is the 2022 economics Nobel prize well deserved or awful?

I don’t know anything about this at all; it’s just interesting to see some controversy:

Economist Paul Krugman: “a well-deserved Nobel that unfortunately remains relevant.”

Economist Tyler Cowen: “Ben is a broad and impressive thinker and researcher. This prize is obviously deserved. In my admittedly unorthodox opinion, his most important work is historical and on the Great Depression.”

Journalist John Authers: “Reactions so far suggest this cd be the most unpopular Nobel since Kissinger won the Peace Prize.”

Historian Adam Tooze: “It has the effrontery actually to celebrate one of the weakest dimensions of modern macroeconomic thinking . . . Not only is the award a gratuitous exercise in self congratulation on the part of a discipline that has huge difficulty in shoe-horning reality into its models. It is also untimely.”

“But shouldn’t we prefer these outside delusions . . .”: Malcolm Gladwell in a nutshell

I was reading a recent New Yorker and what should I come to but a Malcolm Gladwell article. With the same spirit that leads us to gawk at car crashes, I read it.

I gotta give Gladwell some credit for misdirection on this one. It was an article about corporate executive and financial fraudster Jack Welch, and in the magazine’s the table of contents the article was listed as, “General Electric’s legendary C.E.O.” The article itself was titled, “Severance: Jack Welch was the most admired C.E.O. of his day. Have we learned the right lessons from him.” And, right near the beginning of the article is the Gladwellesque line, “The great C.E.O.’s have an instinct for where to turn in a crisis, and Welch knew whom to call.”

“The great C.E.O.’s” . . . nice one! But then, as the article goes on, Gladwell ultimately gives it an anti-Welch spin, arguing that the famously ruthless executive had no values. An interesting twist, actually. As I said, a nice bit of misdirection, which the New Yorker kinda ruined in its online edition by changing the title to, “Was Jack Welch the Greatest C.E.O. of His Day—or the Worst? As the head of General Electric, he fired people in vast numbers and turned the manufacturing behemoth into a financial house of cards. Why was he so revered?” Kind of gives the game away, no?

“But shouldn’t we prefer these outside delusions . . .”

What really jumped out at me when reading this article, though, was not the details about Welch—some guy who had a talent for corporate infighting and was willing to cheat to get what he wanted—but this bit from Gladwell:

It has become fashionable to deride today’s tech C.E.O.s for their grandiose ambitions: colonizing Mars, curing all human disease, digging a world-class tunnel. But shouldn’t we prefer these outsized delusions to the moral impoverishment of Welch’s era?

This is horrible in so many ways.

First, there’s the empty, tacky, “It has become fashionable” framing. I can just imagine this dude when Copernicus came out with his ideas. “It has become fashionable to say that the Earth goes around the Sun. But . . .” Or, during the mid-twentieth century, “It has become fashionable to claim that cigarette smoking causes cancer. But . . .” Or, more recently, “It has become fashionable to claim that university officials should take responsibility when children are being sexually abused on campus. But . . .” Or, “It has become fashionable to argue that planes take off into the wind. But . . .”

I absolutely detest when writers take an idea they disagree with and label it as “fashionable,” as if it makes them adorable rogues to take the other side. What next, a hot take that Knives Out was really a bad movie? After all, it sold a lot of tickets and the critics loved it. It’s “fashionable,” right? Let me, right here, stake out the contrarian position that a take can be unfashionable, contrarian, and dumb.

And then the three examples: “colonizing Mars, curing all human disease, digging a world-class tunnel.” Which one does not belong, huh?

– “Colonizing Mars” may never happen, and it might be a bad idea even if it could happen (to the extent that such a hugely expensive project would take resources away from more pressing concerns), but it’s undeniably cool, and it’s bold. OK, the concept of colonizing Mars isn’t so bold—it’s century-old science fiction—but to actually do it, yeah, that would be awesome.

– “Curing all human disease”: that would be absolutely wonderful. I can only assume it’s an impossible goal, but it would be amazing to get just part of the way there, and there’s no logical reason that some progress can’t me made. I can see how this would appeal to a tech C.E.O., or to just about anyone.

– “Digging a world-class tunnel” . . . Huh? That’s not much of an ambition at all! World-class tunnels already exist! It’s hardly an “outsized delusion” to want to do this. All you need is a pile of money and a right-of-way. But . . . when referring to a “world-class tunnel” Gladwell couldn’t possibly be referring to this public relations stunt, could he?

Anyway, kind of revealing that he puts digging a tunnel in the same category as colonizing Mars or curing all human disease. I guess those Hyperloop press releases really worked on him!

In any case, the idea that “outsized delusions” are a good thing: it’s just kinda funny to hear this, but maybe not such a surprise coming from Gladwell. I was curious on his take on other executives with outsized delusions so I googled *gladwell theranos* and came across this interview where he answers a question about “The book I couldn’t finish”:

I [Gladwell] don’t finish books all the time. But the last book I couldn’t finish? I really, really wanted to finish John Carreyrou’s book on the Theranos scam, Bad Blood. But halfway through, I started saying to myself: “I get it! I get it! She made it all up!”

“Halfway through,” huh? I think all the other readers of that book caught on in the first few pages what was going on.

Gladwell sounds like the kind of guy who turns off the Columbo episode after 45 minutes because he’s finally figured out who the killer is.

How good are election prediction markets?

Forecasting elections using prediction markets has a theoretical appeal, as people are betting their own money so are motivated to get things right. On the other hand there’s been concern about the thinness of the market, especially early on during the campaign. Thin markets can be easier to manipulate, also when there’s not much action on the bets, the betting odds can be noisy. There’s also a concern that bettors just follow the news, in which case the betting odds are just a kind of noisy news aggregator.

Ultimately the question of the accuracy of betting odds compared to fundamentals-based and polls-based forecasts is an empirical question. Or, to put it another way, the results of empirical analysis will inform our theoretical understanding.

A quick summary of my understanding of past empirical work on election prediction markets:

1. For major races, markets are not perfect but they generally give reasonable results.

2. Markets fail at edge cases, consistently giving unrealistically high probabilities of extremely unlikely events.

3. It’s difficult-to-impossible to compare different forecasting approaches because the uncertainties in the outcomes of different races in a national election are highly correlated; in this sense, a national election is giving you only one data point for evaluation.

The best thing I’ve read on the topic is this article by Rajiv Sethi et al., “Models, Markets, and Prediction Performance.”

What happened in 2022?

The recent election featured some strong pre-election hype on markets, along with the usual poll-based forecasts. This time the polls were very accurate on average, while the markets were a bit off, predicting a Republican wave that did not happen. I’d be inclined to attribute this to bettors following the conventional wisdom that there would be strong swing toward the out-party, which ultimately is a “fundamentals”-based argument that made a lot of sense a year ago or even six months ago but not so much in the current political environment with a highly partisan Supreme Court.

But I wanted to know what the experts thought, so I contacted two colleagues who study elections and prediction markets and sent them the following message:

Here are 4 stories for what happened in 2022:

1. Just bad luck. You can’t evaluate a prediction based on one random data point.

2. Overreaction to 2020. Polls overstated Democratic strength in the past election and bettors, like many journalists, did a mental adjustment and shifted the polls back.

3. Bettors in prediction markets have a distorted view of the world, on average, because they are more likely to consume conservative media sources such as Fox, 4chan, etc.

4. Prediction markets don’t really add information; bettors are regurgitating what they read in the news, and in 2022 the news media pundits were off.

What do you think?

The experts reply

David Rothschild:

– The comment section in PredictIt is dominated by the right, politically. Obviously comments are just a portion of traders, but this likely has some effect on some traders. This is especially important because PredictIt is constrained in trade size per person, so in some markets the price looks a little closer to the average trader than the marginal trader (i.e., very confident person cannot swoop in and correct biased market).

– Markets tend to converge towards polls late in the cycle, so while they provide information early in cycle and when information is breaking, final predictions in elections are heavily influenced by polls.

– Markets proved extremely good, faster than anything else, in incorporating information on Election Night.

Rajiv Sethi:

– Markets actually do very well early in the cycle, for example they had Dobbs beating Lake for AZ GOV in early August. But anecdotes like this aren’t evidence. I also feel that the reaction to the PA debate on social media and the markets was absurd – I must have seen hundreds of tweets saying that the Fetterman staff and family should never have allowed him to debate, etc. But he was perfectly capable of making the decision himself, and made a good call, that most people saw as courageous. But aside from PA I think the markets didn’t do badly, just the roll of the dice made them look bad this cycle.

From market fundamentalism to conspiracy theories

I was thinking about some of the above topics after reading this post by statistician Harry Crane, which ran a few days before the November 2022 elections:

When something doesn’t fit the official narrative: Regulate, legislate, or censor it out of existence. . . . It’s central to the Establishment’s strategy on a wide range of issues. Start looking and you’ll start to notice it just about everywhere. Here I focus on how it applies to elections. . . .

Who’s going to win the 2022 midterms? Specifically, which party will win control of the Senate?

According to the polls and pundits in the media, Democrats have the advantage. Voters are upset about Roe v. Wade. Democrats were 75% to win the Senate a couple weeks ago. Now it’s a toss up according to the forecasting website FiveThirtyEight.com.

But if you look at the prediction markets hosted at PredictIt — where savvy politicos risk real money on their opinions – you’ll see that the Republicans are 90% to win the House and 72% to win the Senate. . . .

So which is more accurate? As you’d probably expect, the markets are. . . .

Crane talked about how the prediction markets were favoring the Republican candidate from Jersey Pennsylvania:

Within a few moments after Fetterman opened his mouth for the first time [in their televised debate], Oz shot up to 65% and stayed near that price for the rest of the debate and the week following . . . FiveThirtyEight has Fetterman leading the entire time. We’ll know in a week or so which was more accurate at predicting the Pennsylvania Senatorial race outcome.

We’ll know in a week, indeed. Seriously, though, for the reasons discussed earlier in this post, we shouldn’t take one year’s failure as a reason to discard markets; we should just recognize that markets are human constructs which, for both theoretical and practical reasons, can have systematic errors.

And then Crane went all-in on the conspiracy theorizing, with an image of “Thought Police” and the following text:

When I gave this example [the Pennsylvania Senate race] in a recent interview about the upcoming election, the reporter was disturbed. The interview concluded shortly thereafter. The article was never written.

These markets pose an existential threat to legacy media . . . controlling the narrative before an election is integral to controlling what happens afterwards. Could this be why the media and current administration are putting extra effort to destroy all credible alternatives to biased polling?

When someone is so committed to an idea that he posits conspiracy theories . . . that’s not good.

I conjecture that some of this represents a theoretical misunderstanding on Crane’s part, a bit of what’s called market fundamentalism, a lack of appreciation for the complexity of information flow and behavior. It’s complicated, because if you read his post, Crane is not saying that he knows that markets are better. He says that markets do better empirically, but as discussed above we don’t really have so many data points to assess that claim. So calling him a “fundamentalist” is a bit too strong. I guess it would be more accurate to say that Crane overstates the evidence in favor of the performance of betting markets, he avoids looking at their problems, and then this puts him in a position of explaining the lack of dominance of markets in election forecasting by positing malevolent forces that suppress markets, rather than considering the alternative explanation that people are concerned about market manipulation (a topic as relevant nowadays as it’s ever been).

You might ask why discuss that post at all. Short answer is no, I wasn’t looking to be trolled, nor was I searching the web for the most extreme thing posted by a statistician that week. Given that tens of millions of Americans believe outlandish conspiracy theories, it’s no surprise that the some statistics professors are open to these ideas. I’d guess you could find quite a few believers in ghosts among the profession too, and even the occasional adherent of the hypothesis that smoking does not cause cancer.

Crane’s post interested me not so much for its conspiracy theorizing as much as for its ideological take on prediction markets. Crane loves prediction markets the way I love that Jamaican beef patty place and the way someone I know loves Knives Out. These are topics we just can’t stop talking about.

But let’s unpack this for a moment.

A prediction market, like any human institution, can be viewed as a practical instantiation of a theoretical ideal. For statisticians and economists, I think the starting point of the appeal of prediction markets comes from the theory. Betting is probability come to life, and betting on many related events induces a multivariate distribution. Real-life betting markets are too thin, too noisy, and have too many biases for this derive-the-distribution idea to really work, but it’s cool in theory. Indeed, even at the theoretical level you can’t be assured of extracting probabilities from markets, given possibilities such as insider trading and feedback. Anyway, seeing a post from someone who is such an extreme prediction-market fan gives us some sense of the appeal of these markets, at least for some segment of the technically-minded population.

Summary

My own views on prediction markets are mixed.

I like that there are election prediction markets and I get the points that Rothschild and Sethi make above about their value, especially when incorporating breaking news.

From the other direction, I would paradoxically say that I like markets to the extent that the bettors are doing little more than efficiently summarizing the news. I wouldn’t be so happy if market players are taking advantage of inside information; or using the markets to manipulate expectations; or, even worse, throwing elections in order to make money. I’m not saying that all these things are being done; I’m just wary of justifications of election markets that claim that bettors are adding information to the system. Efficient aggregation of public information would be enough.

I do like the idea of prediction markets for scientific replication because, why not? For me, it’s not so much about people “putting their money where their mouth is” but rather a way to get some quantification of replication uncertainty, in a world where Harvard professors are flooding the zone with B.S.

At the other extreme, no, I don’t favor the idea of a government-sponsored prediction market on terrorism run by an actual terrorist. In the abstract, I’m in favor of the rehabilitation of convicted criminals, but I have my limits.

Prediction markets are worth thinking about, and we should understand them in different contexts, not just as competition with the polls or as some idealized vision of free markets.

In research, when should you tie yourself to the mast?

In statistics, there are two things we know not to do:

1. Keep screwing around with your data and analysis until you get the answer that you want. This is called p-hacking or researcher degrees of freedom or forking paths, and it’s a known strategy for getting government grants, papers in PNAS, keynote talks at psychology conferences, Ted talks, NPR appearances, and . . . unreplicable results.

2. Use a flawed model and ride it all the way down to the inferno, never letting go of the reins even when the problems are obvious. That way lies the madness that is regression discontinuity analysis, so-called unbiased estimation, and, to be fair, Bayesian inference with really bad priors.

These came up in recent discussions of forecasting the recent congressional elections:

1. News organizations have been criticized for searching for data of any sort that would support their expectations of a Republican wave. In this election, tying yourself to the mast of the polls would’ve worked well, and “researcher degrees of freedom” allowed lots of the news media to simply reify their “vibes.”

2. From the other direction, some news organizations tied their hands too much by including problematic polls, most notably there was fivethirtyeight.com which was reluctant to let go of the notoriously undocumented Trafalgar polls. Not allowing yourself to change your analysis in midstream prevents some forms of p-hacking or the equivalent, but at the cost of allowing clear problems to just sit there in your analysis.

No easy answers

There’s no easy answer here, and it’s something that Elliott, Merlin, and I had to deal with, back in 2020 when we found problems in our forecasting model—in the middle of the campaign, after our model’s first predictions had already been released. We bit the bullet and made some changes, deciding that, in this case, the problems with data-based model alteration were less than the problem of letting major known problems fester. Later on we found problems and more problems with our model but did not change it, not so much out of concern for forking paths as because fixing is itself not automatic and could introduce new issues.

We found a completely different set of problems for the fivethirtyeight.com forecast that year, and, as far as I know, they didn’t change their model either, a decision to stand pat which made sense for them for the same reason it made sense for us. Changing a model because it makes some bad predictions is a bit like changing a recipe if the dish doesn’t taste quite right: it can take a lot of trial and error, and if you’re not careful, the new version can be worse, so this sort of adjustment is not something you want to be doing in real time.

Statistician culture, journalism culture, and economist culture

It’s my impression that people in different fields weigh these concerns differently. In statistics we are typically concerned about fitting the data, and we’ll try out all sorts of diagnostics and model adjustments. Journalists tend to be even more flexible—for them, it’s all about the vibes!

Economists fall at the other extreme: they’re very aware of the problem of “specification searches” and they also tend to overrate naive theory (“unbiasedness“); this combination leads them to avoid touching their data even in cases when there are obvious problems (as in those curvy regression discontinuity fits that keep turning up).

We discussed another one of these examples a few years ago, comparing old-school baseball analysts Bill James and Pete Palmer. Palmer set up his formula and ran with it, whereas James followed a better (in my opinion) approach of fitting his models, taking them seriously, and then carefully considering the cases where the inferences from his models didn’t seem to make sense. Sometimes in those settings he’d stick with the model and point to reasons why natural intuitions were unfounded; other times he’d change his modeling approach.

Another area where this comes up is meta-analysis, where it’s just standard practice to include all sorts of irrelevant crap. When making chili, if you include enough different flavors, you can end up with something delicious. I don’t think this works with scientific research summaries. Two notorious examples are the ivermectin meta-analysis and the “nudge” meta-analysis that included 11 papers by the disgraced food-behavior researcher Brian Wansink. “Include everything you can find” might seem like a recipe for rigorous, unbiased science, but it doesn’t really work that way if the individual ingredients are spoiled.

There’s no clear rule for when to accept the inferences and when to question them, but it’s part of the scientific process—just ask Lakatos.

P.S. In a recent post, Nate defends his inclusion of the notorious Trafalgar poll by saying:

It’s not quite a matter of me “taking things into consideration”. The pollster ratings are determined by a formula, not my subjective view of how much I like a pollster. But since Trafalgar had an awful 2022, they’re going to do much worse once the ratings are recalculated.

That’s fine, but, really, his decision to include Trafalgar in the first place is “subjective,” as is his choice of formula, as is his decision to use a formula here in the first place. It’s turtles all the way down, dude.

To put it another way: Yes, you can choose to tie yourself to the mast, but, if so, you’ve chosen to tie yourself to the mast, and other choices are possible—as becomes clear when you consider all the masts you’ve untied yourself from as necessary.

Which one of these will be the biggest “unicorn” failure ever?

Jeffrey Lee Funk and Gary Smith have a list of startups with $3 bllion or more in cumulative losses:

Company                 Founded	 Funds Raised	 Cumulative Losses
Uber Technologies	2009	 $25.2 billion	 $31.7 billion
WeWork	                2010	 $21.9 billion	 $20.7 billion
Teladoc Health   	2002	  $0.17 billion	 $11.2 billion
Rivian Automotive	2009	 $10.7 billion	 $11.1 billion
Snap            	2011	  $4.9 billion	  $9.1 billion
Lyft             	2012	  $4.9 billion	  $8.9 billion
Airbnb          	2008      $6.0 billion	  $6.0 billion
Palantir Technologies	2003      $3.0 billion	  $5.8 billion 
Gingko Bioworks 	2009      $0.8 billion	  $4.8 billion
Door Dash       	2013      $2.5 billion	  $4.6 billion
Invitae         	2010      $2.0 billion	  $4.4 billion
Nutanix         	2009      $1.1 billion	  $4.3 billion
RobinHood       	2013      $6.2 billion	  $4.2 billion
Bloom Energy    	2001      $0.83 billion   $3.3 billion
Wayfair         	2002      $1.7 billion 	  $3.0 billion

Funk and Smith write:

Only one of these 15 companies has ever had a profitable quarter — Airbnb had a $378 million profit on $2.1 billion in revenue in the second quarter of 2022. All of the other startups in the table have recent losses that exceed 10% of revenue and most exceed 30%.

Any hopeful arguments that profitability is just around the corner ring hollow when every company is at least nine years old and two are more than 20 years old.

As we said in our post, Theranos built a unicorn, and we just built a better horse. You can get more money for a unicorn, even though—or especially because—unicorns don’t exist,

There’s something weird about the whole “unicorns” thing, just as it’s weird when people refer to some idea they’re promoting as being “magic.” I get that expressions such as “unicorns” and “magic” are supposed to be metaphors, but all so often they come uncomfortably close to the truth in that they’re describing stories that are not, in fact, real.

Kind of like when people use the term “incredible” for stories that are literally not credible.

A more positive way of looking at a so-called unicorn is that’s it’s a gamble. Someone has an unproven idea and maybe it just has a 1% chance of succeeding, but if it does succeed it might make it big. So it can be rational to invest in it.

But then here’s the thing—it there really is just a 1% chance of succeeding, then the vast majority of these should . . . fail! The first part of “high risk, high reward” is “high risk.” From that standpoint, what’s impressive is that the unicorns on the above list are still standing. I guess that Funk and Smith are saying is that we should think of these as companies that haven’t failed . . . yet.

I’ve always assumed that the business plan of some of these companies was to become so useful to the decision-making class (business executives, government officials, and rich people) that they’d ultimately get some sort of government support.

History of time series forecasting competitions

Here are a couple of posts from Rob Hyndman in 2017 and 2018 that remain interesting, and not just for time series:

M4 Forecasting Competition:

The “M” competitions organized by Spyros Makridakis have had an enormous influence on the field of forecasting. They focused attention on what models produced good forecasts, rather than on the mathematical properties of those models. . . .

Makridakis & Hibon, (JRSSA 1979) was the first serious attempt at a large empirical evaluation of forecast methods. It created heated discussion, and was followed by the M-competition comprising 1001 time series that participants were invited to forecast. The results were published in Makridakis et al (JF 1982)

The M2 competition focused on different issues and involved a much smaller data set, but with richer contextual information about each series.

Twenty years after the first competition, the M3 competition was held, involving 3003 time series. . . .

Now, almost 20 years later again, Makridakis is organizing the M4 competition. Details are available at https://mofc.unic.ac.cy/m4/. . . . I [Hyndman] am pleased to see that this new competition involves two additions to the previous ones . . . It does not appear that there will be multiple submissions allowed over time, with a leaderboard tracking progress (as there is, for example, in a Kaggle competition). This is unfortunate, as this element of a competition seems to lead to much better results. See my paper on The value of feedback in forecasting competitions with George Athanasopoulos for a discussion. . . .

A brief history of time series forecasting competitions:

Prediction competitions are now so widespread that it is often forgotten how controversial they were when first held, and how influential they have been over the years. . . . The earliest non-trivial study of time series forecast accuracy was probably by David Reid as part of his PhD at the University of Nottingham (1969). Building on his work, Paul Newbold and Clive Granger conducted a study of forecast accuracy involving 106 time series . . .

Five years later, Spyros Makridakis and Michèle Hibon put together a collection of 111 time series and compared many more forecasting methods. They also presented the results to the Royal Statistical Society. The resulting JRSSA (1979) paper seems to have caused quite a stir, and the discussion published along with the paper is entertaining, and at times somewhat shocking. . . .

Maurice Priestley was in attendance again and was clinging to the view that there was a true model waiting to be discovered:

The performance of any particular technique when applied to a particular series depends essentially on (a) the model which the series obeys; (b) our ability to identify and fit this model correctly and (c) the criterion chosen to measure the forecasting accuracy.

Makridakis and Hibon replied:

There is a fact that Professor Priestley must accept: empirical evidence is in disagreement with his theoretical arguments.

Many of the discussants seem to have been enamoured with ARIMA models.

It is amazing to me, however, that after all this exercise in identifying models, transforming and so on, that the autoregressive moving averages come out so badly. I wonder whether it might be partly due to the authors not using the backwards forecasting approach to obtain the initial errors. — W.G. Gilchrist

I find it hard to believe that Box-Jenkins, if properly applied, can actually be worse than so many of the simple methods — Chris Chatfield

Then Chatfield got personal:

Why do empirical studies sometimes give different answers? It may depend on the selected sample of time series, but I suspect it is more likely to depend on the skill of the analyst . . . these authors are more at home with simple procedures than with Box-Jenkins. — Chris Chatfield

Again, Makridakis & Hibon responded:

Dr Chatfield expresses some personal views about the first author . . . It might be useful for Dr Chatfield to read some of the psychological literature quoted in the main paper, and he can then learn a little more about biases and how they affect prior probabilities.

Snap!

Hyndman continues:

In response to the hostility and charge of incompetence, Makridakis & Hibon followed up with a new competition involving 1001 series. This time, anyone could submit forecasts, making this the first true forecasting competition as far as I am aware. They also used multiple forecast measures to determine the most accurate method.

The 1001 time series were taken from demography, industry and economics, and ranged in length between 9 and 132 observations. All the data were either non-seasonal (e.g., annual), quarterly or monthly. Curiously, all the data were positive, which made it possibly to compute mean absolute percentage errors, but was not really reflective of the population of real data.

The results of their 1979 paper were largely confirmed. The four main findings (taken from Makridakis & Hibon, 2000) were:

1. Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.

2. The relative ranking of the performance of the various methods varies according to the accuracy measure being used.

3. The accuracy when various methods are being combined outperforms, on average, the individual methods being combined and does very well in comparison to other methods.

4. The accuracy of the various methods depends upon the length of the forecasting horizon involved.

The paper describing the competition (Makridakis et al, JF, 1982) had a profound effect on forecasting research. It caused researchers to:

– focus attention on what models produced good forecasts, rather than on the mathematical properties of those models;

– consider how to automate forecasting methods;

– be aware of the dangers of over-fitting;

– treat forecasting as a different problem from time series analysis.

These now seem like common-sense to forecasters, but they were revolutionary ideas in 1982.

I don’t quite understand the bit about treating forecasting as a different problem from time series analysis. They sound like the same thing to me!

In any case, both of these posts by Hyndman were interesting: lots of stuff there that I haven’t every really thought hard about.

The discount cost paradox?

There must be some econ literature on this one . . .

Joseph Delaney writes that telehealth has been proposed as a solution to ER wait times, but it hasn’t worked so well in practice. Among other things, there are problems with call-center culture:

It made me [Delaney] think of the time this month that there were reports of the 911 number in Toronto asking for call back numbers after a medical emergency. Even if overstated, it really does bring to light the key problem with telehealth—that call center culture is famously customer-hostile.

A number of years back I had a problem with my cable company. Like many foolish persons, I called the cable company and spent 2 hours on hold. After I was told that nothing could be done, I asked if there was anybody I could speak with that had more authority to deal with the issues. I was then placed on hold again. Several hours later a message played saying that the call center was closing and disconnected me. This was an infuriating experience and there was simply no accountability even possible. So the next day I made the long trek to the customer service center, waited in line for about an hour, and then had the problem actually fixed. No part of this experience made me like the company more, but the call experience was terrible.

Recently, I have been constantly hearing “call volumes are unexpectedly high” recordings every time I call a place like a bank or the University travel agent. As a person who once worked in customer service telemarketing call volume forecasting, I even tried times and days that are notoriously light for call volume. No luck.

So the central challenge of telehealth is how to break with the cost-cutting culture that values customer wait times at zero (or even seems to see them as a good thing). You can only redirect from the Emergency Room via telehealth triage if it is relatively quick (let us say an hour, maximum). Because you get no triage credit at the ER for having called telehealth, so if the answer is “go to the ER” but you have lost 4 hours on the phone then that is going to quickly teach everyone not to call telehealth lines.

With pediatric ERs reporting wait times as long as 15 hours, you can see the value of telehealth if it can keep children out of the queue and free up capacity. But that really requires that it be agile (why wait 4 hours as a prelude to waiting 15) and able to do things like prescribe. I know that RSV is an atypically severe phase, but at some point the default needs to be that there are a lot of respiratory viruses running around and we should plan around that.

That gets me to my last pet peeve about telemedicine, which that you need to be able to provide helpful interventions. In a recent covid burst, I had a family member use a telemedicine provider to ask about paxlovid only to be told that it could not be prescribed by phone but that it required an in-person visit. Yes, the plan really was for the infectious person to sit in the waiting room of a walk-in clinic for hours so that the prescription could be written by a person able to see the patient. Now, whether or not treating covid with paxlovid was a good idea is a different question but the issue was that these policies make calling first seem like a bad plan, as you waited hours for an appointment making it much less likely that you can successfully get seen at a walk-in clinic with a time-sensitive health issue.

Which is the opposite of what you want people to do, frankly.

Without solving these cultural issues of how we treat in-calls and how we treat patients, we are not going to be able to really move the demand side for ERs.

Beyond the special difficulties of medicine, this all reminded me of a well-known problem with parole and probation. Given modern technology, these should be more effective than prison and much much cheaper, but they get such low funding that parole and probation officers are overwhelmed. It’s a disturbing paradox that when an alternative solution is cheaper, it gets underfunded so as to not be effective.

That all said, my own health plan’s telehealth system has been working well for me. So I think it can work well if the financial incentives are in the right place.

With the prison thing, I guess the problem is that there are political incentives to spend lots of $ on prisons, not such incentives for parole/probation. Also there may be legal reasons why it’s easy to lock people up but not so easy to implement effective parole/probation.

What do economists say about this?

So here’s my question. Is this a general thing? The idea that when there’s a cheaper and more effective solution out there, it gets done too cheaply and then it’s no longer effective? This would seem to violate some Pareto-like principles of economics, but it happens often enough that I’m thinking there must be some general explanation for it?

Update on the fake story about the river laborers paying people to whip them

As you might remember from a few months ago, there was a story going around that some economists just looooved to tell. The story had all sorts of attributes that you might expect would make economists happy, including a paradox in which apparently bad behavior (whipping people to get them to work harder) was actually good, a subplot in which a do-gooder from the outside just doesn’t understand the real world, and an ethnic slur! What more could you possibly ask for?

Well, you could ask that, if the story is being reported as being true, that there was any evidence it actually happened.

Below are a few versions of the story.

From Stephen Cheung in 2018:

In 1970, Toronto’s John McManus was my guest in Seattle. I chatted to him about what happened when I was a refugee in wartime Guangxi. The journey from Liuzhou to Guiping was by river, and there were men on the banks whose job was to drag the boat with ropes. There was also an overseer armed with a whip. According to my mother, the whipper was hired to do just that by the boatmen! My tale went the rounds, and it was seized by a number of neo-institutional economists. . . . However, this could be a story invented by my mother – the smartest person I have ever known – to entertain a boy of seven!

From Michael Munger in 2018:

There’s a famous example in China, where a group of coolies … have to pull a barge up the Yangtze River … There’s a trade-off … how do you make the 30 guys work hard? The insight of the team production problem is we need … division of labor. … If I’m pulling, I can’t spend my time watching you and you can’t spend your time watching me. We’ll create a new job, called the monitor. … We’ll give the monitor a whip. Now this looks like slavery. The great thing about this, and this is from an article by an economist named Steven NS Cheung. He found that this guy with a whip—and this is the most incredible thing Russ!—this guy with a whip was hired and paid by the coolies!

Interesting how, when a story is described as “incredible,” it often is literally not credible.

Clement and McCormick (1989):

[the] famous Chinese boatpullers fable [where] the monitor uses his vision, intuition, and experience to determine shirking, counseling the loafers with his whip.

Super-clever and counterintuitive to describe whipping as “counseling”: that’s the kind of outside-the-box thinking that will take you far in academia.

Supreet Kaur, Michael Kremer, and Sendhil Mullainathan (2015):

. . . in a story by Steven Cheung (1983, 8): “On a boat trip up China’s Yangtze River in the 19th Century, a titled English woman complained to her host of the cruelty to the oarsmen. One burly coolie stood over the rowers with a whip, making sure there were no laggards. Her host explained that the boat was jointly owned by the oarsmen, and that they hired the man responsible for flogging.”

This one’s notable in that it puts an entire passage in quotes, but it turns out the passage was not in Cheung (1983) at all—not on page 8 or anywhere else in that article—it actually came from a collection from 1998 of old Harvard Business School exam questions.

It’s an awesome transposition in that it adds the “titled English woman,” who is a stand-in for all those soft-headed non-economists out there who aren’t willing to think the unthinkable etc., and also it changes various details, for example the people pulling the boat became oarsman, and of course the word “coolie” was introduced. I guess he had to be “burly,” what with all the flogging he had to do! In a particularly modern touch, the host was there to “explain” the principles of Econ 101 to the silly lady.

And what about the woman being “titled”—that particular elaboration seems gratuitous, no? But it fits in well with economists’ view of themselves as tribunes of the people, consumer sovereignty and all that. The story wouldn’t work so well if the person who “complained” was herself poor and maybe some had experience with physical pain herself.

On the plus side, Kaur et al. issued a correction:

While the incorrect quote also appears in other earlier sources, it does not appear in Cheung’s original article. . . . The inaccurate quote was included simply as a way to illustrate the idea that joint production might necessitate the need for monitoring. . . . However, the quote is in no way central to the core point of the paper, or even for the discussion in section VI of the paper. . . . this incorrect quote can be omitted from the paper without any impact on the substance of the paper.

I think they’re right—the paper would’ve been just fine without the quote—it’s just funny that 3 authors and some number of reviewers and journal editors all read the paper, and none of them noticed how ridiculous the story was.

And just one more thing. Let’s accept that the substance of the paper is unaffected by the whipped-boatmen story. But, the fact that many economists have presented the story as true, even though it’s actually the result of a series of wild embellishments from an initially speculative source . . . that’s interesting, right? I don’t have it in me to write an article on the topic that would pass muster at the Journal of Political Economy (or any journal at all!), but I think there’s something there!

To put it another way, if the whipped-boatmen story is so consistent with the substantive message of the paper that it was included as an example to demonstrate the theory,(as the authors put it in their rejoinder, “The inaccurate quote was included simply as a way to illustrate the idea that joint production might necessitate the need for monitoring.”), and then it turns out the story is false, then maybe this should cast some doubt on said theory? Maybe joint production might not necessitate the need for monitoring as often as they think? At least not in the sense of “counseling the loafers with his whip”?

You can read for free but comments cost money . . . or is it the other way around?

A correspondent who might want to remain anonymous (if not, he can reply in comments) writes:

You really do need to have the courage of my convictions and you might make some profit on the way. I am writing about the latest business model whereby one not only has to subscribe, i.e., pay for what was once free, but that only gets one in the door and not into the inner sanctums. The NYT has features that once were available—in the distant past of a few months ago—as part of the user’s subscription but now require an additional ponying up.

Consequently, you ought to do the same with your blog and offer some sort of tiered entitlement. An ordinary contributor will be allowed to comment on Wansink or David Brooks but would need to pay a nominal/exorbitant fee to post anything about novelists or the Hoover Institution; in between might be Harvard or the misuse of priors. Perhaps I have the hierarchy upside down, but you get the idea.

If scientific citations are worth $100,000 each, how much should I be charging for blog comments? Or how much should I be paying for them? It’s never clear which direction the payment should be going.