On the ethics of pollsters or journalists or political scientists betting on prediction markets

There’s been a bit of concern lately about political consultants or pundits offering some mix of private and public forecasting and advance, and also making side bets on elections. I don’t know enough about these stories to comment on them in any useful way. Instead I’ll share my own perspectives regarding betting on elections.

In June 2020 I wrote a post about an opportunity in a presidential election prediction market—our model-based forecast was giving Biden an 84% chance of winning the election, whereas the market’s implied odds were 53% for Biden, 40% for Trump, 2% for Mike Pence, 2% for Hillary Clinton (!), and another few percent for some other possible longshot replacements for Biden or Trump.

Just to be clear: Those betting odds didn’t correspond to Biden getting 53% of the vote, they corresponded to him having a 53% chance of winning, which in turn basically corresponded to the national election being a tossup.

I thought seriously about laying down some $ on Biden and then covering it later when, as anticipated, Biden’s price moved up.

Some people asked why I wasn’t putting my money where my mouth was. Or, to put it another way, if I wasn’t willing to bet on my convictions, did I really believe in my own forecast? Here’s what I wrote:

I agree that betting is a model for probability, but it’s not the only model for probability. To put it another way: Yes, if I were planning to bet money on the election, I would bet using the odds that our model provided. And if I were planning to bet a lot of money on it, I guess I’d put more effort into the forecasting model and try to use more information in some way. But, even if I don’t plan to bet, I can still help to create the model as a public service, to allow other people to make sensible decisions. It’s like if I were a chef: I would want to make delicious food, but that doesn’t mean that I’m always hungry myself.

Ultimately I decided not to bet, for a combination of reasons:

– I didn’t quite know how to do it. And I wasn’t quite sure it would be legal.

– The available stakes were low enough that I couldn’t make real money off it, and, if I could’ve, I would’ve been concerned about the difficulty of collecting.

– The moral issue that, as a person involved in the Economist forecast, I had a conflict of interest. And, even if not a real conflict of interest, a perceived conflict of interest.

– The related moral issue that, to the extent that I legitimately am an expert here, I’m taking advantage of ignorant people, which doesn’t seem so cool.

– Asymmetry in reputational changes. I’m already respected by the people that matter, and the people who don’t respect me won’t be persuaded by my winning some election bets. But if I lose on a public bet, I look like a fool. See the last paragraph of section 3 of this article.

Also there’s my article in Slate on 19 lessons from the 2016 election:

I myself was tempted to dismiss Trump’s chances during primary season, but then I read that article I’d written in 2011 explaining why primary elections are so difficult to predict (multiple candidates, no party cues or major ideological distinctions between them, unequal resources, unique contests, and rapidly changing circumstances), and I decided to be careful with any predictions.

The funny thing is that, in Bayes-friendly corners of the internet, some people consider it borderline-immoral for pundits to not bet on what they write about. The idea is that pundits should take public betting positions with real dollars cos talk is cheap. At the same time, these are often the same sorts of people who deny that insider trading is a thing (“Differences of opinions are what make bets and horse races possible,” etc.) It’s a big world out there.

Real-world prediction markets vs. the theoretical possibility of betting

Setting aside the practical and ethical problems in real-world markets, the concept of betting can be useful in fixing ideas about probability. See for example this article by Nassim Taleb explaining why we should not expect to see large jumps up and down in forecast probabilities during the months leading up to the event being forecast.

This is a difficult problem to wrestle with, but wrestle we must. One challenge for election forecasting comes in the mapping between forecast vote share and betting odds. Small changes in forecast vote shares correspond to big changes in win probabilities. So if we want to follow Taleb’s advice and keep win probabilities very stable until shortly before the election (see figure 3 of his linked paper), then that implies we shouldn’t be moving the vote-share forecast around much either. Which is probably correct, but then what if the forecast you’ve started with is way off? I guess that means your initial uncertainty should be very large, but how large is reasonable? The discussion often comes up when forecasts are moving around too much (for example, when simple poll averages are used as forecasts, then predictable poll movements cause the forecasts to jump up and down in a way that violates the martingale property that is required of proper probability forecasts), but the key issue comes at the starting point.

So, what about future elections? For example, 2024. Or 2028. One issue that came up with 2020 was that everyone was pretty sure ahead of time, and also correct in retrospect, that the geographic pattern of the votes were aligned so that the Democrats would likely need about 52% of the vote to win the electoral college. So, imagine that you’re sitting in 2019 trying to make a forecast for the 2020 presidential election, and, having read Taleb etc., you want to give the Democrats something very close to a 50/50 chance of winning. That would correspond to saying they’re expected to get 52% of the popular vote. Or you could forecast 50% in the popular vote but then that would correspond to a much smaller chance of winning in the electoral college. For example, say your forecast popular vote share, a year out, is 0.50 with a standard error of 0.06 (so the Democratic candidate has a 95% chance of receiving between 38% and 62% of the two-party vote), then the probability of them winning at least 52% is pnorm(0.50, 0.52, 0.06) = 0.37. On the other hand, it’s been awhile since the Republican candidate has won the popular vote . . . You could give arguments either way on this, but the point is that it’s not so clear how to express high ignorance here. To get that stable probability of the candidate winning, you need a stable predicted vote share, and, again, the difficulty here isn’t so much with the stability as in what’s your starting point is.


Thinking about hypothetical betting odds can be a useful way to understand uncertainty. I’ve found it helpful when examining concerns with my own forecasts (for example here) as well as identifying problems with forecast-based probability statements coming from others (for example here and here).

Actual betting is another story. I’m not taking a strong moral stance against forecasters also making bets, but I have enough concerns that I’m not planning to do it myself. On the other hand, I’m glad that some low-key betting markets are out there; they provide some information even if not as much as sometimes claimed. Rajiv Sethi discusses this point in detail here and here.

Free Bayesian Data Analysis course

We’re organizing a free Bayesian Data Analysis course targeted for Global south and other underrepresented groups. This is currently the third rendition of the BDA GSU course. Please see more information and the link to the registration form at the course web page.

The course is based on BDA3 book and BDA course at Aalto. All course material is freely available.

This is not the easiest Bayes course. The registration form requires you to answer some prerequisite questions. The web page has recommendations for easier material.

As all the material is free, you can choose to study at your own pace. We recommend registering and following the common schedule to benefit from the support of your peers and TAs.

If you want to volunteer to be a TA for the course, the course web page has also a link to TA registration.

The head TA is Meenal Jhajharia who did great job also in 2022.

The course is supported by Stan governing body, Numfocus, and Eduflow is supporting us by providing a free license of Peergrade tool.

Disappointing lack of institutional memory at the New Yorker

I was reading an interesting book review in the New Yorker and then came across this surprising bit:

The idea that ordinary life can be the subject of great art has long been accepted when it comes to poetry and literary fiction—in these genres, its status as a worthy subject feels self-evident—but it can still raise hackles in creative nonfiction. An invented life can be ordinary, but an actual life had better be seasoned by either extraordinary suffering or particular achievement.

I was stunned to see this claim, in the New Yorker, without any reference to classic New Yorker writers A. J. Liebling and Joseph Mitchell, who built much of their careers on writing about local characters. Mitchell’s the one who had the famous line celebrating the ordinary people who were the subjects of his reporting: “There are no little people in this book. They are as big as you are, whoever you are.” We could also add St. Clair McKelway, who wrote a long report about a charming down-and-out guy who counterfeited $1 bills.

I can understand that a particular writer might not be aware of these once-celebrated but now-obscure figures in literary journalism. But I was disappointed that whoever at the New Yorker who was editing the piece didn’t notice. Liebling is one of my favorite writers!

Maurice Sendak vs. Steve McQueen; Wilder advances

We got some tough takes here. Dzhaughn points out that Wilder’s reputation has literally cratered, but Raghu points out:

If you invite Stigler, you’re just going to get a talk someone else has given before.

That’s not literally true—Stigler is an excellent speaker irl—but as a Greatest Seminar Speaker argument it’s good. So Laura will move on to face Malcolm in the next round.

Today’s matchup

Two celebrated artists, each a bit of an outsider who was nonetheless accepted into the mainstream. A Brooklyn-born cartoonist or a London-born director. Your choice!

Again, here are the announcement and the rules.

Alberto Cairo’s visualization course

Alberto Cairo writes:

Every semester I [Cairo] teach my regular introduction to information design and data visualization class. Most students are data scientists, statisticians, engineers, interaction designers, plus a few communication and journalism majors.

At the beginning of the semester, many students are wary about their lack of visual design and narrative skills, and they are often surprised at how fast they can improve if they are willing to engage in intense practice and constant feedback. I’m not exaggerating when writing “intense”: an anonymous former student perfectly described the experience of taking my class in RateMyProfessors: “SO. MUCH. WORK”.

Indeed. The only way to learn a craft is to practice the craft nonstop.

My classes consist of three parts:

First month: lectures, readings, discussions, and exercises to master concepts, reasoning, and software tools. I don’t grade these exercises, I simply give credit for completion, but I hint what grades students would receive if I did grade them.

Second month: Project 1. I give students a general theme and a client. This semester I chose The Economist magazine’s Graphic Detail section, so a requirement for the project was that students tried to mimic its style. Once a week during this second month I give each student individualized advice on their progress prior to the deadline. I don’t give most feedback after they turn their project in, but before.

Third month: Project 2. I give students complete freedom to choose a topic and a style. I also provide weekly feedback, but it’s briefer and more general than on Project 1.

He then shares some examples of student projects. The results are really impressive! Sure, one reason they look so good is that they’re copying the Economist’s style (see “Second month”) above, but that’s fine. To have made something that looks so clean and informative is a real accomplishment and a great takeaway from a semester-long course.

When I teach, I try—not always with success—to make sure that, when the course is over, students can do a few things they could not do before, and that they can fit these new things into their professional life. Cairo seems to have done this very effectively here.

As the last days of January dawn… the 2nd International Cherry Blossom Prediction Competition arrives!

Just in time for February, it’s the return of the great annual International Cherry Blossom Prediction Competition! (This post is by Lizzie.)

Help scientists like me better understand the impacts of climate change and win cash prizes by predicting when the cherry trees will bloom in four cities across the globe. The competition is open to all with prizes for closest prediction as well as other categories.

Interested? Check out the website with all the details, including data, rules and how to enter here.

Last year over 80 contestants from across four continents entered with a variety of prediction approaches. You can read more about last year’s competition here.

A big thanks to the American Statistical Association, Caucus for Women in Statistics, Columbia University’s Department of Statistics, George Mason University’s Department of Statistics and Posit (formerly RStudio) for their support, and partnerships with the International Society of Biometeorology, MeteoSwiss, USA National Phenology Network, and the Vancouver Cherry Blossom Festival—as well as Mason’s Institute for a Sustainable Earth, Institute for Digital InnovAtion, and the Department of Modern and Classical Languages. Sponsors and partners will be updated on the website.

Organizers: Jonathan Auerbach and David Kepplinger (George Mason University) and Elizabeth Wolkovich (University of British Columbia)

Changes since the 1970s (ESP edition)

In 1979, the computer scientist Douglas Hofstadter published the book Godel Escher Bach. I’ve always thought this book to be overrated, but maybe that judgment on my part wasn’t fair. Godel Escher Bach is a long book, and it has some mistakes, but it also has lots of solid, thoughtful passages that stand up impressively well, 40 years later. Not many books of speculative science and engineering would look so reasonable after so many years.

At one point in the book, Hofstadter alludes to Alan Turing’s notorious belief in extra-sensory perception:

My [Hofstadter’s] own point of view—contrary to Turing’s—is that ESP does not exist. Turing was reluctant to accept the idea that ESP is real, but did so nonetheless, being compelled by his outstanding scientific integrity to accept the consequences of what he viewed as powerful statistical evidence in favor of ESP. I disagree, though I consider it an exceedingly complex and fascinating question.

The Turing thing we’ve already discussed: the statistical evidence that he thought existed, didn’t. The 1940s was a simpler era and people trusted what seemed to be legitimate scientific reports. Can’t hold it against him that he wasn’t sufficiently skeptical. This should just make us think harder about what are the accepted ideas that we hold without reflection nowadays. If Turing can make such misjudgments regarding statistical evidence, surely we are doing so too, all the time.

What I want to focus on is the last bit of the above quote, Hofstadter’s statement that the question of ESP is “exceedingly complex and fascinating.”

That’s a funny thing thing to read, because I don’t think the question of ESP is complex, nor do I think it fascinating. ESP is an intuitively appealing idea, easy to imagine, but for which there has never been offered any serious scientific theory or evidence. To the extent that there is “an exceedingly complex and fascinating question” here, it’s not about the existence or purported evidence for ESP, bur rather it’s the question of how it is that so many people believe in it, just as so many people believe in ghosts, astrology, unicorns, fairies, mermaids, etc. OK, there aren’t so many believers anymore in unicorns, fairies, and mermaids, what with the lack of any direct corporeal evidence of these creatures. ESP, ghosts, and astrology are easier to believe in because any evidence would be indirect.

Anyway, here’s my guess of what was going on with Hofstadter’s statement that he considered ESP to be “an exceedingly complex and fascinating question.” Back in the 1970s, ESP seemed to be a live issue. Even though the most prominent ESP promoter was the ridiculous Uri Geller, there was some sense that ESP was an important topic. I’m not quite sure why, but I guess public understanding of science was less sophisticated back then, to the extent that even a computer science professor who didn’t even think ESP existed could still think that it’s an interesting question.

Things have changed since the 1970s. You can study ESP if you want, but it’s no longer in the conversation, and there’s no sense that we have to show respect for the idea.

Trends in science are interesting to me. We’re no longer talking much about ESP—in retrospect, that notorious paper published in the Journal of Personality and Social Psychology in 2010 and featured in major media was not a breakthrough but rather the last gasp of real interest in the topic—but I have a horrible feeling that the sort of schoolyard evolutionary biology exemplified by the “Big and Tall Parents Have More Sons,” “Violent Men Have More Sons,” “Engineers Have More Sons, Nurses Have More Daughters,” and “Beautiful Parents Have More Daughters” papers remains in the mix. And, as long as Ted talks are around, I don’t think embodied cognition social priming and other purported mind hacks will be going away any time soon.

Here I’m not just talking about ideas that have mass popularity. I’m talking about ideas that might well have mass popularity, but what’s relevant here is that they are respected among educated people.

As we keep reminding you, surveys tell us that 30% of Americans believe in ghosts. I’m guessing it was around the same percentage back in the 1970s, at which time I doubt that Douglas Hofstadter believed in ghosts any more than he believed in ESP. The difference is that he respected ESP; I doubt he respected ghosts. ESP was science—possibly mistaken science, but still exceedingly complex and fascinating. Ghosts were a throwback.

Similarly, I expect that schoolyard gender essentialism will always be with us. What’s notable about some of the more ridiculous evolutionary-psychology elaborations of these ideas is that they have elite support—or, at least they did back in 2007 when they were featured uncritically in Freakonomics. I haven’t heard much about these theories recently but I guess they’re still out there. Embodied cognition is in some intermediate stage, similar to ESP in the 1970s in that it’s been shot down by the experts but still exists in the Ted/NPR/airport-business-book universe.

One difference between ESP and the popular pseudoscience of today is that ESP, as generally understood, represents a pretty specific series of assertions that can be disproved to within reasonable accuracy. No, there’s no evidence that people can read your mind, or that you can transmit thoughts to people without sensory signals, or that anyone can predict random numbers, or whatever. In contrast, evolutionary psychology and embodied cognition are broad ideas which indeed are correct in some aspects; the controversy arises from overly general applications of these concepts.

OK, we could go on and on—I guess I already have! The main point of this post was that times have changed. Back in the 1970s it was considered sensible to take ESP seriously, even if you didn’t believe it. Now you don’t have to show ESP that kind of respect. New areas of confusion have taken over. As someone who was around in the 70s myself, I feel a little bit of nostalgia around ESP. It’s pleasantly retro, bringing back images of big smelly cars with bench seats.

You wish you’re first to invent this scale transform: -50 x*x + 240 x – 7

This post is written by Kaiser, not Andrew.

Harvard Magazine printed the following chart that confirms a trend in undergraduate majors that we all know about: students are favoring STEM majors at the expense of humanities.

I like the chart. The dot plot is great for showing this dataset. They placed the long text horizontally. The use of color is crucial, allowing us to visually separate the STEM majors from the humanities majors.

Then, the axis announced itself.
I was baffled, then disgusted.
Here is a magnified view of the axis:

Notice the following features of this transformed scale:

  • It can’t be a log scale because many of the growth values are negative.
  • The interval for 0%-25% is longer than for 25%-50%. The interval for 0%-50% is also longer than for 50%-100%. On the positive side, the larger values are pulled in and the smaller values are pushed out.
  • The interval for -20%-0% is the same length as that for 0%-25%. So, the transformation is not symmetric around 0.

I have no idea what transformation was applied. I took the growth values, measured the locations of the dots, and asked Excel to fit a polynomial function, and it gave me a quadratic fit, R square > 99%.

Here’s the first time I’ve seen this transform: -50 x^2 + 240 x – 7.

This formula fits the values within the range extremely well. I hope this isn’t the actual transformation. That would be disgusting. Regardless, they ought to have advised readers of their unusual scale.

Using the inferred formula, I retrieved the original growth values. They are not extreme, falling between -50% and 125%. There is nothing to be gained by transforming the scale.

The following chart undoes the transformation.

(I also grouped the majors by their fields.)

P.S. A number of readers have figured out the actual transformation, which is the log of the relative ratio of the number of degrees. A reader also pointed us to an article by Benjamin Schmidt who made the original chart. In several other analyses, he looked at “market share” metrics. I prefer the share metric also for this chart. In the above, a 50% increase doesn’t really balance out a 33% decrease because the 2011 values differ across majors.

P.P.S. Schmidt adds some useful information in comments:

Yeah, this is code that I run every year. Harvard Magazine asked if they could use the 2022 version, and must have done something in Illustrator. (Also they dropped a bunch of fields that don’t apply to Harvard–Harvard has no business major, so they hide business.)

I’ll switch it to label fractions rather than percentages next time, and grouping across areas as above is a solid improvement. But the real problem here is that there aren’t *enough* log scales in the world for it to be obvious what’s going on. They are better when discussing rates of change. A linear scale implies the possibility of a 150% drop. That’s worse than meaningless–it’s connected to a lot of mistaken reasoning about percentages. (E.g. people thinking that a 5% drop followed by a 5% rise would bring you back to the same point; the failure to understand compound interest; etc.) IMO charts shouldn’t reflect expectations when those expectations rest on bad mental models.

Here, FWIW, is the code.

p1 = c2 %>%
ggplot() + aes(color = label, y = change, x = reorder(Discipline, change, function(x) {x[1]})) +
geom_point() + labs(title=paste0(“Fig. 1: Change in degrees, “, y1, “-“, y2), y = paste0(“Change in number awarded “, y1, “-“, y2), x = “Major”, color = “General Field”, caption = “Sources: NCES IPEDS data; taxonomy adapted from American Academy of Arts and Sciences.\nBen Schmidt, 2022”) +
coord_flip() + scale_y_continuous(breaks = c(1/2, 2/3, 4/5, 1, 5/4, 3/2, 2, 3), labels=scales::percent(c(1/2, 2/3, 4/5, 1, 5/4, 3/2, 2, 3) – 1)) + theme_bw() + scale_x_discrete(“Major”) + theme(legend.position = “bottom”) + scale_color_brewer(type=’qual’, palette = 2, guide = guide_legend(ncol = 2))

Grade inflation: Why hasn’t it already reached its terminal stage?

Paul Alper writes:

I think it is your duty to write something about this. Why? For one thing, it does not involve Columbia. For another, I presume you and your family will return to NYC and someone in your family in the future will seek medical care in hopes that the physician understands organic chemistry and what it implies for proper medical care. However, if you do not want to be recorded on this, how about getting one of your bloggers who has medical degrees or have some sort of connection with NYU? To be extra safe, have the column written in French or dialect thereof. Another idea: relate this to students taking an undergraduate statistics course where the failure rate is high.

P.S. True story: When I was an undergrad at CCNY in engineering, the administration was up-front loud and proud about the attrition rate from first to second year in engineering because that proved it was an academically good institution.

“This” is a news article entitled “At NYU, Students Were Failing Organic Chemistry. Whose Fault Was It?”, which continues, “Maitland Jones Jr., a respected professor, defended his standards. But students started a petition, and the university dismissed him.” It seems that he was giving grades that were too low.

Fitting this into the big picture: this particular instructor was a retired professor who was teaching as an adjunct, and, as most readers are aware, adjuncts have very little in the way of workplace rights.

The real question, though, which I asked ten years ago, is: Why haven’t instructors been giving all A’s already? All the incentives go in that direction. People talk lots about grade inflation, but the interesting thing to me is that it hasn’t already reached its terminal stage.

The importance of “bumblers” and “pointers” in science, and the division of labor

A few years ago, I received some angry emails from a psychology professor I’d never met, who was annoyed at me for criticizing published work in his field. In one of these, he wrote:

Of Dan Wegner’s many wonderful papers, I think I still like best his little theory of science paper in which he says there are two kinds of people in science: bumblers and pointers. Bumblers are the people who get up every morning and make mistakes, trying to find truth but mainly tripping over their own feet, occasionally getting it right but typically getting it wrong. Pointers are the people who stand on the sidelines, point at them, and say “You bumbled, you bumbled.” These are our only choices in life. If one is going to choose to do the easier of these two jobs, then one should at least know what one is talking about.

This came up a few years ago, and I remarked that this was a pretty ridiculous thing, especially coming from a psychology professor! [I refer here to my correspondent, not to Wegner, who I can only assume had a more nuanced take on all this.] I hope my correspondent doesn’t teach this sort of simplistic theory to his students. Also it was funny to see him directing this to me, given that I have personally spent so much time doing both of these things. The idea that I might do research and also criticize research . . . that was somehow beyond him.

Recently this exchange came to mind and I realized a couple other things.

The first is that my correspondent referred to standing on the sidelines and pointing as “easier” than getting up very morning, making mistakes, trying to find truth but typically getting it wrong.

But why does he say this? Even accepting his (ridiculous) idea that any person can only do one of these two things, why is he so sure that “pointing” is easier than “bumbling”?

Look at his statement more carefully. What’s so hard about “bumbling”? “Getting up in the morning,” that’s not so hard—we all do that! “Trying to find truth,” that’s not hard either. Anyone can try to find truth. “Mainly tripping over their own feet, occasionally getting it right but typically getting it wrong”: hey, that’s sounds pretty damn easy indeed! Admittedly, his definition of the activities of “pointers”—to “stand on the sidelines, point at them, and say ‘You bumbled, you bumbled.'”—that sounds pretty effortless too! So I’d guess the two tasks, as my correspondent described them, are equally easy.

To me, science is hard. To me, the “bumbling” part is not just about “trying to find truth but mainly tripping over their own feet.” It’s all about putting my ideas to the test, which includes doing my best to shoot down my inspirations and also learning from criticism rather than acting defensively. And “pointing” is really hard, because when I criticize others, I want to be very careful not to be wrong myself. Sometimes I make mistakes when criticizing—“pointing” itself has some aspects of “bumbling”—but I try hard to avoid them. So the pointing that I do, like the bumbling that I do, is effortful, as science should be. It’s satisfying and often enjoyable, but it’s work. And neither of these things is inherently easier or harder than the other. I do think it reveals a deep misunderstanding on the part of my correspondent to think otherwise.

The other thing that came to mind just now is division of labor. As I said, I do my own (collaborative) research and I also criticize the research of others. That works well for me—I think that my own research allows me to be a better critic, my criticism opens up new research ideas for me, and both these things improve my textbooks and other expository writing.

But my approach is hardly the only way to go. There are some excellent researchers who don’t waste any time criticizing bad work; they just focus on making progress, and they expend their critical effort on improving their own research. That’s just fine. From the other direction, there are some excellent critics who don’t do much original research but devote lots of time to careful criticism of work by others. That also is important. Good criticism (what my correspondent would call “pointing”) can take a lot of work, and also attention and persistence. You might have to make data requests over and over, continue bugging journals and other institutions, do all sorts of things that I won’t have the patience to do.

And that gets to the point about division of labor. Some people tell me I shouldn’t be wasting my time trying to understand and write about what’s wrong with bad research. Maybe they’re right; I don’t know. But, if you have that attitude, you can’t then turn around and criticize various research critics for being “pointers,” “terrorists,” “Stasi,” “second-stringers,” etc. You can’t have it both ways! It’s just fine to have prominent critics who do not themselves have active research agendas. Criticism is important, we learn from it, and someone who’s really good at criticism might not be really good at original research, just as the reverse might be the case. We need high-quality “bumbling” as well as high-quality “pointing”; that’s how science works.

P.S. Also from that above-linked post, I notice that Tyler Cowen in his blog had linked to that terrible, terrible ovulation-and-voting paper, calling it the “politically incorrect paper of the month.” This was amusing for two reasons:

1. Nobody uses the term “politically incorrect” anymore: like an old stick of gum all of whose flavor has been chewed out, this term has no rhetorical value anymore. Which suggests to me that it never had much real value, and that it was only serving as a political weapon.

2. That ovulation-and-voting paper was, I’d say, scientifically valueless, although it did have the virtue of serving as an example in our multiverse paper. A good reminder that a published article can be “politically incorrect” or serve some other political purpose and still be no good. This wasn’t so clear at the time, but now that we’ve been thinking about forking paths and small effect sizes for awhile, these problems are becoming easier to spot. Just to be clear: I’m not saying the paper was bad because it was “politically incorrect”; I’m saying it was bad, and the political angle gave it some attention and approval it did not deserve. Similar things have happened with bad papers that scratch an itch on the left side of the political spectrum; see for example this article purporting to demonstrate “The Bright Side of Unionization.”

Laura Ingalls Wilder vs. Steve Stigler; Malcolm advances

Dzhaughn offers an allusive line of reasoning:

He is a very high achieving Malcolm, and there is Little prospect that he will be Pope, so we need not worry about Gladwell before.

Baez, Collins, Rivers, Blondell, Miro. She was born to a life of privilege.

Anon’s got the serious take:

I’d have to go with Malcolm X, in part because while he’s a household name, I bet very few have actually heard him speak or read his writings directly. Most people know him through popular reputation, which often provides an objectively inaccurate perspective of his beliefs and views.

Joan Didion, on the other hand – most people who know of her have a fairly accurate idea of what she wrote about. She would no doubt have wonderful things to say, and I would love to hear them, but hearing Malcolm X in his own words would be a valuable educational enrichment . . .

David says it in one line:

Hard decision, but I’ll go for Malcolm X, since he had the good sense to avoid the war.

From the other direction, Diana gives this pitch for Joan:

The NYPL has a Malcolm X collection and has just acquired Joan Didion’s papers (the NYT reported this on January 26). It’s this “just acquired” that intrigues me, because if Didion was hoping to read from some of her own notes, she suddenly will have to wait until they’ve been sorted, catalogued, etc.–which could take a while, since her archive spans 240 linear feet of material. This means that she may well have to improvise, which would be fun. Malcolm X, on the other hand, has had ample notice of his papers’ location; he would have no trouble making photocopies, if he wished. There’s nothing wrong with a prepared speech, but I can’t get rid of the yearning for surprise. Therefore I vote for Didion.

But I’m suspicious of arguments based on novelty—it’s all a bit too PNAS for me—so it’s X who advances.

Today’s matchup

I’ll let wikipedia handle this one:

In 1930, Wilder requested [her daughter, Rose Wilder] Lane’s opinion about an autobiographical manuscript she had written about her pioneering childhood. The Great Depression, coupled with the deaths of Wilder’s mother in 1924 and her older sister in 1928, seem to have prompted her to preserve her memories in a life story called Pioneer Girl. She also hoped that her writing would generate some additional income.

The original title of the first of the books was When Grandma Was a Little Girl. On the advice of Lane’s publisher, she greatly expanded the story. As a result of Lane’s publishing connections as a successful writer and after editing by her, Harper & Brothers published Wilder’s book in 1932 as Little House in the Big Woods. After its success, she continued writing.


Stigler’s law of eponymy, proposed by University of Chicago statistics professor Stephen Stigler in his 1980 publication Stigler’s law of eponymy, states that no scientific discovery is named after its original discoverer. Examples include Hubble’s law, which was derived by Georges Lemaître two years before Edwin Hubble, the Pythagorean theorem, which was known to Babylonian mathematicians before Pythagoras, and Halley’s Comet, which was observed by astronomers since at least 240 BC (although its official designation is due to the first ever mathematical prediction of such astronomical phenomenon in the sky, not to its discovery). Stigler himself named the sociologist Robert K. Merton as the discoverer of “Stigler’s law” to show that it follows its own decree, though the phenomenon had previously been noted by others.

That’s right. We have one person who got full credit and fame for a collaboration, and someone else who’s (slightly less) famous for an article about people getting credit for other people’s ideas. Fits in well with some of the themes of this blog.

Again, here are the announcement and the rules.

What are the most important statistical ideas of the past 50 years?

Many of you have heard of this article (with Aki Vehtari) already—we wrote the first version in 2020, then did some revision for its publication in the Journal of the American Statistical Association.

But the journal is not open-access so maybe there are people who are interested in reading the article who aren’t aware of it or don’t know how to access it.

Here’s the article [ungated]. It begins:

We review the most important statistical ideas of the past half century, which we categorize as: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, Bayesian multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss key contributions in these subfields, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science.

I really love this paper. Aki and I present our own perspective—that’s unavoidable, indeed if we didn’t have an interesting point of view, there’d be no reason to write or read article in the first place—but we also worked hard to give a balanced view, including ideas that we think are important but which we have not worked on or used ourselves.

Also, here’s a talk I gave a couple years ago on this stuff.

Erik van Zwet explains the Shrinkage Trilogy

The Shrinkage Trilogy is a set of three articles written by Zwet et al.:

1. The Significance Filter, the Winner’s Curse and the Need to Shrink at http://arxiv.org/abs/2009.09440 (Erik van Zwet and Eric Cator)

2. A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates at http://arxiv.org/abs/2011.15037 (Erik van Zwet and Andrew Gelman)

3. The Statistical Properties of RCTs and a Proposal for Shrinkage at http://arxiv.org/abs/2011.15004 (Erik van Zwet, Simon Schwab and Stephen Senn)

To help out, Zwet also prepared this markdown file explaining the details. Enjoy.

Labeling the x and y axes: Here’s a quick example where a bit of care can make a big difference.

I’ll have more to say about the above graph some other time—it comes from this excellent post from Laura Wattenberg. Here I just want to use it as an example of statistical graphics.

Overall, the graph is great: clear label, clean look, good use of color. I’m not thrilled with the stacked-curve thing—it makes it hard to see exactly what’s going on with the girls’ names in the later time period—I think I’d prefer a red line for the girls, a blue line for the boys, and a gray line with total so you can see that if you want. Right now the graph shows boys and total, so we need to do some mental subtraction to see the trend for girls. Also, a very minor thing: the little squares with M and F on the bottom could be a bit bigger and just labeled as “boys” and “girls” or “male” and “female” rather than just “M” and “F.”

Also, I think the axis labels could be improved.

Labeling the x-axis every four years is kind of a mess: I’d prefer just labeling 1900, 1920, 1940, etc., with tick marks every 5 and 10 years.

As for the y-axis, it’s not so intuitive to have labels at 4, 8, 12, etc—I’d prefer 5, 10, 15, etc., or really just 10, 20, 30 would be fine—; also, I don’t find numbers such as 10,000 per million to be so intuitive. I’d prefer expressing in terms of percentages. So that 10,000 per million would be 1%. Labeling the y-axis as 1%, 2%, etc., that should be clearer, no? Lots less mysterious than 4000, 8000, etc. Right now, about 3% of baby names end in “i.” At the earlier peak in the 1950s-70s, it was about 2%.

Just to be clear: I like the graph a lot. The criticisms point to ways to make it even better (I hope).

The point of this post is how, with a bit of thought, we can improve a graph. We see lots of bad graphs that can be improved. It’s more interesting to see how even a good graph can be fixed up.

Joan Didion vs. Malcolm X; Rigg advances

In the contest between the fake spy and the real traitor, Manuel writes:

Diana Rigg exploits as an spy, albeit fictional, are memorable. The second Mrs. Arnold would stand more than a fighting chance against her, as she knew everything about cryptography, invisible ink and the like. But even the absent Mr. Peel seems more interesting than his husband.

And Raghu writes:

I re-watched “The Great Muppet Caper” a few years ago, which was better than I remembered. Diana Rigg is in it; Benedict Arnold is not. As “… the Muppets are caught up in a jewel heist while investigating a robbery in London” [Wikipedia], Ms. Rigg can no doubt comment on Anglo-American relations, so we don’t need Arnold for that. Plus, from his Wikipedia page, he seems truly unpleasant.

I’m convinced. Diana it is. Sorry, Jonathan!

Today’s matchup

A cool person versus a person known by an initial. Two intellectuals who became famous in the 60s. One of them was not played by Denzel Washington.

Again, here are the announcement and the rules.

Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models

Paul, Jonah, and Aki write:

Cross-validation can be used to measure a model’s predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into simple terms, but a lot of important models in temporal and spatial statistics do not have this property or are inefficient or unstable when forced into a factorized form. We derive how to efficiently compute and validate both exact and approximate LOO-CV for any Bayesian non-factorized model with a multivariate normal or Student-t distribution on the outcome values. We demonstrate the method using lagged simultaneously autoregressive (SAR) models as a case study.

Aki’s post from last year, “Moving cross-validation from a research idea to a routine step in Bayesian data analysis,” connects this to the bigger picture.

Water Treatment and Child Mortality: A Meta-analysis and Cost-effectiveness Analysis

This post is from Witold.

I thought some of you may find this pre-print (that I am a co-author of) interesting. It’s a meta-analysis of improving water quality in low and middle income countries. We estimated this reduced odds of child mortality by 30% based on 15 RCT. That’s obviously a lot! If true, this would have very large real-world implications, but there are of course statistical considerations of power, publication bias etc. So I thought that maybe some of the readers will have methodological comments while others may be interested in the public health aspect of it. It also ties to a couple of follow-up posts I’d like to write here on effective altruism and finding cost-effective interventions.

First, a word on why this is an important topic. Globally, for each thousand births, 37 children will die before the age of 5. Thankfully, this is already half of what it was in 2000. But it’s still about 5 million deaths per year. One of the leading causes for death in children is diarrhea, caused by waterborne diseases. While chlorinating [1, scroll down for footnotes] water is easy, inexpensive, and proven to remove pathogens from water, there are many countries where most people still don’t have access to clean water (the oft-cited statistic is that 2 billion people don’t have access to safe drinking water).

What is the magnitude of impact of clean water on mortality? There is a lot of experimental evidence for reductions in diarrhea, but making a link between clean water and mortality requires either an additional, “indirect”, model connecting disease to deaths, which is hard [2], or directly measuring deaths, which are rare (hence also hard) [3].

In our pre-print [4], together with my colleagues Michael Kremer, Steve Luby, Ricardo Maertens, and Brandon Tan we identify 53 RCTs of water quality treatments. Contacting the authors of each study resulted in 15 estimates that could be meta-analysed, with about 25,000 children. (Why only 15 out of 53? Apparently because the studies were not powered for mortality, with each one of them contributing just a handful of deaths, in some cases the authors decided to not collect, retain or report deaths.) As far as we are aware, this is the first attempt to meta-analyse experimental evidence on mortality and water quality.

We conduct a Bayesian meta-analysis of these 15 studies using a logit model and find a 30% reduction in odds of all-cause mortality (OR = 0.70, with a 95% interval 0.49 to 0.93), albeit with high (and uncertain) heterogeneity across studies, which means the predictive distribution for a new study has a much wider interval and slightly higher mean (OR=0.75, 95% interval 0.29 to 1.50). This heterogeneity is to be expected because we compare different types of interventions in different populations, across a few decades.[5] (Typically we would want to address this with a meta-regression, but that is hard due to a small sample.)

The whole analysis is implemented in baggr, an R package that provides meta-analysis interface for Stan. There are some interesting methodological questions related to modeling of rare events, but repeating this analysis using frequentist methods (random-effects model on Peto’s OR’s has a mean OR of 0.72) as well as various sensitivity analyses we could think of all lead to similar results. We also think that publication bias is unlikely. Still, perhaps there are things we missed.

Based on this we calculate about $3,000 cost per child death averted, or under $40 per DALY. It’s hard to convey how extremely cost-effective this is (a typical cost effectiveness threshold is equivalent of one years GDP per DALY; this is reached at 0.6% reduction in mortality), but basically it is on par with the most cost-effective child health interventions such as vaccinations.

Since the cost-effectiveness is potentially so high, there are obviously big real-world implications. Some funders have been reacting to the new evidence already. For example, some months ago GiveWell, an effective altruism non-profit that many readers will already be familiar with, conducted their own analysis of water quality interventions and in a “major update” of their assessment recommended a grant of $65 million toward a particular chlorination implementation [6]. (GiveWell’s assessment is an interesting topic for a blog post of its own, so I hope to write about it separately in the next few days.)

Of course in the longer term more RCTs will contribute to precision of this estimate (several are being worked on already), but generating evidence is a slow and costly process. In the short term the funding decisions will be driven by the existing evidence (and our paper is still a pre-print), so it would be fantastic to see if readers have comments on methods and its real-world implications.



[1] For simplicity I simply say “chlorination” but this may refer to chlorinating at home, at the point from which water is drawn, or even using a device in the pipe, if households have piped water which may be contaminated. Each of these will have different effectiveness (primarily due to how convenient it is to use) and costs. So differentiating between them is very important for a policy maker. But in this post I group all of this to keep things simple. There are also other methods of improving quality, e.g. filtration. If you’re interested, this is covered in more detail in the meta-analyses that I link to.

[2] Why is extrapolating from evidence on diarrhea into mortality hard? First, it is possible that reduction in severe disease is higher (in the same way that vaccine may not protect you from infection, but it will almost definitely protect you from dying). Second, clean water also has lots of other benefits, e.g. it likely makes children less susceptible to other infections, nutritional deficiencies, and also makes their mothers healthier (which could in turn lead to fewer deaths during birth). So while these are just hypotheses, it’s hard a priori to say how a reduction in diarrhea would translate to a reduction in mortality.

[3] If you’re aiming for 80% power to detect 10% reduction in mortality you will need RCT data on tens of thousands of children. Exact number of course depends on how high baseline mortality rate is in the studies.

[4] Or, to be precise, an update to a version of this pre-print which we released in February 2022. If you happened to read the previous version of the paper, both main methods and results are unchanged, but we added extra publication bias checks, characterization of the sample and rewrote most of the paper for clarity.

[5] That last aspect of heterogeneity seems important, because some have argued that the impact of clean water may diminish with time. There is a trace of that in our data (see supplement), but with 15 studies the power to test for this time trend is very low (which I show using a simulation approach).

[6] GiveWell’s analysis included their own meta-analysis and led to more conservative estimates of mortality reductions. As I mention at the end of this post, this is something I will try to blog about separately. Their grant will fund Dispensers for Safe Water, an intervention which gives people access to chlorine at the water source. GW’s analysis also suggested a much larger funding gap in water qulity interventions, of about $350 million per year.

Students learn more from active classrooms, but they think they’re learning less!

Aki points us to this article by some Harvard physics teachers, “Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom,” which begins:

Despite active learning being recognized as a superior method of instruction in the classroom, a major recent survey found that most college STEM instructors still choose traditional teaching methods. This article addresses the long-standing question of why students and faculty remain resistant to active learning.

Here’s what they did:

We compared students’ self-reported perception of learning with their actual learning under controlled conditions in large-enrollment introductory college physics courses taught using 1) active instruction (following best practices in the discipline) and 2) passive instruction (lectures by experienced and highly rated instructors).

And here’s what they found:

Students in active classrooms learned more (as would be expected based on prior research), but their perception of learning, while positive, was lower than that of their peers in passive environments. . . . these results suggest that when students experience the increased cognitive effort associated with active learning, they initially take that effort to signify poorer learning.

My response: Interesting. Lots of the teaching literature is done by physicists. Maybe one reason for this is that intro physics is very standard. With intro stat, we’re still arguing about what should be taught. In intro physics, that’s all agreed, so they can instead argue more about how to teach it. Regarding the paper itself, I’m skeptical, but I’d like to believe its conclusions!

Just to elaborate on my views here:

– It makes sense to me that students learn more in active classrooms. I believe in this so much that I just (with Aki) wrote an entire book on how to teach applied statistics using active learning.

– I love active learning so much—and at the same time I understand the process of learning so poorly—that I have to be careful not to automatically believe a story that makes active learning look good, and this paper fits into that category.

– Active learning indeed takes more effort from the student as well as from the teacher. I can well believe that students can get annoyed at active learning and that in the short term they would prefer a more passive experience where they can jus sit in the classroom and zone out for an hour.

– I can also believe that students in active classrooms learned more. Especially in physics classes, where it’s super clear what is needed to be learned, and especially in an elite environment such as Harvard, where it’s not enough to just be able to get the right answers for standard problems but students also have to think creatively on exams.

– I’m not convinced by the authors’ speculation that students take increased cognitive effort as signifying poorer learning. I’d find it more plausible to believe that active learning involves lots of struggle, a process that makes students aware of much that they don’t know.

– This also suggests to me that active learning could be more effective it it had more reassurance, for example adding some drills at the end with straightforward tasks practicing what was learned during the class period. If increased cognitive effort is taken to signify poorer learning, then, yeah, that’s a big problem. But if the perception of poorer learning is just arising from sitting in class and getting confused over and over, then maybe we could remedy this by having the experience be less frustrating, with more positive reinforcement.

Also, a high perception of learning is not necessarily such a good thing! Especially for Harvard students, who already might have a tendency toward overconfidence. To put it another way, if one of the outcomes of the active classes is a lower average perception of learning, maybe that’s a good thing. Knowing what you don’t know is an important part of knowing.

This all reminds of something my dad told me once, which is that you learn the most from your mistakes. Or, more specifically, you learn when you make a mistake and then you find out it was a mistake. Not quite “positive psychology” but I believe it. At least, that’s how it’s worked for me. And for Lakatos.

Diana Rigg vs. Benedict Arnold; Bechdel advances

In yesterday’s battle of two lawgivers, Anonymous Pigeon writes:

First of all I would like to start with a question: Who did more for humanity? Hammurabi did with his code. Sure Alison Bechdel did help for females and yada yada yada but just imagine having a little stone on your wall, it has a completely hilarious law and is signed by Hammurabi. THAT would be great.

Interesting. But I googled *Hammurabi pigeon* and came up with this, sur des « pigeons farcis entre chair et peau façon babylonienne ».

So, if you’re a pigeon, the penalty under Hammurabi’s code is . . . death! And, yeah, I know that, according to some law professors, each executed pigeon saves 18 innocent pigeon lives, so capital punishment of pigeons is morally required—but I’m skeptical. What data do we really have on Babylonian fowl? Let’s show the birds some mercy.

And if we’re not gonna go with the law of Hammurabi, we need to follow the law of Bechdel. Next round may very well feature Diana Rigg, and for that contest to past Bechdel’s test, it will have to feature at least two women talking to each other about something other than a man. So Alison it is.

Today’s matchup

A spy versus a traitor! Both serve the crown. Who will advance? Rigg’s got the moves, but Arnold has the plans for West Point.

Again, here are the announcement and the rules.

“Several artists have torn, burned, ripped and cut their artwork over the course of their careers. it is still possible for them to do it, but they will be able to preserve their artwork permanently on the blockchain.”

I’m tinguely all over from having received this exclusive 5D invitation, and I just have to share it with all of you!

It’s a combination of art museum and shredder that’s eerily reminiscent of the Shreddergate exhibit at the Museum of Scholarly Misconduct.

Here’s the story:

You Are Invited: Media Event During Art Basel Week in Miami

Live demonstration of groundbreaking new NFT technology by the engineers, alongside some of Miami’s leading artists.

Two dates for the live presentations for the media:

  • Thursday, Dec. 1 at 3:00 p.m.
  • Friday, Dec. 2 at 3:00 p.m.

At the Center for Visual Communication in Wynwood, 541 NW 27th Street in Miami.

* * * Media RSVP is required at: https://www.eventbrite.com/**

This is a private event for the news media, by invitation only, and is not open to the public.

This media event will be presented at the location of the new exhibition “The Miami Creative Movement” featuring 15 of Miami’s leading artists.

Media Contacts: ** & ** 305-**-** **@**.com

– This will be the official launch of the new ** Machine, the first hardware-software architecture that creates a very detailed digital map of an artwork using a novel ultra-detailed 5D scanning technology.

– They will transform physical artworks into NFTs in real time for the audience, via this new hardware device they invented.

– The sublimated artworks will be uploaded to the blockchain live, in real-time, and will be showcased in an immersive VR environment at the event.

– After the scanning is completed, a laser-shredder “sublimates” the object, erasing the physical artwork and minting a new NFT directly on the blockchain.

The technology’s creator — ** — hails this as: “The first NFT-based technology that will allow artists and collectors to preserve works of art indefinitely in digital form, simply and without loss of information. Provenance is indisputable and traceable back to the original work of art to every brushstroke and minutiae detail.”

“As this new hardware revolutionizes art conservation around the world and attracts many artists to Web3, it also adds legitimacy to real world art on blockchains, enabling them to be traded.”


** Presents a Technology That Could Revolutionize NFTs and the World of Physical Art Forms

** – an Argentinian team of blockchain experts, technologists and artists, announces the official launch of the ** Machine, the first hardware-software architecture that creates a very detailed digital map of an artwork, using a novel ultra-detailed 5D scanning technology.

After the scanning is completed, a laser-shredder “sublimates” the object, erasing the physical artwork and minting a new NFT directly on the blockchain.

The technology’s creator hails this as the first NFT-based technology that will allow artists and collectors to preserve works of art indefinitely in digital form, simply and without loss of information.

The new artwork transcends the physical work into the blockchain as a unique NFT that can be referenced to the original sublimated artwork, and to which provenance is indisputable and traceable back to the original work of art to every brushstroke and minutiae detail.

Several artists have torn, burned, ripped and cut their artwork over the course of their careers. It is still possible for them to do it, but they will be able to preserve their artwork permanently on the blockchain.

As the hardware revolutionizes art conservation around the world and attracts many artists to Web3, it also adds legitimacy to real world art on blockchains, enabling them to be traded.

Artists who “burn” their paintings with the ** technology will get 85% of the revenue obtained from the newly created NFT and its addition to the most popular NFT marketplaces.

The artist ** completing the process of physical destruction of his painting

This process of creative destruction will be showcased for the news media during the week of Art Basel Miami at the Center for Visual Communication in Miami’s Wynwood Arts District.

There will be a live demonstration for members of the press and the sublimated artworks uploaded to the blockchain will be showcased in the art gallery in an immerse VR environment.

Argentina-based ** is a Web3 and Metaverse company dedicated to transferring the value of art to the digital world. The company was co-founded by **, **and **.

**’s first product is the ** Machine, a technology that scans and laser cuts physical artworks to produce NFTs. Read more at **.

You Are Invited: Media Event During Art Basel Week in Miami

Live demonstration of groundbreaking new NFT technology by the engineers, alongside some of Miami’s leading artists.

Two dates for the live presentations for the media:

  • Thursday, Dec. 1 at 3:00 p.m.
  • Friday, Dec. 2 at 3:00 p.m.

At the Center for Visual Communication in Wynwood, 541 NW 27th Street in Miami.

* * * Media RSVP is required at: https://www.eventbrite.com/**

This is a private event for the news media, by invitation only, and is not open to the public.

This media event will be presented at the location of the new exhibition “The Miami Creative Movement” featuring 15 of Miami’s leading artists.

Media Contacts: ** & ** 305-***-**** **@**.com

P.O. Box **
Miami Beach, FL 33239

“This is a private event for the news media, by invitation only, and is not open to the public.” . . . . wow, this makes me feel so important! There’s so much juicy stuff here, from the “5D scanning technology” onward. “This process of creative destruction” indeed. I’m assuming that anyone who showed up to this event was escorted there in a shiny new Hyperloop vehicle directly from their WeWork shared space. Unicorns all around!

But what’s with the “laser shredder”? Wouldn’t it be enough just to crumple up the original artwork and throw it in the trash?

It’s always fun to be on the inside of “a private event for the news media, by invitation only,” even if it’s not quite as impressive as the exclusive “non-transferable” invitation to hang out with Grover Norquist, Anne-Marie Slaughter, and a rabbi for a mere $16,846.