Published or to be published articles: [2021] Reflections on Lakatos’s “Proofs and Refutations.” {\em American Mathematical Monthly}. (Andrew Gelman) [2021] Holes in Bayesian statistics. {\em Journal of Physics G: Nuclear and Particle Physics}. (Andrew Gelman and Yuling Yao) [2021] Reflections on Breiman’s Two Cultures of Statistical Modeling. {\em Observational Studies}. (Andrew Gelman) [2021] Bayesian statistics […]

**Miscellaneous Statistics**category.

## Retired computer science professor argues that decisions are being made by “algorithms that are mathematically incapable of bias.” What does this mean?

This came up in the comments, but not everyone reads the comments, so . . . Joseph recommended an op-ed entitled “We must stop militant liberals from politicizing artificial intelligence; ‘Debiasing’ algorithms actually means adding bias,” by retired computer science professor Pedro Domingos. The article begins: What do you do if decisions that used to […]

## You can figure out the approximate length of our blog lag now.

Sekhar Ramakrishnan writes: I wanted to relate an episode of informal probabilistic reasoning that occurred this morning, which I thought you might find entertaining. Jan 6th is the Christian feast day of the Epiphany, which is known as Dreikönigstag (Three Kings’ Day), here in Zürich, Switzerland, where I live (I work at ETH). There is […]

## 17 state attorney generals, 100 congressmembers, and the Association for Psychological Science walk into a bar

I don’t have much to add to all that’s been said about this horrible story. The statistics errors involved are pretty bad—actually commonplace in published scientific articles, but mistakes that seem recondite and technical in a paper about ESP, say, or beauty and sex ratio, become much clearer when the topic is something familiar such […]

## What are the most important statistical ideas of the past 50 years?

Aki and I wrote this article, doing our best to present a broad perspective. We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. These eight […]

## How to think about correlation? It’s the slope of the regression when x and y have been standardized.

Dave Balan writes: I am an economist at the Federal Trade Commission with a very basic statistics question, one that I have put to several fairly high-powered econometricians, and to which no one has had a satisfying answer. The question is this. Why are correlations meaningful? We know that they are ubiquitous, they get reported […]

## Is causality as explicit in fake data simulation as it should be?

Sander Greenland recently published a paper with a very clear and thoughtful exposition on why causality, logic and context need full consideration in any statistical analysis, even strictly descriptive or predictive analysis. For instance, in the concluding section – “Statistical science (as opposed to mathematical statistics) involves far more than data – it requires realistic […]

## Further formalization of the “multiverse” idea in statistical modeling

Cristobal Young and Sheridan Stewart write: Social scientists face a dual problem of model uncertainty and methodological abundance. . . . This ‘uncertainty among abundance’ offers spiraling opportunities to discover a statistically significant result. The problem is acute when models with significant results are published, while those with non-significant results go unmentioned. Multiverse analysis addresses […]

## Greek statistician is in trouble for . . . telling the truth!

Paul Alper points us to this news article by Catherine Rampell, which tells this story: Georgiou is not a mobster. He’s not a hit man or a spy. He’s a statistician. And the sin at the heart of his supposed crimes was publishing correct budget numbers. The government has brought a relentless series of criminal […]

## What went wrong with the polls in 2020? Another example.

Shortly before the election the New York Times ran this article, “The One Pollster in America Who Is Sure Trump Is Going to Win,” featuring Robert Cahaly, who on election day forecast Biden to win 235 electoral votes. As you may have heard, Biden actually won 306. Our Economist model gave a final prediction of […]

## The Pfizer-Biontech Vaccine May Be A Lot More Effective Than You Think?

Ian Fellows writes: I [Fellows] just wrote up a little Bayesian analysis that I thought you might be interested in. Specifically, everyone seems fixated on the 90% effectiveness lower bound reported for the Pfizer vaccine, but the true efficacy is likely closer to 97%. Please let me know if you see any errors. I’m basing […]

## Bayesian Workflow

Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Paul-Christian Bürkner, Lauren Kennedy, Jonah Gabry, Martin Modrák, and I write: The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and […]

## Concerns with our Economist election forecast

A few days ago we discussed some concerns with Fivethirtyeight’s election forecast. This got us thinking again about some concerns with our own forecast for The Economist (see here for more details). Here are some of our concerns with our forecast: 1. Distribution of the tails of the national vote forecast 2. Uncertainties of state […]

## “Valid t-ratio Inference for instrumental variables”

A couple people pointed me to this recent econometrics paper, which begins: In the single IV model, current practice relies on the first-stage F exceed- ing some threshold (e.g., 10) as a criterion for trusting t-ratio inferences, even though this yields an anti-conservative test. We show that a true 5 percent test instead requires an […]

## Body language and machine learning

Riding on the street, I can usually tell what cars in front of me are going to do, based on their “body language”: how they are positioning themselves in their lane. I don’t know that I could quite articulate what the rules are, but I can tell what’s going on, and I know that I […]

## Between-state correlations and weird conditional forecasts: the correlation depends on where you are in the distribution

Yup, here’s more on the topic, and this post won’t be the last, either . . . Jed Grabman writes: I was intrigued by the observations you made this summer about FiveThirtyEight’s handling of between-state correlations. I spent quite a bit of time looking into the topic and came to the following conclusions. In order […]

## Reference for the claim that you need 16 times as much data to estimate interactions as to estimate main effects

Ian Shrier writes: I read your post on the power of interactions a long time ago and couldn’t remember where I saw it. I just came across it again by chance. Have you ever published this in a journal? The concept comes up often enough and some readers who don’t have methodology expertise feel more […]

## She’s wary of the consensus based transparency checklist, and here’s a paragraph we should’ve added to that zillion-authored paper

Megan Higgs writes: A large collection of authors describes a “consensus-based transparency checklist” in the Dec 2, 2019 Comment in Nature Human Behavior. Hey—I’m one of those 80 authors! Let’s see what Higgs has to say: I [Higgs] have mixed emotions about it — the positive aspects are easy to see, but I also have […]

## Alexey Guzey plays Stat Detective: How many observations are in each bar of this graph?

How many data points are in each bar of the top graph above? (See here for background.) It’s from this article: Milewski MD, Skaggs DL, Bishop GA, Pace JL, Ibrahim DA, Wren TA, Barzdukas A. Chronic lack of sleep is associated with increased sports injuries in adolescent athletes. Journal of Pediatric Orthopaedics. 2014 Mar 1;34(2):129-33. […]

## Reasoning under uncertainty

John Cook writes, “statistics is all about reasoning under uncertainty.” I agree, and I think this is a good way to put it. Statistics textbooks sometimes describe statistics as “decision making under uncertainty,” but that always bothered me, because there’s very little about decision making in statistics textbooks. “Reasoning” captures it much more than “decision […]