Someone who wishes to remain anonymous writes: I just read your post reflecting on crappy talks . . . I’m reaching out because I’m a data scientist at a local hospital in the US and I’ve been asked to present to our physicians about communicating statistical information to patients (e.g., how to interpret the results […]

**Miscellaneous Statistics**category.

## Include all design information as predictors in your regression model, then postratify if necessary. No need to include survey weights: the information that goes into the weights will be used in any poststratification that is done.

David Kaplan writes: I have a question that comes up often when working with people who are analyzing large scale educational assessments such as NAEP or PISA. They want to do some kind of multilevel analysis of an achievement outcome such as mathematics ability predicted by individual and school level variables. The files contain the […]

## Weakliem on air rage and himmicanes

Weakliem writes: I think I see where the [air rage] analysis went wrong. The dependent variable was whether or not an “air rage” incident happened on the flight. Two important influences on the chance of an incident are the number of passengers and how long the flight was (their data apparently don’t include the number […]

## What is/are bad data?

This post is by Lizzie, I also took the picture of the cats. I was talking to a colleague about a recent paper, which has some issues, but I was a bit surprised by her response that one of the real issues was that it ‘just uses bad data.’ I snapped back reflexively, ‘it’s not […]

## Typo of the day

“Poststratifiction”

## What we did in 2020, and thanks to all our collaborators and many more

Published or to be published articles: [2021] Reflections on Lakatos’s “Proofs and Refutations.” {\em American Mathematical Monthly}. (Andrew Gelman) [2021] Holes in Bayesian statistics. {\em Journal of Physics G: Nuclear and Particle Physics}. (Andrew Gelman and Yuling Yao) [2021] Reflections on Breiman’s Two Cultures of Statistical Modeling. {\em Observational Studies}. (Andrew Gelman) [2021] Bayesian statistics […]

## Retired computer science professor argues that decisions are being made by “algorithms that are mathematically incapable of bias.” What does this mean?

This came up in the comments, but not everyone reads the comments, so . . . Joseph recommended an op-ed entitled “We must stop militant liberals from politicizing artificial intelligence; ‘Debiasing’ algorithms actually means adding bias,” by retired computer science professor Pedro Domingos. The article begins: What do you do if decisions that used to […]

## You can figure out the approximate length of our blog lag now.

Sekhar Ramakrishnan writes: I wanted to relate an episode of informal probabilistic reasoning that occurred this morning, which I thought you might find entertaining. Jan 6th is the Christian feast day of the Epiphany, which is known as Dreikönigstag (Three Kings’ Day), here in Zürich, Switzerland, where I live (I work at ETH). There is […]

## 17 state attorney generals, 100 congressmembers, and the Association for Psychological Science walk into a bar

I don’t have much to add to all that’s been said about this horrible story. The statistics errors involved are pretty bad—actually commonplace in published scientific articles, but mistakes that seem recondite and technical in a paper about ESP, say, or beauty and sex ratio, become much clearer when the topic is something familiar such […]

## What are the most important statistical ideas of the past 50 years?

Aki and I wrote this article, doing our best to present a broad perspective. We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. These eight […]

## How to think about correlation? It’s the slope of the regression when x and y have been standardized.

Dave Balan writes: I am an economist at the Federal Trade Commission with a very basic statistics question, one that I have put to several fairly high-powered econometricians, and to which no one has had a satisfying answer. The question is this. Why are correlations meaningful? We know that they are ubiquitous, they get reported […]

## Is causality as explicit in fake data simulation as it should be?

Sander Greenland recently published a paper with a very clear and thoughtful exposition on why causality, logic and context need full consideration in any statistical analysis, even strictly descriptive or predictive analysis. For instance, in the concluding section – “Statistical science (as opposed to mathematical statistics) involves far more than data – it requires realistic […]

## Further formalization of the “multiverse” idea in statistical modeling

Cristobal Young and Sheridan Stewart write: Social scientists face a dual problem of model uncertainty and methodological abundance. . . . This ‘uncertainty among abundance’ offers spiraling opportunities to discover a statistically significant result. The problem is acute when models with significant results are published, while those with non-significant results go unmentioned. Multiverse analysis addresses […]

## Greek statistician is in trouble for . . . telling the truth!

Paul Alper points us to this news article by Catherine Rampell, which tells this story: Georgiou is not a mobster. He’s not a hit man or a spy. He’s a statistician. And the sin at the heart of his supposed crimes was publishing correct budget numbers. The government has brought a relentless series of criminal […]

## What went wrong with the polls in 2020? Another example.

Shortly before the election the New York Times ran this article, “The One Pollster in America Who Is Sure Trump Is Going to Win,” featuring Robert Cahaly, who on election day forecast Biden to win 235 electoral votes. As you may have heard, Biden actually won 306. Our Economist model gave a final prediction of […]

## The Pfizer-Biontech Vaccine May Be A Lot More Effective Than You Think?

Ian Fellows writes: I [Fellows] just wrote up a little Bayesian analysis that I thought you might be interested in. Specifically, everyone seems fixated on the 90% effectiveness lower bound reported for the Pfizer vaccine, but the true efficacy is likely closer to 97%. Please let me know if you see any errors. I’m basing […]

## Bayesian Workflow

Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Paul-Christian Bürkner, Lauren Kennedy, Jonah Gabry, Martin Modrák, and I write: The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and […]

## Concerns with our Economist election forecast

A few days ago we discussed some concerns with Fivethirtyeight’s election forecast. This got us thinking again about some concerns with our own forecast for The Economist (see here for more details). Here are some of our concerns with our forecast: 1. Distribution of the tails of the national vote forecast 2. Uncertainties of state […]

## “Valid t-ratio Inference for instrumental variables”

A couple people pointed me to this recent econometrics paper, which begins: In the single IV model, current practice relies on the first-stage F exceed- ing some threshold (e.g., 10) as a criterion for trusting t-ratio inferences, even though this yields an anti-conservative test. We show that a true 5 percent test instead requires an […]

## Body language and machine learning

Riding on the street, I can usually tell what cars in front of me are going to do, based on their “body language”: how they are positioning themselves in their lane. I don’t know that I could quite articulate what the rules are, but I can tell what’s going on, and I know that I […]