I see five problems here that together form a feedback loop with bad consequences. Here are the problems: 1. Irrelevant or misunderstood statistical or econometric theory; 2. Poorly-executed research; 3. Other people in the field being loath to criticize, taking published or even preprinted claims as correct until proved otherwise; 4. Journalists taking published or […]

## Rob Tibshirani, Yuling Yao, and Aki Vehtari on cross validation

Rob Tibshirani writes: About 9 years ago I emailed you about our new significance result for the lasso. You wrote about in your blog. For some reason I never saw that full blog until now. I do remember the Stanford-Berkeley Seminar in 1994 where I first presented the lasso and you asked that question. Anyway, […]

## What’s the biggest mistake revealed by this table? A puzzle:

This came up in our discussion the other day: It’s a table comparing averages for treatment and control groups in an experiment. There’s one big problem here (summarizing differences by p-values) and some little problems, such as reporting values to ridiculous precision (who cares if something has an average of “346.57” when its standard deviation […]

## Is he … you know…?

Today I learnt, via Sam Power from Bristol, that the legendary IJ Good and the possibly legendary (I really don’t know) RA Gaskins suggested, in their 1971 paper on density estimation, referring to the roughness penalty from density estimation (or non-linear regression) as the flamboyance1 functional. And it would be a crime if we, as a field, […]

## Richard Hamming’s “The Art of Doing Science and Engineering”

I bought this charming book and started flipping through and reading bits here and there. It has a real mid-twentieth-century feel, reminiscent of Richard Feynman, Martin Gardner, and Hugo Steinhaus. It gives me some nostalgia, thinking about a time when it was expected that students could do all sorts of math—it kinda made me wish […]

## Whassup with the haphazard coronavirus statistics?

Peter Dorman writes: This piece by Robinson Meyer and Alexis Madrigal on the inadequacy of Covid data is useful but frustrating. I think they could have dispensed with the self-puffery, rhetoric and sweeping generalizations and been more detailed about data issues. Nevertheless the core point is one that you and others have stressed, that too […]

## No, I don’t like talk of false positive false negative etc but it can still be useful to warn people about systematic biases in meta-analysis

Simon Gates writes: Something published recently that you might consider blogging: a truly terrible article in Lancet Oncology. It raises the issue of interpreting trials of similar agents and the issue of multiplicity. However, it takes a “dichotomaniac” view and so is only concerned about whether results are “significant” (=”positive”) or not, and suggests applying […]

## What is the landscape of uncertainty outside the clinical trial’s methods?

I live in the province of British Columbia in the country of Canada (right, this post is not by Andrew, it is by Lizzie). Recently one of our top provincial health officials, Dr. Bonnie Henry, has received extra scrutiny based on her decision to delay second doses of the vaccine. The general argument against this […]

## Bayesian methods and what they offer compared to classical econometrics

A well-known economist who wishes to remain anonymous writes: Can you write about this agent? He’s getting exponentially big on Twitter. The link is to an econometrician, Jeffrey Wooldridge, who writes: Many useful procedures—shrinkage, for example—can be derived from a Bayesian perspective. But those estimators can be studied from a frequentist perspective, and no strong […]

## My talk’s on April Fool’s but it’s not actually a joke

For the Boston chapter of the American Statistical Association, I’ll be speaking on this paper with Aki: What are the most important statistical ideas of the past 50 years? We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel […]

## Alan Sokal on exponential growth and coronavirus rebound

Alan Sokal writes: Last week Prime Minister Boris Johnson assured Britons that, come 21 June—at least, if all goes according to plan—we will “re-open everything up to and including nightclubs, and enable large events such as theatre performances.” Life will return to normal, or so he says. Alas, Johnson is fooling himself, and it takes […]

## A new approach to pandemic control by informing people of their social distance from exposure

Po-Shen Lo, a mathematician who works in graph theory, writes about a new approach he devised for pandemic control. He writes: The significance of this new approach is potentially very high, because it not only can improve the current situation, but it would permanently add a new orthogonal tool to the toolbox for pandemic control, […]

## Multivariate missing data software update

Ranjit Lall writes: In 2018 you posted about some machine learning-based multiple imputation software I was developing that works particularly well with large and complex datasets. The software is now available as a package in both Python (MIDASpy) and R (rMIDAS), and a paper describing the underlying method was just published online in Political Analysis […]

## Science reform can get so personal

This is Jessica. Lately I’ve been thinking a lot about philosophy of science, motivated by both a longtime interest in methodological reform in the social sciences and a more recent interest in proposed ethics problems and reforms in computer science. The observation I want to share is not intended to support any particular stance, but […]

## Statisticians don’t use statistical evidence to decide what statistical methods to use. Also, The Way of the Physicist.

David Bailey, a physicist at the University of Toronto, writes: I thought you’d be pleased to hear that a student in our Advanced Physics Lab spontaneously used Stan to analyze data with significant uncertainties in both x and y. We’d normally expect students to use python and orthogonal distance regression, and STAN is never mentioned […]

## Is sqrt(2) a normal number?

In a paper from 2018, Pierpaolo Uberti writes: In this paper we study the property of normality of a number in base 2. A simple rule that associates a vector to a number is presented and the property of normality is stated for the vector associated to the number. The problem of testing a number […]

## Jordana Cepelewicz on “The Hard Lessons of Modeling the Coronavirus Pandemic”

Here’s a long and thoughtful article on issues that have come up with Covid modeling. Jordana Cepelewicz. 2021. The Hard Lessons of Modeling the Coronavirus Pandemic. Quanta. Jordana’s a staff writer for Quanta, a popular science magazine funded by the Simons Foundation, which also funds the Flatiron Institute, where I now work. She’s a science […]

## “Smell the Data”

Mike Maltz writes the following on ethnography and statistics: I got interested in ethnographic studies because of a concern for people analyzing data without an understanding of its origins and the way it was collected. An ethnographer collects stories, and too many statisticians disparage them, calling them “anecdotes” instead of real data. But stories are […]

## COVID and Vitamin D…and some other things too.

This post is by Phil Price, not Andrew. Way back in November I started writing a post about my Vitamin D experience. My doctor says I need more, in spite of the fact that I spend lots of time outdoors in the sun. I looked into the research and concluded that nobody really knows how […]

## Scaling regression inputs by dividing by two standard deviations

I just had reason to reread this article from 2009, and I think it holds up just fine! Just to emphasize, I’m not saying you have to scale predictors by dividing by two standard deviations, nor am I even saying that you should do this scaling. I’m just saying that this scaling is a useful […]