Ranjit Lall writes: In 2018 you posted about some machine learning-based multiple imputation software I was developing that works particularly well with large and complex datasets. The software is now available as a package in both Python (MIDASpy) and R (rMIDAS), and a paper describing the underlying method was just published online in Political Analysis […]

**Miscellaneous Statistics**category.

## Science reform can get so personal

This is Jessica. Lately I’ve been thinking a lot about philosophy of science, motivated by both a longtime interest in methodological reform in the social sciences and a more recent interest in proposed ethics problems and reforms in computer science. The observation I want to share is not intended to support any particular stance, but […]

## Statisticians don’t use statistical evidence to decide what statistical methods to use. Also, The Way of the Physicist.

David Bailey, a physicist at the University of Toronto, writes: I thought you’d be pleased to hear that a student in our Advanced Physics Lab spontaneously used Stan to analyze data with significant uncertainties in both x and y. We’d normally expect students to use python and orthogonal distance regression, and STAN is never mentioned […]

## Is sqrt(2) a normal number?

In a paper from 2018, Pierpaolo Uberti writes: In this paper we study the property of normality of a number in base 2. A simple rule that associates a vector to a number is presented and the property of normality is stated for the vector associated to the number. The problem of testing a number […]

## Jordana Cepelewicz on “The Hard Lessons of Modeling the Coronavirus Pandemic”

Here’s a long and thoughtful article on issues that have come up with Covid modeling. Jordana Cepelewicz. 2021. The Hard Lessons of Modeling the Coronavirus Pandemic. Quanta. Jordana’s a staff writer for Quanta, a popular science magazine funded by the Simons Foundation, which also funds the Flatiron Institute, where I now work. She’s a science […]

## “Smell the Data”

Mike Maltz writes the following on ethnography and statistics: I got interested in ethnographic studies because of a concern for people analyzing data without an understanding of its origins and the way it was collected. An ethnographer collects stories, and too many statisticians disparage them, calling them “anecdotes” instead of real data. But stories are […]

## COVID and Vitamin D…and some other things too.

This post is by Phil Price, not Andrew. Way back in November I started writing a post about my Vitamin D experience. My doctor says I need more, in spite of the fact that I spend lots of time outdoors in the sun. I looked into the research and concluded that nobody really knows how […]

## Scaling regression inputs by dividing by two standard deviations

I just had reason to reread this article from 2009, and I think it holds up just fine! Just to emphasize, I’m not saying you have to scale predictors by dividing by two standard deviations, nor am I even saying that you should do this scaling. I’m just saying that this scaling is a useful […]

## “Maybe we should’ve called it Arianna”

Katie Hafner wrote this obituary of Arianna Rosenbluth, original programmer of what is known as the Metropolis algorithm: Arianna Rosenbluth Dies at 93; Pioneering Figure in Data Science Dr. Rosenbluth, who received her physics Ph.D. at 21, helped create an algorithm that has became a foundation of understanding huge quantities of data. She died of […]

## The “story time” is to lull us in with a randomized controlled experiment and as we fall asleep, feed us less reliable conclusions that come from an embedded observational study.

Kaiser Fung explains. This comes up a lot, and his formulation in the above title is a good way of putting it. He also has this discussion of the AstraZeneca-Oxford vaccine trial results which makes me want to just do a damn Bayesian analysis of it already. I’ll have to find someone with the right […]

## One more cartoon like this, and this blog will be obsolete.

This post is by Phil. This SMBC cartoon seems to wrap up about half of the content of this blog. Of course I’m exaggerating. There will still be room for book reviews and cat photos.

## There is only one reality (and we cannot demand consistency from any other)

I bought The Shadow of the Torturer when it came out in paperback, I guess in response to a positive review. I found it kinda difficult to read, but I wanted to know what would happen next, so I bought volumes 2, 3, and 4 when they came out too. By the time I was […]

## My reply: Three words. Fake. Data. Simulation.

Kash Ramli writes: I am planning on running an experiment to determine whether an adaptive treatment approach to behaviour change interventions could be effective at reducing the heterogenous treatment effects currently observed in the field. The context of the experiment is providing households with social norms based feedback of their consumption (i.e. comparing your consumption […]

## New textbook, “Statistics for Health Data Science,” by Etzioni, Mandel, and Gulati

Ruth Etzioni, Micha Mandel, Roman Gulati wrote a new book that I really like. Here are the chapters: 1 Statistics and Health Data 1.1 Introduction 1.2 Statistics and Organic Statistics 1.3 Statistical Methods and Models 1.4 Health Care Data 1.5 Outline of the Text 1.6 Software and Data 2 Key Statistical Concepts 2.1 Samples and […]

## You’re a data scientist at a local hospital and you’ve been asked to present to the physicians on communicating statistical information to patients. What should you say?

Someone who wishes to remain anonymous writes: I just read your post reflecting on crappy talks . . . I’m reaching out because I’m a data scientist at a local hospital in the US and I’ve been asked to present to our physicians about communicating statistical information to patients (e.g., how to interpret the results […]

## Include all design information as predictors in your regression model, then postratify if necessary. No need to include survey weights: the information that goes into the weights will be used in any poststratification that is done.

David Kaplan writes: I have a question that comes up often when working with people who are analyzing large scale educational assessments such as NAEP or PISA. They want to do some kind of multilevel analysis of an achievement outcome such as mathematics ability predicted by individual and school level variables. The files contain the […]

## Weakliem on air rage and himmicanes

Weakliem writes: I think I see where the [air rage] analysis went wrong. The dependent variable was whether or not an “air rage” incident happened on the flight. Two important influences on the chance of an incident are the number of passengers and how long the flight was (their data apparently don’t include the number […]

## What is/are bad data?

This post is by Lizzie, I also took the picture of the cats. I was talking to a colleague about a recent paper, which has some issues, but I was a bit surprised by her response that one of the real issues was that it ‘just uses bad data.’ I snapped back reflexively, ‘it’s not […]

## Typo of the day

“Poststratifiction”

## What we did in 2020, and thanks to all our collaborators and many more

Published or to be published articles: [2021] Reflections on Lakatos’s “Proofs and Refutations.” {\em American Mathematical Monthly}. (Andrew Gelman) [2021] Holes in Bayesian statistics. {\em Journal of Physics G: Nuclear and Particle Physics}. (Andrew Gelman and Yuling Yao) [2021] Reflections on Breiman’s Two Cultures of Statistical Modeling. {\em Observational Studies}. (Andrew Gelman) [2021] Bayesian statistics […]