This came up in comments recently so I thought I’d clarify the point. Mister P is MRP, multilevel regression and poststratification. The idea goes like this: 1. You want to adjust for differences between sample and population. Let y be your outcome of interest and X be your demographic and geographic variables you’d like to […]

**regulariz**

## No tradeoff between regularization and discovery

We had a couple recent discussions regarding questionable claims based on p-values extracted from forking paths, and in both cases (a study “trying large numbers of combinations of otherwise-unused drugs against a large number of untreatable illnesses,” and a salami-slicing exercise looking for public opinion changes in subgroups of the population), I recommended fitting a […]

## Sparse regression using the “ponyshoe” (regularized horseshoe) model, from Juho Piironen and Aki Vehtari

The article is called “Sparsity information and regularization in the horseshoe and other shrinkage priors,” and here’s the abstract: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage […]

## Avoiding boundary estimates using a prior distribution as regularization

For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, […]

## A Fast Hybrid Algorithm for Large Scale L1-Regularized Logistic Regression

Aleks (I think) sent me this link to a paper by Jianiang Shi, Wotao Yin, Stanley Osher, and Paul Sajda. They report impressive computational improvements based on “a novel hybrid algorithm based on combining two types of optimization iterations: one being very fast and memory friendly while the other being slower but more accurate.” This […]

## Default informative priors for effect sizes: Where do they come from?

To coincide with the publication of our article, A Proposal for Informative Default Priors Scaled by the Standard Error of Estimates, Erik van Zwet sends along an explainer. Here’s Erik: 1 Set-up This note is meant as a quick explainer of a set of three pre-prints at The Shrinkage Trilogy. All three have the same […]

## Top 10 Ideas in Statistics That Have Powered the AI Revolution

Aki and I put together this listsicle to accompany our recent paper on the most important statistical ideas of the top 50 years. Kim Martineau at Columbia, who suggested making this list, also had the idea that youall might have suggestions for other important articles and books; tweet your thoughts at @columbiascience of put them […]

## State-level predictors in MRP and Bayesian prior

Something came up in comments today that I’d like to follow up on. In our earlier post, I brought up an example: If you’re modeling attitudes about gun control, think hard about what state-level predictors to include. My colleagues and I thought about this a bunch of years ago when doing MRP for gun-control attitudes. […]

## My talk’s on April Fool’s but it’s not actually a joke

For the Boston chapter of the American Statistical Association, I’ll be speaking on this paper with Aki: What are the most important statistical ideas of the past 50 years? We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel […]

## Hierarchical stacking

(This post is by Yuling) Gregor Pirš, Aki, Andrew, and I wrote: Stacking is a widely used model averaging technique that yields asymptotically optimal predictions among linear averages. We show that stacking is most effective when the model predictive performance is heterogeneous in inputs, so that we can further improve the stacked mixture by a […]

## New textbook, “Statistics for Health Data Science,” by Etzioni, Mandel, and Gulati

Ruth Etzioni, Micha Mandel, Roman Gulati wrote a new book that I really like. Here are the chapters: 1 Statistics and Health Data 1.1 Introduction 1.2 Statistics and Organic Statistics 1.3 Statistical Methods and Models 1.4 Health Care Data 1.5 Outline of the Text 1.6 Software and Data 2 Key Statistical Concepts 2.1 Samples and […]

## “They adjusted for three hundred confounders.”

Alexey Guzey points to this post by Scott Alexander and this research article by Elisabetta Patorno, Robert Glynn, Raisa Levin, Moa Lee, and Krista Huybrechts, and writes: I [Guzey] am extremely skeptical of anything that relies on adjusting for confounders and have no idea what to think about this. My intuition would be that because […]

## Debate involving a bad analysis of GRE scores

This is one of these academic ping-pong stories of a general opinion, an article that challenges the general opinion, a rebuttal to that article, a rebuttal to the rebuttal, etc. I’ll label the positions as A1, B1, A2, B2, and so forth: A1: The starting point is that Ph.D. programs in the United States typically […]

## “I Can’t Believe It’s Not Better”

Check out this session Saturday at Neurips. It’s a great idea, to ask people to speak on methods that didn’t work. I have a lot of experience with that! Here are the talks: Max Welling: The LIAR (Learning with Interval Arithmetic Regularization) is Dead Danielle Belgrave: Machine Learning for Personalised Healthcare: Why is it not […]

## What are the most important statistical ideas of the past 50 years?

Aki and I wrote this article, doing our best to present a broad perspective. We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. These eight […]

## Thinking about election forecast uncertainty

Some twitter action Elliott Morris, my collaborator (with Merlin Heidemanns) on the Economist election forecast, pointed me to some thoughtful criticisms of our model from Nate Silver. There’s some discussion on twitter, but in general I don’t find twitter to be a good place for careful discussion, so I’m continuing the conversation here. Nate writes: […]

## The flashy crooks get the headlines, but the bigger problem is everyday routine bad science done by non-crooks

In the immortal words of Michael Kinsley, the real scandal isn’t what’s illegal, the scandal is what’s legal. I was reminded of this principle after seeing this news article about the discredited Surgisphere doctor (see here for background). The news article was fine—it’s good to learn these things—but, as with pizzagate, evilicious, and other science […]

## What can be our goals, and what is too much to hope for, regarding robust statistical procedures?

Gael Varoquaux writes: Even for science and medical applications, I am becoming weary of fine statistical modeling efforts, and believe that we should standardize on a handful of powerful and robust methods. First, analytic variability is a killer, e.g. in “standard” analysis for brain mapping, for machine learning in brain imaging, or more generally in […]

## bla bla bla PEER REVIEW bla bla bla

OK, I’ve been saying this over the phone to a bunch of journalists during the past month so I might as well share it with all of you . . . 1. The peers . . . The problem with peer review is the peers. Who are “the peers” of four M.D.’s writing up an […]

## This controversial hydroxychloroquine paper: What’s Lancet gonna do about it?

Peer review is not a form of quality control In the past month there’s been a lot of discussion of the flawed Stanford study of coronavirus prevalence—it’s even hit the news—and one thing came up was that the article under discussion was just a preprint—it wasn’t even peer reviewed! For example, in a NYT op-ed: […]