Skip to content
Archive of posts filed under the Multilevel Modeling category.

Thinking about election forecast uncertainty

Some twitter action Elliott Morris, my collaborator (with Merlin Heidemanns) on the Economist election forecast, pointed me to some thoughtful criticisms of our model from Nate Silver. There’s some discussion on twitter, but in general I don’t find twitter to be a good place for careful discussion, so I’m continuing the conversation here. Nate writes: […]

BMJ update: authors reply to our concerns (but I’m not persuaded)

Last week we discussed an article in the British Medical Journal that seemed seriously flawed to me, based on evidence such as the above graph. At the suggestion of Elizabeth Loder, I submitted a comment to the paper on the BMJ website. Here’s what I wrote: I am concerned that the model does not fit […]

Dispelling confusion about MRP (multilevel regression and poststratification) for survey analysis

A colleague pointed me to this post from political analyst Nate Silver: At the risk of restarting the MRP [multilevel regression and poststratification] wars: For the last 3 models I’ve designed (midterms, primaries, now revisiting stuff for the general) trying to impute how a state will vote based on its demographics & polls of voters […]

“To Change the World, Behavioral Intervention Research Will Need to Get Serious About Heterogeneity”

Beth Tipton, Chris Bryan, and David Yeager write: The increasing influence of behavioral science in policy has been a hallmark of the past decade, but so has a crisis of confidence in the replicability of behavioral science findings. In this essay, we describe a nascent paradigm shift in behavioral intervention research—a heterogeneity revolution—that we believe […]

The value of thinking about varying treatment effects: coronavirus example

Yesterday we discussed difficulties with the concept of average treatment effect. Part of designing a study is accounting for uncertainty in effect sizes. Unfortunately there is a tradition in clinical trials of making optimistic assumptions in order to claim high power. Here is an example that came up in March, 2020. A doctor was designing […]

“Why do the results of immigrant students depend so much on their country of origin and so little on their country of destination?”

Aleks points us to this article from 2011 by Julio Carabaña. Carabaña’s article has three parts. First is a methodological point that much can be learned from a cross-national study that has data at the level of individual students, as compared to the usual “various origins-one destination” design. Second is the empirical claim, based on […]

Resolving confusions over that study of “Teacher Effects on Student Achievement and Height”

Someone pointed me to this article by Marianne Bitler, Sean Corcoran, Thurston Domina, and Emily Penner, “Teacher Effects on Student Achievement and Height: A Cautionary Tale,” which begins: Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added […]

Improving our election poll aggregation model

Luke Mansillo saw our election poll aggregation model and writes: I had a look at the Stan code and I wondered if the model that you, Merlin Heidemanns, and Elliott Morris were implementing was not really Drew Linzer’s model but really Simon Jackman’s model. I realise that Linzer published Dynamic Bayesian Forecasting of Presidential Elections […]

Election 2020 is coming: Our poll aggregation model with Elliott Morris of the Economist

Here it is. The model is vaguely based on our past work on Bayesian combination of state polls and election forecasts but with some new twists. And, check it out: you can download our R and Stan source code and the data! Merlin Heidemanns wrote much of the code, which in turn is based on […]

Faster than ever before: Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal write: Gaussian latent variable models are a key class of Bayesian hierarchical models with applications in many fields. Performing Bayesian inference on such models can be challenging as Markov chain Monte Carlo algorithms struggle with the geometry of the resulting posterior distribution and can be prohibitively slow. […]

In Bayesian priors, why do we use soft rather than hard constraints?

Luiz Max Carvalho has a question about the prior distributions for hyperparameters in our paper, Bayesian analysis of tests with unknown specificity and sensitivity: My reply: 1. We recommend soft rather than hard constraints when we have soft rather than hard knowledge. In this case, we don’t absolutely know that spec and sens are greater […]

New report on coronavirus trends: “the epidemic is not under control in much of the US . . . factors modulating transmission such as rapid testing, contact tracing and behavioural precautions are crucial to offset the rise of transmission associated with loosening of social distancing . . .”

Juliette Unwin et al. write: We model the epidemics in the US at the state-level, using publicly available death data within a Bayesian hierarchical semi-mechanistic framework. For each state, we estimate the time-varying reproduction number (the average number of secondary infections caused by an infected person), the number of individuals that have been infected and […]

This one’s important: Designing clinical trials for coronavirus treatments and vaccines

I’ve had various thoughts regarding clinical trials for coronavirus treatments and vaccines, and then I came across thoughtful posts by Thomas Lumley and Joseph Delaney on vaccines. So let’s talk, first about treatments, then about vaccines. Clinical trials for treatments The first thing I want to say is that designing clinical trials is not just […]

Simple Bayesian analysis inference of coronavirus infection rate from the Stanford study in Santa Clara county

tl;dr: Their 95% interval for the infection rate, given the data available, is [0.7%, 1.8%]. My Bayesian interval is [0.3%, 2.4%]. Most of what makes my interval wider is the possibility that the specificity and sensitivity of the tests can vary across labs. To get a narrower interval, you’d need additional assumptions regarding the specificity […]

Updated Imperial College coronavirus model, including estimated effects on transmissibility of lockdown, social distancing, etc.

Seth Flaxman et al. have an updated version of their model of coronavirus progression. Flaxman writes: Countries with successful control strategies (for example, Greece) never got above small numbers thanks to early, drastic action. Or put another way: if we did China and showed % of population infected (or death rate), we’d erroneously conclude that […]

New analysis of excess coronavirus mortality; also a question about poststratification

Uros Seljak writes: You may be interested in our Gaussian Process counterfactual analysis of Italy mortality data that we just posted. Our results are in a strong disagreement with the Stanford seropositive paper that appeared on Friday. Their work was all over the news, but is completely misleading and needs to be countered: they claim […]

MRP with R and Stan; MRP with Python and Tensorflow

Lauren and Jonah wrote this case study which shows how to do Mister P in R using Stan. It’s a great case study: it’s not just the code for setting up and fitting the multilevel model, it also discusses the poststratification data, graphical exploration of the inferences, and alternative implementations of the model. Adam Haber […]

Conference on Mister P online tomorrow and Saturday, 3-4 Apr 2020

We have a conference on multilevel regression and poststratification (MRP) this Friday and Saturday, organized by Lauren Kennedy, Yajuan Si, and me. The conference was originally scheduled to be at Columbia but now it is online. Here is the information. If you want to join the conference, you must register for it ahead of time; […]

Fit nonlinear regressions in R using stan_nlmer

This comment from Ben reminded me that lots of people are running nonlinear regressions using least squares and other unstable methods of point estimation. You can do better, people! Try stan_nlmer, which fits nonlinear models and also allows parameters to vary by groups. I think people have the sense that maximum likelihood or least squares […]

“For the cost of running 96 wells you can test 960 people and accurate assess the prevalence in the population to within about 1%. Do this at 100 locations around the country and you’d have a spatial map of the extent of this epidemic today. . . and have this data by Monday.”

Daniel Lakeland writes: COVID-19 is tested for using real-time reverse-transcriptase PCR (rt-rt-PCR). This is basically just a fancy way of saying they are detecting the presence of the RNA by converting it to DNA and amplifying it. It has already been shown by people in Israel that you can combine material from at least 64 […]