Skip to content
Archive of posts filed under the Bayesian Statistics category.

Model building is Lego, not Playmobil. (toward understanding statistical workflow)

John Seabrook writes: Socrates . . . called writing “visible speech” . . . A more contemporary definition, developed by the linguist Linda Flower and the psychologist John Hayes, is “cognitive rhetoric”—thinking in words. In 1981, Flower and Hayes devised a theoretical model for the brain as it is engaged in writing, which they called […]

Conference on Mister P online tomorrow and Saturday, 3-4 Apr 2020

We have a conference on multilevel regression and poststratification (MRP) this Friday and Saturday, organized by Lauren Kennedy, Yajuan Si, and me. The conference was originally scheduled to be at Columbia but now it is online. Here is the information. If you want to join the conference, you must register for it ahead of time; […]

More coronavirus research: Using Stan to fit differential equation models in epidemiology

Seth Flaxman and others at Imperial College London are using Stan to model coronavirus progression; see here (and I’ve heard they plan to fix the horrible graphs!) and this Github page. They also pointed us to this article from December 2019, Contemporary statistical inference for infectious disease models using Stan, by Anastasia Chatzilena et al. […]

What can we learn from super-wide uncertainty intervals?

This question comes up a lot, in one form or another. Here’s a topical version, from Luigi Leone: I am writing after three weeks of lockdown. I would like to put to your attention this Imperial College report (issued on monday, I believe). The report estimates 9.8% of the Italian population (thus, 6 mil) and […]

“For the cost of running 96 wells you can test 960 people and accurate assess the prevalence in the population to within about 1%. Do this at 100 locations around the country and you’d have a spatial map of the extent of this epidemic today. . . and have this data by Monday.”

Daniel Lakeland writes: COVID-19 is tested for using real-time reverse-transcriptase PCR (rt-rt-PCR). This is basically just a fancy way of saying they are detecting the presence of the RNA by converting it to DNA and amplifying it. It has already been shown by people in Israel that you can combine material from at least 64 […]

My best thoughts on priors

My best thoughts on priors (also the thoughts of some other contributors) are at the Prior Choice Recommendations wiki. And this more theoretical paper should be helpful too. I sent these links in response to a question from Zach Branson about priors for Gaussian processes. Jim Savage also pointed to our paper on simulation-based calibration […]

Another Bayesian model of coronavirus progression

Jon Zelner writes: Just ran across this paper [Estimating unobserved SARS-CoV-2 infections in the United States, by T. Alex Perkins, Sean Cavany, Sean Moore, Rachel Oidtman, Anita Lerch, and Marya Poterek] which I think is worth signal-boosting. I [Jon] also think that the model in here could potentially be implemented in Stan (though it might […]

Hilda Bastian and John Ioannidis on coronavirus decision making; Jon Zelner on virus progression models

1. Hilda Bastian writes: Doing nothing for which there is no strong evidence is doing something: it’s withholding public health interventions that, on the balance of what we know, could save a lot of lives and trauma – including the lives of a lot of healthcare workers. Secondly, the need for societies to be able […]

Estimates of the severity of COVID-19 disease: another Bayesian model with poststratification

Following up on our discussions here and here of poststratified models of coronavirus risk, Jon Zelner writes: Here’s a paper [by Robert Verity et al.] that I think shows what could be done with an MRP approach. From the abstract: We used individual-case data from mainland China and cases detected outside mainland China to estimate […]

Prior predictive, posterior predictive, and cross-validation as graphical models

I just wrote up a bunch of chapters for the Stan user’s guide on prior predictive checks, posterior predictive checks, cross-validation, decision analysis, poststratification (with the obligatory multilevel regression up front), and even bootstrap (which has a surprisingly elegant formulation in Stan now that we have RNGs in trnasformed data). Andrew then urged me to […]

Coronavirus model update: Background, assumptions, and room for improvement

Julien Riou, coauthor of one of the models we discussed here, writes: Here is an overview of the current state of the project, so that it is easier for everyone to quickly grasp what is the potential room for improvement. Background on the epidemic: COVID-19 just passed 100,000 confirmed cases all over the world, and […]

Coronavirus age-specific fatality ratio, estimated using Stan, and (attempting) to account for underreporting of cases and the time delay to death. Now with data and code. And now a link to another paper (also with data and code).

Julien Riou writes: Stan epidemiologist here. We actually just released a preprint [estimating death rates of people infected with coronavirus, breaking down the population by age and then poststratifying] using Stan ( Crude estimates of case fatality ratio obtained by dividing observed deaths by observed cases are biased in two ways: 1) Deaths are underestimated […]

Conditioning on a statistical method as a “meta” version of conditioning on a statistical model

When I do applied statistics, I follow Bayesian workflow: Construct a model, ride it hard, assess its implications, add more information, and so on. I have lots of doubt in my models, but when I’m fitting any particular model, I condition on it. The idea is we take our models seriously as that’s the best […]

The Great Society, Reagan’s revolution, and generations of presidential voting

> Continuing our walk through the unpublished papers list: This one’s with Yair and Jonathan: We build a model of American presidential voting in which the cumulative impression left by political events determines the preferences of voters. This impression varies by voter, depending on their age at the time the events took place. We find […]

What can we do with complex numbers in Stan?

I’m wrapping up support for complex number types in the Stan math library. Now I’m wondering what we can do with complex numbers in statistical models. Functions operating in the complex domain The initial plan is to add some matrix functions that use complex numbers internally: fast fourier transforms asymmetric eigendecomposition Schur decomposition The eigendecomposition […]

How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis

I don’t have much to say about this one, as Shravan wrote pretty much all of it. It’s a study of how to apply our general advice to “accept uncertainty,” in a specific area of research in linguistics:

Deep learning workflow

Ido Rosen points us to this interesting and detailed post by Andrej Karpathy, “A Recipe for Training Neural Networks.” It reminds me a lot of various things that Bob Carpenter has said regarding the way that some fitting algorithms are often oversold because the presenters don’t explain the tuning that was required to get good […]

Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC

With Aki, Dan, Bob, and Paul: Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R-hat of Gelman and Rubin (1992) has serious flaws. R-hat will fail to correctly […]

Holes in Bayesian Statistics

With Yuling: Every philosophy has holes, and it is the responsibility of proponents of a philosophy to point out these problems. Here are a few holes in Bayesian data analysis: (1) the usual rules of conditional probability fail in the quantum realm, (2) flat or weak priors lead to terrible inferences about things we care […]

How good is the Bayes posterior for prediction really?

It might not be common courtesy of this blog to make comments on a very-recently-arxiv-ed paper. But I have seen two copies of this paper entitled “how good is the Bayes posterior in deep neural networks really” left on the tray of the department printer during the past weekend, so I cannot underestimate the popularity of […]