Skip to content
Archive of posts filed under the Statistical computing category.

The typical set and its relevance to Bayesian computation

tl;dr The typical set (at some level of coverage) is the set of parameter values for which the log density (the target function) is close to its expected value. As has been much discussed, this is not the same as the posterior mode. In a d-dimensional unit normal distribution with a high value of d, […]

Ugly code is buggy code

People have been (correctly) mocking my 1990s-style code. They’re right to mock! My coding style works for me, kinda, but it does have problems. Here’s an example from a little project I’m working on right now. I was motivated to write this partly as a followup to Bob’s post yesterday about coding practices. I fit […]

Drunk-under-the-lamppost testing

Edit: Glancing over this again, it struck me that the title may be interpreted as being mean. Sorry about that. It wasn’t my intent. I was trying to be constructive and I really like that analogy. The original post is mostly reasonable other than on this one point that I thought was important to call […]

Shortest posterior intervals

By default we use central posterior intervals. For example, the central 95% interval is the (2.5%, 97.5%) quantiles. But sometimes the central interval doesn’t seem right. This came up recently with a coronavirus testing example, where the posterior distribution for the parameter of interest was asymmetric so that the central interval is not such a […]

Validating Bayesian model comparison using fake data

A neuroscience graduate student named James writes in with a question regarding validating Bayesian model comparison using synthetic data: I [James] perform an experiment and collect real data. I want to determine which of 2 candidate models best accounts for the data. I perform (approximate) Bayesian model comparison (e.g., using BIC – not ideal I […]

Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors

Yuling, Aki, and I write: When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms can have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the […]

Aki’s talk about reference models in model selection in Laplace’s demon series

I (Aki) talk about reference models in model selection in Laplace’s demon series 24 June 15UTC (Finland 18, Paris 17, New York 11). See the seminar series website for a registration link, the schedule for other talks, and the list of the recorded talks. The short summary: 1) Why a bigger model helps inference for […]

“Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale” and my answers to their questions

Here’s the description of the online seminar series: Machine learning is changing the world we live in at a break neck pace. From image recognition and generation, to the deployment of recommender systems, it seems to be breaking new ground constantly and influencing almost every aspect of our lives. In ths seminar series we ask […]

Challenges to the Reproducibility of Machine Learning Models in Health Care; also a brief discussion about not overrating randomized clinical trials

Mark Tuttle pointed me to this article by Andrew Beam, Arjun Manrai, and Marzyeh Ghassemi, Challenges to the Reproducibility of Machine Learning Models in Health Care, which appeared in the Journal of the American Medical Association. Beam et al. write: Reproducibility has been an important and intensely debated topic in science and medicine for the […]

Faster than ever before: Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal write: Gaussian latent variable models are a key class of Bayesian hierarchical models with applications in many fields. Performing Bayesian inference on such models can be challenging as Markov chain Monte Carlo algorithms struggle with the geometry of the resulting posterior distribution and can be prohibitively slow. […]

Super-duper online matrix derivative calculator vs. the matrix normal (for Stan)

I’m implementing the matrix normal distribution for Stan, which provides a multivariate density for a matrix with covariance factored into row and column covariances. The motivation A new colleague of mine at Flatiron’s Center for Comp Bio, Jamie Morton, is using the matrix normal to model the ocean biome. A few years ago, folks in […]

Statistical Workflow and the Fractal Nature of Scientific Revolutions (my talk this Wed at the Santa Fe Institute)

Wed 3 June 2020 at 12:15pm U.S. Mountain time: Statistical Workflow and the Fractal Nature of Scientific Revolutions How would an A.I. do statistics? Fitting a model is the easy part. The other steps of workflow—model building, checking, and revision—are not so clearly algorithmic. It could be fruitful to simultaneously think about automated inference and […]

Stan pedantic mode

This used to be on the Stan wiki but that page got reorganized so I’m putting it here. Blog is not as good as wiki for this purpose: you can add comments but you can’t edit. But better blog than nothing, so here it is. I wrote this a couple years ago and it was […]

It’s “a single arena-based heap allocation” . . . whatever that is!

After getting 80 zillion comments on that last post with all that political content, I wanted to share something that’s purely technical. It’s something Bob Carpenter wrote in a conversation regarding implementing algorithms in Stan: One thing we are doing is having the matrix library return more expression templates rather than copying on return as […]

Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale

David Rohde points us to this new seminar series that has the following description: Machine learning is changing the world we live in at a break neck pace. From image recognition and generation, to the deployment of recommender systems, it seems to be breaking new ground constantly and influencing almost every aspect of our lives. […]

We need better default plots for regression.

Robin Lee writes: To check for linearity and homoscedasticity, we are taught to plot residuals against y fitted value in many statistics classes. However, plotting residuals against y fitted value has always been a confusing practice that I know that I should use but can’t quite explain why. It is not until this week I […]

New Within-Chain Parallelisation in Stan 2.23: This One‘s Easy for Everyone!

What’s new? The new and shiny reduce_sum facility released with Stan 2.23 is far more user-friendly and makes it easier to scale Stan programs with more CPU cores than it was before. While Stan is awesome for writing models, as the size of the data or complexity of the model increases it can become impractical […]

Bayesian analysis of Santa Clara study: Run it yourself in Google Collab, play around with the model, etc!

The other day we posted some Stan models of coronavirus infection rate from the Stanford study in Santa Clara county. The Bayesian setup worked well because it allowed us to directly incorporate uncertainty in the specificity, sensitivity, and underlying infection rate. Mitzi Morris put all this in a Google Collab notebook so you can run […]

MRP with R and Stan; MRP with Python and Tensorflow

Lauren and Jonah wrote this case study which shows how to do Mister P in R using Stan. It’s a great case study: it’s not just the code for setting up and fitting the multilevel model, it also discusses the poststratification data, graphical exploration of the inferences, and alternative implementations of the model. Adam Haber […]

Webinar on approximate Bayesian computation

X points us to this online seminar series which is starting this Thursday! Some speakers and titles of talks are listed. I just wish I could click on the titles and see the abstracts and papers! The seminar is at the University of Warwick in England, which is not so convenient—I seem to recall that […]