Skip to content
Archive of posts filed under the Statistical computing category.

Concerns with our Economist election forecast

A few days ago we discussed some concerns with Fivethirtyeight’s election forecast. This got us thinking again about some concerns with our own forecast for The Economist (see here for more details). Here are some of our concerns with our forecast: 1. Distribution of the tails of the national vote forecast 2. Uncertainties of state […]

Interactive analysis needs theories of inference

Jessica Hullman and I wrote an article that begins, Computer science research has produced increasingly sophisticated software interfaces for interactive and exploratory analysis, optimized for easy pattern finding and data exposure. But assuming that identifying what’s in the data is the end goal of analysis misrepresents strong connections between exploratory and confirmatory analysis and contributes […]

Hiring at all levels at Flatiron Institute’s Center for Computational Mathematics

We’re hiring at all levels at my new academic home, the Center for Computational Mathematics (CCM) at the Flatiron Insitute in New York City. We’re going to start reviewing applications January 1, 2021. A lot of hiring We’re hoping to hire many people for each of the job ads. The plan is to grow CCM […]

“Model takes many hours to fit and chains don’t converge”: What to do? My advice on first steps.

The above question came up on the Stan forums, and I replied: Hi, just to give some generic advice here, I suggest simulating fake data from your model and then fitting the model and seeing if you can recover the parameters. Since it’s taking a long time to run, I suggest just running your 4 […]

Stan’s Within-Chain Parallelization now available with brms

The just released R package brms version 2.14.0 supports within-chain parallelization of Stan. This new functionality is based on the recently introduced reduce_sum function in Stan, which allows to evaluate sums over (conditionally) independent log-likelihood terms in parallel, using multiple CPU cores at the same time via threading. The idea of reduce_sum is to exploit […]

Everything that can be said can be said clearly.

The title as many may know, is a quote from Wittgenstein. It is one that has haunted me for many years. As a first year undergrad, I had mistakenly enrolled in a second year course that was almost entirely based on Wittgenstein’s  Tractatus. Alarmingly, the drop date had passed before I grasped I was supposed […]

From monthly return rate to importance sampling to path sampling to the second law of thermodynamics to metastable sampling in Stan

(This post is by Yuling, not Andrew, except many ideas are originated from Andrew.) This post is intended to advertise our new preprint Adaptive Path Sampling in Metastable Posterior Distributions  by Collin, Aki, Andrew and me, where we developed an automated implementation of path sampling and adaptive continuous tempering. But I have been recently reading a writing book […]

Parallel in Stan

by Andrew Gelman and Bob Carpenter We’ve been talking about some of the many many ways that parallel computing is, or could be used, in Stan. Here are a few: – Multiple chains (Stan runs 4 or 8 on my laptop automatically) – Hessians scale linearly in computation with dimension and are super useful. And […]

Bayesian Workflow (my talk this Wed at Criteo)

Wed 26 Aug 5pm Paris time (11am NY time): The workflow of applied Bayesian statistics includes not just inference but also model building, model checking, confidence-building using fake data, troubleshooting problems with computation, model understanding, and model comparison. We move toward codifying these steps in the realistic scenario in which we are fitting many models […]

Cmdstan 2.24.1 is released!

Rok writes: We are very happy to announce that the next release of Cmdstan (2.24.1) is now available on Github. You can find it here: https://github.com/stan-dev/cmdstan/releases/tag/v2.24.1 2 New features: A new ODE interface Functions for hidden Markov models with a discrete latent variable Elementwise pow operator and matrix power function Newton solver Support for the […]

epidemia: An R package for Bayesian epidemiological modeling

Jamie Scott writes: I am a PhD candidate at Imperial College, and have been working with colleagues here to write an R package for fitting Bayesian epidemiological models using Stan. We thought this might interest readers of your blog, as it is based on work previously featured there. The package is similar in spirit to […]

More on absolute error vs. relative error in Monte Carlo methods

This came up again in a discussion from someone asking if we can use Stan to evaluate arbitrary integrals. The integral I was given was the following: where the -ball is assumed to be in dimensions so that . (MC)MC approach The textbook Monte Carlo approach (Markov chain or plain old) to evaluating such an […]

Regression and Other Stories translated into Python!

Ravin Kumar writes in with some great news: As readers of this blog likely know Andrew Gelman, Jennifer Hill, and Aki Vehtari have recently published a new book, Regression and Other Stories. What readers likely don’t know is that there is an active effort to translate the code examples written in R and the rstanarm […]

StanCon 2020 is on Thursday!

For all that registered for the conference, THANK YOU! We, the organizers, are truly moved by how global and inclusive the community has become. We are currently at 230 registrants from 33 countries. And 25 scholarships were provided to people in 12 countries. Please join us. Registration is $50. We have scholarships still available (more […]

Somethings do not seem to spread easily – the role of simulation in statistical practice and perhaps theory.

Unlike Covid19, somethings don’t seem to spread easily and the role of simulation in statistical practice (and perhaps theory) may well be one of those. In a recent comment, Andrew provided a link to an interview about the new book Regression and Other Stories by Aki Vehtari, Andrew Gelman, and Jennifer Hill. An interview that covered […]

The typical set and its relevance to Bayesian computation

[Note: The technical discussion w.r.t. Stan is continuing on the Stan forums.] tl;dr The typical set (at some level of coverage) is the set of parameter values for which the log density (the target function) is close to its expected value. As has been much discussed, this is not the same as the posterior mode. […]

Ugly code is buggy code

People have been (correctly) mocking my 1990s-style code. They’re right to mock! My coding style works for me, kinda, but it does have problems. Here’s an example from a little project I’m working on right now. I was motivated to write this partly as a followup to Bob’s post yesterday about coding practices. I fit […]

Drunk-under-the-lamppost testing

Edit: Glancing over this again, it struck me that the title may be interpreted as being mean. Sorry about that. It wasn’t my intent. I was trying to be constructive and I really like that analogy. The original post is mostly reasonable other than on this one point that I thought was important to call […]

Shortest posterior intervals

By default we use central posterior intervals. For example, the central 95% interval is the (2.5%, 97.5%) quantiles. But sometimes the central interval doesn’t seem right. This came up recently with a coronavirus testing example, where the posterior distribution for the parameter of interest was asymmetric so that the central interval is not such a […]

Validating Bayesian model comparison using fake data

A neuroscience graduate student named James writes in with a question regarding validating Bayesian model comparison using synthetic data: I [James] perform an experiment and collect real data. I want to determine which of 2 candidate models best accounts for the data. I perform (approximate) Bayesian model comparison (e.g., using BIC – not ideal I […]