Reviews of our Bayesian Workflow book from Bin Yu, David Spiegelhalter, Brad Efron, Christian Robert, Rohan Alexander, and Mine Doğucu!

Posted on July 16, 2026 9:11 AM by Andrew

Roughly speaking, Bayesian Workflow is to Bayesian Data Analysis in 2026 what Bayesian Data Analysis was to earlier Bayesian books in 1995: it builds upon everything that came before.

With Bayesian Data Analysis, the big steps forward were:

Going beyond Bayesian inference to also consider Bayesian model building (as a researcher, you construct the model, it isn’t just given to you as in a textbook), model checking (breaking through the absolutely horrible attitude, common to Bayesians in the early 1990s, that the model was “subjective” and thus should not be checked), and model improvement (continuous model expansion, not the misguided idea of assigning posterior probabilities).
Going beyond simple conjugate models. BDA had lots of hierarchical models, also lots of computational tools so that you could fit the models you want by putting them together from understandable components. And I like how we had a clear separation between modeling and computing. The model comes first, then you figure out how to compute it. Or you set up a model that works within your computational constraints.
A Bayesian approach to sampling and causal inference. This was Rubin’s framework in which unobserved units in the population and unobserved causal outcomes are treated as missing data and are part of a joint probability model. We worked this out in chapter 7 of BDA (which became chapter 8 in the third edition of the book).
Lots of live examples. Not just “real-data examples,” but problems we’d directly worked on. This motivated us and I think it gave our readers a sense of how Bayesian methods worked not just in theory but in applied problems.
A pragmatic view of probability as a measurable quantity. That’s right there in chapter 1. Bayesian methods are not the product of a philosophical stance; they’re a way to connect models and data using probability.

I could go on and on, but for that I can refer you to the Bayesian Data Analysis book.

And these are the key innovations of Bayesian Workflow:

Going beyond Bayesian data analysis (model building, inference, model checking, and model expansion) to consider the larger process of statistical modeling, including comparisons of multiple models fit to a single dataset.
A fuller use of informative priors. This is a big deal. In BDA we still had a bit of the Bayesian cringe going on. One reason we’ve moved toward stronger priors is that the replication crisis has taught us that the amount of prior information available in any given problem is often approximately the same as the information coming from an experiment (see here, for example). Informative priors also fit our increased focus on generative modeling, and we’re doing a lot more prior predictive checking to understand the implications of our models.
More integration between modeling, data analysis, and computing. One way to see this is that the Bayesian Workflow webpage has the code to run all our examples. We also have lots of code snippets in the text as a way of demonstrating the way in which coding is central to our statistical workflow.
Lots more live examples. It’s been 30 years since BDA first came out. One reason that Bayesian Workflow has 11 authors is that different collaborators worked on different examples (but the three principal authors read through the entire book, so the general approach should remain coherent).
Simulation-based experimentation. This is something my colleagues have been doing more and more over the years. At its most basic, simulation-based experimentation provides a best-case baseline for statistical methods: if you can’t recover your quantities of interest with sufficient accuracy under ideal conditions (when your data are simulated from the model you’re fitting), then you know you’re in trouble. And often this is the case! Beyond that, we can simulate from one model and fit another, and see what happens. Simulation experiments aren’t always so easy to construct, as they involve specifying the entire data-generation process. But we think this is effort worth expending, as it involves thinking about the problem you’re working on.

I could go on and on, but for that I can refer you to the Bayesian Workflow book.

And now for the reviews

But you don’t have to trust me on this! Just listen to some of the eminent statisticians and educators who’ve reviewed our book:

Bin Yu (University of California):

An outstanding, protocol-driven guide for Bayesian data analysis, Bayesian Workflow by Gelman, Vehtari, McElreath and co-authors delivers a practical and comprehensive framework for iterative modeling, emphasizing simulation, diagnostic checks, and rigorous empirical validation, and with a long and impressive list of case studies. By treating data analysis as a structured, verifiable workflow, it provides an indispensable toolkit for diagnosing model failures, refining priors, and building reliable data analysis systems for reproducible conclusions, useful for beginning and veteran data analysts alike.

David Spiegelhalter (Cambridge University):

This is not a typical methods textbook, but instead it guides the reader through the whole process of fitting, critiquing and adapting statistical models to real-world problems. It is full of the accumulated wisdom of skilled practitioners, teaching through demonstration rather than theory, with both basic and highly sophisticated examples. I strongly recommend this book to statisticians who really want to understand what they can learn from their data.

Brad Efron (Stanford University):

A bravura performance…Gelman, Vehtari, McElreath and friends develop in detail a practical Bayesian data analysis workflow, from acquisition to final report, including full computational guidance.

Christian Robert (Université Paris Dauphine):

This original, thought-provoking, and transformative book is much much more than an implementation manual for Bayesian Data Analysis, even though it shares almost the same perspective. (The first sentence of the book states that the authors’ “conceptions of statistical practice, and of Bayesian statistics, have changed over the years”.) By providing a modus vivendi for undertaking Bayesian modelling from scratch in realistic settings where models are not magicked out of the blue, the authors explicit and rationalise the many steps required by such a bottom-up modelling protocol (“not a checklist, not a cookbook”, and not a flowchart!) in real situations. The contents read very well and very smoothly, with a seamless conjunction of intuition, modelling advices, computational details, and comparison tools. While unsurprisingly Bayesian, the perspective adopted therein remains both open and inclusive, with a welcome humility about the limitations and challenges of Bayesian workflows. This book should thus appeal to and profit a wide variety of readers, as providing guidance through an extensive collection of highly detailed examples, with shared code and exercises.

Rohan Alexander (University of Toronto):

Some statistics books show you how to beat an egg, others are recipe books: if this, then that style. This book teaches you how to cook. Written by authors who established so much of how we do Bayesian statistics, this new book is an indispensable guide for analyzing data in a trustworthy way. It walks you through the actual steps involved in building models to explore and understand datasets. Part 4 is particularly excellent – the authors provide many end-to-end case studies that will be useful for both practitioners and students. It highlights the value of their workflow-based approach. Filled with chatty asides, the book introduces the Bayesian workflow to a broad audience. It embraces the frustrations and complexities of actually doing Bayesian statistics and provides specific guidance throughout. Each chapter contains exercises and it could be the basis of an upper-year undergraduate course, or a first-year grad course, in applied statistics. It will be used for many years to come.

Mine Doğucu (Harvard University):

What makes Bayesian Workflow so exceptional is how it seamlessly pairs profound ideas about modeling with the adoption of modern computational practice. By centering the messy, iterative process of modeling through real-world case studies, the authors reject rigid cookbooks and checklists in favor of building deep situational awareness. Because the ideas are so clearly articulated and deeply applied, this book serves as an invaluable pedagogical resource. With its practical exercises, individual chapters or the text as a whole can easily be integrated into upper-level undergraduate or graduate courses, while also remaining accessible for self-guided readers. It is an indispensable read for anyone with foundational knowledge in Bayesian methods, regardless of whether they are applied practitioners, software developers, or methodologists.

You might also be interested in the journal issue on statistical workflow that we recently edited for the Philosophical Transactions of the Royal Society.

Again, here’s Bayesian Workflow on Amazon, here’s the publisher’s website, and here’s our website with data, code, and lots more.

Enjoy.

18 Associate Editors resign from Statistics and Computing editorial board: Problems with commercial scholarly publishing, and what does this all mean?

Posted on July 10, 2026 9:27 AM by Andrew

I was cc-ed on a message sent by 18 members of the board of the journal Statistics and Computing, quitting their posts because the publisher (Springer) has announced a new policy whereby all authors will have to pay publication charges. The soon-to-be-former associate editors write, “Statistics and Computing will no longer publish the best science, both due to financial exclusion of those researchers who cannot afford to pay, and those community-minded researchers who refuse to pay on principle.”

I’ll put the full message, with its 18 signatories, below the fold.

My reaction to all this is that it would be great if the journal could move to an open and free system such as is done by the Journal of Machine Learning Research–a journal that I believe was founded by people who had resigned from the editorial board of a commercial journal.

Even commercial journals that begin with good intentions can develop fatal problems. For example, check out the sad story of the Berkeley Electronic Press, a set of commercial journals that was founded by a friend of mine. My friend’s an economist, and I guess he might say that it was the iron logic of capitalism that reduced a once-noble endeavor to a rent-seeking enterprise.

So, yeah, I’d recommend that Statistics and Computing follow the path of JMLR, really try to imitate its structure as closely as possible. Bayesian Analysis is another free journal that appears to run with minimal overhead.

It kind of bugs me, though. Profit-making companies have done great things in publishing and communication. A quick glance at our shelves reveals lots of wonderful books, almost all of which were published privately. As were Bayesian Data Analysis, Regression and Other Stories, Active Statistics, and the rest of my books. Lots of great movies are made for money too.

On the other hand, Arxiv is nonprofit, as is lots of the web, on which I post all my published and unpublished papers. Wikipedia is nonprofit, and this blog is written using WordPress, which appears to be another nonprofit organization. I teach at Columbia University, which is private but nonprofit, not run perfectly by any means but still going strong.

Scholarly publishing is a funny industry because it’s my vague impression that it started out as a low-budget noncommercial enterprise, and then some private rent-seekers moved in. I guess these companies were doing something special or they wouldn’t have been so successful at taking over, but now it seems to have gone too far. More sites like Arxiv, JMLR, and Bayesian Analysis would be a good thing. Right now we’re always having to figure out where to publish our papers; it’s just an absolute mess. These journal submission websites make the Department of Motor Vehicles office look like a lean machine by comparison.

P.S. Retraction Watch ran a story on this, where they quoted Robin Ryder as saying, “If the editors regroup elsewhere to form a new journal, they hope to publish with a society, Ryder told us, citing journals like Journal of the Royal Statistical Society and Annals of Statistics, ‘none of which force authors to pay APCs.'”

I think it would be a mistake for them to follow the Journal of the Royal Statistical Society and Annals of Statistics. Both these journals have arduous paperwork-laden submission processes and both charge for access.

If you’re going to start over, why not follow the model of JMLR and Bayesian Analysis and make it all free? Cut out the middleman entirely!

Continue reading →

Claude builds 3D Hamiltonian Monte Carlo animation in one shot with anaglyphs

Posted on July 7, 2026 4:54 PM by Bob Carpenter

This post is from Bob

The sausage

So as not to bury the lead (or “lede” if you want a mid-20th-century newspaper vibe), check out the this 3D HMC animation generator.

<br />

It can render regular animations or produce anaglyph 3D encoding (red/blue). Unless you have 3D glasses, unclick the “Anaglyph 3D” checkbox at the bottom of the upper left corner control box.

The app let you zoom in and rotate the visualization with obvious controls (explanation in the footer of the visualization). The app also lets you adjust the amount of correlation in the 3D normal distribution as well as step size, number of steps, and animation speed. Looking the long way down a highly correlated “cigar” shape is dramatic.

The 3D effect with glasses is strongest when you rotate the visualization (it’s the usual intuitive controls with instructions at the bottom of the web page) and zoom in a bit. I find that using low 3D depth looks the best. Don’t get your hopes up too much. This isn’t Dr. Strange creating buildings in 3D in a Marvel movie.

If you want to pop it up in an independent browser so you can go to full screen, here’s a link.

3D Hamiltonian Monte Carlo Animation

How the sausage was made

I continue to be amazed at the progress of the frontier LLMs. The demo above was the result of handing Claude Opus 4.8 (“hard” thinking mode) the following single prompt with no build up. As with the Galileo inclined plane case study I posted, which Opus one-shotted, I was expecting some back and forth and false starts.

I want to generate a 3D animation for red/blue glasses of the Hamiltonian Monte Carlo algorithm. There is a nice online visualizatuion by Chi Feng here, but it is not 3D https://chi-feng.github.io/mcmc-demo/app.html I just want the main animation—no need to calculate marginals, etc.

To start, we can use a 3D highly correlated (0.9) normal target with unit variance aligned at one end of the cigar (e.g., near (2, 2, 2) looking toward (-2, 2, 2), which will have things zoom over your shoulder and come back).

If you can generate it so that it’ll run in a web browser with controls on step size and number of steps that’d be great, but if not, choose a step size conservatively so it won’t be rejecting very often. I want it to continue multiple iterations in order to see the effect of random momentum on the trajectories. Leave balls behind wherever the sampler actually samples. When it rejects, make the ball bigger. The trajectory should be thick enough to be visible.

If it’s easier to have Python generate an animation that’s also fine. I just want to be able to render it on my desktop to show people during a talk. I just ordered 50 pairs of cardboard red/blue 3D glasses to hand out.

I was wrong. It did it in one shot. After about 10 minutes of cranking away, it produced what you are looking at. The output is a self-contained (i.e., encapsulated) HTML file of 627KB. There are some things I’d change in an iteration (smaller pipes, fewer of them lying around), but I think it’s worth sharing the output of such a simple prompt. Perhaps needless to say, a follow up prompt gave me the HTML I needed to embed the result in this page as an iframe.

I wrote all 692 words of the blog post myself (other than the html embedding), but I’m sure Claude could have done that, too. The LLMs have fewer rhetorical tics when writing technical and scientific material. But it wouldn’t have sounded like me.

Statistical visualization in the mid 2020s?

I wonder what Andrew’s statistics visualization class would look like in 2026 with LLM-powered visualizations this easy to make. Now that the LLMs can reliably one-shot something this complex, I’m finally starting to worry about the future of programmers. Undergraduate enrollments in CS are very volatile and already going back down as they did after the dot com bubble burst. There was huge growth (a factor of two to three) from after the mortgage market bubble burst around 2007 until it started to decline again due to AI.

The high cost of split R-hat

Posted on July 2, 2026 3:00 PM by Bob Carpenter

This post is by Bob.

I’ve been thinking a lot lately about R-hat given that I’m using it for online converging monitoring in our new Walnuts implementation. In that setting, where I use Welford accumulators to update R-hat estimates every iteration, I can’t use split R-hat without way too much buffering. So I’ve been thinking about the effect of splitting, too, and whether we need it. I asked Andrew and he said Kenny Shirley once produced an example where split R-hat diagnosed non-convergence that regular R-hat didn’t, but that example is lost to time and we’ve never seen this kind of behavior with NUTS as far as I know (please give us an example in the comments or via email to Andrew if you have).

Relating R-hat and ESS

My intuition was that we could set a low enough R-hat threshold that it would ensure a high enough effective sample size (ESS) when we crossed it. The relation’s a little tighter than I thought, with

Rhat^2 ≈ 1 + M / ESS,

where M is the number of chains and ESS is the effective sample size of all chains combined. There’s a multivariate proof in Vats and Knudson, 2021, Revisitng the Gelman-Rubin diagnostic, Statistical Science, page 2 and section 5 for details, but it’s pretty straightforward to get the intuition when you reduce R-hat^2 to (N-1)/N + var(chain-means) / man(chain-variances) as Charles Margossian did in his nested R-hat paper. Vats and Knudson disapprove of Andrew and Aki’s suggested threshold of 1.1 from BDA3, because it is satisfied with a combined ESS of 20 across Andrew’s default 4 chains.

Being me, I tried to validate my intuition with simulations rather than linear algebra. Also, I like to see that things work in practice that theory entails to make sure I’ve understood all the assumptions baked into the theory (one can’t prove anything without assumptions!). When asked to code a simulation using ArviZ, Claude inserted a (2 * M) in the numerator in place of the M. Where did that come from, I asked? It told me it needed the factor of 2 because ArviZ uses split Rhat. D’oh! Of course it does, because we’ve doubled M without increasing ESS.

A worked example

Suppose we have 4 chains with a combined ESS of 400. Then sqrt(1 + 4/400) ≈ 1.005 and sqrt(1 + (2 * 4) / 400) ≈ 1.01. We’ve effectively doubled the number after the 1 by splitting. Unlike Vats and Knudson, I usually don’t need an ESS >> 100, so the 400 required for split R-hat < 1.01 is perhaps a bit too conservative for my tastes. On the other hand, we face a practical problem estimating ESS reliably with fewer than 50 or so ESS per chain. Estimation is challenging because it relies on autocorrelation estimates from the chains themselves, which become much noisier when based on shorter chains. (Side question: Do we not combine autocorrelation estimates across chains to reduce standard error because some chains might not be mixing?) Also, we know this algebra wasn't a coincidence of 4 chains and 400 draws. The Taylor expansion of sqrt(1 + x) is the convergent sequence

sqrt(1 + x) = 1 + x/2 - x^2 / 8 + x^3 / 16 + ...

When x < 0.1, the first-order approximation, sqrt(1 + x) = 1 + x / 2, is good.

The bottom line for practitioners

We need around twice as many draws to get below a fixed threshold with split R-hat than with the original R-hat.

Bayesian Workflow exists as a physical book!

Posted on June 27, 2026 1:51 PM by Andrew

We’re very excited about this book. It’s the result of several years of effort. You can order from the publisher or from Amazon.

Here’s the book’s webpage, which includes the data and code for the book’s examples and case studies, of which there are many.

Here’s the table of contents:

Part 1: From Bayesian inference to Bayesian workflow
1. Bayesian theory and Bayesian practice
2. Statistical modeling and workflow
3. Computational tools
4. Introduction to workflow: Modeling performance on a multiple choice exam

Part 2: Statistical workflow
5. Building statistical models
6. Using simulations to capture uncertainty
7. Prediction, generalization, and causal inference
8. Visualizing and checking fitted models
9. Comparing and improving models
10. Statistical inference and scientific inference

Part 3: Computational workflow
11. Fitting statistical models
12. Diagnosing and fixing problems with fitting
13. Approximate algorithms and approximate models
14. Simulation-based calibration checking
15. Statistical modeling as software development

Part 4. Case studies
16. Coding a series of models: Simulated data of movie ratings
17. Prior specification for regression models: Reanalysis of a sleep study
18. Predictive model checking and comparison: Clinical trial
19. Building up to a hierarchical model: Coronavirus testing
20. Using a fitted model for decision analysis: Classification competition
21. Posterior predictive checking: Stochastic learning in dogs
22. Incremental development and testing: Black cat adoptions
23. Debugging a model: World Cup football
24. Leave-one-out cross validation model checking and comparison: Roaches
25. Model building and expansion: Golf putting
26. Model building with latent variables: Markov models for animal movement
27. Model building: Time-series decomposition for birthdays
28. Models for regression coefficients and variable selection: Student grades
29. Sampling problems with latent variables: No vehicles in the park
30. Challenge of multimodality: Differential equation for planetary motion
31. Simulation-based calibration checking in model development workflow

Appendices
A. Statistical and computational workflow for Bayesians and non-Bayesians
B. How to get the most out of Bayesian Data Analysis

One way to think of the book is that it’s all the things missing from BDA, like how to set up an informative prior, what to do when your computations aren’t converging, how to work through a series of models fit to the same data, how to design and perform simulated-data experiments . . . and all sorts of other things too.

The core of the book–parts 1 through 3–clock in under 200 pages, and then we have another 300 pages full of case studies demonstrating different aspects of Bayesian statistical and computational workflow. The appendices should be useful to you too, first because the workflow ideas in this book apply to non-Bayesian inference too, and second because BDA still has lots of valuable material in it, so it’s good to know where to look.

This new Bayesian Workflow book could change your life (we hope), and I thank my coauthors, Aki Vehtari and Richard McElreath, with Daniel Simpson, Charles C. Margossian, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, Martin Modrák, Vianey Leos Barajas, for all their care and effort. We thank our employers and various funding agencies for giving us the resources to be able to write this book as a side project along with all our daily responsibilities. And we thank many people for their input on earlier versions of the book, along with the Stan developers making so much of this work possible and the Stan community of users for supplying a continuing series of challenges that have motivated many of the ideas and methods discussed in the book.

I posted this already on the blog and you can see answers to some questions in the comments there. I’m posting it again here because, hey, we don’t come out with a new book every day!

I hope you find the book readable, interesting, and useful.

Structural equation modeling (SEM) and positive definiteness

Posted on June 25, 2026 3:00 PM by Bob Carpenter

This post is from Bob.

Mitzi and I were swotting up on structural equation models (SEM) for our class this past Monday at the Modern Modeling and Methods (M3) conference at Fordham University. It was a lot of fun and now I think I understand SEM notation. I really like these applied conferences and this was a group of psychometrician, econometricians, and sociometricians. Many if not most of them thought about models in terms of SEM, so we thought we should figure it out. But I was left with a concern you may be able to help me sort out.

The example

The first worked example in Ken Bollen’s seminal 1979 textbook on SEM is a study of how industrialization relates to democracy. It comes from his paper,

Bollen, Kenneth A. (1979). “Political Democracy and the Timing of Development.” American Sociological Review, 44(4).

and was reprised in his book

Bollen, Kenneth A. (1989). Structural Equations with Latent Variables. Wiley.

I had the pleasure of sitting across from Ken at the invited speakers dinner at the conference, so I’m glad I looked into SEM before that. Good news for the SEM devotees—he released a completely revised guide to SEM a few months ago.

Bollen, Kenneth A. 2026. Elements of Structural Equation Models. Cambridge University Press.

The data and parameters

The data consists of eleven covariates (called “indicators” in SEM) for each of 75 countries. Four of the covariates are related to democracy in 1960 (y1, y2, y3, y4), the same four measurements were taken again again in 1965 (y5, y6, y7, y8) , and there were three measurements of industrialization in 1960 (x1, x2, x3).

The SEM model the original researcher came up with here assumes three latent scalars per country, industrialization in 1960 (IND60), level of democracy in 1960 (DEM60), and level of democracy in 1965 (DEM65). These latent parameters are related in the following way: democracy in 1960 is a regression on industrialization in 1960, and democracy in 1965 is a regression on both democracy in 1960 and industrialization in 1960.

The covariates are then modeled like a seemingly unrelated regression in econometrics. The four democracy 1965 parameters are treated as regressions on the latent level of democracy in 1965, and similarly for the democracy in 1960, and industrialization in 1960.

Rather than independent errors, a SEM model explicitly indicates with arrows which pairs of observations are allowed to have non-zero correlation in the covariance matrix for the observations. The three industrialization observations are assumed to have zero correlation—there are no arrows between any of the three measurements in the SEM diagram. Each of the four measurements in 1960 is assumed to covary with the same measurement taken in 1965. In addition, the second and fourth measurement in each year are assumed to be correlated with each other, which leads to a box-like structure.

The SEM diagram

Here are the arrows in the diagram, where I’m not using their standard LISREL notation, but writing them in R expression syntax to indicate what is regressed on what. In their graphical notation, just replace ~ with <-. All three latent variables and all eleven measurements are indexed by country.

IND60
DEM60 ~ IND60
DEM65 ~ DEM60, IND60

x1, x2, x3 ~ IND60
y1, y2, y3, y4 ~ DEM60
y5, y6, y7, y8 ~ DEM65

The covariance structure is indicated by stating which pairs of measurements are modeled with non-zero correlation. The first four just pair the measurements of the same thing across 1960 and 1965.

y1 <-> y5
y2 <-> y6
y3 <-> y7
y4 <-> y8

The last pair of correlations are within 1960 and within 1965.

y2 <-> y4
y6 <-> y8

Together, these induce an odd box structure, where y2 is correlated with y6 and y4, both of which are correlated with y8, but y2 and y8 are assumed to have zero correlation.

y2 <-> y6
^      ^
|      |
v      v
y4 <-> y8

Stan implementation

We didn’t get this far in my half of the class, so I will share here the Stan Playground example where I fit Bollen’s example (you can get the data and the Stan model through the Playground link:

Stan implementation of Bollen’s SEM example.

It gets the right answer compared to lavaan/blavaan, which is nice. In the Stan code, xi is IND60 and eta1, eta2 are DEM60, DEM65. The relation among the latent parameters are modeled directly as regressions. The correlations among the observations are modeled using soft zeroing, where I just put a tight prior around zero on the structural zero elements, because Stan doesn’t give you a good way of setting up structural zeroes in a covariance matrix (Sean Pinkney or Ben Goodrich might know how to do this?).

This makes me curious how the lavaan package in R manages this. There’s a Bayesian version of lavaan built on top of Stan, blavaan. The first example right at the top of the home pages for both the lavaan and blavaan is Bollen’s democracy model. I guess it’s like the Scottish lip cancer data set for spatial modeling or Fisher’s iris data for regressions.

My questions

Consider a simple diagram among measurements like the following.

x <-> y
y <-> z

This says there can be non-zero correlation between A/B and also between B/C, but the correlation between A/C is zero. It’s a simplified case of the box we saw in the actual example. These arrows implies the correlation matrix looks as follows.

|        1  rho[x,y]         0 |
| rho[x,y]         1  rho[y,z] | = Omega
|        0  rho[y,z]         1 |

Given that the correlation matrix Omega must be positive definite, this limits the range of rho[x,y] and rho[y,z]. For example, we can’t have rho[x,y] = rho[y,z] = 0.9, or rho[x,z] would have to be greater than zero to maintain positive definiteness.

Q1: Why doesn’t SEM instead say that the correlation rho[x,z] is just the minimum value it can be given rho[x,y] and rho[y,z]? I’m suggesting that we instead treat the above diagram as implying no additional correlation between x and z other than that implied by the correlation between x and y and the correlation between y and z? That is, why try to shrink rho[x,z] all the way to zero? From the text, it feels like the motivation is to enforce zero correlation in the model. But all this is doing is simplifying regressions—it won’t actually enforce zero correlation among the measurements that are modeled with zero correlation. I wished I’d asked Ken this question at dinner, but I’ll ping him about this blog post and hopefully get a response.

Of course, in the pragmatic Bayesian workflow, we’d use posterior predictive checks to evaluate whether there’s unmodeled correlation between x and z.

Q2: I’m also curious what Andrew and others think about enforcing structural zeroes in correlation between measurements as opposed to just estimating a dense covariance matrix and inspecting where the correlations fall.

Workshop on Rethinking the Role of Bayesianism in the Age of Modern AI

Posted on June 22, 2026 4:13 PM by Andrew

Esmeralda Whitammer, Sara Wade, Vincent Fortuin, Konstantina Palla, and Theodore Papamarkou write:

We are organising a focused workshop on Rethinking the Role of Bayesianism in the Age of Modern AI from October 26 to 30, 2026, bringing together researchers exploring the frontiers of Bayesian machine learning and deep learning. The meeting will take place in Edinburgh, Scotland, UK, and will be hosted by the University of Edinburgh’s School of Informatics.

This workshop follows in the footsteps of the meetings held at Dagstuhl in 2024 and MBZUAI in 2025. This year, the meeting is growing and becoming an official event of the International Society for Bayesian Analysis (ISBA)’s new section on Bayesian AI. We are planning to maintain the collaborative and interactive spirit of the previous meetings, with a programme that includes talks, panel discussions, poster sessions, and ample time for interaction among participants representing a wide range of perspectives and expertise.

Looks interesting! They should invite Aki for sure.

LLM-generated Stan case study on Galileo’s inclined plane experiment

Posted on June 18, 2026 3:00 PM by Bob Carpenter

This post is from Bob.

I’ve been planning for at least a couple years to generate a case study around Galielo’s use of an inclined plane instrumented with water clocks to estimate the terrestrial gravitational constant. Here are some photographs of a replica in the Museo Galileo (click to blow them up). And here’s a video simulation of the experiment. We replace his clever pendulum apparatus explained in the video and the web page with simple Bayesian statistics so we can actually estimate the gravitational constant.

The case study

Here is a draft.

Bob Carpenter. 2026. Estimating g from Galileo’s Water Clock: A scientific Bayesian inverse problem with Stan and CmdStanPy. GitHub.

I list myself as the author here because I’m responsible and AIs can’t own copyright in the U.S., but 100% of the text and code was written by Claude Opus 4.8 (medium or high effort, but I can’t recall which). I used the desktop app, which doesn’t allow sharing, but you can try it yourself.

The prompt

Here’s the sloppy prompt I used, which I just typed in without much thought in a couple minutes to get a feel for what it could do on its own.

I would like to generate a case study written in Quarto and using CmdStanPy to demonstrate solving scientific Bayesian inverse problems. I want to use a simulation of Galileo’s water clock experiment, which can be used to estimate the gravitational constant. I would like you to start by generating the mathematical model description in LaTeX, the model code in Stan to solve the inverse problem, and a simulation driver in Python using CmdStanPy and plotnine for plotting. Please just `import plotnine as pn` and use `pn.geom…`, etc. All I need in the output now is a call to `.summary()` on the fit returned by `.sample()`. Wrap this all up in a quarto document for me from which I can generate HTML by calling `quarto render galileo.qmd`.

It was done before I got back to my desk with a cup of coffee (well under five minutes). So not quite the several hours Andrew said it took him to write his case study on the New York Knicks basketball team, which he posted earlier today. Of course, this was much simpler and I didn’t have to think through any details before generating it.

Is it right?

What Claude produced looks really good to me. If a student had done this, I’d given them an A. I can’t object to the way it described Galileo’s experiment, wrote the math, wrote the Stan code, wrote the Python simulation, or plotted the raw data as Andrew is always urging us to do.^*

The source

You can find the source .qmd file on my GitHub:

https://github.com/bob-carpenter/case-studies/tree/master/galileo-gravity

It’s short, so I would have just included it, but the blog software blocked my post after considering it an attack on the site. To get it to render with resources embedded, I had to ask Claude a follow-up question and manually insert a single line of config into the .yaml header for the markdown document.

Putting this blog post together took longer than writing the prompt and checking the results.

^* Maybe Claude runs a little simulation of Andrew like I do. Andrew himself claims to run a simulation of Jennifer Hill—it’s the basis of his
handy statistical lexicon entry for “WWJD,” which he told me stands for “What would Jennifer do?” Unfortunately, neither the lexicon entry nor its underlying link explains the acronym.

R wins statistics award.

Posted on June 17, 2026 2:00 AM by Andrew

Elena Belogolovsky writes:

Congratulations to the R Core Team on receiving the 2026 Rousseeuw Prize for Statistics.

R has made creative, open-ended statistical analysis and graphics accessible to generations of statisticians and applied researchers. It has also been central to statistical research, methodology, and applications during decades when statistics became more computational and more important across science, engineering, business, and public health.

One of the great strengths of R is that it is not just a software platform. It is also a community. The system of R packages allows anyone to implement a new method and share it with the world, helping make statistical research more open, useful, and alive. R has also been the medium for major developments in statistical graphics, transforming applied statistics and the way people work with data.

The volunteers who have developed, guided, and maintained R and the R community are richly deserving of this major award.

I agree with the committee that the R team is an excellent recipient of this award. I say this for several reasons:

– Most obviously, R is super-useful and it’s changed statistics, both by enabling more complicated and reliable analysis and by establishing a common language for statistical coding.

– R integrates statistical modeling with graphics, which traditionally (but, in my opinion, mistakenly) have been thought of as in opposition.

– R is open source. This might sound like no big deal, but its predecessor was Splus, which was a commercial package. Before that came S, which was open but was not set up to expand in a scalable way.

– With its system of packages, R became modular: different groups of users (including me!) could write their own packages and develop new and useful tools without needing to get tangled in core R issues. For example, we have cmdstanr, which lets you run Stan programs from R. This is super-useful for Bayesian workflow.

– R is a programming language, not a menu-based set of commands. This is no big deal now, given that the natural comparison to R is Python, but, back in the day, when R’s competitors were Sas, Spss, Stata, etc., it was a big deal that with R you write programs, you don’t just push buttons. A big deal for workflow in statistics and data science.

– Regarding the R community . . . ok, this gets complicated. Still and all, the R core team is very helpful to outsiders and has been a clear net benefit to the communities of developers, statisticians, and users.

I’m sure I’m missing a few things. My only disagreement with the award citation is that it doesn’t mention S, the statistical software environment developed by John Chambers and others at Bell Labs back in the 1980s. R is a rewrite of S. With lots of improvements, but I do think the S team deserves credit for setting up the template.

Call for invited session proposals for the upcoming BayesComp conference

Posted on June 16, 2026 5:42 PM by Andrew

Lu Zhang writes:

As a member of the BayesComp 2027 conference committee, I would like to share the announcement of the call for invited session proposals for the upcoming BayesComp conference, which will be held in College Station, Texas, on May 18–20, 2027.

The scientific committee is currently soliciting proposals for invited sessions. Each invited session will consist of three speakers, and proposals should focus on timely, important, and broadly engaging topics in Bayesian computation and related areas.

The submission deadline (as of now) for invited session proposals is August 15, 2026.

Proposal form: https://forms.gle/wpYvkkjKGZ5vHqhF6

Additional details are available in the official announcement:

The LOC for BayesComp 2027 is pleased to announce that the next edition of BayesComp will take place in College Station, TX during May 18–20, 2027. The scientific committee is now opening calls for invited sessions. Each invited session will consist of 3 speakers. Proposals should highlight timely, important, and broadly engaging topics in Bayesian computation and related areas. Each speaker may be listed as a speaker in only one invited or contributed session proposal.

Lu is the first author on the Pathfinder paper and continues to do interesting work on Bayesian statistics and computing. Based on what I’ve heard about past BayesComps, the conference should be really interesting.

Ph.D. student opening in Sweden on Earth Observation, Data Science, and AI for poverty estimation

Posted on June 15, 2026 5:37 PM by Andrew

Adel Daoud writes:

I’m writing to ask for your help circulating a PhD opening in my group at Chalmers, the AI and Global Development Lab (www.aidevlab.org). The position is in Earth Observation, Data Science, and AI for poverty estimation, the Data Science and AI division (Department of Computer Science and Engineering). We are looking for candidates with a strong grounding in data science, computer science, deep learning, statistics, or similar— remote sensing experience and causal inference are welcome bonus.

Ad and application portal: https://www.chalmers.se/en/about-chalmers/work-with-us/vacancies/?rmpage=job&rmjob=14818&rmlang=UK
Deadline: 20 June 2026.

Here’s the description of their center:

The AI & Global Development Lab fuses AI with Earth Observation to illuminate the causes and consequences of human development across time and space.

Our interdisciplinary team, comprising data scientists, computer scientists, and social scientists, develops methods to better understand the multi-scale dynamics of pressing global issues, including poverty, conflict, sustainability, and the effectiveness of policy interventions.

By analyzing satellite imagery from 1984 to the present, AI search agent swarms for large-scale knowledge discovery, and other planetary-scale sources, we are reconstructing historical and geographical development trajectories at a level of detail never before possible, working to offer new insights into the changing face of development worldwide.

We also invite you to visit PlanetaryCausalInference.org for more information about the causal arm of our project.

They call it “Planetary causal inference,” which seems to fit the themes of this blog.

Stein’s method, learning and inference -or- how to really monitor convergence and thin chains

Posted on June 8, 2026 3:00 PM by Bob Carpenter

This post is from Bob.

I’ve been thinking a lot about scores (gradients of the log density function) and how they can be used for convergence monitoring. We know that the expected value of the score is zero. Stein generalized this with Stein operators. In the monomial case, the Stein operators give you functions in increasing degrees, all of which have zero expectation in the posterior. Here theta is the variable being sampled and S is the score function, so that S(theta) is the gradient of the target log density evaluated at theta.

Order 0: S(theta)

Order 1: 1 + theta .* S(theta)

Order 2: 2 * theta + theta^2 .* S(theta)

This leads to a natural test for convergence of first, second, and third moments. Just compute Monte Carlo estimates of these quantities and see if they’re zero. We’d want to standardize for standard deviation to make the result scale-free like R-hat. To develop some intuitions, in a standard normal distribution p(theta) = normal(theta | 0, I), we have S(theta) = -theta, and thus S(theta) converges to zero at the same rate as our variable theta converges to its true value; the order 1 test is 1 – theta^2, which we know has expectation zero because theta^2 has a ChiSquared(1) distribution with expectation of 1). The order 1 case corresponds to equipartition in physics and the form D + theta’ * S(theta) also naturally has zero expectation as shown in the viral theorem in physics in the 1870s.

Diving into this a bit more led me back to Jackson Gorham and Lester Mackey’s work on Stein’s method. They haven’t been sitting still since introducing the basic idea, which kernelizes the idea above. Mackey et al. have produced an absolutely wonderful summary of this body of work in two forms. The first is a dense, 41-slide deck with all the key definitions and results. I’d suggest at least skimming this first.

Lester Mackey. April 2026. Stein’s Method, Learning, and Inference.. GitHub.

Mackey along with Chris Oates and Qiang Liu, who have also worked heavily in this area, put together a definitive monograph. They’ve presented a great deal of difficult material in a way that I can digest (though it’s going to be rough going if you’re not well versed in sampling and how MCMC is traditionally measured and evaluated).

Qiang Liu, Lester Mackey, Chris Oates. March 2026. Probabilistic Inference and Learning with Stein’s Method. arXiv.

In particular, they go over Stein variational inference, which seems to me like it would be the ideal way to perform quasi Monte Carlo-like inference for statistical models if we could only get a robust version to scale. The idea’s to initialize a bunch of points, then use optimization to minimize a kernelized Stein discrepancy of the empirical distribution of those points to the true distribution.

The Kappa Zoo: David Eubanks’s online monograph on rating models

Posted on May 28, 2026 3:00 PM by Bob Carpenter

David Eubanks writes:

My site is kappazoo.com, and it’s still a work in progress. I would rather have emailed after I had the new goodness-of-fit code done, but I saw that you’re doing a summer workshop (on Andrew’s blog) [editor: Modern Modeling Methods (M3)], so thought I’d mention it now.

It may be billed as a work in progress, but it’s a complete draft with no missing sections that provides a really nice overview of rating/crowdsourcing models. These are the models that dragged me into statistics, namely Bayesian rating models formulated as noisy measurement models. The first model of this kind that I or Eubanks could find was Phil Dawid and Allan Skene’s (1979) paper on rating.

Eubanks works through a great deal of workflow without calling it that. There are multiple model evaluation and comparison measures used and explained with connections to information-theoretic notions like entropy.

There’s a long discussion of Cohen’s kappa statistic, which is a commonly reported statistic measuring inter-rater agreement. As Eubanks notes, it doesn’t deliver on its promise of adequately measuring inter-rater agreement. The discussion is quite good here and complementary to the discussion from me and Becky Passonneau in our paper on rating in NLP, though our conclusions are the same.

I was surprised to see that Eubanks has a section comparing item-response theory (IRT) models with difficulty. I’ve been trying to convince people this is important for years. It took me around ten years to figure out how to move from Dawid and Skene’s IRT-0-like model to an IRT-1-like model, which we report in in our latest paper on crowdsourcing with difficulty parameters (which also works through a lot of Bayesian workflow in considering different models). I can’t identify what took so long—it seems so obvious to me now.

Statistical analysis recapitulates the development of statistical methods

Posted on May 28, 2026 9:00 AM by Andrew

We ran this a few years ago but it remains interesting so I’m reposting:

There’s a old saying in biology that the development of the organism recapitulates the development of the species: thus in utero each of us starts as a single-celled creature and then develops into an embryo that successively looks like a simple organism, then like a fish, an amphibian, etc., until we reach our human form in preparation for birth.

Modern biologists don’t believe in this recapitulation. But taking this as an intriguing idea, I see an analogy with statistical practice.

Some version of this recapitulation occurs just about whenever we do applied statistics. We start with the simplest methods–univariate data summaries and some basic multivariate analyses–then we perform some comparisons which we check via standard errors and off-the-shelf hypothesis tests, then we move to modeling. We might well start with least squares and maximum likelihood and then move to regularization and multilevel modeling as needed, then throw in measurement error models, selection models, nonparametric this and that, and so forth.

The analogy isn’t perfect–in particular, we don’t always begin an analysis with simple averages and plots; sometimes we begin with a sophisticated nonparametric data-exploration tool such as lowess or deep nets. And, lots of methods for graphical exploratory data analysis have only been developed recently; indeed, even methods as basic as scatterplots are only a few centuries old.

Within the context of modeling, though, it does seem to me that we tend to start simple and then add more complicated features one at a time–and this seems like a sensible way to proceed. In so proceeding, we’re motivated in part by computational stability but also in part by the logic of increasing complexity: we take each step for a reason. Thus it is logical that statistical analysis recapitulates the development of statistical methods.

Jonah’s seminar tomorrow: “Bayesian Workflow and the Software That Shapes It”

Posted on May 18, 2026 12:30 PM by Leonardo Egidi

This is Leo. Jonah Gabry (Stan developer, Andrew’s collaborator, etc.) is spending the whole month of May as a visiting professor here with us at the University of Trieste in Italy. Tomorrow, May 19th, in the De Finetti room at the University of Trieste, at 9 am NYC time (GMT-4), Jonah will give the following talk:

“Bayesian Workflow and the Software That Shapes It”

based on the upcoming book: “Bayesian Workflow”.

For anyone local, you are welcome to come in person. Anyone else can join on Microsoft Teams (available here).

Alchemize: PyMC’s model to replace Stan/PyMC, etc. with an LLM

Posted on May 14, 2026 3:00 PM by Bob Carpenter

This post is from Bob

I’ll let Thomas Wiecki, who is one of the core PyMC devs and one of the partners at PyMC Labs, speak for himself here:

Thomas Wiecki. 2026. Alchemize: Transpile PyMC to Rust for 3-7x speed-up. PyMC Discourse.

If you haven’t seen what people are doing with agentic AI, this is a good example. I’m really happy that Thomas and PyMC Labs are sharing their thoughts and initial tries at things like this as I think it has the potential to benefit everyone working on modeling.

If you want to see the basis of the agent’s instructions, check out the “skill” for PyMC that Chris Fonnesbeck wrote.

We’ve already batted this around a bit in email with Thomas, so I can summarize some talking points:

LLM-based chatbots are really good at translating. Compiling (or more technically correct, transpiling) a statistical model down to a language like Rust or C++ or JAX is a kind of translation.

You can start from PyMC’s execution trace, but you can also start with a model description. You could also start with something like Stan code.

The biggest bottleneck to deploying Bayesian models in my opinion is the inherent variance and unreliability of MCMC-based inference. Our workflow proposals are all about making sure this doesn’t go wrong. Wiecki’s point here is that we can have the bots go through the workflow. Iterating until the gradients and log densities match is a good example, but this could be extended to more parts of workflow.

The skills feel a lot like writing a textbook for a bot. I have no idea how hard or easy this is or how much it improves over the baseline. Jeremy Magland built a RAG-like helper for Stan that compressed the Stan Reference Manual down to 1K tokens for context (like a skill) and allowed it to search and import from the Stan User’s Guide, but never measured how much it improved over the baseline. It really feels like it should also have the Stan Functions Reference, BDA3, Regression and Other Stories, and the Bayesian Workflow book, as well.

Hopefully we’ll asymptote at writing a textbook sized set of skills and not have to write one per target model (that is, something like the Stan User’s Guide, Reference Manual, Functions Reference).

I’m curious as to whether it will eventually be able to make writing hard models easier. I’m thinking of efforts like epinow2, which involves a very large chunk of Stan code.

As the foundation models and chatbot tuning changes, there’s going to be an issue of regression testing and tuning for whatever the latest models are.

P.S. This effort explains how Thomas was able to create the huge posteriordb pull requests for PyMC (#320 and #319)!

P.P.S. The latest thing Claude (Opus 4.7) did that impressed me was generate the ess(MatrixXd, vector) function in summary.hpp. This function estimates effective sample size Stan style (discounting for R-hat > 1) on a ragged array of Markov chains. We have to generalize all the posterior analysis tools to deal with the new asynchronous parallel sampler). I had Stan’s ESS function and all the other functions I’d written for the ragged structures to give it as a guide. It’s very easy to code review that it matches Stan’s implementation for the new data structure. I only had to tweak the output a little bit for style.

Handbook of Markov chain Monte Carlo, second edition

Posted on May 6, 2026 9:31 AM by Andrew

Radu Craiu, Dootika Vats, Galin Jones, Steve Brooks, Xiao-Li Meng, and I edited the second edition of the Handbook of Markov chain Monte Carlo. Dootika set up a github page for the book, listing all the chapters and includes links to most of them in Arxiv form. (Chapter 4, “For how many iterations should we run Markov chain Monte Carlo?”, is by Charles Margossian and me.) For some reason, a few of the chapters are not yet on Arxiv but I guess they’ll get there soon.

I recommend the whole book, but especially chapter 24, “Running Markov chain Monte Carlo on modern hardware and software,” by Pavel Sountsov, Colin Carroll, and Matthew Hoffman. But really, just dive in and read whatever chapters interest you.

My only regrets are that we didn’t include chapters on the following topics:
– Probabilistic programming (Stan, etc.)
– Sequential Monte Carlo (particle filtering)
– Divide-and-conquer algorithms (expectation propagation, etc.)

But, hey, no project is ever done.

I’m glad to have been part of this, and special thanks to Radu and Dootika, who joined the project for the second edition and added a lot.

Expanding the Stan User’s Guide

Posted on May 4, 2026 3:00 PM by Bob Carpenter

This post is from Bob.

The Stan User’s Guide has been evolving organically along with the project. I’m writing this post for two reasons—to let you know what we’ve added lately and to encourage you to add your own chapters.

History

Initialy there was just one doc that included what is now the Reference Manual, Function sReference and CmdStan Manual. You can browse them all on the web or download pdfs from the Stan documentation web site.

Most of the topics grew up opportunistically from either things the team was interested in, new features we added to Stan, or just trying to cover the basics of the chapters of Bayesian Data Analysis and Data Analysis Using Regression and Multilevel/Hierarchical Models.

Recent additions

There have been some recent additions, including some from new faces:

Franziska Henrich: drift-diffusion Wiener models for reaction-time modeling
Abner Heredia Bustos: multiple imputation
Brynjólfur Gauti Guðrúnar Jónsson: copulas
Charles Margossian, Steve Bronder, Aki Vehtari: embedded Laplace approximation
Me: survival models

There are also a metric ton of minor clarifications, fixes, and model examples, such as Mitzi Morris’s additional sufficient statistics optimizations and you can simply look at all the documentation pull requests.

The future is you

You can see from Franziska’s pull request, Abner’s PR, and Brynjólfur’s PR that we provide a lot of guidance and feedback. I found it super useful to write a lot of the documentation when I was learning statistics because I got feedback from Andrew Gelman, Ben Goodrich, Aki Vehtari, and Michael Betancourt, among others.

Some topics that would be nice to add

There is a lot to add to our existing discussions and you can see a lot of that in the pull request list on our documentation repository. There are also many topics that we don’t cover or don’t cover well. Off the top of my head, roughly ordered near the top according to my assessment of how impactful it would be to add.

causal inference using Rubin’s potential outcomes framework
Thompson sampling for reinforcement learning/bandits (sorry, Andrew!)
penalized complexity priors and the Besag-York-Molie II spatial model (working from Mitzi’s Morris’s case study)
Extreme value models
RNA-seq and DNA-seq composition models at the read level and k-mer level
Econometric models (I don’t even know what these are, but there are books)
Stationarity-constraining parameterizations of VAR models a la Sarah Heaps
Neural networks
A/B testing (ideally sequential)
spatio-temporal models along the lines of Leon Held’s survey
non-trivial examples of ODE models like Lotka-Volterra (my case study), SEIR (Elizaveta Semenova et al.’s case study) or even better, pharmacokinetic/pharmacodynamic (multiple case studies around Torsten), soil carbon (my very early case study, but there’s been much better work since)
Hilbert-space approximations to Gaussian processes (GPs)
PDE approximations to GPs
Fourier space analyses of GPs
N-gram language models
Plackett-Luce model for contest with more than two players (this can follow my case study for StanCon)
…

Some of those may already be mentioned in our doc, so sorry if you’ve already written something. Also, feel free to suggest more topics in the comments.

The Bayesian Workflow book is coming!

Posted on April 16, 2026 9:58 AM by Andrew

We’re very excited about this book. It’s the result of several years of effort. You can pre-order from the publisher or from Amazon.