MSc and PhD programs in statistics at the University of British Columbia

Posted on November 7, 2025 2:59 PM by Charles Margossian

(this post is by Charles)

Applications for the MSc and PhD programs in statistics at the University of British Columbia are now open. The application deadline is December 1st for PhDs and January 5th for MSc.

Of interest to the readership of this blog is that we have a lot of momentum in the department for research in Bayesian modeling, computational statistics, and probabilistic machine learning. Amongst the new faculty hires, Saif Syed and myself are both starting research groups to work on algorithms for probabilistic modeling. And we have an established guard working on Bayesian computation, that includes Alexandre Bouchard-Côté, Trevor Campbell, and Geoff Pleiss.

Furthermore, the department stands out through its commitment to implementing novel methods in high-performance probabilistic software. In the department, you will find the creators and maintainers of Pigeon.jl and GPyTorch. And, as some of you may know, I’m myself a seasoned Stan developer. I’m also happy to report that we have some veteran Stan users on campus, notably in the Ecology department :)

We’re a quite large and growing department with interests that span all areas of statistics (not just Bayesian methods!). PhD students entering the program do not need to be matched up with an advisor upon admission—rather, you have one year to interact with faculty and explore a number of topics. The program does a really good job allowing and helping students do that!

Vancouver is a really beautiful and bustling city. The campus is right by the ocean and you’re a one hour drive away from hiking in the mountains. I’m new and still figuring out the place, but so far, I’ve found the quality of life here to be very good. By the way, it doesn’t get that cold, because of the ocean currents and we rarely get snow in the city (but we still have skiing nearby). We do get rain and somewhat British weather…

One final point: Unlike in the U.S., application to the PhD program requires a masters degree. However, if you only have a bachelor’s degree and are committed to pursuing a PhD, exceptional candidates can apply to the PhD track program within the MSc program.

Starting a new position at the University of British Columbia

Posted on March 20, 2025 3:12 PM by Charles Margossian

(this post is by Charles)

Hello wonderful blog readers!

A short post to share that starting this summer, I will continue my academic journey as an assistant professor of statistics at the University of British Columbia in Vancouver, Canada.

Full statement here.

Elections for the Stan Governing Body 2025

Posted on March 15, 2025 7:06 PM by Charles Margossian

(this post is by Charles)

Calling all members of the Stan community to action!

We’re renewing the Stan Governing Body. The SGB co-organizes StanCon and related events (such as StanConnect!), works on initiatives to fund developers, and more generally helps set the directions of the project.

From the Stan forum:

We’re renewing the Stan Governing Body (SGB) with all 5 seats up for grabs. Current SGB members may still run, however they are not guaranteed to preserve their seats. As in previous years, we will use the Stan forum for nominations.

Please respond to this post to self-nominate for either a 1-year or 2-year term on the SGB before April 14th. We encourage all nominees to briefly summarize their experiences with Stan and their goals for the SGB. Feel free to add links to any content you think is relevant. And if you know someone who you think would be a good fit for the SGB please let them know and encourage them to nominate themselves.

You can find more details on the original post.

On a more personal note…

On a more personal note, I have decided to not run for a third term. Serving for two years has been an immense privilege.

It doesn’t take much reflection to realize what an integral part of my career Stan has been (and continues to be). I discovered Stan during my first job out of college at Metrum Research Group: I was contributing to the C++ library and building features which would enable applications in Pharmacometrics. My first pull request was the matrix exponential function in 2016. My first research poster was titled “Stan functions for pharmacometrics.” My first conference talk was at StanCon 2017, aka the inaugural StanCon. The project and the people working on it encouraged me—even empowered me—to pursue a PhD in Statistics, which I did at Columbia with, as my advisor, the illustrious stranger who created this blog. Throughout the years, Stan has connected me to people, both in statistics and in many applied domains.

I must recognize that, as I dove deeper into academia, I started dedicating less and less time to Stan. I remember during my first year in grad school proposing, as a final class project, adding a new feature to Stan. The prof told that me that such work was important for the scientific community but that I needed to find something that was conceptually more substantive. I was also given a clear signal that the PhD program was to train researchers, not engineers.

It’s unfortunate that incentives in academia often misalign with the goals of developing (and maintaining!) high-quality open-source software—at least in the short term. What the creators of Stan pulled off in an academic context was remarkable and unorthodox. In the end, it took an international collaboration across academia and industry to carry the project. I can’t do everyone justice, so I’ll refer you to the list of developers.

I have often felt torn between my work as a researcher and as an engineer. Sure, sometimes you can align the goals: I did leverage that final class project to build a prototype feature in Stan and wrote a research paper on it. But this anecdote strikes me more as the exception than the rule. Even after I started working at the Flatiron Institute—an institution that champions the development of scientific software and hires full-time software engineers—I only devoted so much time to Stan. My colleague Brian Ward jokes that during my almost 3 years here, he has never seen me write any C++ (and I almost haven’t). I’m embarrassed by how much time it’s taken me to write up documentation on our suite of HMM functions; I’m not very active on the Stan forum; and it is now a StanCon tradition to have someone publicly bash me for not finishing Stan’s integrated Laplace approximation (this month I hope—Steve Bronder and I are one unit test away from completing the C++ pull request. Steve has done a lot of work to create a clean user interface.)

The reality is that I’ll never get to work full time on Stan like I did before the PhD. And I suppose that’s ok. I enjoy the “conceptual” work—that is the more methodological/theoretical research that I’ve been doing, and I trust that some of it is useful to Stan users and to the broader Bayesian modeling community. I’m actually very fond of my non-Stan collaborators—if you can believe it! But the feeling of not doing enough for Stan… yeah, that’s a real thing.

Two years ago, when the announcement for the SGB election was posted, I saw an opportunity to devote more time to the Stan project in a structured manner. The SGB does a lot and I focused on bringing back StanCon. Concretely, I co-organized StanCon’23 and StanCon’24, and laid the groundwork for StanCon’26 (yes, we’re taking a one-year break). I liked working on these conferences. Sure, it’s work, but a lot of people generously contribute their time, and if the tasks are properly delegated, it all becomes very manageable. Ultimately, it’s very rewarding to see a vision brainstormed over a zoom call come to life when we all gather, say, at a pub in Oxford, and streams of colleagues we haven’t see in months, sometimes years, keep pouring in and gathering around a large table.

I also believe that StanCon is the best applied Bayesian conference out there. Period. And I think its participants benefit immensely from attending—whether by learning a lot from the tutorials or exchanging ideas with top experts. We’re a community of doers.

I hope the next batch of SGB members will tackle this opportunity with the same ambition and pride that the past bodies have displayed. And even as some of us move on, we’ll be here to insure a smooth transition and provide support where we can.

Flatiron-wide Autumn Meeting: talks are now online

Posted on February 13, 2025 4:09 PM by Charles Margossian

(this post is by Charles)

Every year, the Flatiron Institute hosts the FWAM meeting. From the website:

FWAM (“Flatiron-wide Autumn Meeting”, previously “Flatiron-wide Algorithms and Mathematics”) is a two-day internal conference with the goal of introducing and reviewing scientific and computational tools of broad and significant usefulness to Flatiron researchers across all centers of the institute.

These talks will more broadly appeal to researchers who work with computational methods. The Flatiron Institute has just released recordings of the presentations, available here!

This year, I was asked to give a talk on Markov chain Monte Carlo. I adapted material from the chapter Andrew Gelman and I recently wrote for the upcoming revised Handbook of MCMC (see preprint here). The title: “For how many iterations should we run MCMC?”

Here’s a sample of other talks I enjoyed:

“Coding for Humans: Best practices for writing software people can read” (Jeff Soules)
“Communicating Science using Visuals: Tips for Scientists” (Lucy Reading-Ikkanda)
“Testing Bayesian models and predictive inference in ML and statistics” (Bob Carpenter)
“Fitting Hierarchical Bayesian Models with Selection Effects in Astronomy” (Will Farr)
“Scaling and Generalizing Approximate Bayesian Inference” (David Blei)

And there are a lot more talks, which span a range of topics and will appeal to different audiences.

Call for StanCon 2025+

Posted on November 14, 2024 12:39 PM by Charles Margossian

We are looking for volunteers to organize the next StanCons for 2025 and beyond!

If you are interested in making a proposal (or even just discussing the possibility of making one), please consider reaching out to [email protected].

To make a proposal, please submit this form. The Stan Governing Body will begin reviewing proposals on November 30th. For more information, see our complete post on the Stan forum.

StanCon 2024 Oxford: recorded talks are now released!

Posted on October 30, 2024 6:47 PM by Charles Margossian

(This post is by Charles)

The title says it all: recordings of StanCon 2024 are now available on Stan’s youtube channel.

We’re happy to make the content of StanCon 2024 accessible, even to those who couldn’t make it in person. And for those who were there, you might be tempted to revisit the excellent presentations. This is a wonderful snapshot of the applied and methodological work happening in Stan, and more broadly in the Bayesian modeling community. (StanCon is Stan centric, but not Stan exclusive!)

I’m very excited about all the research that is happening!!

What makes an MCMC sampler GPU-friendly?

Posted on October 29, 2024 12:20 PM by Charles Margossian

(This post is by Charles)

Art Owen (Stanford) read our paper on nesting Rhat to assess convergence in the many-short-chains regime of MCMC. He made a lot of great comments and asked some clarification questions. Notably:

It wasn’t clear to me what makes an MCMC GPU-friendly. Is there a canonical reference? E.g., maybe the Lao paper.

I’m not sure there is a canonical reference with an entirely satisfactory exposition. Lao et al (2020) present the main ideas from an engineering perspective and discuss why it is difficult to implement NUTS on a GPU. Meanwhile, Hoffman et al (2021) present ChEES-HMC, a GPU-friendly alternative to NUTS. (This paper inspired the design of other GPU-friendly samplers: see Sountsov and Hoffman (2022), Hoffman and Sountsov (2022), and Riou-Durand et al (2022).)

A key ingredient to make a sampler GPU-friendly is that each Markov chain requires the same operations, albeit with different inputs. Then each operation can be executed in parallel on single-instruction-multiple-data (SIMD) hardware. For example, in HMC, the gradients of the log density for all chains are evaluated in parallel and this calculation of many gradients is efficient. This is to be contrasted with more traditional implementations of parallel MCMC where each chain runs asynchronously on its own processing unit, and so there is no parallel evaluation of the gradients. The latter approach is less efficient than SIMD, which to me is one of the more subtle points made by Junpeng Lao and colleagues.

With this in mind, NUTS is not GPU-friendly because at each iteration, each chain simulates a Hamiltonian trajectory with a different length, meaning each chain requires a different number of operations. So we end up waiting for the slowest chain at each iteration or trying to anticipate the longest trajectory. This is likely a challenge, not only for NUTS, but for many locally adaptive methods (although not all).

Additional input from Matt

I’ve discussed the topic many times with Matt and Bob, and sought them out for additional insight.

Matt sends me the draft of an article which dives into the topic. The article is unpublished and currently being reviewed, so I cannot share it. But I believe it will become the canonical reference Art was asking for.

The paper carefully explains that using SIMD only works well when separate chains perform the same computation at the same time. This can be violated if a sampler is adaptive or has a control flow that makes the chains differ in the amount of computation they perform. Matt points to some results, which show that NUTS has a heavy control flow, meaning it spends a lot of computation beyond evaluating and differentiating the target log density (remember, calculating gradients on SIMD is efficient). For those more familiar with the NUTS paper, these operations include managing the internal tree proposal stack, which will incur a number of operations unique to each chain.

Matt adds:

Although raggedness (different chains running for different numbers of leapfrog steps or whatever) is a serious issue, I think it may actually be secondary to NUTS’s other overheads (switching back and forth between states at either end of the trajectory, other control flow stuff). Alexey Radul tried to implement a version of NUTS that deals with raggedness by syncing on every gradient call, which allows for chains to have different numbers of samples at different times, but IIRC the overhead of the virtual machine he created (in TensorFlow!!!) to solve the problem incurred more overhead than it was worth.

Additional input from Bob

To shed more light on why NUTS’s overhead, and more generally why control flow is a problem, Bob points to this blog post he wrote. I highly recommend reading it. The post discusses SIMD, memory management, vectorization, and branch-point prediction. A heavy control flow leads to a more difficult memory management and branch-point predictions, both of which are harder on a GPU than a CPU. A GPU-friendly sampler would avoid both, as much as possible.

(On the same email thread, Bob Carpenter and I also started a discussion on how certain likelihoods can be difficult to evaluate, for example when the likelihood is based on a system of differential equations. I’m not including this here, because this post seems long enough and I will visit this topic in another post.)

StanCon 2024… is a wrap!

Posted on September 22, 2024 6:45 PM by Charles Margossian

We made it…! Another StanCon! We’re on the other side.

StanCon ’24 took place on September 9th – 13th at Oxford University and, of all the StanCons I attended, this may well have been my favorite so far.

Some highlights:

The high quality of the talks. I remember thinking, during the first day “Wow, this is the fifth talk I’ve enjoyed today. There’s no low point.” (And I’m usually the first one to stare at the window after slide 1 and space out for the next twenty minutes.) I can think of two reasons why the talks were so engaging: (i) a lot of talks were applied, and speakers were mindful to introduce the field of application to a non-expert audience. The pedagogy was great. (ii) At the same time, all the talks were about Bayesian modeling, and so, while diverse in application, the overall topic of conference remained focused. We also had a packed and engaged lecture room, and there was no shortage of questions and comments (I don’t believe a single speaker got away with just a round of applause at the end of their talk).
The high quality of the tutorials. I stepped in and out for most tutorials, so I can only share some impressions. On Monday, Aki Vehtari, Noa Kallioinen and Teemu Säilynoja spoke about model comparison, and from what I saw, they did a wonderful job balancing technical justification and pragmatic recommendation. In parallel, Richard McElreath taught an introduction to Stan, and I discovered his excellent cat adoption example (a survival model), where all aspects of the model are built from the ground up. On Friday, I attended the course by Elizaveta Semenova and Anna Riha, who completely demystified Bayesian optimization; it was a great to learn about a new topic Stan could be applied to. I didn’t have a chance to stop by the other tutorials, though I heard positive feedback from attendees.
Gathering between events. The coffee room was animated, groups formed in the corridor, and folks were quick to find a pub in the evening. One night, I stood up to leave, and by the time I had gone around to say goodbye and make just one more comment about this presentation, or this idea, or the future of applied statistics, another hour had passed. We had an excellent reception dinner at Exeter college: we toasted many times to Brian (if you know, you know) and moved a “forbidden table” to take a group picture. When the main conference ended on Thursday, many came to play football (soccer :p) in the university park. Naturally, we were caught by the rain which made the grass wet and the game very slapstick. This is the second football game in the history of StanCon and once again I was on the losing team (next time!!). Overall, several attendees commented that the Stan community was friendly and willing to engage with each other’s problems, which is exactly what we hope for from such a conference.
Oxford: I like Oxford. Its buildings, its pubs, its beautiful parks and its beautiful grass. And I liked getting a sausage roll in the morning on my way to the conference registration desk, and fish’n’chips after a wet football game. I would’ve liked some heating in the dorm I stayed at, but hey, there was a kettle.

Goals for the Stan Governing Body (SGB)

I’ll now put on my SGB hat, and say that the conference helped us meet several goals of the SGB, including fostering the community, giving Stan developers and Bayesian experts a place to meet, and reaching out to Stan curious—sometimes even Bayesian curious—researchers and students.

The organizers were mindful of creating a diverse, inclusive, and accessible environment. We don’t collect official data on our participants, so what follows is an estimate. Focusing on one dimension, we can look at the participation of women in what is a historically male-dominated environment. Women presented 2/4 keynotes lectures, a bit short of 1/3 of the talks, and led 2/6 tutorials. To be transparent, this composition of the program arose rather organically—we did make an effort to achieve gender parity for the keynotes, but even then, the candidate list was long for both women and men. I believe we took several steps in the right direction to make StanCon more inclusive, and the SGB will discuss more thoroughly how well this and other facts align with our goals, and what further initiatives we need to take.

One of our key strategies to make StanCon more inclusive was the scholarship program, which was need-based and awarded based on (i) whether the participant would benefit from attending the conference, (ii) whether the participant is contributing to the conference (e.g. giving a talk) and (iii) whether the participant is from an underrepresented group at StanCon and more generally in STEM. Very few candidates satisfied all three criterion, and we would consider anyone who matched at least criterion (i) as a qualified applicant. The main restriction was monetary, and we awarded the scholarship to about ~1/3 of applicants. I’m left with the bitter thought that we could’ve awarded more scholarships (more and earlier for those who needed to file a visa application), if we had anticipated funding from late ticket purchases or gotten another sponsor, etc. These are regrets and lessons we will leverage for future StanCons, and hopefully we can admit more qualified applicants next time.

Recordings and public material

Another way in which we make the event accessible is by making talks available online for free. We recorded most of the talks (as long as we had consent from the speaker), and our videographer is currently editing the content. These things take a bit of time, but we’ll share the videos soon enough.

Also, the two panel discussions, led by Alex Andorra, will feature as the first ever live episodes of the podcast Learning Bayesian Statistics. Check out the podcast, its high quality content is available for free.

StanCon 2025 and other events

Well, now is the time to think about the next StanCon (and for some of us, to continue thinking about it)! The first step will be to find a host and local organizers, and soon the SGB will release a call for proposals.

If you want to support the Stan Community, hosting the conference is an excellent way to do so. It’s work but it’s an important event—-I really believe the Bayesian community would be worst off without it, and fortunately, we’ve had many volunteers who for past StanCons have taken on this value- and passion-driven project.

We also plan to have other events to cultivate the Stan Community, including StanConnects, which are shorter events focused on one particular application, and we want to support courses for summer schools. And of course, we’re always trying to support volunteers who wish to run such events.

StanCon 2024 is in 32 days!

Posted on August 9, 2024 5:46 AM by Charles Margossian

(this post is by Charles)

We’re well on track for a healthy StanCon and we’re nearing the completion of the conference’s program. Talks span a rich range of topics, including ranking sushi, infection diseases, pharmacometrics, artist collaboration, Stan on GPU, Gaussian processes, convincing your organization to adopt a Bayesian workflow and more.

We have four wonderful keynote lectures:

Bayesian multilevel causal modelling of the frequency and implications of having two HIV infections (Chris Wymant)
From the Depths to the Stars: How Modeling Shark Movements Illuminates Star Behavior (Vianey Leos Barajas)
Applied modeling for drug development (Sebastian Weber)
The Pragmatic Probabilistic Programmer (Mitzi Morris)

as well as some excellent tutorials:

Introduction to Stan (Richard McElreath)
Model selection (Aki Vehtari, Noa Kallioinen and Teemu Säilynoja)
Bayesian hierarchical models in Stan (Sean Pinkney)
Bayesian optimization with Stan (Anna Elisabeth Riha, Adam Howes, Seth Flaxman, Elizaveta Semenova)
Biodiversity modeling and forecasting (Will Pearse)
Infectious disease modeling in Stan (Juliette Unwin)

We also have a special guest for a special event (and we’ll keep up the suspense just a little bit longer…!)

As a somewhat veteran Stan developer (from the early days, though not the primordial ones—still 8 years!), I’m excited to see names which are both familiar and unfamiliar (to me). The community is large, intellectually diverse and it is going strong on multiple fronts: applied statistics, software development, and research underlying algorithms and methods.

I’d like to emphasize that StanCon is intended to bring together seasoned experts (this includes veteran practitioners of Bayesian modeling and Stan developers) and also practitioners who are new to Stan and even Bayesian statistics. StanCon continues to be a gateway event, which is why we have such an emphasis on tutorials.

Last but certainly not least, I’d like to thank our sponsors, who make it possible to run the scholarship program,

as well as our supporting institutions,

If you’re interested in attending, you can still register for StanCon!

Last week’s summer school on probabilistic AI

Posted on June 26, 2024 5:14 PM by Charles Margossian

(this post is by Charles)

Last week, the Nordic Summer School on Probabilistic AI took place in Copenhagen. I was fortunate to attend some of it (3 out of 5 days), and teach half a day on Monte Carlo methods. All the course material is available online. This includes slides, and extensive code demo and exercises. I believe recordings of the lectures will be released.

I’d like to share some thoughts/ideas that came up in class and in the hallway, particularly on the topics of variational inference (VI) and Markov chain Monte Carlo (MCMC). This by no means covers all the subjects taught during the summer school; these are simply two topics close to home for me. (The blogger’s bias)

VI: the main protagonist

VI played the more prominent role: not only was there a dedicated full day on the topic, many methods during the session on deep learning used VI for model training.

Not all of VI is black box

I enjoyed revisiting VI from a more classical perspective, notably coordinate ascent VI (CAVI), which requires tailoring the approximating family to the model being fitted. Contrast this with black box VI, where we try to come up with a single variational family, which (hopefully) works well across a range of models. Of course, in the context of the Bayesian workflow, black box algorithms are desirable. But where these fall short, more bespoke solutions can be worth pursuing.

A small note on estimating uncertainty with VI

The discussion did not shy away from VI’s potential shortcoming when estimating the uncertainty of a target distribution. Early exercises examined how well VI estimates precision. I found this choice interesting, since I’m more used to thinking about variance (or the even more intuitive standard deviation). In the context of VI, the distinction matters: as my colleagues, Lawrence Saul and Loucas Pillaud-Vivien, and I recently showed, VI can simultaneously estimate precision very well and variance very poorly! (see here; see also the classic by Turner and Sahini (2011)).

VI for full Bayes and for maximum likelihood estimation

I also noted two distinct applications of VI. First, we approximated the posterior distribution of a Bayesian regression, and our goal was to estimate the posterior mean and precision of interpretable parameters.

Secondly, we did maximum likelihood estimation (MLE), with the usual trick of maximizing the evidence lower bound (ELBO). In this latter case, VI was used to marginalize out latent variables and approximate a marginal likelihood, in the manner of an expectation-maximization (EM) algorithm. Here, there was no immediate requirement for VI to quantify uncertainty — could this explain why VI works well in this type of application? (Something to dig deeper).

The variational autoencoder (VAE) served as a canonical example. The goal was to point estimate the weights of the “decoding” neural network, by maximizing the likelihood after marginalizing the latent variables. Jes Frellen, the instructor, mention an application in which the authors use a Bayesian VAE (Daxberger et al, 2019), i.e. train a Bayesian neural network. This brings us back to the full Bayesian case, and I added the paper to my reading list.

ML methods need to be broken down into a model and a training procedure

During the deep learning module, Jes Frellen carefully distinguished the modeling and the training (“inference” in statistics jargon). Many ML methods (e.g, VAE) include both a model (e.g., the “decoder”) and a learning method (e.g. the “encoder” or the “amortized VI”). It is enlightening to separate the two: we might for example try the same deep learning model with a different inference algorithm. Conversely, we might realize that the inference algorithm in the VAE, i.e. amortized VI, can actually be applied to a broader range of models (see for example Agrawal and Domke (2021); also my paper with Dave Blei).

And now: Monte Carlo methods

In the midst of all this came the time to the learn about Monte Carlo methods :) I had a lively lunch conversation with students about whether Monte Carlo methods scale in high dimensions. (It would be worth writing a dedicated blog post on the subject; for now, I’ll simply state the short answer: no, there is no intrinsic curse of dimensionality. The longer answer is… longer). The module I taught covered standard Monte Carlo (for example to estimate the ELBO), Markov chain Monte Carlo (MCMC), and importance sampling (to estimate leave-one-out predictions based on posterior samples… from MCMC or from VI…).

In principle, MCMC and VI solve the same problem, and so we ended the module with a high-level comparison between the two approaches. There is a somewhat general consensus that, subject to strict computational budget (i.e. limited computation relative to complexity/size of the model) VI can achieve better results. However, as computation increases, MCMC eventually achieves a smaller error. This corresponds to the sketch in our Bayesian Workflow paper (Figure 5). Two students presenting at the poster session (Devina Mohan and Benhard—whose last name I didn’t write in my notes… :’( ) observed this trade-off when applying VI and MCMC to their problem.

I believe this trade-off needs to be formalized and better understood. I’m not aware of any theoretical or even conceptual justification for why we might expect this trade-off between VI and MCMC, even if empirical evidence exists…

The other thing I highlighted, building on previous lectures, is that with MCMC we can separately check the inference (using diagnostics such as R-hat, ESS estimates, etc.) and check the trained model (with posterior predictive checks, cross-validation, etc.). In VI, we are typically confined to only doing the second. If the checks for the trained model are good, that’s great; and who cares about whether the posterior inference is accurate? (One might argue.) On the other hand, if the checks fail, we will not know whether the fault lies in the model or in the inference, or in both. To go back to the VAE: do we have a problem with the decoder or with the encoder?

That said, they are promising approaches to diagnosing the inference of VI: see Yao et al (2018), Huggins et al (2020), Biswas and Mackay (2024), and of course, good old (expensive) simulation based calibration checking (Talts et al, 2018).

Some other notes

The book Deep Generative Modeling by Jakub Tomczak was recommended by several speakers. I myself have this book on my desk. I’ve only read it selectively but every section I worked through was excellent and I definitely have a lot to learn from this textbook.

Ole Winther taught a one-hour crash course on stochastic calculus (!!), explaining how SDEs arise as a limiting case of ladder VAEs (wow!). I really appreciated the derivation of Ito’s rule.

Antonio Vergari introduced probabilistic circuits as a “love letter to mixture models”. This demystified the subject and left me wanting to learn more. Several students I spoke with after class expressed a similar sentiment.

Copenhagen was really beautiful and a good deal of fun. I hope to visit again.

StanCon 2024: scholarships, sponsors, and other news

Posted on June 24, 2024 2:02 PM by Charles Margossian

(this post is by Charles)

StanCon 2024 is happening in 77 days!!

We have reviewed and accepted a number of stellar proposals for talks and tutorials; and we have 4 wonderful keynote lectures lined up:

Bayesian multilevel causal modelling of the frequency and implications of having two HIV infections by Chris Wymant (Oxford University)
From the Depths to the Stars: How Modeling Shark Movements Illuminates Star Behavior by Vianey Leos Barajas (University of Toronto)
Applied modeling for drug development by Sebastian Weber (Novartis Pharma)
The Pragmatic Probabilistic Programmer by Mitzi Morris (Columbia University)

As speakers confirm their attendance, the program gets updated everyday on the event website. I’ll also post regularly here, on the Stan forum and on social media about updates on the event.

Some important notes:

Early bird registration ends on July 1st. After that, ticket prices go up by £100 (but stay the same for students)
We have now accepted a large number of proposals for talks and tutorials (soon to be posted on the event website). Spots are being filled up: if you want to submit a proposal, do it soon!

We have also received an outstanding number of scholarship applications and we are thrilled to see the interest of young scientists with a broad range of interests and aspirations. Today, we issued a first wave of scholarships and we will issue more in the coming weeks. As described on the event website:

The purpose of the StanCon scholarship is to make StanCon a more accessible and inclusive event. The StanCon scholarship covers registration for the tutorial and the main conference, as well as housing (for non-local attendees). Scholarships are awarded on a need-base, and are usually targeted at early career scientists, including students and post-docs, and members of underrepresented groups in STEM.

The funding for scholarships comes from revenue generated by tickets and more importantly from our sponsors, which I’d like to thank and recognize:

In addition, we received support from the following institutions: the Max Plank Institute for Evolutionary Anthropology, the Oxford Internet Institute, and NumFocus.

Given the high-volume of scholarship applications, we are renewing our efforts to find sponsors. Sponsors enjoy prominent mention on the StanCon website and throughout the conference, as well as other benefits such as free tickets. Currently, sponsorship begins at £2,500, and includes two free tickets. Funding helps support StanCon (which is a not-for-profit event supported by volonteers) and gives more applicants the opportunity to attend StanCon on a scholarship.

If you might be interested in sponsoring StanCon or know sponsors we should reach out to, feel free to contact us via stancon2024 at mc-stan dot org.

Progress in 2023, Charles edition

Posted on January 19, 2024 3:00 PM by Charles Margossian

Following the examples of Andrew, Aki, and Jessica, and at Andrew’s request:

Published:

Variational Inference with Gaussian Score Matching. Neural Information Processing Systems.
(Chirag Modi, CM, Yuling Yao, Robert Gower, David Blei and Lawrence Saul)
The Shrinkage-Delinkage Trade-off: An Analysis of Factorized Gaussian Approximations for Variational Inference. Uncertainty in Artificial Intelligence (Oral).
(CM and Lawrence Saul)
Adaptive Tuning for Metropolis Adjusted Langevin Trajectories. Artificial Intelligence and Statistics.
(Lionel Riou-Durand, Pavel Sountsov, Jure Vogrinc, CM and Sam Power)

Unpublished:

For how many iterations should we run Markov chain Monte Carlo?
(CM and Andrew Gelman)
Amortized Variational Inference: When and Why?
(CM and Dave Blei)
Nested Rhat: Assessing the convergence of Markov chain Monte Carlo when running many short chains. (Revised).
(CM, Matthew Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari and Andrew Gelman)
General Adjoint-Differentiated Laplace approximation. Technical Report.
(CM)

This year, I also served on the Stan Governing Body, where my primary role was to help bring back the in-person StanCon. StanCon 2023 took place at the University of Washington in St. Louis, MO and we got the ball rolling for the 2024 edition which will be held at Oxford University in the UK.

It was also my privilege to be invited as an instructor at the Summer School on Advanced Bayesian Methods at KU Leuven, Belgium and teach a 3-day course on Stan and Torsten, as well as teach workshops at StanCon 2023 and at the University of Buffalo.

Summer School on Advanced Bayesian Methods in Belgium

Posted on June 26, 2023 2:27 PM by Charles Margossian

(this post is by Charles)

This September, the Interuniversity Institute for Biostatistics and statistical Bioinformatics is holding its 5th Summer School on Advanced Bayesian Methods. The event is set to take place in Leuven, Belgium. From their webpage:

As before, the focus is on novel Bayesian methods relevant to the applied statistician. In the fifth edition of the summer school, the following two courses will be organized in Leuven from 11 to 15 September 2023:

Three-day course (11-13 September) on Bayesian Workflow for hierarchical and ODE-based models using Stan by Dr. Charles Margossian (Flatiron Institute, Center for Computational Mathematics in New York)

Two-day course (14-15 September) on Spatial modeling with inlabru in ecology and beyond — background and practice by Prof. Janine Illian (University of Glasgow, UK)

The target audience of the summer school are statisticians and/or epidemiologists with a sound background in statistics, but also with a background in Bayesian methodology. In both courses, practical sessions are organized, so participants are asked to bring along their laptop with the appropriate software (to be announced) pre-installed.

I’m happy to do a three-day workshop on Stan: we’ll have ample time to dig into a lot of interesting topics and students will have a chance to do plenty of coding.

I’m also looking forward to the course on spatial modeling. I’ve worked quite a bit on the integrated Laplace approximation (notably its implementation in autodiff systems such as Stan), but I’ve never used the INLA package itself (or one of its wrappers), nor am I very familiar with applications in ecology. I expect this will be a very enriching experience.

The registration deadline is July 31st.

Scholarships for StanCon 2023

Posted on April 22, 2023 7:36 PM by Charles Margossian

(this post is by Charles Margossian)

Applications for the StanCon scholarships are still available. As stated on the conference website:

The purpose of the StanCon scholarship is to make StanCon a more accessible and inclusive event.

Participants who require financial assistance to attend the conference may apply for a scholarship by filling out this form. The StanCon scholarship covers registration for the tutorial and the main conference, as well as local lodging. Scholarships are awarded on a need-base, and prioritize early career scientists, including students and post-docs, and members of underrepresented groups in STEM.

Applications are reviewed on a rolling basis, and scholarships are awarded based on available funds.

Scholarships are made possible by the generosity of our sponsors for the conference: the Washington University in St Louis, Metrum Research Group, and NumFocus. We have already awarded scholarships to highly qualified applicants and we’re very excited to give them the opportunity to attend the conference.

Among the current awardees, we had some well versed Stan users, and also people who are new to Stan. And this is great: the conference gathers experts and also acts as a gateway event, notably via its tutorials, which cover the basics of Stan and more advanced topics.

As a reminder, StanCon is set to take place June 20th – 23rd at the Washington University in St Louis, Missouri.

StanCon 2023: deadline for proposals extended!

Posted on April 1, 2023 10:53 AM by Charles Margossian

…. and good news: the proposal deadline for StanCon 2023 has been extended to **April 30th**. We are mostly looking for proposals for talks, themed sessions, and posters:

We are interested in a broad range of topics relevant to the Stan community, including:

– Applications of Bayesian statistics using Stan in all domains

– Software development to support or complement the Stan ecosystem

– Methods for Bayesian modeling, relevant to a broad range of users

– Theoretical insights on common Bayesian methods and models

– Visualization techniques

– Tools for teaching Bayesian modeling.

Proposals are reviewed on a rolling basis. Several talks have already been accepted and added to the schedule, and I’m very excited about the speakers we have so far! Topics include: applications of Stan in numerous fields (healthcare, econometrics, pharmacometrics, ….), interfacing Stan with custom inference algorithms, advances in Bayesian methodology, visualisation and monitoring of MCMC while the Markov chains are running, discussions on Stan/Torsten and Turing.jl and more.

(and this does not include some of the exciting submissions we’ve recently received, although I can’t comment too much…)

There are also some wonderful tutorials lined up. I want to emphasize that a lot of the expertise the community uses to answer questions on the stan forum (https://discourse.mc-stan.org/) is taught during these tutorials.

Here’s a link to the event page https://mc-stan.org/events/stancon2023/ and as always, questions can be addressed to [email protected].

Happy weekend everyone!

StanCon 2023: call for proposals is still open!

Posted on March 22, 2023 11:29 AM by Charles Margossian

This year, we’re bringing back StanCon in person!

StanCon is an opportunity for members of the broader Stan community to come together and discuss applications of Stan, recent developments in Bayesian modeling, and (most importantly perhaps) unsolved problems. The conference attracts field practitioners, software developers, and researchers working on methods and theory. This year’s conference will take place on June 20 – 23 at Washington University in St Louis, Missouri.

The keynote speakers are:

Bob Carpenter (Flatiron Institute)
John Kruschke (Indiana University)
Mariel Finucane (Mathematica Policy Research)
Siddhartha Chib (Washington University in St. Louis)

Proposals for talks, sessions and tutorials are due on March 31st (though it looks like we’ll be able to extend the deadline). Posters are accepted on a rolling basis. From the website:

We are interested in a broad range of topics relevant to the Stan community, including:

Applications of Bayesian statistics using Stan in all domains

Software development to support or complement the Stan ecosystem

Methods for Bayesian modeling, relevant to a broad range of users

Theoretical insights on common Bayesian methods and models

Visualization techniques

Tools for teaching Bayesian modeling

Keep in mind that StanCon brings together a diverse audience. Material which focuses on an application should introduce the problem to non-field experts; theoretical insights should be linked to problems modelers are working on, etc.

Parallelization for Markov chain Monte Carlo with heterogeneous runtimes: a case-study on ODE-based models

Posted on March 20, 2023 10:28 AM by Charles Margossian

(this post is by Charles)

Last week, BayesComp 2023 took place in Levi, Finland. The conference covered a broad range of topics in Bayesian computation, with many high quality sessions, talks, and posters. Here’s a link to the talk abstracts. I presented two posters at the event. The first poster was on assessing the convergence of MCMC in the many-short-chains regime. I already blogged about this research (link): here’s the poster and the corresponding preprint.

The second poster was also on the topic of running many chains in parallel but in the context of models based on ordinary differential equations (ODEs). This was the outcome of a project led by Stanislas du Ché, during his summer internship at Columbia University. We examined several pharmacometrics models, with likelihoods parameterized by the solution to an ODE. Having to solve an ODE inside a Bayesian model is challenging because the behavior of the ODE can change as the Markov chains journey across the parameter space. An ODE which is easy-to-solve at some point can be incredibly difficult somewhere else. In the past, we analyzed this issue in the illustrative planetary motion example (Gelman et al (2020), Section 11). This is the type of problem where we need to be careful about how we initialize our Markov chains and we should not rely on Stan’s defaults. Indeed, these defaults can start you in regions where your ODE is nearly impossible to solve and completely kill your computation! A popular heuristic is to draw the initial point from the prior distribution. On a related note, we need to construct priors carefully to exclude patently absurd parameter values and (hopefully) parameter values prone to frustrate our ODE solvers.

Even then—and especially if our priors are weakly informative—our Markov chains will likely journey through challenging regions. A common manifestation of this problem is that some chains lag behind because their random trajectories take them through areas that frustrate the ODE solver. Stanislas observed that this problem becomes more acute when we run many chains. Indeed, as we increase the number of chains, the probability that at least some of the chains get “stuck” increases. Then, even when running chains in parallel, the efficiency of MCMC as measured by effective sample size per second (ESS/s) eventually goes down as we add more chains because we are waiting for the slowest chain to finish!

Ok. Well, we don’t want to be punished for throwing more computation at our problem. What if we instead waited for the fastest chains to finish? This is what Stanislas studied by proposing a strategy where we stop the analysis after a certain ESS is achieved, even if some chains are still warming up. An important question is what bias does dropping chains introduce? One concern is that the fastest chains are biased because they fail to explore a region of the parameter space which contains a non-negligible amount of probability mass and where the ODE happens to be more difficult to solve. Stanislas tried to address this problem using stacking (Yao et al 2018), a strategy designed to correct for biased Markov chains. But stacking still assumes all the chains somehow “cover” the region where the probability mass concentrates and, when properly weighted, produce unbiased Monte Carlo estimators.

We may also wonder about the behavior of the slow chains. If the slow chains are close to stationarity, then by excluding them we are throwing away samples which would reduce the variance of our Monte Carlo estimators, however, it’s not worth waiting for these chains to finish if we’ve already achieved the wanted precision. What is more, as Andrew Gelman pointed out to me, slow chains can often be biased, for example if they get stuck in a pathological region during the warmup and never escape this region—as was the case in the planetary motion example. But we can’t expect this to always be the case.

In summary, I like the idea of waiting only for the fastest chains and I think understanding how to do this in a robust manner remains an open question. This work posed the problem and took steps in the right direction. There was a lot of traffic at the poster and I was pleased to see many people at the conference working on ODE-based models.

What Nested R-hat teaches us about the classical R-hat

Posted on October 5, 2022 3:00 PM by Charles Margossian

(this post is by Charles)

My colleagues Matt Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari, Andrew Gelman, and I released a preprint titled “Nested R-hat: assessing the convergence of Markov chain Monte Carlo when running many short chains”. This is a revision of an earlier preprint. Here’s the abstract:

The growing availability of hardware accelerators such as GPUs has generated interest in Markov chains Monte Carlo (MCMC) workflows which run a large number of chains in parallel. Each chain still needs to forget its initial state but the subsequent sampling phase can be almost arbitrarily short. To determine if the resulting short chains are reliable, we need to assess how close the Markov chains are to convergence to their stationary distribution. The R-hat statistic is a battle-tested convergence diagnostic but unfortunately can require long chains to work well. We present a nested design to overcome this challenge, and introduce tuning parameters to control the reliability, bias, and variance of convergence diagnostics.

The paper is motivated by the possibility of running many Markov chains in parallel on modern hardware, such as GPU. Increasing the number of chains allows you to reduce the variance of your Monte Carlo estimator, which is what the sampling phase is for, but not the bias, which is what the warmup phase is for (that’s the short story). So you can trade length of the sampling phase for number of chains but you still need to achieve approximate convergence.

There’s more to be said about the many-short-chains regime but what I want to focus on is what we’ve learned about the more classic R-hat. The first step is to rewrite the condition, R-hat < 1.01, as a tolerance on the variance of the per chain Monte Carlo estimator. Intuitively, we’re running a stochastic algorithm to estimate an expectation value, which is a non-random quantity. Hence, different chains should, despite their different initialization and seed, still come to an “agreement”. This agreement is measured by the variance of the estimator produced by each chain.

Now here’s the paradox. The expected squared error of a per chain Monte Carlo estimator decomposes into a squared bias and a variance. When diagnosing convergence, we’re really interested in making sure the bias has decayed sufficiently (a common phrase is “has become negligible”, but I find it useful to think of MCMC as a biased algorithm). But, with R-hat, we’re really monitoring the variance, not the bias! So how can this be a useful diagnostic?

This paradox occurred to us when we rewrote R-hat to monitor the variance of Monte Carlo estimators constructed using groups of chains or superchains, rather than a single chain. The resulting nested R-hat decays to 1 provided we have enough chains, even if the individual chains are short (think a single iteration). But here’s the issue: regardless of wether the chains are close to convergence or not, R-hat can be made arbitrarily close to 1 by increasing the size of each superchain and thence decreasing the variance of their Monte Carlo estimator. Which goes back to my earlier point: you cannot monitor bias simply by looking at variance.

Or can you?

Here’s the twist: we now force all the chains within a superchain to start at the same point. I had this idea initially to deal with multimodal distributions. The chains within a group are no longer independent, though eventually they (hopefully) will forget about each other. In the mean time we have artificially increased the variance. Doing a standard variance decomposition:

total variance = variance of conditional expectation + expected conditional variance

Here we’re conditioning on the initial point. If the expected value of each chain no longer depends on the initialization, then the first term — variance of the conditional expectation — goes to 0. This is a measurement of “how well the chains forget their starting point”, and we call it the violation of stationarity. It is indifferent to the number of chains. The second term, on the other hand, persists even if your chains are stationary but it decays to 0 as you increase the number of chains. More generally, this persistent variance can be linked to the Effective Sample Size.

We argue that nested R-hat is a (scaled) measure of the violation of stationarity, biased by the persistent variance. How does this link to the squared bias? Well, both bias and violation decay as we warm up our chains, so one can be used as a “proxy clock” of the other. I don’t have a fully general theory for this but if you consider a Gaussian target and are willing to solve an SDE, you can show that the violation and the squared bias decay at the same rate. This is also gives us insight about how over-dispersed initializations should be (or not be) for nested R-hat to be reliable.

Now nested R-hat is a generalization of R-hat, meaning our analysis carries over! We moreover have a theory of what R-hat measures which does not assume stationarity. Part of the conceptual leap is to do an asymptotic analysis which considers an infinite number of finite (non-stationary) chains, rather than a single infinitely long (and hence stationary) chain.

Moving forward, I hope this idea of a proxy clock will help us identify cases where R-hat and its nested version are reliable, and how we might revise our MCMC processes to get more reliable diagnostics. Two examples discussed in the preprint: choice of initial variance and how to split a fixed total number of chains into superchains.

Design choices and user experience in Stan and TensorFlow Probability

Posted on November 20, 2021 10:31 AM by Charles Margossian

(This post is by Charles, not Andrew)

I’ve been a Stan developer for a bit more than 5 years now (!!) and this summer, I interned at Google Research with the team behind TensorFlow Probability (TFP). I wanted to study a new probabilistic programming language and gain some perspective on how Stan does things. I’ll compare the two languages but rather than focus on performance, I want to talk about design and user experience. Which is fun because this aspect gets a bit lost in academic writing.

As an applied statistician, I love the Stan user experience because the focus is on modeling. Stan’s default inference is general-purpose and can be used on a very wide range of models. Other than tweaking a few tuning parameters, I usually rely on Stan’s default inference engine, which is a dynamic Hamiltonian Monte Carlo sampler. This means I can spend more time worrying about the model and whether or not it gives me the insights I seek. Being able to use “sampling” statements in the model block, i.e.

y ~ normal(theta, sigma)

is really nice because this is how I write models on paper. No graphs, no names chosen from a menu of options, no formula. For me, nothing beats writing down the generative process. Pedagogically this is a little bit confusing because any statement in Stan’s model block is not an actual sampling statement, since nothing gets sampled. Rather it’s a call to increment the log of the joint density or, in Stan’s jargon, the “target”. So maybe we should only teach the equivalent statement

target += normal_lpdf(y | theta, sigma)?

I’m not sure. Currently I teach both. I also like Stan’s coding blocks. Since a Bayesian model is specified by a joint distribution over observed variables and parameters, declaring variables in a data and parameter block further resonates with my statistical formulation of the model. There’s also another advantage: the blocks each have a different computation cost which is helpful to think about when writing computationally intensive models (see for example the discussion in this paper).

TFP acts more as a Python package than its own language. Within the same script, we’ll process the data, specify our model, run our inference, and do all the post-processing. This doesn’t have the pedagogical appeal of building blocks, nor its rigidity. Based on the tutorials I worked through, there are no pseudo-sampling statements, which means less clarity in terms of what model we’re writing down but also less confusion in terms of what computation takes place. Matt Hoffman tells me TFP has something akin to Stan’s sampling but it has less of a DSL (domain specific language) feel to it. When writing the model in TFP we can use the many functions available in Python with one super important caveat: the code needs to be differentiable, if we use gradient-based algorithms (which we probably do). With that in mind, TFP is designed to be on top of two automatic differentiation libraries: TensorFlow (surprise, surprise) and Jax. Stan has its own autodiff library so *broadly speaking* everything you write in Stan is safely differentiable. I was advised to use Jax which is pretty awesome — the documentation itself is a fascinating read. One compelling feature is that Jax supports many NumPy functions. And NumPy is great because of all the mathematical operations it offers and because it seamlessly parallelizes these operations across CPU cores. Jax takes NumPy a step further by parallelizing operations across accelerators, such as GPUs, and making a lot of these functions differentiable and therefore amiable to gradient-based inference. For those who are familiar with NumPy, this is great. For me there was a bit of a learning curve.

It’s worth pointing out that there is quite a bit of overlap between the functions Jax and Stan support but there are differences which reflect the motivating problems behind each project. Over the years, Stan’s support for ODEs has become very good and quite broad. Jax offers more specific ODE support, which works well for certain problems (e.g. Neural ODEs), less so in other cases. Note that there are pending pull requests extending Jax’s ODE support. But let’s focus on design here. Stan’s ODEs report error messages when they fail: for instance the numerical integrator reaches the maximum number of steps. Jax on the other hand reports… nothing. This is because apparently exceptions don’t exist on XLA (accelerated linear algebra, see this discussion on GitHub). Not having error messages caused me a lot of debugging grief. Overall I did find Stan did a better job issuing error and warning messages. Another example is divergent transitions: Stan automatically warns you when divergences occur. In TFP, you have to dig them out, which means you need to already know what a divergent transition is, and that may not be the case for many users.

Ok, this is where things get fun. TFP asks you to specify your inference algorithm. If you call tfp.mcmc you need to define the transition kernel of your Markov chain and usually, you should also write down an adaptive scheme. This takes a few lines of code, even if you use TFP’s built-in features. You then probably want to specify a “trace” function which records various properties of your sampler across iterations, such as whether a divergent transition occurs. This is more effort than relying on Stan’s very good defaults. But it also makes it easier to try out new transition kernels and moreover new inference algorithms! How easy is it to do this in Stan? It’s hard and this is not what Stan was designed for. That said, the underlying C++ code is open-source, well-written and well documented, and so it can be hacked. People do it. I’ve done it. And this leads to very interesting forks of Stan.

The question is: as an applied Statistician, do I resent not being able to specify my own inference algorithm in Stan? This short answer is occasionally. I’m biased because many of my scientific collaborations begin with people not being able to fit their model in Stan. Often we do eventually manage to do it by reparameterizing the model, baking more information into the priors, rewriting the ODE integrator, etc. Working out a new inference algorithm and efficiently implementing it is a bigger leap. My research does focus on the problems that frustrate our inference and for which we want to develop better methods. But this takes time. In the meantime, I believe there is a strong case to be made for an algorithm that has been stress-tested throughout the years, theoretically well-studied, and which empirically has worked for many problems.

On the other hand, as a researcher in the field of statistical algorithms, I do enjoy the modularity TFP provides and having the ability to easily try out new methods. TFP also has “experimental” features which are not fully supported but can nevertheless be tried out. I pitched the idea of including features explicitly labeled as “experimental” in Stan, as a middle-ground between making available some of the tech we’re developing and being upfront about what remains ongoing research. This gets into questions of what Stan’s mission is and how much we trust users to take heed of our warning messages. Long story short, for the time being, experimental features won’t be part of Stan’s official releases (except ADVI, which does issue a warning message).

Comparing TFP and Stan reminds me a little bit of the historical conflict between Microsoft and Apple, that is a modular approach versus end-to-end control designed to create a seamless user experience, at least on the algorithmic front. In the past I’ve approached this question by thinking about specific problems, the performance benefits of using specialized versus general techniques, and whether it makes sense to have a universal inference algorithm, or at least a universal default. But I think a big component of software development is design and user experience. How do “everyday scientists”, if you’re willing to accept this term, interact with the software? What do they want to focus on? What balance between guidance and freedom works best for a broad audience? Now when it comes to model specification, both Stan and TFP champion modularity and giving the users a large control over their model, which we could contrast to other packages out there. This too is a design choice and one we could ponder about.

Statistical Modeling, Causal Inference, and Social Science

Author Archives: Charles Margossian

MSc and PhD programs in statistics at the University of British Columbia

Starting a new position at the University of British Columbia

Elections for the Stan Governing Body 2025

On a more personal note…

Flatiron-wide Autumn Meeting: talks are now online

Call for StanCon 2025+

StanCon 2024 Oxford: recorded talks are now released!

What makes an MCMC sampler GPU-friendly?

StanCon 2024… is a wrap!

Goals for the Stan Governing Body (SGB)

Recordings and public material

StanCon 2025 and other events

StanCon 2024 is in 32 days!

Last week’s summer school on probabilistic AI

VI: the main protagonist

Not all of VI is black box

A small note on estimating uncertainty with VI

VI for full Bayes and for maximum likelihood estimation

ML methods need to be broken down into a model and a training procedure

And now: Monte Carlo methods

Some other notes

StanCon 2024: scholarships, sponsors, and other news

Progress in 2023, Charles edition

Summer School on Advanced Bayesian Methods in Belgium

Scholarships for StanCon 2023

StanCon 2023: deadline for proposals extended!

StanCon 2023: call for proposals is still open!

Parallelization for Markov chain Monte Carlo with heterogeneous runtimes: a case-study on ODE-based models

What Nested R-hat teaches us about the classical R-hat

Design choices and user experience in Stan and TensorFlow Probability