Suppose I want to estimate my chances of winning the lottery by buying a ticket every day. That is, I want to do a pure Monte Carlo estimate of my probability of winning. How long will it take before I have an estimate that’s within 10% of the true value? It’ll take… There’s a big […]

## Abuse of expectation notation

I’ve been reading a lot of statistical and computational literature and it seems like expectation notation is absued as shorthand for integrals by decorating the expectation symbol with a subscripted distribution like so: This is super confusing, because expectations are properly defined as functions of random variables. For example, the square bracket convention arises because […]

## Rao-Blackwellization and discrete parameters in Stan

I’m reading a really dense and beautifully written survey of Monte Carlo gradient estimation for machine learning by Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. There are great explanations of everything including variance reduction techniques like coupling, control variates, and Rao-Blackwellization. The latter’s the topic of today’s post, as it relates directly to […]

## Royal Society spam & more

Just a rant about spam (and more spam) from pay-to-publish and closed-access journals. Nothing much to see here. The latest offender is from something called the “Royal Society.” I don’t even know to which king or queen this particular society owes allegiance, because they have a .org URL. Exercising their royal prerogative, they created an […]

## Beautiful paper on HMMs and derivatives

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the […]

## Macbook Pro (16″ 2019) quick review

I just upgraded yesterday to one of the new 2019 Macbook Pro 16″ models: Macbook Pro (16″, 2019), 3072 x 1920 pixel display, 2.4 GHz 8-core i9, 64GB 2667 MHz DDR4 memory, 2880 x 1800 pixel display, AMD Radeon Pro 5500M GPU with 4GB of GDDR6 memory, 1 TB solid-state drive US$4120 list including Apple […]

## Field goal kicking—like putting in 3D with oblong balls

Putting Andrew Gelman (the author of most posts on this blog, but not this one), recently published a Stan case study on golf putting [link fixed] that uses a bit of geometry to build a regression-type model based on angles and force. Field-goal kicking In American football, there’s also a play called a “field goal.” […]

## Econometrics postdoc and computational statistics postdoc openings here in the Stan group at Columbia

Andrew and I are looking to hire two postdocs to join the Stan group at Columbia starting January 2020. I want to emphasize that these are postdoc positions, not programmer positions. So while each position has a practical focus, our broader goal is to carry out high-impact, practical research that pushes the frontier of what’s […]

## Non-randomly missing data is hard, or why weights won’t solve your survey problems and you need to think generatively

Throw this onto the big pile of stats problems that are a lot more subtle than they seem at first glance. This all started when Lauren pointed me at the post Another way to see why mixed models in survey data are hard on Thomas Lumley’s blog. Part of the problem is all the jargon […]

## All the names for hierarchical and multilevel modeling

The title Data Analysis Using Regression and Multilevel/Hierarchical Models hints at the problem, which is that there are a lot of names for models with hierarchical structure. Ways of saying “hierarchical model” hierarchical model a multilevel model with a single nested hierarchy (note my nod to Quine’s “Two Dogmas” with circular references) multilevel model a […]

## Calibration and sharpness?

I really liked this paper, and am curious what other people think before I base a grant application around applying Stan to this problem in a machine-learning context. Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 243–268. Gneiting […]

## Seeking postdoc (or contractor) for next generation Stan language research and development

The Stan group at Columbia is looking to hire a postdoc* to work on the next generation compiler for the Stan open-source probabilistic programming language. Ideally, a candidate will bring language development experience and also have research interests in a related field such as programming languages, applied statistics, numerical analysis, or statistical computation. The language […]

## Why does my academic lab keep growing?

Andrew, Breck, and I are struggling with the Stan group funding at Columbia just like most small groups in academia. The short story is that to apply for enough grants to give us a decent chance of making payroll in the following year, we have to apply for so many that our expected amount of […]

## Software release strategies

Scheduled release strategy Stan’s moved to a scheduled release strategy where we’ll simply release whatever we have every three months. The Stan 2.20 release just went out last week. So you can expect Stan 2.21 in three months. Our core releases include the math library, the language compiler, and CmdStan. That requires us to keep […]

## AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting: Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong. Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend. I really like the problem […]

## Peter Ellis on Forecasting Antipodal Elections with Stan

I liked this intro to Peter Ellis from Rob J. Hyndman’s talk announcement: He [Peter Ellis] started forecasting elections in New Zealand as a way to learn how to use Stan, and the hobby has stuck with him since he moved back to Australia in late 2018. You may remember Peter from my previous post […]

## Stan examples in Harezlak, Ruppert and Wand (2018) *Semiparametric Regression with R*

I saw earlier drafts of this when it was in preparation and they were great. Jarek Harezlak, David Ruppert and Matt P. Wand. 2018. Semiparametric Regression with R. UseR! Series. Springer. I particularly like the careful evaluation of variational approaches. I also very much like that it’s packed with visualizations and largely based on worked […]

## StanCon 2019: 20–23 August, Cambridge, UK

It’s official. This year’s StanCon is in Cambridge. For details, see StanCon 2019 Home Page What can you expect? There will be two days of tutorials at all levels and two days of invited and submitted talks. The previous three StanCons (NYC 2017, Asilomar 2018, Helsinki 2018) were wonderful experiences for both their content and […]

## Ben Lambert. 2018. *A Student’s Guide to Bayesian Statistics.*

Ben Goodrich, in a Stan forums survey of Stan video lectures, points us to the following book, which introduces Bayes, HMC, and Stan: Ben Lambert. 2018. A Student’s Guide to Bayesian Statistics. SAGE Publications. If Ben Goodrich is recommending it, it’s bound to be good. Amazon reviewers seem to really like it, too. You may […]

## (Markov chain) Monte Carlo doesn’t “explore the posterior”

[Edit: (1) There’s nothing dependent on Markov chain—the argument applies to any Monte Carlo method in high dimensions. (2) No, (MC)MC is not not broken.] First some background, then the bad news, and finally the good news. Spoiler alert: The bad news is that exploring the posterior is intractable; the good news is that we […]