Later this week, I’m going to be GM-ing my first session of Blades in the Dark, a role-playing game designed by John Harper. We’ve already assembled a crew of scoundrels in Session 0 and set the first score. Unlike most of the other games I’ve run, I’ve never played Blades in the Dark, I’ve only […]

## Drunk-under-the-lamppost testing

Edit: Glancing over this again, it struck me that the title may be interpreted as being mean. Sorry about that. It wasn’t my intent. I was trying to be constructive and I really like that analogy. The original post is mostly reasonable other than on this one point that I thought was important to call […]

## Super-duper online matrix derivative calculator vs. the matrix normal (for Stan)

I’m implementing the matrix normal distribution for Stan, which provides a multivariate density for a matrix with covariance factored into row and column covariances. The motivation A new colleague of mine at Flatiron’s Center for Comp Bio, Jamie Morton, is using the matrix normal to model the ocean biome. A few years ago, folks in […]

## Make Andrew happy with one simple ggplot trick

By default, ggplot expands the space above and below the x-axis (and to the left and right of the y-axis). Andrew has made it pretty clear that he thinks the x axis should be drawn at y = 0. To remove the extra space around the axes when you have continuous (not discrete or log […]

## Prior predictive, posterior predictive, and cross-validation as graphical models

I just wrote up a bunch of chapters for the Stan user’s guide on prior predictive checks, posterior predictive checks, cross-validation, decision analysis, poststratification (with the obligatory multilevel regression up front), and even bootstrap (which has a surprisingly elegant formulation in Stan now that we have RNGs in trnasformed data). Andrew then urged me to […]

## Naming conventions for variables, functions, etc.

The golden rule of code layout is that code should be written to be readable. And that means readable by others, including you in the future. Three principles of naming follow: 1. Names should mean something. 2. Names should be as short as possible. 3. Use your judgement to balance (1) and (2). The third […]

## Is data science a discipline?

Jeannette Wing, director of the Columbia Data Science Institute, sent along this link to this featured story (their phrase) on their web site. Is data science a discipline? Data science is a field of study: one can get a degree in data science, get a job as a data scientist, and get funded to do […]

## What can we do with complex numbers in Stan?

I’m wrapping up support for complex number types in the Stan math library. Now I’m wondering what we can do with complex numbers in statistical models. Functions operating in the complex domain The initial plan is to add some matrix functions that use complex numbers internally: fast fourier transforms asymmetric eigendecomposition Schur decomposition The eigendecomposition […]

## A normalizing flow by any other name

Another week, another nice survey paper from Google. This time: Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S. and Lakshminarayanan, B., 2019. Normalizing Flows for Probabilistic Modeling and Inference. arXiv 1912.02762. What’s a normalizing flow? A normalizing flow is a change of variables. Just like you learned way back in calculus and linear algebra. Normalizing […]

## Monte Carlo and winning the lottery

Suppose I want to estimate my chances of winning the lottery by buying a ticket every day. That is, I want to do a pure Monte Carlo estimate of my probability of winning. How long will it take before I have an estimate that’s within 10% of the true value? It’ll take… There’s a big […]

## Abuse of expectation notation

I’ve been reading a lot of statistical and computational literature and it seems like expectation notation is absued as shorthand for integrals by decorating the expectation symbol with a subscripted distribution like so: This is super confusing, because expectations are properly defined as functions of random variables. For example, the square bracket convention arises because […]

## Rao-Blackwellization and discrete parameters in Stan

I’m reading a really dense and beautifully written survey of Monte Carlo gradient estimation for machine learning by Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. There are great explanations of everything including variance reduction techniques like coupling, control variates, and Rao-Blackwellization. The latter’s the topic of today’s post, as it relates directly to […]

## Royal Society spam & more

Just a rant about spam (and more spam) from pay-to-publish and closed-access journals. Nothing much to see here. The latest offender is from something called the “Royal Society.” I don’t even know to which king or queen this particular society owes allegiance, because they have a .org URL. Exercising their royal prerogative, they created an […]

## Beautiful paper on HMMs and derivatives

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the […]

## Macbook Pro (16″ 2019) quick review

I just upgraded yesterday to one of the new 2019 Macbook Pro 16″ models: Macbook Pro (16″, 2019), 3072 x 1920 pixel display, 2.4 GHz 8-core i9, 64GB 2667 MHz DDR4 memory, 2880 x 1800 pixel display, AMD Radeon Pro 5500M GPU with 4GB of GDDR6 memory, 1 TB solid-state drive US$4120 list including Apple […]

## Field goal kicking—like putting in 3D with oblong balls

Putting Andrew Gelman (the author of most posts on this blog, but not this one), recently published a Stan case study on golf putting [link fixed] that uses a bit of geometry to build a regression-type model based on angles and force. Field-goal kicking In American football, there’s also a play called a “field goal.” […]

## Econometrics postdoc and computational statistics postdoc openings here in the Stan group at Columbia

Andrew and I are looking to hire two postdocs to join the Stan group at Columbia starting January 2020. I want to emphasize that these are postdoc positions, not programmer positions. So while each position has a practical focus, our broader goal is to carry out high-impact, practical research that pushes the frontier of what’s […]

## Non-randomly missing data is hard, or why weights won’t solve your survey problems and you need to think generatively

Throw this onto the big pile of stats problems that are a lot more subtle than they seem at first glance. This all started when Lauren pointed me at the post Another way to see why mixed models in survey data are hard on Thomas Lumley’s blog. Part of the problem is all the jargon […]

## All the names for hierarchical and multilevel modeling

The title Data Analysis Using Regression and Multilevel/Hierarchical Models hints at the problem, which is that there are a lot of names for models with hierarchical structure. Ways of saying “hierarchical model” hierarchical model a multilevel model with a single nested hierarchy (note my nod to Quine’s “Two Dogmas” with circular references) multilevel model a […]

## Calibration and sharpness?

I really liked this paper, and am curious what other people think before I base a grant application around applying Stan to this problem in a machine-learning context. Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 243–268. Gneiting […]