Skip to content
Archive of posts filed under the Statistical computing category.

Rao-Blackwellization and discrete parameters in Stan

I’m reading a really dense and beautifully written survey of Monte Carlo gradient estimation for machine learning by Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. There are great explanations of everything including variance reduction techniques like coupling, control variates, and Rao-Blackwellization. The latter’s the topic of today’s post, as it relates directly to […]

Correctness

A computer program can be completely correct, it can be correct except in some edge cases, it can be approximately correct, or it can be flat-out wrong. A statistical model can be kind of ok but a little wrong, or it can be a lot wrong. Except in some rare cases, it can’t be correct. […]

Exciting postdoc opening in spatial statistics at Michigan: Coccidioides is coming, and only you can stop it!

Jon Zelner is an collaborator who does great work on epidemiology using Bayesian methods, Stan, Mister P, etc. He’s hiring a postdoc, and it looks like a great opportunity: Epidemiological, ecological and environmental approaches to understand and predict Coccidioides emergence in California. One postdoctoral fellow is sought in the research group of Dr. Jon Zelner […]

How to “cut” using Stan, if you must

Frederic Bois writes: We had talked at some point about cutting inference in Stan (that is, for example, calibrating PK parameters in a PK/PD [pharmacokinetic/pharmacodynamic] model with PK data, then calibrating the PD parameters, with fixed, non updated, distributions for the PK parameters). Has that been implemented? (PK is pharmacokinetic and PD is pharmacodynamic.) I […]

Fitting big multilevel regressions in Stan?

Joe Hoover writes: I am a social psychology PhD student, and I have some questions about applying MrP to estimation problems involving very large datasets or many sub-national units. I use MrP to obtain sub-national estimates for low-level geographic units (e.g. counties) derived from large data (e.g. 300k-1 million+). In addition to being large, my […]

The opposite of “black box” is not “white box,” it’s . . .

In disagreement with X, I think the opposite of black box should be clear box, not white box. A black box is called a black box because you can’t see inside of it; really it would be better to call it an opaque box. But in any case the opposite is clear box.

Beautiful paper on HMMs and derivatives

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the […]

The default prior for logistic regression coefficients in Scikit-learn

Someone pointed me to this post by W. D., reporting that, in Python’s popular Scikit-learn package, the default prior for logistic regression coefficients is normal(0,1)—or, as W. D. puts it, L2 penalization with a lambda of 1. In the post, W. D. makes three arguments. I agree with two of them. 1. I agree with […]

Econometrics postdoc and computational statistics postdoc openings here in the Stan group at Columbia

Andrew and I are looking to hire two postdocs to join the Stan group at Columbia starting January 2020. I want to emphasize that these are postdoc positions, not programmer positions. So while each position has a practical focus, our broader goal is to carry out high-impact, practical research that pushes the frontier of what’s […]

Zombie semantics spread in the hope of keeping most on the same low road you are comfortable with now: Delaying the hardship of learning better methodology.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking […]

Many Ways to Lasso

Jared writes: I gave a talk at the Washington DC Conference about six different tools for fitting lasso models. You’ll be happy to see that rstanarm outperformed the rest of the methods. That’s about 60 more slides than I would’ve used. But it’s good to see all that code.

A heart full of hatred: 8 schools edition

No; I was all horns and thorns Sprung out fully formed, knock-kneed and upright — Joanna Newsom Far be it for me to be accused of liking things. Let me, instead, present a corner of my hateful heart. (That is to say that I’m supposed to be doing a really complicated thing right now and […]

Dan’s Paper Corner: Yes! It does work!

Only share my research With sick lab rats like me Trapped behind the beakers And the Erlenmeyer flasks Cut off from the world, I may not ever get free But I may One day Trying to find An antidote for strychnine — The Mountain Goats Hi everyone! Hope you’re enjoying Peak Libra Season! I’m bringing […]

Stan contract jobs!

Sean writes: We are starting to get money and time to manage paid contracting jobs to try to get a handle on some of our technical debt. Any or all of the skills could be valuable: C++ software engineering C++ build tools, compilers, and toolchains Creating installers or packages of any kind (especially cross-platform) Windows […]

Jeff Leek: “Data science education as an economic and public health intervention – how statisticians can lead change in the world”

Jeff Leek from Johns Hopkins University is speaking in our statistics department seminar next week: Data science education as an economic and public health intervention – how statisticians can lead change in the world Time: 4:10pm Monday, October 7 Location: 903 School of Social Work Abstract: The data science revolution has led to massive new […]

“Troubling Trends in Machine Learning Scholarship”

Garuav Sood writes: You had expressed slight frustration with some ML/CS papers that read more like advertisements than anything else. The attached paper by Zachary Lipton and Jacob Steinhardt flags four reasonable concerns in modern ML papers: Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on […]

Convergence diagnostics for Markov chain simulation

Pierre Jacob writes regarding convergence diagnostics for Markov chain simulation: I’ve implemented an example of TV upper bounds for (vanilla) HMC on a model written in Stan, see here and here for a self-contained R script. Basically, this creates a stan fit object to obtain a target’s pdf and gradient, and then implements a pure […]

Bayesian Computation conference in January 2020

X writes to remind us of the Bayesian computation conference: – BayesComp 2020 occurs on 7-10 January 2020 in Gainesville, Florida, USA – Registration is open with regular rates till October 14, 2019 – Deadline for submission of poster proposals is December 15, 2019 – Deadline for travel support applications is September 20, 2019 – […]

Hey, look! The R graph gallery is back.

We’ve recommended the R graph gallery before, but then it got taken down. But now it’s back! I wouldn’t use it on its own as a teaching tool, in that it has a lot of graphs that I would not recommend (see here), but it’s a great resource, so thanks so much to Yan Holtz […]

Causal inference workshop at NeurIPS 2019 looking for submissions

Nathan Kallus writes: I wanted to share an announcement for a causal inference workshop we are organizing at NeurIPS 2019. I think the readers of your blog would be very interested, and we would be eager to have them interact/attend/submit. And here it is: The NeurIPS 2019 Workshop on “Do the right thing”: machine learning […]