Skip to content
Archive of posts filed under the Statistical computing category.

Fitting big multilevel regressions in Stan?

Joe Hoover writes: I am a social psychology PhD student, and I have some questions about applying MrP to estimation problems involving very large datasets or many sub-national units. I use MrP to obtain sub-national estimates for low-level geographic units (e.g. counties) derived from large data (e.g. 300k-1 million+). In addition to being large, my […]

The opposite of “black box” is not “white box,” it’s . . .

In disagreement with X, I think the opposite of black box should be clear box, not white box. A black box is called a black box because you can’t see inside of it; really it would be better to call it an opaque box. But in any case the opposite is clear box.

Beautiful paper on HMMs and derivatives

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the […]

The default prior for logistic regression coefficients in Scikit-learn

Someone pointed me to this post by W. D., reporting that, in Python’s popular Scikit-learn package, the default prior for logistic regression coefficients is normal(0,1)—or, as W. D. puts it, L2 penalization with a lambda of 1. In the post, W. D. makes three arguments. I agree with two of them. 1. I agree with […]

Econometrics postdoc and computational statistics postdoc openings here in the Stan group at Columbia

Andrew and I are looking to hire two postdocs to join the Stan group at Columbia starting January 2020. I want to emphasize that these are postdoc positions, not programmer positions. So while each position has a practical focus, our broader goal is to carry out high-impact, practical research that pushes the frontier of what’s […]

Zombie semantics spread in the hope of keeping most on the same low road you are comfortable with now: Delaying the hardship of learning better methodology.

This post is by Keith O’Rourke and as with all posts and comments on this blog, is just a deliberation on dealing with uncertainties in scientific inquiry and should not to be attributed to any entity other than the author. As with any critically-thinking inquirer, the views behind these deliberations are always subject to rethinking […]

Many Ways to Lasso

Jared writes: I gave a talk at the Washington DC Conference about six different tools for fitting lasso models. You’ll be happy to see that rstanarm outperformed the rest of the methods. That’s about 60 more slides than I would’ve used. But it’s good to see all that code.

A heart full of hatred: 8 schools edition

No; I was all horns and thorns Sprung out fully formed, knock-kneed and upright — Joanna Newsom Far be it for me to be accused of liking things. Let me, instead, present a corner of my hateful heart. (That is to say that I’m supposed to be doing a really complicated thing right now and […]

Dan’s Paper Corner: Yes! It does work!

Only share my research With sick lab rats like me Trapped behind the beakers And the Erlenmeyer flasks Cut off from the world, I may not ever get free But I may One day Trying to find An antidote for strychnine — The Mountain Goats Hi everyone! Hope you’re enjoying Peak Libra Season! I’m bringing […]

Stan contract jobs!

Sean writes: We are starting to get money and time to manage paid contracting jobs to try to get a handle on some of our technical debt. Any or all of the skills could be valuable: C++ software engineering C++ build tools, compilers, and toolchains Creating installers or packages of any kind (especially cross-platform) Windows […]

Jeff Leek: “Data science education as an economic and public health intervention – how statisticians can lead change in the world”

Jeff Leek from Johns Hopkins University is speaking in our statistics department seminar next week: Data science education as an economic and public health intervention – how statisticians can lead change in the world Time: 4:10pm Monday, October 7 Location: 903 School of Social Work Abstract: The data science revolution has led to massive new […]

“Troubling Trends in Machine Learning Scholarship”

Garuav Sood writes: You had expressed slight frustration with some ML/CS papers that read more like advertisements than anything else. The attached paper by Zachary Lipton and Jacob Steinhardt flags four reasonable concerns in modern ML papers: Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on […]

Convergence diagnostics for Markov chain simulation

Pierre Jacob writes regarding convergence diagnostics for Markov chain simulation: I’ve implemented an example of TV upper bounds for (vanilla) HMC on a model written in Stan, see here and here for a self-contained R script. Basically, this creates a stan fit object to obtain a target’s pdf and gradient, and then implements a pure […]

Bayesian Computation conference in January 2020

X writes to remind us of the Bayesian computation conference: – BayesComp 2020 occurs on 7-10 January 2020 in Gainesville, Florida, USA – Registration is open with regular rates till October 14, 2019 – Deadline for submission of poster proposals is December 15, 2019 – Deadline for travel support applications is September 20, 2019 – […]

Hey, look! The R graph gallery is back.

We’ve recommended the R graph gallery before, but then it got taken down. But now it’s back! I wouldn’t use it on its own as a teaching tool, in that it has a lot of graphs that I would not recommend (see here), but it’s a great resource, so thanks so much to Yan Holtz […]

Causal inference workshop at NeurIPS 2019 looking for submissions

Nathan Kallus writes: I wanted to share an announcement for a causal inference workshop we are organizing at NeurIPS 2019. I think the readers of your blog would be very interested, and we would be eager to have them interact/attend/submit. And here it is: The NeurIPS 2019 Workshop on “Do the right thing”: machine learning […]

Read this: it’s about importance sampling!

Importance sampling plays an odd role in statistical computing. It’s an old-fashioned idea and can behave just horribly if applied straight-up—but it keeps arising in different statistics problems. Aki came up with Pareto-smoothed importance sampling (PSIS) for leave-one-out cross-validation. We recently revised the PSIS article and Dan Simpson wrote a useful blog post about it […]

All I need is time, a moment that is mine, while I’m in between

You’re an ordinary boy and that’s the way I like it – Magic Dirt Look. I’ll say something now, so it’s off my chest. I hate order statisics. I loathe them. I detest them. I wish them nothing but ill and strife. They are just awful. And I’ve spent the last god only knows how long […]

How does Stan work? A reading list.

Bob writes, to someone who is doing work on the Stan language: The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in […]

AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting: Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong. Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend. I really like the problem […]