Skip to content
Archive of posts filed under the Statistical computing category.

The default prior for logistic regression coefficients in Scikit-learn

Someone pointed me to this post by W. D., reporting that, in Python’s popular Scikit-learn package, the default prior for logistic regression coefficients is normal(0,1)—or, as W. D. puts it, L2 penalization with a lambda of 1. In the post, W. D. makes three arguments. I agree with two of them. 1. I agree with […]

Econometrics postdoc and computational statistics postdoc openings here in the Stan group at Columbia

Andrew and I are looking to hire two postdocs to join the Stan group at Columbia starting January 2020. I want to emphasize that these are postdoc positions, not programmer positions. So while each position has a practical focus, our broader goal is to carry out high-impact, practical research that pushes the frontier of what’s […]

Zombie semantics spread in the hope of keeping most on the same low road you are comfortable with now: Delaying the hardship of learning better methodology.

Now, everything is connected, but this is not primarily about persistent research misconceptions such as statistical significance. Instead it is about (inherently) interpretable ML versus (misleading with some nonzero frequency) explanatory ML that I previously blogged on just over a year ago. That was when I first become aware of work by Cynthia Rudin (Duke) […]

Many Ways to Lasso

Jared writes: I gave a talk at the Washington DC Conference about six different tools for fitting lasso models. You’ll be happy to see that rstanarm outperformed the rest of the methods. That’s about 60 more slides than I would’ve used. But it’s good to see all that code.

A heart full of hatred: 8 schools edition

No; I was all horns and thorns Sprung out fully formed, knock-kneed and upright — Joanna Newsom Far be it for me to be accused of liking things. Let me, instead, present a corner of my hateful heart. (That is to say that I’m supposed to be doing a really complicated thing right now and […]

Dan’s Paper Corner: Yes! It does work!

Only share my research With sick lab rats like me Trapped behind the beakers And the Erlenmeyer flasks Cut off from the world, I may not ever get free But I may One day Trying to find An antidote for strychnine — The Mountain Goats Hi everyone! Hope you’re enjoying Peak Libra Season! I’m bringing […]

Stan contract jobs!

Sean writes: We are starting to get money and time to manage paid contracting jobs to try to get a handle on some of our technical debt. Any or all of the skills could be valuable: C++ software engineering C++ build tools, compilers, and toolchains Creating installers or packages of any kind (especially cross-platform) Windows […]

Jeff Leek: “Data science education as an economic and public health intervention – how statisticians can lead change in the world”

Jeff Leek from Johns Hopkins University is speaking in our statistics department seminar next week: Data science education as an economic and public health intervention – how statisticians can lead change in the world Time: 4:10pm Monday, October 7 Location: 903 School of Social Work Abstract: The data science revolution has led to massive new […]

“Troubling Trends in Machine Learning Scholarship”

Garuav Sood writes: You had expressed slight frustration with some ML/CS papers that read more like advertisements than anything else. The attached paper by Zachary Lipton and Jacob Steinhardt flags four reasonable concerns in modern ML papers: Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on […]

Convergence diagnostics for Markov chain simulation

Pierre Jacob writes regarding convergence diagnostics for Markov chain simulation: I’ve implemented an example of TV upper bounds for (vanilla) HMC on a model written in Stan, see here and here for a self-contained R script. Basically, this creates a stan fit object to obtain a target’s pdf and gradient, and then implements a pure […]

Bayesian Computation conference in January 2020

X writes to remind us of the Bayesian computation conference: – BayesComp 2020 occurs on 7-10 January 2020 in Gainesville, Florida, USA – Registration is open with regular rates till October 14, 2019 – Deadline for submission of poster proposals is December 15, 2019 – Deadline for travel support applications is September 20, 2019 – […]

Hey, look! The R graph gallery is back.

We’ve recommended the R graph gallery before, but then it got taken down. But now it’s back! I wouldn’t use it on its own as a teaching tool, in that it has a lot of graphs that I would not recommend (see here), but it’s a great resource, so thanks so much to Yan Holtz […]

Causal inference workshop at NeurIPS 2019 looking for submissions

Nathan Kallus writes: I wanted to share an announcement for a causal inference workshop we are organizing at NeurIPS 2019. I think the readers of your blog would be very interested, and we would be eager to have them interact/attend/submit. And here it is: The NeurIPS 2019 Workshop on “Do the right thing”: machine learning […]

Read this: it’s about importance sampling!

Importance sampling plays an odd role in statistical computing. It’s an old-fashioned idea and can behave just horribly if applied straight-up—but it keeps arising in different statistics problems. Aki came up with Pareto-smoothed importance sampling (PSIS) for leave-one-out cross-validation. We recently revised the PSIS article and Dan Simpson wrote a useful blog post about it […]

All I need is time, a moment that is mine, while I’m in between

You’re an ordinary boy and that’s the way I like it – Magic Dirt Look. I’ll say something now, so it’s off my chest. I hate order statisics. I loathe them. I detest them. I wish them nothing but ill and strife. They are just awful. And I’ve spent the last god only knows how long […]

How does Stan work? A reading list.

Bob writes, to someone who is doing work on the Stan language: The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in […]

AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting: Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong. Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend. I really like the problem […]

How to simulate an instrumental variables problem?

Edward Hearn writes: In an effort to buttress my own understanding of multi-level methods, especially pertaining to those involving instrumental variables, I have been working the examples and the exercises in Jennifer Hill’s and your book. I can find general answers at the Github repo for ARM examples, but for Chapter 10, Exercise 3 (simulating […]

New! from Bales/Pourzanjani/Vehtari/Petzold: Selecting the Metric in Hamiltonian Monte Carlo

Ben Bales, Arya Pourzanjani, Aki Vehtari, and Linda Petzold write: We present a selection criterion for the Euclidean metric adapted during warmup in a Hamiltonian Monte Carlo sampler that makes it possible for a sampler to automatically pick the metric based on the model and the availability of warmup draws. Additionally, we present a new […]

Neural nets vs. regression models

Eliot Johnson writes: I have a question concerning papers comparing two broad domains of modeling: neural nets and statistical models. Both terms are catch-alls, within each of which there are, quite obviously, multiple subdomains. For instance, NNs could include ML, DL, AI, and so on. While statistical models should include panel data, time series, hierarchical […]