Skip to content
Archive of posts filed under the Statistical computing category.

A heart full of hatred: 8 schools edition

No; I was all horns and thorns Sprung out fully formed, knock-kneed and upright — Joanna Newsom Far be it for me to be accused of liking things. Let me, instead, present a corner of my hateful heart. (That is to say that I’m supposed to be doing a really complicated thing right now and […]

Dan’s Paper Corner: Yes! It does work!

Only share my research With sick lab rats like me Trapped behind the beakers And the Erlenmeyer flasks Cut off from the world, I may not ever get free But I may One day Trying to find An antidote for strychnine — The Mountain Goats Hi everyone! Hope you’re enjoying Peak Libra Season! I’m bringing […]

Stan contract jobs!

Sean writes: We are starting to get money and time to manage paid contracting jobs to try to get a handle on some of our technical debt. Any or all of the skills could be valuable: C++ software engineering C++ build tools, compilers, and toolchains Creating installers or packages of any kind (especially cross-platform) Windows […]

Jeff Leek: “Data science education as an economic and public health intervention – how statisticians can lead change in the world”

Jeff Leek from Johns Hopkins University is speaking in our statistics department seminar next week: Data science education as an economic and public health intervention – how statisticians can lead change in the world Time: 4:10pm Monday, October 7 Location: 903 School of Social Work Abstract: The data science revolution has led to massive new […]

“Troubling Trends in Machine Learning Scholarship”

Garuav Sood writes: You had expressed slight frustration with some ML/CS papers that read more like advertisements than anything else. The attached paper by Zachary Lipton and Jacob Steinhardt flags four reasonable concerns in modern ML papers: Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on […]

Convergence diagnostics for Markov chain simulation

Pierre Jacob writes regarding convergence diagnostics for Markov chain simulation: I’ve implemented an example of TV upper bounds for (vanilla) HMC on a model written in Stan, see here and here for a self-contained R script. Basically, this creates a stan fit object to obtain a target’s pdf and gradient, and then implements a pure […]

Bayesian Computation conference in January 2020

X writes to remind us of the Bayesian computation conference: – BayesComp 2020 occurs on 7-10 January 2020 in Gainesville, Florida, USA – Registration is open with regular rates till October 14, 2019 – Deadline for submission of poster proposals is December 15, 2019 – Deadline for travel support applications is September 20, 2019 – […]

Hey, look! The R graph gallery is back.

We’ve recommended the R graph gallery before, but then it got taken down. But now it’s back! I wouldn’t use it on its own as a teaching tool, in that it has a lot of graphs that I would not recommend (see here), but it’s a great resource, so thanks so much to Yan Holtz […]

Causal inference workshop at NeurIPS 2019 looking for submissions

Nathan Kallus writes: I wanted to share an announcement for a causal inference workshop we are organizing at NeurIPS 2019. I think the readers of your blog would be very interested, and we would be eager to have them interact/attend/submit. And here it is: The NeurIPS 2019 Workshop on “Do the right thing”: machine learning […]

Read this: it’s about importance sampling!

Importance sampling plays an odd role in statistical computing. It’s an old-fashioned idea and can behave just horribly if applied straight-up—but it keeps arising in different statistics problems. Aki came up with Pareto-smoothed importance sampling (PSIS) for leave-one-out cross-validation. We recently revised the PSIS article and Dan Simpson wrote a useful blog post about it […]

All I need is time, a moment that is mine, while I’m in between

You’re an ordinary boy and that’s the way I like it – Magic Dirt Look. I’ll say something now, so it’s off my chest. I hate order statisics. I loathe them. I detest them. I wish them nothing but ill and strife. They are just awful. And I’ve spent the last god only knows how long […]

How does Stan work? A reading list.

Bob writes, to someone who is doing work on the Stan language: The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in […]

AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting: Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong. Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend. I really like the problem […]

How to simulate an instrumental variables problem?

Edward Hearn writes: In an effort to buttress my own understanding of multi-level methods, especially pertaining to those involving instrumental variables, I have been working the examples and the exercises in Jennifer Hill’s and your book. I can find general answers at the Github repo for ARM examples, but for Chapter 10, Exercise 3 (simulating […]

New! from Bales/Pourzanjani/Vehtari/Petzold: Selecting the Metric in Hamiltonian Monte Carlo

Ben Bales, Arya Pourzanjani, Aki Vehtari, and Linda Petzold write: We present a selection criterion for the Euclidean metric adapted during warmup in a Hamiltonian Monte Carlo sampler that makes it possible for a sampler to automatically pick the metric based on the model and the availability of warmup draws. Additionally, we present a new […]

Neural nets vs. regression models

Eliot Johnson writes: I have a question concerning papers comparing two broad domains of modeling: neural nets and statistical models. Both terms are catch-alls, within each of which there are, quite obviously, multiple subdomains. For instance, NNs could include ML, DL, AI, and so on. While statistical models should include panel data, time series, hierarchical […]

Maintenance cost is quadratic in the number of features

Bob Carpenter shares this story illustrating the challenges of software maintenance. Here’s Bob: This started with the maintenance of upgrading to the new Boost version 1.69, which is this pull request: for this issue: The issue happens first, then the pull request, then the fun of debugging starts. Today’s story starts an issue […]

Stan examples in Harezlak, Ruppert and Wand (2018) Semiparametric Regression with R

I saw earlier drafts of this when it was in preparation and they were great. Jarek Harezlak, David Ruppert and Matt P. Wand. 2018. Semiparametric Regression with R. UseR! Series. Springer. I particularly like the careful evaluation of variational approaches. I also very much like that it’s packed with visualizations and largely based on worked […]

Several post-doc positions in probabilistic programming etc. in Finland

There are several open post-doc positions in Aalto and University of Helsinki in 1. probabilistic programming, 2. simulator-based inference, 3. data-efficient deep learning, 4. privacy preserving and secure methods, 5. interactive AI. All these research programs are connected and collaborating. I (Aki) am the coordinator for the project 1 and contributor in the others. Overall […]

“Sometimes all we have left are pictures and fear”: Dan Simpson talk in Columbia stat dept, 4pm Monday

4:10pm Monday, April 22 in Social Work Bldg room 903: Data is getting weirder. Statistical models and techniques are more complex than they have ever been. No one understand what code does. But at the same time, statistical tools are being used by a wider range of people than at any time in the past. […]