You’re an ordinary boy and that’s the way I like it – Magic Dirt Look. I’ll say something now, so it’s off my chest. I hate order statisics. I loathe them. I detest them. I wish them nothing but ill and strife. They are just awful. And I’ve spent the last god only knows how long […]

**Statistical computing**category.

## How does Stan work? A reading list.

Bob writes, to someone who is doing work on the Stan language: The basic execution structure of Stan is in the JSS paper (by Bob Carpenter, Andrew Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell) and in the reference manual. The details of autodiff are in […]

## AnnoNLP conference on data coding for natural language processing

This workshop should be really interesting: Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong. Silviu Paun and Dirk Hovy are co-organizing it. They’re very organized and know this area as well as anyone. I’m on the program committee, but won’t be able to attend. I really like the problem […]

## How to simulate an instrumental variables problem?

Edward Hearn writes: In an effort to buttress my own understanding of multi-level methods, especially pertaining to those involving instrumental variables, I have been working the examples and the exercises in Jennifer Hill’s and your book. I can find general answers at the Github repo for ARM examples, but for Chapter 10, Exercise 3 (simulating […]

## New! from Bales/Pourzanjani/Vehtari/Petzold: Selecting the Metric in Hamiltonian Monte Carlo

Ben Bales, Arya Pourzanjani, Aki Vehtari, and Linda Petzold write: We present a selection criterion for the Euclidean metric adapted during warmup in a Hamiltonian Monte Carlo sampler that makes it possible for a sampler to automatically pick the metric based on the model and the availability of warmup draws. Additionally, we present a new […]

## Neural nets vs. regression models

Eliot Johnson writes: I have a question concerning papers comparing two broad domains of modeling: neural nets and statistical models. Both terms are catch-alls, within each of which there are, quite obviously, multiple subdomains. For instance, NNs could include ML, DL, AI, and so on. While statistical models should include panel data, time series, hierarchical […]

## Maintenance cost is quadratic in the number of features

Bob Carpenter shares this story illustrating the challenges of software maintenance. Here’s Bob: This started with the maintenance of upgrading to the new Boost version 1.69, which is this pull request: https://github.com/stan-dev/math/pull/1082 for this issue: https://github.com/stan-dev/math/issues/1081 The issue happens first, then the pull request, then the fun of debugging starts. Today’s story starts an issue […]

## Stan examples in Harezlak, Ruppert and Wand (2018) *Semiparametric Regression with R*

I saw earlier drafts of this when it was in preparation and they were great. Jarek Harezlak, David Ruppert and Matt P. Wand. 2018. Semiparametric Regression with R. UseR! Series. Springer. I particularly like the careful evaluation of variational approaches. I also very much like that it’s packed with visualizations and largely based on worked […]

## Several post-doc positions in probabilistic programming etc. in Finland

There are several open post-doc positions in Aalto and University of Helsinki in 1. probabilistic programming, 2. simulator-based inference, 3. data-efficient deep learning, 4. privacy preserving and secure methods, 5. interactive AI. All these research programs are connected and collaborating. I (Aki) am the coordinator for the project 1 and contributor in the others. Overall […]

## “Sometimes all we have left are pictures and fear”: Dan Simpson talk in Columbia stat dept, 4pm Monday

4:10pm Monday, April 22 in Social Work Bldg room 903: Data is getting weirder. Statistical models and techniques are more complex than they have ever been. No one understand what code does. But at the same time, statistical tools are being used by a wider range of people than at any time in the past. […]

## What is the most important real-world data processing tip you’d like to share with others?

This question was in today’s jitts for our communication class. Here are some responses: Invest the time to learn data manipulation tools well (e.g. tidyverse). Increased familiarity with these tools often leads to greater time savings and less frustration in future. Hmm it’s never one tip.. I never ever found it useful to begin writing […]

## (Markov chain) Monte Carlo doesn’t “explore the posterior”

[Edit: (1) There’s nothing dependent on Markov chain—the argument applies to any Monte Carlo method in high dimensions. (2) No, (MC)MC is not not broken.] First some background, then the bad news, and finally the good news. Spoiler alert: The bad news is that exploring the posterior is intractable; the good news is that we […]

## Yes, I really really really like fake-data simulation, and I can’t stop talking about it.

Rajesh Venkatachalapathy writes: Recently, I had a conversation with a colleague of mine about the virtues of synthetic data and their role in data analysis. I think I’ve heard a sermon/talk or two where you mention this and also in your blog entries. But having convinced my colleague of this point, I am struggling to […]

## My two talks in Montreal this Friday, 22 Mar

McGill University Biostatistics seminar, Purvis Hall, 102 Pine Ave. West, Room 25 Education Building, 3700 McTavish Street, Room 129 [note new location], 1-2pm Fri 22 Mar: Resolving the Replication Crisis Using Multilevel Modeling In recent years we have come to learn that many prominent studies in social science and medicine, conducted at leading research institutions, […]

## Maybe it’s time to let the old ways die; or We broke R-hat so now we have to fix it.

“Otto eye-balled the diva lying comatose amongst the reeds, and he suddenly felt the fire of inspiration flood his soul. He ran back to his workshop where he futzed and futzed and futzed.” –Bette Midler Andrew was annoyed. Well, annoyed is probably too strong a word. Maybe a better way to start is with The […]

## stanc3: rewriting the Stan compiler

I’d like to introduce the stanc3 project, a complete rewrite of the Stan 2 compiler in OCaml. Join us! With this rewrite and migration to OCaml, there’s a great opportunity to join us on the ground floor of a new era. Your enthusiasm for or expertise in programming language theory and compiler development can help […]

## HMC step size: How does it scale with dimension?

A bunch of us were arguing about how the Hamiltonian Monte Carlo step size should scale with dimension, and so Bob did the Bob thing and just ran an experiment on the computer to figure it out. Bob writes: This is for standard normal independent in all dimensions. Note the log scale on the x […]

## R fixed its default histogram bin width!

I remember hist() in R as having horrible defaults, with the histogram bars way too wide. (See this discussion: A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram […]

## Book reading at Ann Arbor Meetup on Monday night: *Probability and Statistics: a simulation-based introduction*

The Talk I’m going to be previewing the book I’m in the process of writing at the Ann Arbor R meetup on Monday. Here are the details, including the working title: Probability and Statistics: a simulation-based introduction Bob Carpenter Monday, February 18, 2019 Ann Arbor SPARK, 330 East Liberty St, Ann Arbor I’ve been to […]

## Should he go to grad school in statistics or computer science?

Someone named Nathan writes: I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several […]