Healthier kids: Using Stan to get more information out of pediatric respiratory data

Robert Mahar, John Carlin, Sarath Ranganathan, Anne-Louise Ponsonby, Peter Vuillermin, and Damjan Vukcevic write:

Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1,197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index (LCI), the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables LCI to be estimated using tests with different degrees of completeness, something not possible with the standard approach.

They continue:

Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision.

Yesssss! This reminds me of our work on serial dilution assays, where we squeezed information out of data that had traditionally been declared “below detection limit.”

Mahar, Carlin, et al. continue:

Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions.

Wow—all my favorite things! And check this out:

Keywords: lung clearance index, multiple-breath washout, variance components, Stan, incomplete data.

That’s right. Stan.

There’s only one thing that bugs me. From their Stan program:

alpha ~ normal(0, 10000);

Ummmmm . . . no.

But basically I love this paper. It makes me so happy to think that the research my colleagues and I have been doing for the past thirty years is making a difference.

Bob also points out this R package, “breathteststan: Stan-Based Fit to Gastric Emptying Curves,” from Dieter Menne et al.

There’s so much great stuff out there. And this is what Stan’s all about: enabling people to construct good models, spending less time on figuring how to fit the damn things and more time on model building, model checking, and design of data collection. Onward!

7 thoughts on “Healthier kids: Using Stan to get more information out of pediatric respiratory data

  1. I wanted to say here that the arrival of Stan has completely changed the way we do research in my lab, and has changed the kind of questions we ask when modeling language processing. It has even changed the way I teach frequentist statistics (!).

    So thank you for creating this truly amazing environment and community. Having people like Bob, Mitzi, Ben, Michael, (and now also Lauren, very soon) visiting my lab over the years has also been a very important and educational experience for us. We would never had had any contact with these amazing people but for Stan.

  2. It was interesting to see the paper estimate a prior distribution from the data. When I hear people talk about choosing a prior, it often sounds to me like they are just making things up. So it is nice to see a case where there is a rigorous basis for the prior.

    But, I’m pretty naive when it comes to Bayesian analysis, so dazzling me is not all that impressive an achievement.

    • That’s exactly how I feel when I see a p-value and the conclusion that goes with it: They’re making things up.

      We are trained to not think of p-value based reasoning as making things up, but often that’s just what it is.

      I suggest reading the book Uncertain Judgements to take the edge off of your impression that people are making things up when they define priors.

    • > estimate a prior distribution from the data

      This is the standard in hierarchical modeling. It can be done with full Bayes, as we almost always do in Stan, or it can be done with so-called “empirical Bayes”, where you take a point estimate of the hierarchical parameters based on a marginalized model.

      Priors are no more arbitrary than likelihoods—it’s all just part of the joint model. What we’re usually interested in is posterior predictive inference, which we typically evaluate with posterior predictive checks and cross-validation.

      • When I saw Terry’s reference to “estimate a prior distribution from the data” I checked the paper to see what they were doing. I got the impression that they estimate the prior from a subset of the data which doesn’t enter in the calculation of the likelihood. I’m not 100% sure, but I sleep better thinking that’s what they do. Empirical Bayes, on the other hand, will send you straight to Bayesian hell.

  3. I really like this paper, too. I saw it presented by Robert Mahar in John Carlin and Damjan Vukcevic’s research group meeting.

    I had to convince myself with simulations that these exponential decay mixtures can be fit from data. They turn out to be surprisingly robust, given that the mixture isn’t evident from plotting the data. Coincidentally, Andrew was using a mixture of exponential decays as his “hello world” model for Stan at the time, so I conveniently had the basics of the model precompiled when I saw the talk.

Leave a Reply

Your email address will not be published. Required fields are marked *