Skip to content
Archive of posts filed under the Statistical computing category.

“Sometimes all we have left are pictures and fear”: Dan Simpson talk in Columbia stat dept, 4pm Monday

4:10pm Monday, April 22 in Social Work Bldg room 903: Data is getting weirder. Statistical models and techniques are more complex than they have ever been. No one understand what code does. But at the same time, statistical tools are being used by a wider range of people than at any time in the past. […]

What is the most important real-world data processing tip you’d like to share with others?

This question was in today’s jitts for our communication class. Here are some responses: Invest the time to learn data manipulation tools well (e.g. tidyverse). Increased familiarity with these tools often leads to greater time savings and less frustration in future. Hmm it’s never one tip.. I never ever found it useful to begin writing […]

(Markov chain) Monte Carlo doesn’t “explore the posterior”

[Edit: (1) There’s nothing dependent on Markov chain—the argument applies to any Monte Carlo method in high dimensions. (2) No, (MC)MC is not not broken.] First some background, then the bad news, and finally the good news. Spoiler alert: The bad news is that exploring the posterior is intractable; the good news is that we […]

Yes, I really really really like fake-data simulation, and I can’t stop talking about it.

Rajesh Venkatachalapathy writes: Recently, I had a conversation with a colleague of mine about the virtues of synthetic data and their role in data analysis. I think I’ve heard a sermon/talk or two where you mention this and also in your blog entries. But having convinced my colleague of this point, I am struggling to […]

My two talks in Montreal this Friday, 22 Mar

McGill University Biostatistics seminar, Purvis Hall, 102 Pine Ave. West, Room 25 Education Building, 3700 McTavish Street, Room 129 [note new location], 1-2pm Fri 22 Mar: Resolving the Replication Crisis Using Multilevel Modeling In recent years we have come to learn that many prominent studies in social science and medicine, conducted at leading research institutions, […]

Maybe it’s time to let the old ways die; or We broke R-hat so now we have to fix it.

“Otto eye-balled the diva lying comatose amongst the reeds, and he suddenly felt the fire of inspiration flood his soul. He ran back to his workshop where he futzed and futzed and futzed.” –Bette Midler Andrew was annoyed. Well, annoyed is probably too strong a word. Maybe a better way to start is with The […]

stanc3: rewriting the Stan compiler

I’d like to introduce the stanc3 project, a complete rewrite of the Stan 2 compiler in OCaml. Join us! With this rewrite and migration to OCaml, there’s a great opportunity to join us on the ground floor of a new era. Your enthusiasm for or expertise in programming language theory and compiler development can help […]

HMC step size: How does it scale with dimension?

A bunch of us were arguing about how the Hamiltonian Monte Carlo step size should scale with dimension, and so Bob did the Bob thing and just ran an experiment on the computer to figure it out. Bob writes: This is for standard normal independent in all dimensions. Note the log scale on the x […]

R fixed its default histogram bin width!

I remember hist() in R as having horrible defaults, with the histogram bars way too wide. (See this discussion: A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram […]

Book reading at Ann Arbor Meetup on Monday night: Probability and Statistics: a simulation-based introduction

The Talk I’m going to be previewing the book I’m in the process of writing at the Ann Arbor R meetup on Monday. Here are the details, including the working title: Probability and Statistics: a simulation-based introduction Bob Carpenter Monday, February 18, 2019 Ann Arbor SPARK, 330 East Liberty St, Ann Arbor I’ve been to […]

Should he go to grad school in statistics or computer science?

Someone named Nathan writes: I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several […]

Autodiff! (for the C++ jockeys in the audience)

Here’s a cool thread from Bob Carpenter on the Stan forums, “A new continuation-based autodiff by refactoring,” on the inner workings of Stan. Enjoy. P.S. Just for laffs, see this post from 2010 which is the earliest mention of autodiff I could find on the blog.

Google on Responsible AI Practices

Great and beautifully written advice for any data science setting: Google. Responsible AI Practices. Enjoy.

NYC Meetup Thursday: Under the hood: Stan’s library, language, and algorithms

I (Bob, not Andrew!) will be doing a meetup talk this coming Thursday in New York City. Here’s the link with registration and location and time details (summary: pizza unboxing at 6:30 pm in SoHo): Bayesian Data Analysis Meetup: Under the hood: Stan’s library, language, and algorithms After summarizing what Stan does, this talk will […]

Reproducibility and Stan

Aki prepared these slides which cover a series of topics, starting with notebooks, open code, and reproducibility of code in R and Stan; then simulation-based calibration of algorithms; then model averaging and prediction. Lots to think about here: there are many aspects to reproducible analysis and computation in statistics.

How we should they carry out repeated cross-validation? They would like a third expert opinion…”

Someone writes: I’m a postdoc studying scientific reproducibility. I have a machine learning question that I desperately need your help with. . . . I’m trying to predict whether a study can be successfully replicated (DV), from the texts in the original published article. Our hypothesis is that language contains useful signals in distinguishing reproducible […]

“Do you have any recommendations for useful priors when datasets are small?”

A statistician who works in the pharmaceutical industry writes: I just read your paper (with Dan Simpson and Mike Betancourt) “The Prior Can Often Only Be Understood in the Context of the Likelihood” and I find it refreshing to read that “the practical utility of a prior distribution within a given analysis then depends critically […]

Hey, check this out: Columbia’s Data Science Institute is hiring research scientists and postdocs!

Here’s the official announcement: The Institute’s Postdoctoral and Research Scientists will help anchor Columbia’s presence as a leader in data-science research and applications and serve as resident experts in fostering collaborations with the world-class faculty across all schools at Columbia University. They will also help guide, plan and execute data-science research, applications and technological innovations […]

Postdocs and Research fellows for combining probabilistic programming, simulators and interactive AI

Here’s a great opportunity for those interested in probabilistic programming and workflows for Bayesian data analysis: We (including me, Aki) are looking for outstanding postdoctoral researchers and research fellows to work for a new exciting project in the crossroads of probabilistic programming, simulator-based inference and user interfaces. You will have an opportunity to work with […]

“Simulations are not scalable but theory is scalable”

Eren Metin Elçi writes: I just watched this video the value of theory in applied fields (like statistics), it really resonated with my previous research experiences in statistical physics and on the interplay between randomised perfect sampling algorithms and Markov Chain mixing as well as my current perspective on the status quo of deep learning. […]