Statistical Methods and Data Skepticism

Data analysis today is dominated by three paradigms: null hypothesis significance testing, Bayesian inference, and exploratory data analysis. There is concern that all these methods lead to overconfidence on the part of researchers and the general public, and this concern has led to the new “data skepticism” movement.

But the history of statistics is already in some sense a history of data skepticism. Concepts of bias, variance, sampling and measurement error, least-squares regression, and statistical significance can all be viewed as formalizations of data skepticism. All these methods address the concern that patterns in observed data might not generalize to the population of interest.

We discuss the challenge of attaining data skepticism while avoiding data nihilism, and consider some proposed future directions.

Stan (mc-stan.org) is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. We are also developing Stan as a more general statistical modeling and computing platform that will be able to do optimization, variational inference, and expectation propagation, as well as full Bayes. We discuss how Stan works and what it can do, the problems that motivated us to write Stan, current challenges, and areas of planned development, including tools for improved generality and usability, more efficient sampling algorithms, and fuller integration of model building, model checking, and model understanding in Bayesian data analysis.

Unfortunately something came up and I won’t be able to do either of those talks. Bummer. I was looking forward to both. An old version of the Stan talk is here but I was planning to present some new material too.

The Open Statistical Programming Meetup will still be taking place with David Madigan as the speaker. We are sorry to lose Andrew but we’ll still provide a talk that day.

Excellent choice!

Hopefully the Bayesian homunculus (slide 28) won’t demand better working conditions or become friends with Mayo.

“But the history of statistics is already in some sense a history of data skepticism”

True, but it’s like making things “foolproof”. The problem is, they keep making improved fools.

To me, this is easiest to see in significance testing. The 5% rule has always been arbitrary, a bit like telling your teenager they have to be home by 10. But as the rule became more important (in getting published, and getting published became more important to securing that good job) there became more temptation to [insert term of your choice here: cheat? shade the results? overanalyze?] So the old hypothesis testing rules designed to allow us to quantify our skepticism no longer work quite as intended and we need to continually improve our tools for skepticism.

Humunculus should be doing the inference, and computer model building and checking.

I say this having wasted hours in a previous life checking residuals for time series forecasting. I wasn’t sure who was in driving seat, and who was working for whom: Me or computer.

I would love to hear that talk on data skepticism vs data nihilism. How can I con you into coming to give it somewhere in my town? What is your speaking fee?