What better way to start then new year than with some hard-core statistical theory?

Ryan Martin and Chuanhai Liu send along a new paper on inferential models:

Probability is a useful tool for describing uncertainty, so it is natural to strive for a system of statistical inference based on probabilities for or against various hypotheses. But existing probabilistic inference methods struggle to provide a meaningful interpretation of the probabilities across experiments in sufficient generality. In this paper we further develop a promising new approach based on what are called inferential models (IMs). The fundamental idea behind IMs is that there is an unobservable auxiliary variable that itself describes the inherent uncertainty about the parameter of interest, and that posterior probabilistic inference can be accomplished by predicting this unobserved quantity. We describe a simple and intuitive three-step construction of a random set of candidate parameter values, each being consistent with the model, the observed data, and a auxiliary variable prediction. Then prior-free posterior summaries of the available statistical evidence for and against a hypothesis of interest are obtained by calculating the probability that this random set falls completely in and completely out of the hypothesis, respectively. We prove that these IM-based measures of evidence are calibrated in a frequentist sense, showing that IMs give easily-interpretable results both within and across experiments.

I find this stuff difficult to understand, but Chuanhai is a deep thinker and I encourage the theoretically-minded among you to take a look. My general impression is that this work is an attempt to provide a foundation for inference based on inverting hypothesis tests—or, to put it another way, inference about the subspace of parameters in which the model is consistent with data. As I’ve discussed (but unfortunately never written up in any formal way), the naive view of Neyman-Pearson inference by inverting hypothesis tests does not in general work (if by “work” you mean get inferences that make sense in particular cases). But I’m sympathetic to the idea that there’s a way to do this right, involving some sort of calibration. I remember thinking about this when writing my Ph.D. thesis—I actually had a chapter on it, I think—but I was never satisfied with the results.

Welcome to an error statistical version of confidence intervals.