Rob Kass on statistical pragmatism, and my reactions

Rob Kass writes:

Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I [Kass] suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative “big picture” depiction.

In my comments, I pretty much agree with everything Rob says, with a few points of elaboration:

Kass describes probability theory as anchored upon physical randomization (coin flips, die rolls and the like) but being useful more generally as a mathematical model. I completely agree but would also add another anchoring point: calibration. Calibration of probability assessments is an objective, not subjective process, although some subjectivity (or scientific judgment) is necessarily involved in the choice of events used in the calibration. In that way, Bayesian probability calibration is closely connected to frequentist probability statements, in that both are conditional on “reference sets” of comparable events . . .

In a modern Bayesian approach, confidence intervals and hypothesis testing are both important but are not isomorphic; they represent two different steps of inference. Confidence statements, or posterior intervals, are summaries of inference about parameters conditional on an assumed model. Hypothesis testing–or, more generally, model checking–is the process of comparing observed data to replications under the model if it were true. . . .

Kass discusses the role of sampling as a model for understanding statistical inference. But sampling is more than a metaphor; it is crucial in many aspects of statistics. . . .

The only two statements in Kass’s article that I clearly disagree with are the following two claims: “the only solid foundation for Bayesianism is subjective,” and “the most fundamental belief of any scientist is that the theoretical and real worlds are aligned.” , , , Claims of the subjectivity of Bayesian inference have been much debated, and I am under no illusion that I can resolve them here. But I will repeat my point made at the outset of this discussion that Bayesian probability, like frequentist probability, is except in the simplest of examples a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other. , , , a person who is really worried about subjective model-building might profitably spend more effort thinking about assumptions inherent in additive models, logistic regressions, proportional hazards models, and the like. Even the Wilcoxon test is based on assumptions . . .

Like Kass, I believe that philosophical debates can be a good thing, if they motivate us to think carefully about our unexamined assumptions. Perhaps even the existence of subfields that rarely communicate with each other has been a source of progress in allowing different strands of research to be developed in a pluralistic environment, in a way that might not have been so easily done if statistical communication had been dominated by any single intolerant group. . . .

4 thoughts on “Rob Kass on statistical pragmatism, and my reactions

  1. It took me a long time — years and years — to realize that I am worse than the average person (of similar professional standing) at reading an abstract statement and figuring out what it means. I'm good at generalizing/abstracting from specific cases, but that's just about the _only_ way I can do it. I think I have some countervailing virtues, but this shortcoming often frustrates me, as it does now.

    Which is all preamble to saying that I am not sure what you mean when you say "Calibration of probability assessments is an objective, not subjective process, although some subjectivity (or scientific judgment) is necessarily involved in the choice of events used in the calibration. In that way, Bayesian probability calibration is closely connected to frequentist probability statements, in that both are conditional on "reference sets" of comparable events . . " To the extent that I _think_ I understand what you are saying, I disagree with it, but I may be completely misinterpreting it.

    Fairly often, there is no "reference set of comparable events", or if there is such a reference set it is very small. Famously, long before its first flight someone at NASA was tasked with assessing the probability that the Space Shuttle would be catastrophically lost on a mission. What is a "comparable event" in this case? You can look at other manned space missions, but those were on very different craft that didn't even use some of the same technologies.

    I suppose if you define "comparable" events broadly enough, you can always find "comparable" events. When trying to decide whether polls were overestimating support for Barack Obama for President due to people not wanting the polltakers to think they're racist, some analysts looked at elections for other offices (governor, senator, mayor.) to see if they could quantify the effect. But it's not clear (to me) that you can look at a mayoral election in a northern state and learn much about whether or how racial bias will show up in how people in Arkansas vote for President of the United States.

    Space shuttle explosions, blacks (or women) running for President, probability of a large terrorist attack in the U.S., the probability that a given country will develop nuclear weapons in the next N years, the probability of a fatal flu pandemic….there are lots and lots of cases in which you have models that need calibrating, and there's no set of comparable cases. I think one of the ways to evaluate the accuracy of these models is — indeed, has to be — subjective.

    Either that, or I simply don't understand what you're saying. That's at least equally likely, given my acknowledged difficulties in understanding abstractions.

  2. Phil:

    Calibration is an objective process (or, at least, as objective as other statistical processes), when you can calibrate. I agree with you that, in many settings, calibration is not possible. But calibration is possible in many many statistical problems, and I don't see unique events–important as they are–as essential to the foundations of statistics.

  3. I guess I see too many gray areas to either agree or disagree. Lots of problems for which people do, or should, build statistical models are "unique events" or at least rare ones, even beyond the trivial sense that every event is somehow unique. And I think if you do have comparable events to compare to, then sure, you can perform "objective" calibration, but I think that often that's not the case. Often, there's subjectivity in determining what is a "comparable" case: if I'm trying to decide whether to move my pro sports team to a new city, and thus trying to predict attendance, I might build a model that includes population, an index of sports interest, distance to other cities with teams, etc. Now I want to calibrate it. Are my comparable cases just other examples of the same sport, or do I expand my model and use other sports too? Do I just use examples from the free-agent era, or older examples too? Do I include new franchises, or just moves of old franchises? And so on. There's plenty of room for subjectivity.

  4. Nice and "Peircedestrian" – allows one to walk around with some view of subtle statistical "inferencing" by utilizing critical common sense and figure 1’s triadic split into 1 Possibility, 2 Actuality and 3 Explanation.

    Now the problem of misidentifying the model with what it is (trying to) represent is much more wide spread than in statistics. As a for instance, I’ll recall my Finance professor in MBA school scolding me for my comment “your model says the price will increase” with “What model? That’s just what happens!”

    Also teaching about models as fallible representations may have almost as large challenges as teaching statistics and it’s definitely not math with right and wrong answers that are easy to test and grade.

    The discipline does need more prose like this – thanks for posting.

    As for calibration of probability assessments, may help to use figure one where there is always a bit of each but I would agree “2 Actuality” – is dominant (objective in the sense of not being determined by us but by brute reality) but some of “1 Possibility” (subjective choice / assumptions of comparability) and “3 Explanation” (conclusions/caveats in light of 1 & 2).

    K?

Comments are closed.