Bridges between deterministic and probabilistic models for binary data

Posted on June 27, 2011 3:12 PM by Andrew

For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes.

The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and with empirical examples from archeology and the psychology of choice.

Here’s the article (by Iwin Leenen, Iven Van Mechelen, Paul De Boeck, Jeroen Poblome, and myself). It didn’t get a lot of attention when it came out, but I still think it’s an excellent and widely applicable idea. Lots of people are running around out there fitting deterministic prediction models, and our paper describes a method for taking such models and interpreting them probabilistically. By turning a predictive model into a generative probabilistic model, we can check model fit and point toward potential improvements (which might be implementable in the original deterministic framework).

2 thoughts on “Bridges between deterministic and probabilistic models for binary data”

anon on June 28, 2011 7:23 AM at 7:23 am said:

Maybe I'm missing the point, because I don't think anyone would disagree with this premise regarding the utility of casting a deterministic model as a stochastic one in the first place. I think when people go with deterministic models, they tend to do it because either (or both) 1) the degree of stochasticity in the underlying phenomena is very small relative to the measurement of interest 2) a deterministic framework renders a solution tractable. Of course, going that route, even when it seems justified, can miss a lot of the story – I assume most people acknowledge this. Maybe this message is aimed at a specific audience or I'm missing something here.
Andrew Gelman on June 28, 2011 11:34 AM at 11:34 am said:

Anon:

The point of our paper is not to tell people that deterministic models are bad. Rather, we're giving a method where you can take an existing deterministic model and interpret it probabilistically. We think this should make these existing deterministic approaches more useful, but putting in a little bit of uncertainty without changing the underlying character of the model.

Comments are closed.