I received the following email:

I am trying to develop a Bayesian model to represent the process through which individual consumers make online product rating decisions. In my model each individual faces total J product options and for each product option (j) each individual (i) needs to make three sequential decisions:

– First he decides whether to consume a specific product option (j) or not (choice decision)

– If he decides to consume a product option j, then after consumption he decides whether to rate it or not (incidence decision)

– If he decides to rate product j then what finally he decides what rating (k) to assign to it (evaluation decision)

We model this decision sequence in terms of three equations. A binary response variable in the first equation represents the choice decision. Another binary response variable in the second equation represents the incidence decision that is observable only when first selection decision is 1. Finally, an ordered response variable in the third stage captures the extent of preference of individual i for product j. This ordered response (rating) is observed only when both first and second decisions are 1. Each of these response variables in turn are dictated by a corresponding latent variable that is assumed to be linearly related to a set of product characteristics.

I have been able to implement the estimation algorithm in R. However, when I tried to apply the algorithm to a simulated data set with known parameter values it failed to recover the parameters. I was wondering if there is something wrong with the estimation method. I am attaching a document outlining the model and the proposed estimation framework. It would be immense help if you kindly have a look at the model and the proposed estimation strategy and suggest any improvement or modification needed.

I replied: I don’t have time to read this, but just to give some general advice: if your fake-data check does not recover your model, I recommend you simplify your model. Go simpler and simpler until you can get it to work, then from there you can try to identify what is the problem.

This is generally good advice with estimation, too, I think. Know where you want to be and create a sequence of meaningful models that go from simple to the complicated one, and provide you diagnostic information along the way. Made up example: You want to fit mixed logistic regression with three levels of nesting and a random slope at level 2 (students nested in classrooms nested in schools, for instance) and some cross level interactions. This is a fairly complicated model.

Step 1: Fit an ordinary logistic regression model including the cross-level interactions.

Step 2: Fit the random intercept model with level 2 and 3.

Step 3: Fit the random slope model.

Step 4: Vary the model you made to some degree to test how dependent your estimates are on assumptions such as orthogonality between effects at different levels.

It is rarely possible to get a complicated model to run without these intermediate steps to give you information. There are simply too many points of failure to diagnose how the model you wanted to fit will go wrong (and chances are very good it will!) without it.

Reaching beyond this specific question, I try to teach this sort of logic in my classes through various means. For instance, in categorical I usually assign some problems where estimation will fail or be otherwise snarly due to separation or a sampling 0. But students often have a hard time seeing that. Anyone have ideas for teaching this logic? (And truth be told, I’m not sure where I learned it.)

That’s a good advice. Another option is to inverse the model and use it to generate fake data, this way you can see how the fake data looks like and it gives you clues on where the problem might be.

Another useful diagnostic is to take the incorrect result and (assuming the parameters are feasible) generate new fake data. Compare this to a sample of fake data generated by the true parameters. If they’re similar in some relevant plots/summary statistics, you might actually have an identifiability or estimability problem rather than a coding one (the folk theorem again!).

The post looks as if only one fake data set was produced. I’d try it on some more to see whether the problem is rather variance or rather bias (or perhaps identifiability).