Checking your model using fake data

Someone sent me the following email:

I tried to do a logistic regression . . . I programmed the model in different ways and got different answers . . . can’t get the results to match . . . What am I doing wrong? . . . Here’s my code . . .

I didn’t have the time to look at his code so I gave the following general response:

One way to check things is to try simulating data from the fitted model, then fit your model again to the simulated data and see what happens.

P.S. He followed my suggestion and responded a few days later:

Yeah, that did the trick! I was treating a factor variable as a covariate!

I love it when generic advice works out!

3 thoughts on “Checking your model using fake data

  1. I’ve been doing a lot of thinking about checking my own multilevel models with fake data, but not quite sure how to deal with the “random effects part” of the simulation.

    I’ve experimented with generating random intercepts and slopes using rmnorm(), but when I come to fit models the variance associated with the intercepts and slopes never seem to match what was in my variance-covariance matrix I used with rmnorm() — in fact, the sizes are often reversed (although the correlations are often very close).

    I had always thought that if I wanted to check a model, I should use a matrix consisting of the variances that came from the fitted model but seeing that in practice they frequently differ I’m not quite sure.

    Is this not a big a deal as it seems to me to be? Or am I doing something wrong?

  2. Per the advice of many faculty, I’ve recently begun simulating data to test out my model. This has typically involved rmnorm. Can you explain what you mean by simulating data from the fitted model? Does this mean simulating data equivalent to the model implied covariance matrix (i.e., a perfectly fitting model)? Or is this something altogether different.

  3. Pingback: Testing the general clustering algorithm » Source-Filter

Comments are closed.