Modeling missing data and fitting it jointly will not provide the same result as multiple imputation. Multiple imputation is an approximation that pipes the output of the imputation into the model, ratehr than modeling the missing data and everything else jointly.

If the outcomes are missing at random, what Andrew’s suggesting is factoring the posterior with known outcomes $latex y$ and missing outcomes $latex y’$, with parameters $latex \theta$ as

$latex p(y’, \theta \mid y) = p(\theta \mid y) \cdot p(y’ \mid \theta).$

Then in Stan, you can code the $latex p(y’ \mid \theta)$ component in the generated quantities block very efficiently.

]]>And a friend of mine who works for the Federal Reserve says that there are similar stories about missingness in bank reports – who’da thunk? So I’m a bit leery when I hear talk about modeling missing data, especially when the data comes from an organization that may be evaluated using the data. Astronomical observations, not a problem – political science, un uh. AmIrite, Andrew?

]]>