Mikhail:

See here.

]]>Also a good point. In other words, not only is there a danger that your sample size would be insufficient, there’s also the practical problem of not really knowing what it is.

]]>Im still missing the point here…

Lets say I run my MCMC for 10000 samples, and my true parameter is ranked at 13% (15% of samples are lover than the true, 87% are higher).

Now of I thin these samples to 100, may true parameter still should be ranked at about 13%, should it? So what is the gain?

Jonathan:

The point of thinning here is that it simplifies the joint distribution of the posterior simulations, making them close to independent, which makes it easier to understand the properties of the statistics comparing the true values of the parameters to the posterior simulations.

]]>Am I right that thinning is almost a red herring, and that the main problem is insufficient effective sample size? In other words, I think the key is to run for long enough, and that the main reason to thin while you’re doing that is efficiency – that if you have a large-enough effective sample size, the auto-correlation shouldn’t hurt you.

I’m not saying we shouldn’t thin, I’m saying I’m confused by the claim that thinning (and removing autocorrelation) is necessary.

]]>There is Python code to do this in the github repo associated with the paper: https://github.com/seantalts/simulation-based-calibration

]]>I wonder if you have any video lectures available? It interesting to see how your style of presentation works in life performance. ]]>

It would be intriguing if you could find histograms that resembled putting one’s hand on one’s hip, or pulling one’s knees in tight…

]]>I take your point about ties and detecting structural problems, but running an algorithm on real data doesn’t tell you anything about whether the algorithm is computing what it’s supposed to compute. It is quite possible that the algorithm is faithfully reproducing a model that doesn’t fit the data.

]]>Running the algorithm on real data. (Again, this checks literally nothing.)”

Not to be nitpick, but that’s just not true. It’s certainly true that *alone* does not validate that your sampling algorithm is correct, but a lot of bugs are caught with *both* these procedures. The number of peer-reviewed algorithms that out-and-out fail on most of the available real datasets is surprisingly high (of course, they tend not to gain much popularity). This can be as simple as failing when there are ties in the data; never happens with continuous simulated data, very frequently happens with real data!

The reason this is a nitpick worth having is that I don’t believe the proposed methods are “100% validation” that the samples approximate the posterior, and you claim that’s not true either. So it’s a bit disingenuous to say “these methods do nothing” when, in fact, they would be helpful for catching certain bugs that seem to show up more than they should. Both these methods can catch a percentage of issues, and, empirically, the “Running the algorithm on real data” percentage is much higher than we care to admit sometimes!

]]>Should you not be familiar with it, and care about this sort of thing, Eagle-Eye Cherry made a sweet cover of LL cool J’s “Mama said knock you out” (which i think is where you got the “don’t call it a comeback, i’ve been here for years” from ?)

]]>