Skip to content

Do we still recommend average predictive comparisons? Click here to find the surprising answer!

Usually these posts are on 6-month delay but this one’s so quick I thought I’d just post it now . . .

Daniel Habermann writes:

Do you still like/recommend average predictive comparisons as described in your paper with Iain Pardoe?

I [Habermann] find them particularly useful for summarizing logistic regression models.

My reply: Yes, I do still recommend this!


  1. BozoBrain says:

    Are these essentially the Average Marginal Effects which are obtained from the “margins” command in Stata (and the newish margins command in R)? Note that these quantities can be obtained from post-processing MCMC (e.g.Stan) output.

    • Andrew says:


      I don’t know what Stata etc. do, but the average predictive comparisons in our paper do not in general correspond to effects. They can be causal effects under certain assumptions, but comparisons is a more general term.

      • BozoBrain says:


        Thank you. I didn’t mean to wade into the muddy waters of causal inference. However, the approach of averaging over empirical distributions of predicted values is really catching on in the Stata (and now to some extent R) world. For a Stan example, see the first answer to such a question here:

        Frankly, Stan and other MCMC software seems to be an ideal platform to compute these quantities without resorting to normal asymptotic theory which can lead to biased inference.

        • > averaging over empirical distributions of predicted values

          I think this means averaging over *posterior* distributions of predicted values. Empirical distributions are distributions of observed quantities, but predictions are not observed, except in the trivial sense that when you ask a computer to calculate something you can then observe what it calculated.

          The ultimate point of that bit of pedanticness is that in order to have a distribution to average over, you must have a posterior distribution, in other words, you must be doing Bayesian analysis.

          • Carlos Ungil says:

            > In order to have a distribution to average over, you must have a posterior distribution, in other words, you must be doing Bayesian analysis.

            If you have a model y=f(x1, x2, x3, …) that gets you predicted values for a given set of covariates, to get average predicted values you just need a distribution of covariates. It doesn’t have to be « Bayesian », it could come straight from the census.

            • Sure, yes you can always average across a population too. I wasn’t clear that was what was being suggested. I guess there are two modes of analysis here. One you can average a predicted change across the population. You can do this regardless of your modeling method. It gives you some kind of population information, a single number, and two you can average a predicted change across a posterior, and get a predicted change for each unit, it gives you a population of expected changes.

              • Andrew says:

                Yes, in our paper we average over the posterior and also we average over the data, which represents an average over a hypothetical superpopulation.

            • BozoBrain says:

              One useful application is with simple Gaussian regression with interaction terms:
              y = beta_0 + beta_1*x_1 + beta_2*x_2 + beta_3*x_1:x_2

              Here one might be interested in the marginal “effect” (not necessarily causal) of x_1. But what about the interaction with x_2? One could plot various dose-response type curves of x_1 versus y for various values of x_2. But the margins approach allows for a single x_1 versus y dose-response graph.

              The trick is this. Suppose there are individuals i=1,…,N. Then for a particular value of x_1, predict y for x_2_{i=1}, x_2_{i=2), etc. Then average the predicted values with error. This allows for examination of the effect of x_1 on y, averaging over the empirical values of x_2 in the data. This can be done in a Bayesian for frequentist framework.

              There are many variations of this, and is quite useful for non-linear (e.g. logit) models. It appears to have become very very popular and many of us would be giddy if rstanarm or brms would support a full-featured “margins” like command. Many researchers use Stata largely because of this feature.

Leave a Reply to BozoBrain