Multilevel modeling of binomial data of the form 0/n or n/n

Xue Li and Dick Campbell write,

We are doing research on racial and SES disparities in stage at breast cancer diagnosis which involves using tract-level poverty estimates to impute person-level SES variables. We are trying to do this using Bayesian methods.

In any case, using an R program which we wrote, we were able to replicate the results from the rat tumor example on page 127 – 131 of Bayesian Data Analysis. However, results from a simple set of data appeared to depend on the proportion of cases with a 100% success rate and a 0% rate. For cases with 100% success, the accuracy of the estimates appeared to depend on the proportion of cases with 0% successes.

I have independent binomial trials (say 100), with 30% of them (30 trials) have 0 successes out of totals and 5% of them (5 trials) have all successes. My data layout is as follows.

trail Success Total
1 7 26
2 0 10
3 58 564
4 4 136
5 4 32
6 12 12
7 0 49
8 5 19
9 30 76
10 0 6
11 9 198
12 41 238
13 0 122
14 48 57
15 48 60
……………omit……………………..
41 0 15
42 7 18
43 76 636
44 24 142
45 87 182
46 0 22
47 0 23
48 0 7
49 9 9
50 67 67
51 3 3
……………omit……………………..

I am interested in the posterior probability of success. I replicated your example on page 127-131 and got very similar results. Then I used the same method to run data like the above. Then it turns out that for the trials with all successful observations, such as trial 6 and 49. The posterior probability of success is underestimated. For trial 6, iI got 0.88, for trial 49, I got 0.86.

Then I increased the percentage of trials having all successes to 30% and maintained the same percentage of trials having 0 success. Then I got a reasonable result. For the trial with 12 success out of 12 total, i got 0.97, and for the trial with 9 out of 9, i got 0.96.

The rat tumor example has no trials with all successes. Can the rat tumor example be applied to the data with extreme trials (0 success and all success) on both ends? If not, is there any modification can be made?

My reply:

It is the nature of Bayesian inference that the estimate for any particular case will depend on the other data available. (Just as, in linear regression, the prediction for any given data point will depend on the regression line, which itself is estimated from other data.) Thus, it makes sense that, when you have higehr values of the data (greater proportion of successes), that your estimates will become higher.
For trial 6, your data are 12/12 but your Bayes estimate is 0.88 or 0.97.
These are not underestimates; they are the best estimates given the data (and model) available. If you have 12/12 successes, you won’t necessarily expect a success the next time. (Just as, for example, if I am successful in 5 straight shots of a basketball, I won’t necessariy make the next shot–thus, the probability will be less than 1.) The “partial pooling” of the data toward the Bayes estimate depends on your estimated distribution for the probabilities.

Xue Li responded,

Thanks for your response. Based on my understanding, the posterior probability of success for a trial with all success observations really depends on the percentage of trials having all success. That is the posterior estimate of success for a trial (for example, 4 success out of 4 observations) will be different substantially (0.68 vs. 0.91) due to the percentage of trials with all success (5% vs. 30%). This makes sense. The higher the percentage of trials with all success, the more confidence you have that posterior estimate of success for such trials are more close to 1.

With the same set of data (independent binomial trials with 30% of them having all fails and 5% of them having all success), i tried three other methods.

1. A random intercept logistic regression model with no covariates. Assume that the random intercept follows normal distribution. I ran this model in SAS, proc mixed. It turns out that the posterior probabilities of success are inflated (above 0.9) whenever the observed probability of success is over 0.68 or so.

2. A random intercept logistic regression model with no covariates. Assume discrete unknown random intercept. I ran this model in GllAMM (stata). It turns out that the estimated random effect has a mixture distribution. One mass point at lower end (corresponds to trials with all fails), one mass point at upper end (corresponds to trials with all success) and a normal in the middle. The posterior probability of success for trials with all success are close to 1 even with the trial such as 4/4. The problem of this method is that the estimated lower and upper mass points are not very stable for different searching range.

3. A random intercept logistic regression model with no covariates. Assume semiparametric random effect. The random intercept effect is modeled by Dirichlet-Process centered on a normal with mean and variance. The posterior estimate of success is close to that from stata. The posterior probability of success for trials with all success are close to 1 even with the trial
such as 4/4.

My reply: not having looked at your data in any detail, I’ll just say that my first inclination is to favor your approach #1 above, doing some posterior predictive checking to see if the model fits the data reasonably well. I’m suspicious of mixture approaches that pick out zeroes. But it really depends on the substance of your problem, whether you’re expecting a continuous range of effects or expecting some actual zeroes.