I’ve been meaning to follow up two comments you made in passing about priors:
1. You said you didn’t like Dirichlet priors for multinomials because they didn’t model covariance. What alternative do you suggest?
2. When I told you I was using the prior from the hierarchical binomial survival example from [page 128 of] your BDA book, you said you didn’t like that prior any more. Why and what would you suggest as an alternative?
The book model reparameterized Beta(a,b) in terms of mean a/(a+b) which got a uniform prior, and scale a+b with a Pareto(1.5) prior [p(a+b) proportional to (a+b)**-2.5].
It works fairly well in practice, though it does lead to a fair number of large scale (a+b) samples.
I used your prior for baseball batting average estimation; the post includes the raw data (2006 AL position players) in tsv form, BUGS code, and the R calling harness.
I also use your prior for a hierarchical model of diagnostic test accuracy in epidemiology (or other data coding tasks).
I have longer versions of that paper with more analysis, simulations, data, alternative item-response type models, and pointers to all the code and data.
The basic epidemiology model keeps getting rediscovered. I’m still the only one who’s drunk
enough of your Kool-Aid to go the full Bayesian hierarchical model route.
0. When I started blogging, I made a conscious effort to be serious and focused on research. I didn’t want to be like the many bloggers out there who had academic credentials but really just mouthed off on current events. (I’m happy to provide my take on current events, but I try to add something new when doing so, not just giving my political opinions as if anyone should give a damn about what I happen to think is right and wrong about the world.) Anyway, I took a look at Bob’s blog and he does me one better: at Bob’s place, it’s all business, all the time. Talk about restraint!
1. For modeling parameters that sum to 1, I prefer the following sort of model. Suppose you are modeling a_1, a_2, a_3, a_4, where a_1+a_2+a_3+a_4=1. Then I’d assign a multivariate normal model to a new set of random variables b_1, b_2, b_3, b_4 and define, a_j = exp(b_j) / (exp(b_1)+exp(b_2)+exp(b_3)+exp(b_4)), for j=1,2,3,4. This model has extra parameters that give you more flexibility compared to the Dirichlet. There’s also a slight nonidentifiability–you can add a constant to all four b_j’s without changing the model–but that’s no big deal; you’re only using the b’s as a way to model the a’s.
2. Yeah, I’m not so thrilled with the prior distribution in Section 5.2 of BDA. I was trying too hard to come up with a natural noninformative distribution. The solution I came up with was reasonable, and it was moderately clever, but now I lean toward more brute-force approaches, and also I prefer weakly-informative priors. Here’s what I’d do if I were rewriting this chapter today:
The challenge is to set a prior for the hyperparameters (a, b) for the Beta (a, b) population distribution on the probabilities of tumor in a bunch of rat experiments. In the book, we first transformed to (a/(a+b), 1/sqrt(a+b)). This makes sense: a/(a+b) is the expected value and 1/sqrt(a+b) is close to the standard deviation of the population distribution. I think a Uniform (0,1) distribution on a/(a+b) is reasonable, but I’m no longer such a fan of that Uniform (0, infinity) distribution on 1/sqrt(a+b), which we did in the book. Instead, I’d probably prefer something like a half-Cauchy (0, 1). I don’t think it would make much difference in this example, but I’m moviing away from the whole “noninformative” thing.