Perhaps that sentence would be better phrased “In that case, a prior of N(0,1) could be considered inappropriate, in that it puts most of its mass on parameter values that are unrealistically large in absolute value.”

]]>That’s true. In fact, it seems highly under represented in the scientific journals.

Justin

]]>Yes, the terminology is ambiguous. N(0,1) is too weak in that example—but in another sense it’s too strong a prior, in that it leads to strong claims about large values of theta.

]]>Ignoring the ungrammatical first sentence, the second and third sentences don’t make sense to me. Isn’t the N(0,1) prior here too weak, not too strong? What am I missing?

]]>It’s easy to see the difference in made-up examples, but I’m not sure if they are so made-up that they don’t inform us about real, practical situations… but here we go in any case:

Consider a sort of “noiseless detection” scenario. Always, when the signal is below a certain value, the detector will report 0; when signal, S, is above this threshold the detector reports 1. At least that’s our model!

Now, if we have a set of observations like this…

S R

0.1 0

1.2 0

3.4 0

5.7 1

8.9 1

…the likelihood will crop the posterior distributio somewhere between3.4 and 5.7, depending on the prior. However, it is easy to see that if we introduce a single outlying value like this…

S R

0.1 0

1.2 0

3.4 0

5.7 1

8.9 1

9.2 0 !!!OUTLIER ALERT!!!!

…the whole thing breaks down since the likelihood is not able to incorporate the outlier. It does not matter what we do with the prior. Only by modifying the likelihood, by saying e.g. that there’s 0.02 chance of 1 when the signal is below threshold and 0.98 chance of 1 when above can the situation be fixed.

A more realistic example would be one of logistic regression, as was case in OP, but I’m not good enough to work out the maths in this sort of simple form. But in my experience similar problems can arise: the expected probabilities go to 0 or 1 — due to imprecision in doubles, I reckon — but there’s an outlying observation that ruins it for the whole set of theta. But that’s a different story.

]]>The problem is that people don’t analyze all their data at once. When you’re only analyzing a small amount of data, your data can be consistent with all sorts of parameter values that make no sense. And if you just let your data go and overfit, you can end up with ridiculous inferences and wrong scientific conclusions.

]]>The data would tell you if the coefficients cannot be like 5 or 10 or 100, no? I’d worry about situations where we think it cannot be 5 for whatever reason, and it turns out it can be 5 but the prior we used disallowed that.

Justin

]]>jd: quick web search finds legal free download of O’Hagan (1979). btw Google scholar button makes it very easy to find downloadble files.

jd: “Since the coefficients are on the log scale” I guess there is typo here, but that is valid point that O’Hagan (1979) is not directly applicable here. Andrew’s response is what I would have said, too.

We’re doing more research on priors and you can expect that there will be changes in the recommendations later this year.

]]>However, here we are talking about prior distributions for _parameters_? That’s a different issue, no?

In logistic regression the model for observations is the binomial distribution. Problems arise if there are e.g. negative responses when a the probability for them is essentially zero (for some set of parameters). Accomodating these outliers is not done by giving _the parameters_ more uninformative priors; these are accomodated by e.g. mixing the binomial distribution with uniform distribution. This will limit the lowest and highest probabilities inside in such a way that a single outlying response won’t collapse the probability for the set of parameters to log(0).

]]>I’m not sure, because now we do recommend normal(0, 1) or normal(0, 2.5) as default weakly informative priors. Perhaps the issue here is that we want our models to be on unit scale, so we just would not expect to have coefficients like 5 or 10 or 100 etc. If your problem might not be on unit scale then it would make sense to have another prior on the scale. That prior would be weak on the log scale. Taking a scaled normal(0, 1) prior and then averaging over uncertainty in the scale will give you something like a t prior unscaled. I guess that’s the right way of thinking about it. I’ll add this to the wiki.

]]>(Sound is missing, so I just flipped through it. Hope its not too out of date.)

]]>The article is behind a paywall. Could you explain why something like normal(0,5) or even normal(0,2.5) wouldn’t be weakly informative for logistic regression coefficient parameters? Since the coefficients are on the log scale this would seem like rather weak priors, no? A coefficient parameter of 5 seems rather large (or -5 rather small).

]]>