The idea that you don’t believe that theta is greater than 0 because “it is consistent with noise” really seems to be the failing to reject the Null implies the null fallacy.

Either you really doubt the prior (okay, that’s the point of this post…) BUT in a way that puts a spike at 0 since I assume you would say the same thing if we observed y = -1 (which I don’t think you like to do given your other posts) OR you really doubt the N(theta, 1) distribution of the data (valid doubt, but not about the prior!).

I get that it’s evidence that the maybe a flat prior is saying something…except I would guess an informative prior in this problem would be something like N(0, 10) which would result in a nearly identical answer!

]]>Since you are talking about the statistical interpretation of performing a given measurement, and one could place that interpretation in a number of different (and unstated) contexts – and the interpretation would be correct and uncontroversial in *some* contexts – then, because you wouldn’t ever write about something that *was* correct and uncontroversial, I have to imagine another context under which an interpretation using an uninformative prior would be dangerous and/or controversial.

The sentence “Most of the published studies I’ve seen that have featured statistically significant p-values do not look like this” gives the game away: despite appearances, this isn’t a post about measurement errors, it’s another post in the ongoing series about p-values, or at least about mistakes made by people who habitually use p-values, or who would do so if they could get away with it any more. And the context is measuring alleged effects for which the default prior — one might almost say null hypothesis — is that their value is in fact very close to zero?

OK, so example 3 was of this sort; but example 4 doesn’t look like mutant-frequentism to me, on the face of it it’s a type S problem.

]]>W_beta = {x|P(x) greater than beta }

choose beta so W_beta contains almost all the mass. Say 99% of it.

]]>A: “theta_true is in [-100,100]”

or

B: “theta_true is less than -5 or greater than 5”

Statement A is true while B is false. That’s what makes my prior better than yours.

]]>Doesn’t look such a good method (or answer) now, does it?

]]>I agree that there are settings where an (approximately) uniform prior distribution makes sense. These settings are those in which the data are much stronger than the prior. Your example of a precise temperature measurement and very weak prior distribution (merely the statement that a measurement was performed in a particular country) is one such example. Most of the published studies I’ve seen that have featured statistically significant p-values do not look like this. To put it another way, if a result such as p<0.05 is considered newsworthy, this already implies (in some sense) a strong prior centered around zero, so that it is considered something of a surprise for the measurement to be far from zero. But, yes, in regard to your comment, there are definitely settings where inferences from the noninformative prior are reasonable. In my post, I was focusing (implicitly) on the more controversial settings.

]]>Would you say that my example of measuring the temperature at a mystery location in Canada is one of these typical cases?

If not, what would be a typical case where the inference is wrong?

And isn’t it deeply ironic that the example can, apparently, only be understood if the reader already possesses a lot of contextual information which is not given in the post?

]]>The problem comes back to this issue of mixing definitions of probability. The non-informative prior was chosen to reflect a state of knowledge, so we can’t suddenly change the interpretation of the posterior probability as a frequency. For the posterior interpretation to work as a frequency interpretation, the prior has to be calibrated to reflect the base rates of theta values.

]]>Suppose you wanted to measure the temperature at some time and place and had no prior information except that the place was in Canada. Your observation is 1 deg above zero with 1 deg standard uncertainty; how certain are you that it’s really above zero? I would be fairly certain; the 5-to-1-ish odds ratio seems reasonable to me.

]]>Of course it depends on the context. Depending on the scaling of the problem, an effect of 100 could make sense. I try to scale things so that effects are of order of magnitude 1. For example, in logistic regression you’re not going to see an effect of 100, similarly in econ you’re not going to see an elasticity of 100 if you’re working on the log-log scale.

With regard to your last point, I wouldn’t frame this as “second-guessing someone’s prior.” A better way to put it would be that people use conventional models that include much less information than is actually known. Such conventional models include linear regressions etc. as well as uniform prior distributions. If data are strong, you can often do just fine with conventional models. But if data are sparse, it can often make sense to go back and add some real information to your model, in order to better answer your scientific questions.

To put it another way, an analysis based on a conventional model can (sometimes) tell you what’s in the data. But scientific reports typically don’t just report information in data, they also make general claims about the world, and for that it can be a terrible mistake to ignore strong information that is already known.

]]>I’m referring to theta=0 as “pure noise” in the sense that, in this simple example, we can write the model as y = theta + epsilon, where epsilon is an independent error term. Here, theta is the signal and epsilon is the noise. If theta=0, that’s pure noise. I have no deeper meaning that that.

]]>Yes, that’s my point. A conventional or purportedly noninformative model can be a useful starting point but we have to be ready to move on if it gives implausible inferences.

]]>I’m not disagreeing with your example, of course. The issue (as I see it) is the assumption that a uniform (or indeed any other) prior can represent “ignorance”.

]]>Suppose I, in your presence, choose an independent uniform choice t from [-100, 100] – we agreed on this (it’s effectively fixed) and – then – observed y = t+1. Would you then feel that 84% posterior that theta > t is “too high”.

Because (and especially if you say no) it sounds as though you want to second guess someone’s prior on the basis on what subsequent questions they ask about the posterior. In practice, fair enough. In theory and philosophy, what a rathole.

]]>thetasim = runif(n=100000,min=-100,max=100) # instead of an infinite distribution, I’ll use uniform [-100,100]

ysim = rnorm(n=100000,mean=thetasim,sd=1)

Now look at all of the theta for which ysim was near 1; what fraction of these are from theta > 0?

yes1 = round(ysim) == 1

sum(thetasim[yes1] > 0)/sum(yes1)

For a particular set of random draws (the first and only one I’ve done), I got 0.82. If the theta parameter could be anywhere from -100 to 100, and you draw from y ~ N(theta,1), it really is a very good bet that theta > 0.

You obviously know this, so…I guess I don’t get the point of that example, which you say is your new favorite! Perhaps you’re saying that in most real-world circumstances that people use infinite uninformative priors, if they actually see a number that is near zero — anything with an absolute value below 10, maybe below 100 or 1000 — then they should reconsider their prior, because if the parameter value really could be “anything at all” then why is it so small, there’s probably a reason that we could figure out if we tried. Or something like that?

]]>A: “theta’s in [-1,3]”

while a more informative prior says:

B: “theta’s in [-.1,.1]”

then if theta=0 both statements are correct. The latter is simply more informative. Since B implies A it’s not possible to say the former’s wrong while the later is right. Why is this so hard for people to understand? I really don’t get it.

]]>To be devil’s advocate, you might say that this prior is too informative because it assumes the variance is known, which is never true in practice. Any prior on the variance would alleviate this problem. If one used a Jeffreys prior on the mean and variance then the posterior would still be improper after one observation.

Or to be more even-handed, you could say that strong assumptions in one part of the model can bleed into so-called non-informative priors for other parameters, rendering them highly informative.

]]>I think that is the issue here, how does one _interpret_ posterior probabilities?

Obviously in the context of the appraised credibility of _the_ prior(s) and data model(s) used.

But Andrew seems to be pointing to the frailty of noisy data, even for thought experiment true models, perhaps in a Rubinesque repeated use relevant way? (1984)

(Perhaps something to work on over the weekend.)

]]>Thanks! I will read those.

PS. To clarify, I don’t think your models specifically are problematic; my concern was about using informative priors in subject areas where the priors are not strongly data-linked and hence where large flexibility & disagreement exists in the particular choice of priors.

]]>Bottom line: non-informativeness is in the eyes of the beholder. If there is a formulation of your problem that you are comfortable reasoning about, choose priors that best corresponds to your state of knowledge (or ignorance) in that formulation/parameterization. But don’t expect these non-informative priors of yours to map to non-informative “folklore” priors in a different parameterization.

]]>I recommend that you (and others who think my models are “a can of worms”) to read my recent AJPS paper with Yair and my forthcoming JRSS paper with Kenny (for details on two particular cases) and BDA (for more general principles.

]]>What I think is these are great cases where rich data exists to construct a good data-based prior. The applications of Bayesian reasoning that make me uncomfortable are ones in which researchers pull a fairly subjective prior out of a hat and multiple researchers do not even show much consensus as to what prior is the right prior. I feel that is a can of worms.

I suspect Andrew’s examples of voting, school vouchers, ethnicity etc. are in that category. I may be wrong. Perhaps there are obvious, uncontroversial priors there?

]]>e.g. If it were human babies a n=3 study of this nature is probably silly.

]]>I’m unclear on why you are labeling theta=0 as “pure noise”, but it suggests that you have some concrete examples in your head. Is it perhaps the case that they are of the type where you have strong reason to expect that theta is close to zero (e.g. theta represents some effect that you expect may well be negligible)? Would you make the same claim if theta were, say, a temperature reading?

]]>But _your_ frequentist question has a case where there is a real (and thus acceptable-to-him) distribution over theta, and he asks (by your account) the Bayes-rule-driven question: what is probability that theta is > 1 conditioned on his data. He’s probably feeling lucky, because that’s not his everyday case (and in his everyday case, he is NOT going to make up a distribution.) But given that he has this distribution, where do he and the Bayesian collide? His distribution might not match your [-100,100] prior, but then two Bayesians might disagree too. I just don’t see what nonsense one can generate vis-a-vis _this_ Frequentist Question.

Sorry for missing your point.

]]>This was an e-commerce application; unfortunately I can’t go into the details.

]]>In other words, your complaint is relevant in a certain context where everyone already knows implicitly that there’s a special value 0 and your priors really should be taking this into account.

]]>Yes. To put it another way, the very fact that zero is being used as a comparison point (with statements such as, “the estimate is only 1 standard error from 0”) typically implies a prior distribution in which zero plays a prominent role.

]]>So, Andrew’s more or less saying that in that context, we should often be using prior information to constrain things to be not so subject to noisy random small data sets.

]]>I just don’t believe that P(theta>0|y)=0.84. To put it another way, I don’t think that, if the study were repeated with a huge sample size, there’s an 84% the result would go in the same direction. 5:1 odds seem too strong to me, for a pattern that could easily have occurred by chance.

]]>So you think the data’s consistent with theta=0 and the posterior thinks the evidence is consistent with theta=0. However, the evidence is also consistent with other values of theta, which the posterior also takes into consideration. It’s a mystery why you think that’s a bad thing.

]]>http://xianblog.wordpress.com/2013/11/21/hidden-dangers-of-noninformative-priors/

]]>Basically, we use strong priors to combine different types of data about the same thing. And we never have the data we wish to have, just the data that we happen to have. So even a little bit of information in a prior can help a lot. The thing to note about these examples is that the prior is “strong” only in particular regions. It mainly serves to jointly truncate the radio carbon estimates.

]]>