Prior distribution for a predicted probability

I received the following email:

I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.”

Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate.

So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.)

However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probability in a logistic regression.

Do you have any ideas or suggestions on how to proceed?

I wrote back:

Hi, I assume I can blog your qu and my reply?

To which he replied:

If possible, it might be nice to keep out the part about ** models. I might want to keep that part quiet until I have a paper ready to publish. Perhaps just blog about the idea of a prior probability leading into a logistic regression. (Maybe the epidemiological example?) Is that possible/OK?

OK, OK, . . . in that case, here’s my advice: you can put in a prior distribution for a predicted probability in two ways. The first way is to put the prior on the parameters in the model, and just solve for the hyperparamters that induce the predictive prior that you want. The solution process is iterative and stochastic; see this 1995 paper (see section 6.1 of that paper for an example of specifying a prior distribution). The second approach is to consider your prior as data, directly on the observation of interest. That is, your predictive probability that you’re working with is some function of the parameters of the model, and you just say that you have a prior mean and sd for that probability (or, you can do it on the logit scale, whatever), and you throw that normal density into your posterior distribution. Easy enough to do as one line in Stan.

12 thoughts on “Prior distribution for a predicted probability

  1. How about using a beta distribution centered at the public’s opinion?

    However, I tend to find this a bit troublesome: ‘Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.’ While this is statistically totally ok, most people would expect to see how what kind of beliefs the data creates, not how it updates prior beliefs. In other words, I think that the non-statistical audience would like to see more or less objective Bayesian analysis, or worse still they expect the analysis to be objective and do not realize that the results can be driven by previous studies.

    Secondly, if you think of the further use of your results, it might be advisable to be as non-informative and objective as possible. Think that somebody tries to include the results in a meta-analysis of the said epidemiological topic. If the model has been constructed by using prior studies as a prior, the paper needs to be excluded. Because it is not an independent sample.

    I do acknowledge the mathematical beauty of statistical learning, updating and filtering. It’s only that I think that the world is not ready for such methods. Ideally you would have a Big Bayes machine ™ for each type of problem, and every time someone collects data on the problem, they would upload it on the machine which would calculate the updated posterior in real time, and this would serve as the Official Truth on that problem.

    • MK:

      You write: “if you think of the further use of your results, it might be advisable to be as non-informative and objective as possible.”

      It depends. Data analyses in scientific papers are typically presented, simultaneously, as a summary of the particular data at hand, and as information to be used to directly address scientific theories. It might be better if researchers did these using separate summaries: a data-only summary to convey the information that they have added with their experiment, and a Bayesian analysis that will more directly address the scientific questions. Otherwise you have things like, “We found X which is statistically significant, therefore we believe X and this tells us something about theories A and B,” which can be hugely misleading in settings where prior information is strong and the data in the local study are weak.

      • Strongly in favor of raw data summaries, based on these two replies. It certainly makes sense to make plots and tables of data without any likelihood at all. Yet I still find subjective Bayesianism somewhat problematic, or maybe I’m underestimating the intended audience of scientific papers. My concern is that they do not realize the effect that the prior (or the sampling model, for that matter) has on the results. Yet I expect everybody to understand that a p value 0.05 in a sample of 20 does not tell anything much at all.

    • MK:

      I think it is a mistake to think everyone in the scientifical community _should_ have the same joint model – split out into prior and likelihood however you wish – I believe Don Berry makes this point nicely here

      RA Fisher thought he knew how to get a (sufficient) data-only summary for what an individual study does added to other studies (past and future) – called the likelihood _function_ – but for one thing that assumes you know the data generating model for sure and would never change that as you learned about/from the other studies.

    • “Wisdom of the crowds” makes me think it’s something you can bet on (either through bookmakers or other exchanges), like sports, elections, credit defaults, etc.

  2. Or you can make up data. By which I mean, generate pseudo-data that gives the right prior distribution, then feed this to your logistic regression estimator with the actual observed data. This implicitly puts the right prior on the parameters.

Leave a Reply

Your email address will not be published. Required fields are marked *