In Bayesian inference, do people cheat by rigging the prior?

Ulrich Atz writes in with a question:

A newcomer to Bayesian inference may argue that priors seem sooo subjective and can lead to any answer. There are many counter-arguments (e.g., it’s easier to cheat in other ways), but are there any pithy examples where scientists have abused the prior to get to the result they wanted? And if not, can we rely on this absence of evidence as evidence of absence?

I don’t know. It certainly could be possible to rig an analysis using a prior distribution, just as you can rig an analysis using data coding or exclusion rules, or by playing around with what variables are included in a least-squares regression. I don’t recall ever actually seeing this sort of cheatin’ Bayes, but maybe that’s just because Bayesian methods are not so commonly used.

I’d like to believe that in practice it’s harder to cheat using Bayesian methods because Bayesian methods are more transparent. If you cheat (or inadvertently cheat using forking paths) with data exclusion, coding, or subsetting, or setting up coefficients in a least squares regression, or deciding which “marginally significant” results to report, that can slip under the radar. But the prior distribution—that’s something everyone will notice. I could well imagine that the greater scrutiny attached to Bayesian methods makes it harder to cheat, at least in the obvious way by using a loaded prior.

30 thoughts on “In Bayesian inference, do people cheat by rigging the prior?

  1. >> But the prior distribution—that’s something everyone will notice.

    Objection, your honor. It is almost impossible to publish the prior in a medical journal, so nobody will notice. Therefore, reviewers are cautious: “Please focus on the results, and use objective methods without personal priors”.

    Please don’t kill the messenger… Should I start to argue “we have tried with different priors, and results are insensitive to these within a large margin”. Reviewer: then why not use … ahem … objective methods.

    Sorry, I am sympathetic to your arguments, but life out in medical reviewer hell make one cynical. Journal with professional statistical reviewers are rare, and you cannot always publish in BMJ or Nature

    • It’s interesting how much this varies from field to field. Reviewers in my area, hearing research, have not even blinked at the notion of subjective priors. The reviewers *do* regularly ask for binary results (‘significant’ or ‘not significant’), but I’ve never seen them care about prior subjectivity. On the other hand, our research group is one of the only people in this entire field to use Bayesian methods. My guess is that the potentially “insidious” nature of subjective priors has not yet entered the reviewer’s mindset.

  2. > scientists have abused the prior to get to the result they wanted

    That is the point of the prior though — to get a regularized estimate that you wanted! If you aren’t doing this, you’re missing out.

    > But the prior distribution—that’s something everyone will notice

    No way. It’s fairly tricky to figure out what’s happening with priors in things like brms and rstanarm — at least compared to the difficulty of using them. And what does a horseshoe prior even mean? Like, I go copy-paste from the paper, but I’m not trying to get deep into the details usually.

    • I think Andrew’s point is that you have to say what prior distribution you assumed, so if you cheat everyone can see it. You’re right that if you just skim right past that part of the paper, or just accept the prior without thinking about it, you won’t notice, so yeah, not everyone WILL see it. But every one CAN see it. It would be a bold and risky way to cheat.

      • Yeah I guess I’m exaggerating. Like you can generated code with brms and look up the docs with rstanarm and it’s not too hard.

        And if you just don’t report the prior and someone asks and you don’t dig it up — that’s hardly the fault of Bayesian inference.

        I absolutely agree with “I’d like to believe that in practice it’s harder to cheat using Bayesian methods because Bayesian methods are more transparent.”

  3. In my “Objections to Frequentism” (http://www.statisticool.com/objectionstofrequentism.htm) I note several times where a prior gives ‘Drake equation like’ results, ranging from basically 0 to 1 in a posterior given choice of prior. For example:

    -proofs of God existing, the resurrection of Jesus, and other miracles, that rely on Bayesian statistics subjective prior probabilities, by believers (prior and posterior around 1) and by skeptics (prior and posterior around 0)

    -Jahn et al ESP studies, they write “Whereas a classical analysis returns results that depend only on the experimental design, Bayesian results range from confirmation of the classical analysis to complete refutation, depending on the choice of prior.”

    -In my “Plenary Session 2.06: Frequentist Response” (http://www.statisticool.com/plenarysession.htm), Harrell talks about a study. I wrote

    “For one study they discussed, different priors changed the posterior drastically, the posterior probability ranging from .05 to .70 to .998 (let’s just say from 0 to 1). This is like a “Drake equation”, where even if the mathematics is correct (it is), the user can get just about any output, and hence decision, based on their inputs.”

    Anything can be gamed, and I don’t agree that using Bayesian methods would necessarily lower the amount of gaming. Bayesians would have similar things to game, data editing, as well as prior hacking, MCMC settings, sensitivity analysis choices, BF cutoffs, and the ignoring of multiplicity adjustments and stopping rules. Ultimately the people engaging in QRPs are at fault, not the methods. On non-NHST research, Elise Gould said “Non-NHST research is just as susceptible to QRPs as NHST.”

    I don’t believe for a second a large percentage of statisticians, of any stripe, are engaging in these behaviors though, but rather doing honest work. I guess having prior as empirical as possible or not used as all, and everything made transparent as possible is way to go,

    Justin

  4. Well, there is loads of literature in which priors are not properly justified, but I tend to think that in the vast majority of cases researchers are not cheating but rather are lazy and choose something that is nice to implement, has been used by others before, or is advertised as some kind of “default”. Maybe they also don’t want to be accused of “subjectivity” openly discussing their choices, although not discussing them surely doesn’t make the prior more objective.

  5. In science, cheating/lying gets caught the same way as any other source of error. Independent replication.

    Just bringing this up shows their field is actually not science, because they are not doing the part of science required to catch mistakes/cheating.

  6. A prior *encodes* your background information, it doesn’t define it. It can be difficult to encode that information, like it would be hard to write a song by directly typing all the MIDI messages into a file. If you listen to the MIDI file and find that it doesn’t have the right tempo, and the instruments that come in on the solo are violins instead of trumpets… you go back and re-encode the song to match what you meant. That’s *supposed* to happen.

    So, in fact in reality during Bayesian fitting, you’re adjusting the prior. In an example I just did a bunch of work on this week, I was trying to infer a dimensionless “density” of proteoglycands in cartilage. The density was measured by staining and taking a picture, and counting the pixels that were stained at least to a certain degree. The staining and lighting and exposure and soforth are all chosen subjectively to get a “good” image visually, and then kept the same across all the samples, the only meaningful quantities are the *ratios* between staining amounts in different conditions.

    I would fit this model using a strong regularizing prior on the control condition to ensure that the average inferred density across all the control samples was 1, because *that’s the definition of the dimensionless ratio, it’s the ratio of the average of any condition to the average of the control condition*. But in the process weird things could happen… I had some indexing bugs for example, and things turned out to make no sense because of the bugs. So one thing I would do is unconstrain some of the priors to see what happened, to try to find out if the results were a model error, or a programming error… Then when I found the programming error, I would re-constrain the prior to see how the model worked.

    This kind of thing *should* happen all the time in your model development. What *shouldn’t* happen is something like “we got set up with priors that made sense to us and debugged everything, but then in the final result, the treatment was really close to the same results as the control, so we just tightened up the prior on the treatment effect to keep it from getting close to the control values and ensure that the treatment was at least 20% more effective than control…”

    It’s one thing to impose essential model meaning via the prior, or to calibrate the prior predictive distribution of outcomes, but it’s another to adjust the prior to change the final inference to be something you would like to see. I think usually this would require very strong priors if you’ve got even a modest amount of data (20 data points say). Therefore, in most cases where the answer is believable, the prior bias required to force it to be “what someone wanted” would be extreme and noticeable… whether an individual would in fact notice it or not is a different question, but if you looked at the prior you’d say “why is there this prior that says normal(1.2,0.01) that seems like a lot of constraint on the outcomes you’d accept” or something like that.

    • There’s a fine line though. Let’s say as a statistician I work together with a subject matter expert. The subject matter expert doesn’t like the result, supposedly not because goes it against what they hope for, but rather because it “can’t be true” based on experience in the field together with some more or less credible but imprecise information. At the end of the day I’m not able to assess this, but I’d normally try, at least if I have some trust in my collaborator and their competence, to come up with something that is acceptable to them (which includes discussing the “inner workings” of the models and the possibility that the result that the collaborator doesn’t like may make some surprising sense if considered with an open mind). By and large this may be a sensible approach, however it may well bias results to be more in line with the expert’s expectations than they should.

      This issue is by no means exclusive to Bayesian analysis, but the Bayesians have the prior as a very nice additional tool for inducing this kind of bias (and for improving the model, of course;-).

      • I agree with this. I would usually argue to use priors that have as their high probability density region something that includes absolutely everything the subject matter expert thinks could be true, and a little more, or even maybe quite a bit more in some cases. If the final results are insufficiently precise, the correct answer is to collect more data. Usually a researcher would only object to that answer when the data is extremely expensive, and/or there are big incentives to “get a result”. Fortunately for me, the people I’ve worked with usually don’t want to just “get a result” they want the right answer. Mileage obviously varies.

  7. I routinely see people using constraints during ML-estimation, at least in the case of (relatively) complex non-linear models. How worried are we that people are cheating by rigging those constraints? Whose worried about people rigging their analyses by a prior assuming zero measurement error (and pretending as though they can differentiate models in this errorless fantasy)? And all the other a prior assumptions already mentioned about measurement, structure of model, inclusion principles, outlier removal and so on.

    I really don’t get why the only a prior assumption that ever seems to worry people are the prior distributions in Bayesian analysis. All research is drenched in a prior assumptions. Propaganda against Bayesianism has cut deep. I guess it plays into fantasies about being objective.

  8. In my experience, the value of a prior is not to pull estimates *toward* some favored region, but rather to push estimates *away* from preposterous regions of the parameter space. Here, “preposterous” means that parameter combinations in those regions would yield predictions that are clearly out of whack (hence the value of simulating from the prior, as Garnett says).

    In practice, of course, it typically works out the same when you implement the model/prior, but I think this framing—in terms of avoiding obviously ridiculous parameter values—makes clear the role of the prior is not to cheat, but to make the game more fair. The fairness comes from the prior effectively restricting the space of models to just those that are plausible to begin with. It’s the difference between a cheetah beating a sloth in a race versus a cheetah beating a jaguar—only the second race, in which we have restricted our consideration to contenders each of which could plausibly win, is really fair and only in that case is the outcome of the race meaningful.

    This still leaves a role for subjectivity, of course: We might all agree that a model that predicts an average human lifespan of 1000 years should be given a low prior (though maybe not strictly zero if Methuselah complains) but what about one that predicts an average lifespan of 100 years?

    • Right, a Prior precisely defines the regions that are “worth considering”.

      Consider something like polynomial regression. Every smooth function over a finite region can be arbitrarily well approximated by a polynomial. If you are looking for say the weight in kilograms of sacks of coffee beans being loaded onto a freight train through time, to see if maybe there is some cheating going on under-filling the sacks, do you want functions that have values in the vicinity of 10^90 kg? How about -70000? Do you want functions that can oscillate in time at a rate of 70kHz?

      “Let the data speak for itself” in this kind of case is like saying “be totally irrational and consider negative weights and functions that imply some sacks contain neutron star material”

  9. “I’d like to believe that in practice it’s harder to cheat using Bayesian methods because Bayesian methods are more transparent.”

    Except for the simplest of models, this is generally only true if data and code are released. Otherwise we are back in nontransparent territory.

  10. Believe it or not, there’s even a recent paper centered on explicit advocacy of systematic prior riggging… to discredit a certain set of medical practices. Given the assorted scandals (Hammond, Goetzsche, PACE, Reuters+Mons. etc) and lack of a real prejudice control culture, I have no doubt that even this is common practice already, but it’s appalling that it can be suggested so openly and wonder how could that not get retracted yet. Too bad that it fits common prejudices nicely, so it’s a quite tricky issue; I’m looking for a solution without encouraging citations&c.

  11. I actually worry about this issue a lot in my own work. I’m using fairly complex models (parameterized Bayes nets) and I have a large number of priors based on loose expert opinion that the relationship between the two variables has a particular sign in the correlation. Because of all the conditioning that goes on the amount of data available to update one parameter’s posterior might be pretty small.

    I routinely do a check where I compare the prior to the posterior. If the posterior hasn’t moved much, I know that either the prior is too strong, there is not enough data, or the model has too many parameters. In any case it is something I would like to know about and do some more thinking before using.

    • Russell:

      Regarding comparing prior to posterior, see this suggestion:

      For each parameter (or other qoi), compare the posterior sd to the prior sd. If the posterior sd for any parameter (or qoi) is more than 0.1 times the prior sd, then print out a note: “The prior distribution for this parameter is informative.”

  12. I used to work at the economic research department in a large Southern European firm.
    Everyone there doing anything data related was a Bayesian. The motto ‘torture the prior until it “works”‘ was frequently heard.
    Keep in mind lobbying is not legal in Europe.
    We went by the books tho. Every technical decision was reported. At the very end of the memo. In the tiniest possible font.

Leave a Reply

Your email address will not be published. Required fields are marked *