When I said, “judge this post on its merits, not based on my qualifications,” was this anti-Bayesian? Also a story about lost urine.

Paul Alper writes, regarding my post criticizing an epidemiologist and a psychologist who were coming down from the ivory tower to lecture us on “concrete values like freedom and equality”:

In your P.P.S. you write,

Yes, I too am coming down from the ivory tower to lecture here. You’ll have to judge this post on its merits, not based on my qualifications. And if I go around using meaningless phrases such as “concrete values like freedom and equality,” please call me on it!

While this sounds reasonable, is it not sort of anti-Bayes? By that I mean your qualifications represent a prior and the merits the (new) evidence. I am not one to revere authority but deep down in my heart, I tend to pay more attention to a medical doctor at the Mayo Clinic than I do to Stella Immanuel. On the other hand, decades ago the Mayo Clinic misplaced (lost!) a half liter of my urine and double charged my insurance when getting a duplicate a few weeks later.

Alper continues:

Upon reflection—this was over 25 years ago—a half liter of urine does sound like an exaggeration, but not by much. The incident really did happen and Mayo tried to charge for it twice. I certainly have quoted it often enough so it must be true.

On the wider issue of qualifications and merit, surely people with authority (degrees from Harvard and Yale, employment at the Hoover Institution, Nobel Prizes) are given slack when outlandish statements are made. James Watson, however, is castigated exceptionally precisely because of his exceptional longevity.

I don’t have anything to say about the urine, but regarding the Bayesian point . . . be careful! I’m not saying to make inferences or make decisions solely based on local data, ignoring prior information coming from external data such as qualifications. What I’m saying is to judge this post on its merits. Then you can make inferences and decisions in some approximately Bayesian way, combining your judgment of this post with your priors based on your respect for my qualifications, my previous writings, etc.

This is related to the point that a Bayesian wants everybody else to be non-Bayesian. Judge my post on its merits, then combine with prior information. Don’t double-count the prior.

28 thoughts on “When I said, “judge this post on its merits, not based on my qualifications,” was this anti-Bayesian? Also a story about lost urine.

  1. I was not trained as a Bayesian, so please excuse this naive question – but it is something that keeps cropping up. This post makes me think about the value of the prior information represented by qualifications (degrees, pedigree, publications, etc.). Certainly I view some sources (e.g. Mayo) as superior to others (e.g. Surgisphere). But as we know, these pedigrees often do not represent accurate signals – they can represent willingness to play some game (bureaucratic or publication counts, for example) rather than real research quality. As you point out, judging the research on its own quality is a healthy caution to not rely too heavily on qualifications as a prior.

    My question is: doesn’t this render the definition of the prior as quite subjective? Deciding when qualifications are of value and when they should be ignored seems to be a choice – and one made on whatever subjective criteria a person chooses. I believe that has been a common complaint against Bayesian methods – that they are inherently subjective. I’ve seen Andrew repeatedly articulate the belief that explicit judgements about priors is superior to the implicit judgements that are made when prior information is ignored – and I agree with that. But if the choice of prior, as with how much to weight pedigrees, is so subjective, then I can see why we are not all Bayesians. Can someone help clarify the relationship between merits vs qualifications and Bayesian vs Frequentist methods?

    • I believe that has been a common complaint against Bayesian methods – that they are inherently subjective.

      Three people bet on a horse. One knows nothing, so just bets based on the name. The second has been watching all season, monitoring stats, jockey, etc. The third is involved in fixing the race.

      Why in the world would these three people assign the same probability? Complaining about that is insane.

      • Nice metaphor. And I understand and agree. But it doesn’t address the issue at hand: when should we use qualifications as relevant prior information? Personally, I would count a researcher’s pedigree (what degree they have, where they got it, where they work) as relevant to how much attention I pay to there research. I would also count their prior history – we’ve seen many researchers with great pedigrees who have done really sloppy (or worse) work. As suggested in the post, I also would consider their area of expertise and the area they are making claims about: having a Nobel winning economist making proclamations about vaccinations is not the same as their views on the Federal Reserve. However, I suspect that my choices about how to apply filters to their qualifications will differ from yours (and everybody else’s).

        Andrew’s conclusion is: “Judge my post on its merits, then combine with prior information. Don’t double-count the prior.” I take this to mean that the merit of some research counts more than the qualification of the researcher. While I agree, this advice is sufficiently vague as to offer little guidance. How do I know when to stop trusting Mayo (the health network, not the philosopher)? If I should view the medical advice first, before seeing its source, how am I to judge that advice?

        • You would first check for logical consistency. Then whether the premises seem plausible to you.

          Eg. “Driving a car cures cancer. Out of 25 non-drivers 28 got cancer, while only 1 out of 20 drivers got cancer.”

          First of all you cant have a subset be larger than the total. So there is a logical inconsistency.

          Second, the premise may have been plausible when cars first came out. But now we know many drivers who got cancer and many non-drivers who didn’t. So there is no way that is the correct explanation for whatever was really observed.

          That one was easy, but the same applies for finding more subtle logical inconsistencies and implausible premises. Usually the most time-consuming part is piercing the jargon and identifying the implicit assumptions.

          For example, I am a random person on the internet who said those vaccine passports would spread the virus even more, if anything, because intramuscular vaccines don’t give you mucosal immunity.

          Meanwhile the CDC, news and social media “experts”, and politicians all said otherwise.

          How is someone with no background in the area supposed to judge besides authority/consensus heuristics? Well, you could check the surprising premise. Look up mucosal immunity and prior vaccines, see if there is any controversy about the topic. Then look for mention of the topic by the “authorities”. Are they addressing it at all? Can you find anything about them measuring mucosal immunity in any person or animal after the vaccine? Or do they only talk about antibodies in the blood?

          It should start to look more and more like there is a problem with the argument. Ie, the premise that antibodies in the blood stop people from spreading a virus that replicates in the lining of the nose and throat.

          It does take more effort than authority/consensus heuristics, after all that is why we use those heuristics. But it can really be quite minimal in a lot of cases.

    • > Certainly I view some sources (e.g. Mayo) as superior to others

      I just experienced an interesting Bayesian inference that I had to update in light of new data! When I first read this, I assumed you were talking about statistician and sometime commenter here Deborah Mayo. Certainly, I view her as a source that is superior to others! So up the end of the quote above, my inference was consistent with the data. My inference was based on my prior experiences reading this blog, which helps me resolve ambiguous references in ways that are likely to be consistent with the intended meaning in context.

      It was only when I got to the “Surgisphere” mention that I realized you were referring to the Mayo Clinic, not Deborah Mayo (no relation, I assume?). Surgisphere would be very unlikely to come up if you were talking about Deborah Mayo, but would be very likely if you were talking about medical organizations. So I revised my inference about what you were referring to in light of the new data you provided.

        • Carlos – Haha well, usually I would be guilty as charged! Sometimes I like to read them backwards by first looking at the comments and then reading the post. But this time, I actually did read the post first. However, Dale’s first sentence sounded like it was going to be a stats question, so apparently my brain chunked the content of the post in favor of it’s default when seeing ‘stats stuff’ and “Mayo”.

  2. 500ml of urine sounds a lot, but I did some research (and by research, I mean a google search), and the average amount is 800-2,000 ml per day (https://medlineplus.gov/ency/article/003425.htm#:~:text=The%20normal%20range%20for%2024,vary%20slightly%20among%20different%20laboratories.). So, producing a half-liter sample seems fairly reasonable. You’d probably have to hold it in for several hours to produce it all at once, so I’d be annoyed if I had to do it twice.

    • Some people would be able to produce a half liter of urine in their first morning void. For those who could not, there is generally no need for the specimen to be produced in one void: it can be collected over in multiple voids over several hours or the course of the day.

    • Some say they need a liter full,
      Some say half.
      From what I’ve learned at school
      I hold with those who favor full.

      But if I had to do it twice,
      I think I know enough about myself
      To say that half a sacrifice
      Could also work
      And would suffice.

    • As someone who has to self-catheterise three times a day I can say that 500ml sounds about right. (I have several months’ data if anyone is really that interested as my urologist asked me to record this when I first started).

      I’m glad that no-one confused the third meaning of mayo – mixing mayo and urine makes me think of those urban legends about McDonalds.

      PS This post may or may not (mayo or mayo not?) be a piss-take.

      PPS I’m sure Paul Alper will let us know whether self-catheterise is a proper English verb, and if so whether it should be self-catheterise or self-catheterize.

  3. Although I firmly believe that my urinary incident at the Mayo Clinic is absolutely true, in keeping with statistical practice, here is some evidence to the contrary:

    1. I am self reporting and that is notoriously unreliable.
    2. Despite my criticism of the Mayo Clinic, I returned for subsequent treatment, an indication of support for the organization. And, I am alive to tell the tale.
    3. Recollect previous blogs having to do with a map confusing the alps with the Pyrenees and slaves rowing a ship. The fact that I have not listed a URL for either indicates unreliable recall on my part.

    Because most of the contributors to this blog do not watch Alex Jones and his commentary on Covid vaccines, the [medical doctor] Stella Immanuel reference may not ring a bell, so here it is:

    https://en.wikipedia.org/wiki/Stella_Immanuel

    “she also said Illuminati are using witches to destroy the world through abortion, gay marriage, children’s toys, and media, including Harry Potter, Pokémon, Wizards of Waverly Place and Hannah Montana. In another 2015 sermon, she said scientists are developing vaccines to stop people from being religious.”

    As I hope you can see, my urinary incident comment does not compare to her statements.

    • Yes, she definitely got a few things wrong there. Hannah Montana, for instance. The rest of it is plausible enough, although not necessarily true, but it’s ridiculous to think the Illuminati were involved with Hannah Montana. That was George Soros.

  4. If the post says to judge it on its merits, that is also information. It implies that the post may not be up to usual standards of quality/accuracy. (Or, at least that the variance is higher than normal.) If we’re being Bayesians, we should consider that too.

    • If someone says to judge a post based on the quality of its author and not on the post’s merits, that would clearly imply that the post isn’t very good. You believe that a request to judge a post on its merits rather than the quality of its author also implies the post isn’t very good. Therefore, we are pushed towards the conclusion that *any* statement about evaluating a post and its author implies that the post isn’t good.

      This comment, by the way, has the exact level of quality as the mean of all of my past works and accomplishments.

      • This makes me ponder the saying “don’t judge a book by its cover.” Often a lot of work goes into the cover: there’s often a picture, and (if we include the back cover and why wouldn’t we) some description of the book and maybe some endorsements and blurbs and such. Are we to completely discount all of that? It tells us nothing whatsoever about the contents of the book? What about the title, which is (normally) on the cover of the book, doesn’t that convey information? And the author’s name?

        I think “don’t judge a book by its cover” is bad advice. But “don’t judge a book _solely_ by its cover”, OK, I’m with you. Or..well, once you’ve actually read the book then the cover is irrelevant, so, OK, “don’t judge a book by its cover after you have read it”, I could get behind that.

        I’m not sure how relevant this is to judging a post based on whether the author tells you not to judge it based on the author. It’s thematically related, is all.

        • Proverbs can be antithetical, like “out of sight out of mind” and “absence makes the heart grow fonder,” “look before you leap” or “he who hesitates is lost.” In the case of “don’t judge a book by its cover” (don’t judge by appearances) we have “what you see is what you get” (appearances do in fact provide a lot of information).

  5. I don’t have anything to say about the urine, but regarding the Bayesian point . . . be careful! I’m not saying to make inferences or make decisions solely based on local data, ignoring prior information coming from external data such as qualifications. What I’m saying is to judge this post on its merits. Then you can make inferences and decisions in some approximately Bayesian way, combining your judgment of this post with your priors based on your respect for my qualifications, my previous writings, etc.

    It might be helpful to instead connect back to the earlier discussions of ‘why should a Bayesian randomize?’.

    Since a Bayesian agent can presumably just model the allocation process and get correct & more efficient inferences that way, why waste any time with randomization, which adds additional problems like imbalance? The justification is that it allows other agents to take the result at face-value, instead of trying to model whatever biased subjective process the first Bayesian agent did which ultimately produced their final published result.

    Like when we do in R set.seed(1234): there’s nothing intrinsically good about the seed 1234 (surely God loves the seeds 1233 or 1235 just as much?), other than it being a ‘standard’ seed which I use to assure anyone reading the results that I didn’t accidentally run random seeds until I got a ‘good’ one and so the results truly are ‘random’. (Similarly in cryptography, often you need a random number to serve as a constant, which one is unimportant, but it might be possible to bruteforce the choice covertly to do bad things, so you need ‘nothing up your sleeve’ numbers which are either obvious/unique, well-known, or provably the outcome of random choices you couldn’t know or influence – fancier equivalents of the stock market’s closing price or the next Bitcoin block hash.) These are not strictly necessary and may even be bad things (maybe the seed 1234 actually happens to be, by chance, a very misleading seed!), but those disadvantages are greatly outweighed by the greater utility of people taking the work more seriously, rather than looking at it, shrugging, and (often correctly) ignoring it.

    So when I read someone say “judge this post on its merits, not based on my qualifications”, I take them as saying that the post is self-contained, in the sense that a randomized sample or 1234-seed or nothing-up-my-sleeve number is self-contained: everything about its origins screened-off and unnecessary to model. The contents are such that they could have been written by the Prince of Lies himself, Lord Satan, but that would be irrelevant: whatever the contents are, they should make sense and be verifiable on their own. Nothing will stand on authority of the ‘because I said so’ or ‘I remember talking to X years ago and he said Y, no, X can’t confirm it, you’ll just have to take my word for it’.

    Which is true of the post in question. Gelman is not standing on his academic status when eg. he criticizes the writing as ‘word salad’: if it’s word salad, you can surely judge that for yourself just by following the link, it is irrelevant what priors Gelman brings to the reading or why he might be motivated to say that – it either is or is not word salad when you read it. Or if Ioannidis simply ignores the existence of political violence and organized conspiracies when taking a free speech absolutist position, that is clear from reading it, and Gelman has no particular special traits that you need to care about in evaluating his claim that Ioannidis ignores the most important parts of the context.

    • Indeed, these are all really good points. I often use a seed not 1234 but the unix seconds on the date I started writing the script:

      In Julia something like:

      secs = (Date(2023,09,09)-Date(1970,01,01))/Second(1)
      rng = Xoshiro(secs)

      There’s a clear reason to choose this, it’s reproducible, and doesn’t always use the same seed in case there’s something bad about some “standard” seed like 1, or 1234.

      Randomization makes the job of modeling easier, which is another reason to use it in Bayes. If you measure things with 5 or 6 different lab instruments, and you choose which instrument by random number generator, then you can automatically get some balance in your measurement errors and avoid biases among them. It makes the modeling process easier and more straightforward. It also makes it more convincing to others as you say.

    • Since a Bayesian agent can presumably just model the allocation process and get correct & more efficient inferences that way, why waste any time with randomization, which adds additional problems like imbalance?

      I consider this to be essentially wrong. While yes, you can model a more complete data generating process and de-bias causal effects with proper conditioning on confounders, the bayesian posterior induced by a particular dataset may just end up with a long anti correlated ridge between the causal effect and confounders.

      To make this more concrete, suppose you have two versions of a product page, where most of your customers arrive through affiliate “influencer” marketing links. You want to know which product page is better, averaged through the population. Would you serve one version of the product page through one influencer and one version of the product page through another influencer? Let’s say we do that, and account for the fact that some influencers’ viewers have a higher baseline purchase propensity than others. There are two parameters associated with each influencer link:

      1. x_i, influencer i’: audience propensity to purchase
      2. y_i, the quality of the product page associated with influencer i

      With diffuse priors, you examine the contours of joint posterior density f(x_i, y_i), you’ll end up finding a long anticorrelated ridge between each x_i, y_i; these data cannot resolve between the product page quality and the influencer’s audience susceptibility. With more data on the influencer upstream audiences, their purchase history on other product pages, and an assumption that your product page is like other product pages, you might be able to better resolve x_i and identify y_i. But good luck getting that data, and who knows about that assumption.

      Modeling does not solve all problems and the purpose of randomization is not “rhetorical”. It’s not in any sense more purely bayesian to act like you’re impotent with respect to data collection and try to solve all problems with provided datasets and thinking really hard. If you actually want want answers, sometimes you need to choose what data to collect and, even better, what data to produce.

      • I agree. The point of randomization is to try getting all the known and unknown factors to (at least approximately) cancel out. This make the phenomenon easier to model.

        Eg, modelling the behavior of a fair die is far easier than a weighted die because all the forces are equal for every orientation. Thus they cancel out.

        In principle, you can include these forces in your model. I’d say if you can really do that, then you would even have a deterministic model. The uncertainty would be all in measuring the input parameters.

        In practice, we don’t have that information. Often, we don’t even know all the information we would need to have.

        I’d like to see more about this idea that Bayesians don’t need to randomize though. I wonder if I’ve commented about it before on this blog and forgotten about it…

  6. Related to “Bayesian wants everyone else to be non-bayesian”, I wish to argue: 1. diagnostics then posterior 2. many non-Bayesian inside me, instead of one Bayesian. For 1, posterior should be fed into or compared with prior to test the consistency of model as a whole and to filter out extreme assumptions. 2 is on computational prior and its connection with rejection sampling + Theorem 5 from SBC test quantity paper. My point is great amount of subjectiveness is injected in the translation from discrete to continuous space as exemplified by Richard’s experiment here: https://threadreaderapp.com/thread/1701165075493470644.html . To elaborate:

    1. if five samples from two distributions normal(.5, .1) and beta(2,2) happen to be identical e.g. (.5, .4, .6, .5, .5) — does it matter what distribution it was sampled from? Meaning could we update our perception of prior toward pre-asymptotic computation, away from asymptotic pure math?

    2. how to reverse engineer prior? This started from the observation that extreme values from the fat tail of prior sometimes generate computationally challenging data sets (took more than 100 times longer to compute as reported https://discourse.mc-stan.org/t/using-narrower-priors-for-sbc/21709/5 and https://discourse.mc-stan.org/t/using-narrower-priors-for-sbc/21709/12 thought experiment by Andrew on prior predictive simulation of Poisson regression with wide prior on intercept giving nearly all zeroes data.

    This evolved to Theorem5 of https://arxiv.org/pdf/2211.02383.pdf paper saying:
    “An obvious application of Theorem 5 is … SBC checks can focus only on that data space of interest. This is a form of rejection sampling and can be practically useful if it is easy to formulate a criterion that constrains plausible real data sets but hard to construct a defensible prior distribution that would enforce this criterion implicitly. For example, prior information can be available on the plausible variance of an outcome across the whole population, which may be hard to express as a prior on coefficients associated with predictors”

  7. Argument from authority (argumentum ab auctoritate, aka appeal to authority) is a famous logical fallacy. But rationality should not be equated with logicality. There’s an area of epistemology called “informal logic” that addresses things like when logical fallacies may actually be rational, when viewed as inductive rather than deductive inferences. There’s literature in this area on using Bayesian inference to explore when and why logical fallacies may be justifiable. A few examples:

    Bayesian Informal Logic and Fallacy (Korb 2008)

    https://philpapers.org/rec/KORBIL (use DOI link for PDF from journal)

    A Bayesian Approach to Informal Argument Fallacies (Hahn & Oaksford 2006)

    https://www.jstor.org/stable/27653391

    The rationality of informal argumentation: A Bayesian approach to reasoning fallacies (Hahn & Oaksford 2007)

    https://psycnet.apa.org/record/2007-10421-007

    • Argument from authority and consensus heuristics are not only useful, they are a two of the primary modes of making decisions. The other is habit.

      If you reflect on it I’m sure you’ll agree that you only use rational reasoning for a small minority of the decisions you make. And this is fine, it would be a waste of energy to try figuring out everything for yourself. The problem arises when we forget why we do/believe something, then come up with some faulty post-hoc rationale to justify it.

Leave a Reply

Your email address will not be published. Required fields are marked *