Juan de Oyarbide writes:

In Chapter 16 of the book “Thinking, Fast and Slow,” titled “Causes Trump Statistics,” Daniel Kahneman brings the differentiation between the use of statistical base rates and causal base rates in Bayes’ rule. Kahneman claims with a simple example that often, due to our logical human reasoning, we may not find the correct Bayesian mathematical model, and that depends on how the problem is presented to us. So he says that under some circumstances the omission of priors generates an overestimation of posterior probabilities.

I wonder if in the problem in question we actually have the same mathematical representation for either way the problem is presented, or there might be some model misidentification. I think the way information is brought could condition our understanding of the priors, and therefore the associated uncertainty (e.g., information on population probabilities with uncertainty on risk is not the same as having same probabilities on risk associated to each population and then equal population weights).

Oyarbide provides further details:

I found the problem online, I will share it below.

A cab was involved in a hit-and-run accident at night. Two cab companies, the Green and the Blue, operate in the city. You are given the following data: 85% of the cabs in the city are Green and 15% are Blue. A witness identified the cab as Blue. The court tested the reliability of the witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green?

Now consider a variation of the same story, in which only the presentation of the base rate has been altered. You are given the following data: The two companies operate the same number of cabs, but Green cabs are involved in 85% of accidents. The information about the witness is as in the previous version.

Kahneman writes:

The two versions of the problem are mathematically indistinguishable, but they are psychologically quite different. People who read the first version do not know how to use the base rate and often ignore it. In contrast, people who see the second version give considerable weight to the base rate, and their average judgment is not too far from the Bayesian solution. Why?

In the first version, the base rate of Blue cabs is a statistical fact about the cabs in the city. A mind that is hungry for causal stories finds nothing to chew on: How does the number of Green and Blue cabs in the city cause this cab driver to hit and run? In the second version, in contrast, the drivers of Green cabs cause more than 5 times as many accidents as the Blue cabs do. The conclusion is immediate: the Green drivers must be a collection of reckless madmen! You have now formed a stereotype of Green recklessness, which you apply to unknown individual drivers in the company.

The stereotype is easily fitted into a causal story, because recklessness is a causally relevant fact about individual cabdrivers. In this version, there are two causal stories that need to be combined or reconciled. The first is the hit and run, which naturally evokes the idea that a reckless Green driver was responsible. The second is the witness’s testimony, which strongly suggests the cab was Blue. The inferences from the two stories about the color of the car are contradictory and approximately cancel each other. The chances for the two colors are about equal (the Bayesian estimate is 41%, reflecting the fact that the base rate of Green cabs is a little more extreme than the reliability of the witness who reported a Blue cab). The cab example illustrates two types of base rates.

Statistical base rates are facts about a population to which a case belongs, but they are not relevant to the individual case. Causal base rates change your view of how the individual case came to be. The two types of base-rate information are treated differently: Statistical base rates are generally underweighted, and sometimes neglected altogether, when specific information about the case at hand is available. Causal base rates are treated as information about the individual case and are easily combined with other case-specific information.

Oyarbide writes:

My question is, are the problems mathematically indistinguishable? Because the first case we don’t have information about risk, so some prior should be incorporated before including population facts. My second questions is, is there such thing of a statistical base rate and a causal base rate? Shouldn’t we always write a problem based on causality and incorporate population information on priors?

My reply is that neither problem is fully mathematically specified; they both rely on implicit assumptions of independence or random sampling. So you can think of the problems as different to the extent that the different scenarios might bring to mind different models of departures from this unstated assumption.