Here’s a little problem to test your probability intuitions:

Posted on August 1, 2022 9:18 AM by Andrew

Ariel Rubinstein and Michele Piccione send along this little problem to test your probability intuitions:

A very small proportion of the newborns in a certain country have a specific genetic trait.
Two screening tests, A and B, have been introduced for all newborns to identify this trait.
However, the tests are not precise.
A study has found that:
70% of the newborns who are found to be positive according to test A have the genetic trait (and conversely 30% do not).
20% of the newborns who are found to be positive according to test B have the genetic trait (and conversely 80% do not).
The study has also found that when a newborn has the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.
Likewise, when a newborn does not have the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.
Suppose that a newborn is found to be positive according to both tests.
What is your estimate of the likelihood (in %) that this newborn has the genetic trait?

Just to clarify for readers such as myself who are overly familiar with statistics jargon: when they say “likelihood” above, they’re talking about what we could call “conditional probability.”

Anyway, you can check your intuition on this one. Tomorrow I’ll post the solution and get into some interesting subtleties.

P.S. Solution and discussion here.

70 thoughts on “Here’s a little problem to test your probability intuitions:”

Tamas K Papp on August 1, 2022 10:30 AM at 10:30 am said:

I never trust my intuition about probability, so I did the math. But my result takes about 6-8 lines of algebra so am looking forward to the intuitive solution. Anyhow, thanks for the nice distraction while Stan is running, I usually do meaningless stuff but this was instructive.

Reply ↓
gec on August 1, 2022 10:52 AM at 10:52 am said:

My response: We can’t know from the info provided.

The answer we seek is p(trait | A & B), which is p(A & B | trait) p(trait) / [p(A & B | trait) p(trait) + p(A & B | ~trait) p(~trait)]. We are told that the tests are conditionally independent such that p(A & B | trait) = p(A | trait) * p(B | trait) and p(A & B | ~trait) = p(A | ~trait) * p(B | ~trait).

The first thing to note is that we don’t know the base rate p(trait), except that it is low. This is the classic disease diagnosis example. At first, I thought it couldn’t be that simple. Surely, I said to myself (self, don’t call me Shirley), knowing p(trait | A) and p(trait | B) could help?

But no. Again, we need to know p(A | trait) = p(trait | A) p(A) / [p(trait | A) p(A) + p(trait | ~A) p(~A)]. Same for test B. But we don’t know p(A) or p(B), that is, the base rate at which the tests return positive results. So again we hit a wall.

Anyway, that’s my answer, as disappointing as it is. Fun to think about though!

Reply ↓
- Brian Gawalt on August 1, 2022 11:29 AM at 11:29 am said:
  
  This is where I landed after a minute or two. Specifically, I asked myself how my answer would change if:
  
  – traitless babies NEVER get positive results on A nor on B
  – traitless babies ALWAYS get positive results on both tests
  
  As you said, we don’t know whether either of those is/isn’t the case, and so we’re stuck until we can get that information.
  
  Reply ↓
  - Brian Gawalt on August 2, 2022 11:35 AM at 11:35 am said:
    
    OK, having read the follow up post, I see that I didn’t appropriately incorporate “the trait is very rare” into my thinking. If I had, I would have been able to rule out “traitless babies always (or frequently) get positive results on both tests,” and would have been able to keep thinking it through.
    
    Reply ↓
Avi on August 1, 2022 11:16 AM at 11:16 am said:

Call G the baserate for this condition.

If A picks up on p% of people with the condition and B picks up on q%, then A and B are true positive for p*q% of people with the condition
Conversely, if A is positive for fp% of people without the condition and B is positive for fq%, then A and B are false positive for fp * fq% of people without the condition

If G is the base rate of the condition, we have:
G*p/(fp * 1-G) = 0.7/0.3
G*q/(fq * 1-G) = 0.2/0.8

And from the above, we have AB true positive = G*p*q and AB false positive = (1-G)*fp*fq, so we need to find the ratio, or (G/1-G)*p*q/fp*fq

Simplify:

G/(1-G) = 0.7fp/0.3p
G/(1-G) = 0.2fq/0.8q

Invert:
4q/fq = 3p/7fp = (1-G)/(G)

Square all sides, but in the middle, multiply by 4q/fq instead:
16q^2/fq^2 = 3pq / 7 fpfq = ((1-G)/G) ^ 2
28q^2/3fq^2 = pq/fpfq = ((1-G)/G) ^ 2 * 7/12
(1-G)/G * 7/12 = (G/1-G)*p*q/fp*fq = our answer

For a quick test: G=1/100, we have 1M babies, 10,000 test positive for A (7,000 real, 3,000 fake), 5,000 test positive for B, 1,000 real, 4,000 fake. Probability of testing positive for A given you have the condition is 7k/10k = 70%, this doesn’t change if they test true positive for B first, so we have 700 babies who tested positive for both A and B. Probability of testing positive for A given you don’t have the condition is 3k/990k = 1/330, this doesn’t change if you tested false positive for B first so you have 4000/330 12.1212 repeating babies who test false positive for both A and B. The formula above ((1-G)/G * 7/12) given G=1/100 gives us 57.75 for the ratio, which is the same as the 57.75 I get by dividing 700 babies by 12.1212, so this checks out.

Reply ↓
Anonymous on August 1, 2022 11:21 AM at 11:21 am said:

My answer:
After getting the positive A result, we have a 70% chance of having the trait.
In this distribution, 14% (70%*20%) will be true positive cases where both tests are positive.
conditional on the A result being a false positive, we have a 30%*P(B false positive) chance of B coming up true.
We know that the total number of positives B has is <=5* the number of true positives (since 20% of B positives are true). If the total number of true cases is some small epsilon, this means the false positive rate for B is at most 4*epsilon, so our bound on the second case is 30%*4*epsilon~epsilon<<14%. So we should be close to 100% positive.

TLDR: For a rare trait, any macroscopic resolution on the test implies a very low false positive rate, so two independent tests being positive should get us close to 100%.

Reply ↓
Oscar Cunningham on August 1, 2022 11:56 AM at 11:56 am said:

The way I though about it was that both 70% and 20% are middling probabilities. Whereas we are told the base rate is very low. So both tests A and B must both have large odds ratios in order to lift the probability up from low to middling. So then if we apply a second large odds ratio to a middling probability we will get a large probability. The baby is very likely to have the trait.

Reply ↓
- Oscar Cunningham on August 1, 2022 3:12 PM at 3:12 pm said:
  
  Or alternatively, a more intuitive context: Imagine that there’s a sport that elects a Player of the Year every year. It turns out that 70% of the time the winner is the player who scored the most points. And 20% of times the winner is the player who defensively saved the most points. Then imagine some year the same player both scores the most points and saves the most points. Then that player is totally dominant in the sport, and is almost certain to win the vote.
  
  Reply ↓
- Leon on August 1, 2022 4:52 PM at 4:52 pm said:
  
  In this it’s like the classic “diagnostic testing for a rare disease” teaching example, but in reverse.
  
  The usual lesson is: good sensitivity and specificity + low base rate => only middling posterior probability.
  
  The lesson here seems to be: middling posterior probability + low base rate => good sensitivity and specificity.
  
  An extra element is the boost you get from combining truly independent pieces of information (which one imagines would be rare in real life).
  
  At first I thought the player of the year example was great, however a priori one would probably expect defensive and offensive abilities to be negatively associated rather than independent.
  
  Reply ↓
  - Nik Vetr on August 1, 2022 7:32 PM at 7:32 pm said:
    
    > At first I thought the player of the year example was great, however a priori one would probably expect defensive and offensive abilities to be negatively associated rather than independent.
    
    Yeah, I think the sports analogy is a bit tricky, because players in a league / team come to us pre-stratified. Playing sports at some level is a collider for offensive and defensive ability, inducing a negative association within-team, but both might share general “athleticism” causal factors, inducing a positive association in the general population. But maybe a sports analogy could be:
    
    There are two types of people in the world: professional basketball players (who are very rare) and everyone else. I have two tests that I can use to help ID these two types of people. The first involves a measuring tape — I test to see if they are taller than 7ft. The second is a Snellen chart — I test to see if their eyesight is sharp enough to successfully read the 8th row. The first test is great at discriminating bball pros from everyone else: in the pool of individuals who are >7ft tall, 70% are professional basketball players. The second test is also pretty good: 20% of the people with “good” 20/20 vision play professional basketball. In the general population, visual acuity and height are independent features; being tall means you’re no more likely to have good vision than if you were short, and having good vision makes you no more likely to be tall than if you had bad vision.
    
    I apply the test to an individual and find out that they are a tall person with good vision. What’s the probability they are a professional basketball player?
    
    (I think the hard part is actually coming up with “independent” traits to test, since people self-select into particular groups, and e.g. lots of health traits are confounded by stuff like childhood nutrition, or time spent playing games outside, or whatever. And I bet height is not independent of e.g. muscle fiber composition, or reaction time, etc. But that’s how I would try to structure a sports analogy, if it were me)
    
    Reply ↓
J. Cross on August 1, 2022 12:08 PM at 12:08 pm said:

We can’t answer this exactly without knowing precisely how small “very small” is but these are both really powerful tests and they are independent so the chance that the newborn has this trait ranges from >99% to virtually certain.

Reply ↓
Raghu Parthasarathy on August 1, 2022 12:16 PM at 12:16 pm said:

I don’t see why people are complaining that “the base rate at which the tests return positive results” is unknown. We’re told, aren’t we, that the trait is rare and what fraction of each test returns positive results, so these values (70 and 20%) *are* the base rates, aren’t they? Not exactly, of course, but very close, and close enough for anything we’d care about.

Anyway, I’ve sketched my answer… (I don’t think I should post it, to avoid biasing anyone, regardless of whether I’m right or not!)

Reply ↓
- Raghu Parthasarathy on August 1, 2022 3:55 PM at 3:55 pm said:
  
  Please ignore my comment…
  
  Reply ↓
Anand on August 1, 2022 12:18 PM at 12:18 pm said:

IMO, the answer is almost same ( around 90%) of the original % of the rare trait prevalence in the population.

Reply ↓
Greg Hamer on August 1, 2022 12:58 PM at 12:58 pm said:

First test says 30% chance it is wrong. Second test has 80% chance of being wrong. So for the independent tests to both be wrong is .3 x .8 or .24. So 76% chance of having the trait.

Reply ↓
- Anonymous on August 1, 2022 4:05 PM at 4:05 pm said:
  
  That’s what I thought too! People are doing some big brain stuff in the other comments, saying everything from “nearly 0%” to “nearly 100%” to “impossible to say”, but for the life of me I don’t see how your answer is wrong.
  
  To simplify even further: imagine only test A was done. It says right there in the problem statement that 70% of people who test positive in A, have the condition. So if the question had only test A, the answer would be 70%, right? Now there’s test B, which is further evidence for the person having the condition (since a B-positive gives a 20% chance to have it, and the positive rate for no info is “low”). So the answer must at the very least be higher than 70%, yes?
  
  Reply ↓
  - DJAD on August 2, 2022 8:41 AM at 8:41 am said:
    
    It’s incorrect because you’re mixing up the conditioning. 70% is the probability you test positive given you have the disease, but that is not the same as the probability you have the disease given you test positive (which is what the question is really asking for).
    
    Reply ↓
    - Anonymous on August 2, 2022 11:47 AM at 11:47 am said:
      
      That’s not what 70% means in the problem statement though. They say “70% of children who test positive, have the disease”. That’s P[disease|positive]=0.7.
    - DJAD on August 2, 2022 1:04 PM at 1:04 pm said:
      
      Actually you’re right. I’m the one who mixed up the conditioning, which explains why my answer below is wrong. Darn!
  - Anoneuoid on August 2, 2022 2:40 PM at 2:40 pm said:
    
    Think about it this way. After selecting for test A-positives, you will have 70% with the gene/disease and 30% without. The baseline is no longer 0.1% or whatever. It is now a large 70%.
    
    Then for test B-positives, you want to know what fraction will come from that 70% of A-positives vs the 30% of A-negatives. Where “D” is “disease” this is p(B|D) and p(B|!D), *not* p(D|B) and p(!D|B).
    
    Reply ↓
    - Anoneuoid on August 2, 2022 2:42 PM at 2:42 pm said:
      
      70% of A-positives vs the 30% of A-negatives.
      
      Lets call those disease-positives and disease-negatives instead. All tested positive on test A.
- Anonymous on August 2, 2022 8:02 PM at 8:02 pm said:
  
  We know that both tests come out positive. So how can one of them be a true positive and the other a false positive? It’s impossible. They’re either both right or both wrong.
  
  Reply ↓
Michael Weissman on August 1, 2022 1:06 PM at 1:06 pm said:

In the limit of small population prevalence x it looks like the odds answer is trivially 7/12x. So probability ((7/12x)/(1+(7/12x)) It’s fun that for small prior probability x the posterior probability gets close to 1.

Reply ↓
- Michael Weissman on August 1, 2022 1:11 PM at 1:11 pm said:
  
  Maybe best to explain. Call x the prior probability, which is also the prior odds because x<<1. Call ORA the odds ratio for a positive result of the first test and ORB the second. Since they're independent we just want x*ORA*ORB for the final odds.
  That's just x*ORA*x*ORB/x= 7/12x.
  
  Reply ↓
  - Michael Weissman on August 1, 2022 10:37 PM at 10:37 pm said:
    
    And, to be complete, the general formula just has odds (7/12) (1-x)/x for the same reason.
    
    Quite a few people here have gotten this right, but I’m surprised at how many wander around rather than just plugging and chugging with a standard Bayes problem. Bayes updates are always done via odds*likelihood ratio, and this is no exception.
    
    Reply ↓
Jim Nazium on August 1, 2022 1:15 PM at 1:15 pm said:

Here’s my math-free intuitive answer (since it’s a test of probability intuitions): Test B is so bad that the results should be interpreted the opposite way. That is, a “positive” result from Test B should be interpreted as a negative result, since there is an 80% likelihood that a newborn with a “positive” Test B does not have the trait. So you have one low-precision result suggesting she has the trait, and another, independent low-precision result that says she doesn’t have it. So you basically have no information from the tests. My assessment of the probability is the same as the population base rate, which we are told id “rare”, so on the order of 0.1%.

Reply ↓
- Jim Nazium on August 1, 2022 2:33 PM at 2:33 pm said:
  
  After giving it some more thought, and reading the other responses, I see why this is wrong. Drat! I think Nik Vetr below has it right.
  
  Reply ↓
Adam on August 1, 2022 1:19 PM at 1:19 pm said:

So if 100 people who have this condition are tested, and get yes on test A and yes on Test B. 14% will get our results.
If 100 people who do not have this condition are tested, and get yes on test A and yes on Test B, 24% of those will get our results.

Relative to the base rate I’d think we are 14/24* less likely to get our result, and if the base rate was 50/50, I would think we have the condition 14/38 times.

Reply ↓
Nik Vetr on August 1, 2022 1:31 PM at 1:31 pm said:

My intuition: since the tests are independent, we can just apply Bayes’ theorem twice, using the posterior of the first test as the prior for the second test.

Also easier here to work in the odds form of Bayes’ theorem, ie Pr(H1|E) / Pr(H2|E) = Pr(H1) / Pr(H2) * Pr(E|H1) / Pr(E|H2), ie the posterior odds are the prior odds times the likelihood ratio.

so suppose H1 = has the disease, and H2 = does not have the disease. E refers to either A or B, depending on which test we’re looking at.

So for test A:

0.7 / 0.3 = Pr(H1) / Pr(H2) * Pr(A|H1) / Pr(A|H2)

and for test B:

0.2 / 0.8 = Pr(H1) / Pr(H2) * Pr(B|H1) / Pr(B|H2)

Give independent tests, we also know by Bayes’ theorem:

Pr(H1|A,B) / Pr(H2|A,B) = Pr(H1) / Pr(H2) * Pr(A|H1) / Pr(A|H2) * Pr(B|H1) / Pr(B|H2)

We can sub in the first equation in for the first two multiplicands to get:

Pr(H1|A,B) / Pr(H2|A,B) = 0.7 / 0.3 * Pr(B|H1) / Pr(B|H2)

Then we can rearrange the second equation to be:

Pr(B|H1) / Pr(B|H2) = 0.2 / 0.8 * Pr(H2) / Pr(H1)

and substitute that in to get:

0.7 / 0.3 * 0.2 / 0.8 * Pr(H2) / Pr(H1) = 7 / 12 * Pr(H2) / Pr(H1)

Then if we want to convert back to a probability, we take this fraction and find the ratio of the numerator to the sum of the numerator and the denominator. But it depends on Pr(H2) / Pr(H1), which can also be written as (1-Pr(H1)) / Pr(H1), and we don’t know what Pr(H1) is from the information provided. But if it’s small, then Pr(H2) / Pr(H1) is large, so the odds of having the disease given two positive test results are large, and the probability of having the disease is close to 1.

Reply ↓
- Nik Vetr on August 1, 2022 2:09 PM at 2:09 pm said:
  
  And I guess the more verbal & frequentist intuition is — we filter the general population on the basis of a positive test result, taking the prevalence of the disease from very rare (eg <<0.01) to very common (0.7) or somewhat common (0.2) in each sub population (those that test positive for A or B, respectively). Since the filters are independent, applying them both consecutively will result in the twice-filtered population having the disease in very very high frequency.
  
  Reply ↓
  - Nik Vetr on August 1, 2022 2:35 PM at 2:35 pm said:
    
    (the implication of the first sentence being that both filters are very strong — sorry for the consecutive comments!)
    
    Reply ↓
Alex F on August 1, 2022 2:38 PM at 2:38 pm said:

I agree with the intuition of Anonymous, Oscar, J Cross, maybe others — the answer is *basically* 100%. Here’s my attempt at ex post mathematical formalization (which is maybe just a longwinded derivation of Michael Weissman’s formula, I don’t naturally think in terms of odds).

Let P be the share of the trait in the population; let TA<=P be the population share of true positives on test A; and let TB<=P be the population share of true positives on test B. The problem doesn't give us these numbers, but it tells us that P<<1. The problem does tell us that the false positive shares in the population, FA and FB, are given by FA = TA * 3 / 7 and FB = TB * 8 / 2.

The population share of double-positives who have the trait, call it Y, is Y = P * (TA / P) * (TB / P) = TA * TB / P.

The population share of double-positives who don't have the trait, call it N, is N = (1-P) * (FA / (1-P)) * (FB / (1-P) = FA * FB / (1-P) = (12 / 7) * TA * TB / (1-P) = (12/7) * Y * P * / (1-P).

We're interested in Y / (Y+N). For P small, we see that N is on the order of P * Y, and so Y / (Y+N) is approximately 100% as promised.

The exact formula for Y / (Y+N) would be 1 / (1 + (12/7) P / (1-P)), which if we plug in P/(1-P) ~ P gives an approximate probability of 1 / (1 + 12 P / 7), which in turn is approximately equal to 1 – 12 P / 7.

For P = 1%, we get posterior prob of about 98.3%; for P = .1%, posterior prob of 99.8%.

Reply ↓
Joshua on August 1, 2022 2:50 PM at 2:50 pm said:

> 20% of the newborns who are found to be positive according to test B have the genetic trait (and conversely 80% do not).

My intuition tells me that you shouldn’t be conducting test B.

Reply ↓
Dale Lehman on August 1, 2022 3:55 PM at 3:55 pm said:

OK, I’m willing to look foolish. By the way, I’ve always been very bad at these problems. But my intuition says that we can think of A and B as a combined test. Since there are independent, the chance of testing positive on both tests, given that you have the genetic trait is 70%*20% = 14% (it is fairly low because neither test is very good – one is modest and the other is terrible). If it is a rare trait, then the prevalence in the population is very small (5% or less). So, the sensitivity of the combined test is likely to be swamped by the false negative rate, making the conditional probability that a joint positive test is a true positive very small. So, I’m voting for a very low probability, not a very high one. What am I missing (and I am usually missing something)?

Reply ↓
- HTB on August 1, 2022 4:52 PM at 4:52 pm said:
  
  70% is the probability of having the trait given test A is positive, not the probability of test A being positive given you have the trait (I.e., Pr(Trait|A) not Pr(A|Trait)). Same for test B.
  
  Reply ↓
- Daniel Lakeland on August 1, 2022 7:31 PM at 7:31 pm said:
  
  Assume rare is quite small frequency in the population, lets say .001 or less
  
  Now after taking the test1 and getting a positive 70% of those people are actually positive. We’ve moved the needle from “there’s a 0.001 chance you have the disease” to “theres 0.7 chance you have the disease” that’s a very diagnostic test right?
  
  Now, the second test doesn’t do as good a job, but it still moves the needle from say 0.001 to 0.2
  
  So combining them together intuitively we winnow out the negative people very strongly, so it should be almost entirely people with the disease who get double positive tests.
  
  Getting the exact answer though requires doing a crap-ton of algebra. I started doing it and got bored. If I were going to work it out, I’d use maxima to do the algebra for me.
  
  Reply ↓
  - Dale Lehman on August 1, 2022 7:56 PM at 7:56 pm said:
    
    “Now after taking the test1 and getting a positive 70% of those people are actually positive. We’ve moved the needle from “there’s a 0.001 chance you have the disease” to “theres 0.7 chance you have the disease” that’s a very diagnostic test right?”
    
    So, let’s say there are 100,000 people – 100 have the disease. 70 of those will test positive. But if we test all 100,000, how many of the 99,900 without the disease will test positive? It doesn’t take much of a false positive rate to make the conditional probability of having the disease given a positive test quite small. Are we assuming there are no false positives?
    
    Reply ↓
    - Joshua on August 1, 2022 9:30 PM at 9:30 pm said:
      
      What am I not getting?
      
      > Are we assuming there are no false positives?
      
      70% of the newborns who are found to be positive according to test A have the genetic trait (and conversely 30% do not).
      
      Doesn’t that mean that 30% if the positives are false positives?
      
      > 20% of the newborns who are found to be positive according to test B have the genetic trait (and conversely 80% do not).
      
      Doesn’t that mean 80% of the positives with test B are false positives?
      
      Again, what am I not getting?
    - Anoneuoid on August 1, 2022 10:51 PM at 10:51 pm said:
      
      You are told probability of “disease” D given positive test A: p(D|A) = 0.7. Also probability of no disease given positive test A: p(!D|A) = 0.3. Then the same for test B.
      
      Finally, you are told that p(D) is small. Eg, p(D) = .001 and p(!D) = 0.999.
      
      You are *not* told p(A|D) or p(A|!D). However, using Bayes rule you have:
      
      p(A|D) = p(A) * p(D|A)/p(D) = p(A) * (0.7/0.001) = 700 * p(A)
      
      We also know:
      p(A) = p(D) * p(A|D) + p(!D) * p(A|!D)
      
      Plug that into the equation above and rearrange:
      p(A|D) = (700 * 0.999/0.3) * p(A|!D) = 2,331 * p(A|!D)
      
      Now expand p(D|A) into Bayes rule:
      
      P(D|A) = 0.7 = p(D) * p(A|D) / p(A)
      
      Where p(A) is expanded as already shown above. Then you see there is an equation of form:
      
      x/(x + y),
      
      Where x = 2.331 * y and the ratio, of course, equals 0.7.
      
      Hopefully, that helps you see how the probability of the positive test given disease can be 700x greater than a positive test in general. Further, it is 2,331x greater than a positive test in the absence of disease… Yet, probability of disease given a positive test is only 0.7, because there are so many more people without disease.
      
      tldr; This comment does not answer the original question.
    - Joshua on August 1, 2022 11:31 PM at 11:31 pm said:
      
      I did a search and found 34 mentions of “disease” in this thread; all 34 are in the comments and none are in the OP
    - Carlos Ungil on August 2, 2022 5:35 AM at 5:35 am said:
      
      > But if we test all 100,000, how many of the 99,900 without the disease will test positive?
      
      30
      
      > It doesn’t take much of a false positive rate to make the conditional probability of having the disease given a positive test quite small.
      
      Sure, but we’re being told that the conditional probability of having the genetic trait given a positive tests are 70% and 20%.
      
      The problem statement makes us focus on the wrong thing (the second test is worse) and extract the wrong conclusion (the second test is bad, worse than useless, we shouldn’t be doing it).
      
      It may help to start with a simpler case: two tests which are independent but otherwise equally powerful and the genetic trait is present in half the positives.
      
      If the trait is a one in a thousand occurrence one test improves the ratio present:absent from ~1:1000 to 1:1 and the second brings it to ~1000:1.
      
      The 12/7 factor in the solution of the original problem is a negligible adjustment.
    - Joshua on August 2, 2022 8:58 AM at 8:58 am said:
      
      Ah. So I think I see it now. As someone just texted to me:
      
      if the population is overwhelmingly negative but only 30% of the positives are false positives that suggests that the the actual false positive rate of the test is very low. Suppose you have a population of 1M and only 10 have the disease. If you end up with, say, 10 positives, then 7 will be true positives and 3 false positives. That is 3 false positives out of almost 1M actual negatives. Even on test B if you find 2/10 of the actual positives then you have 8 false positives out of almost 1M…
      
      … So if the baby was truly negative and tested positive on both tests that would be really low probability…
      
      So, then, to evaluate the statement that 30% of those who tested positive with test A were false positives you need to know how any were tested.
    - Joshua on August 2, 2022 9:26 AM at 9:26 am said:
      
      I hope I provided a good example of how a non-statistics literate person gets snagged by the base rate fallacy.
paul alper on August 1, 2022 3:59 PM at 3:59 pm said:

Back in antiquity, 2010,
https://statmodeling.stat.columbia.edu/2010/05/27/hype_about_cond/

Andrew wrote about “Tuesday’s Child,” a now-famous example in the literature
———————————————————————————————————————–
“I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?

The first thing you think is ‘What has Tuesday got to do with it?’ Well, it has everything to do with it.”
———————————————————————————————————————–

Andrew then went on to say, “That’s one reason I’m not a big fan of this sort of trick probability question: some of the most important parts of the problem are hidden, and the answer is typically explained in a way that avoids making clear the assumptions that are needed to get there.”
Is this current example of newborns similar to “Tuesday’s Child” in the sense that it is a “trick probability question” or, does it exemplify something deeper? If so, with regard to pedagogy, how are they similar and how are they different? Certainly, “Tuesday’s Child” is completely stated in very few words, whereas, today’s “little problem to test your probability intuitions” takes up quite a few lines of text–14 plus some two more lines of clarification regarding likelihood vs. conditional probability.

Reply ↓
Nick Adams on August 1, 2022 4:03 PM at 4:03 pm said:

We are given the sensitivities of test A and B (70% and 20% respectively) but not the specificities. This makes it impossible to work out the positive likelihood ratio (sensitivity/1-specificity) associated with each test which is what we need to know.
So can’t tell.

Reply ↓
DanR on August 1, 2022 4:05 PM at 4:05 pm said:

If you apply Bayes’ theorem a couple of times you get the simple relationship:

P(disease|A,B) = P(+disease|A) P(+disease|B) / [P(+disease|A) P(+disease|B) + p(-disease|A) P(-disease|B)]

From the data, this is (0.7 * 0.2) / (0.7 * 0.2 + 0.3 * 0.8), which I work out to be 0.14/0.38 or about 0.37

Reply ↓
abc on August 1, 2022 4:10 PM at 4:10 pm said:

Intuitively:

The prevalence of disease is stated to be low.
The test results are described in terms of positive predictive value (the probability of disease given a positive test result).

Positive predictive value of a diagnostic is sensitive to prevalence, in that
PPV/(1-PPV) = Post-test odds and:

Post-test odds=TPF/FPF x (pre-test odds), where pre-test odds is pre-test prevalence converted to odds, TPF is true positive fraction (i.e. sensitivity) and FPF is false-positive fraction (1-specificity).

The PPVs in the problem are reported on the same base probability of disease (base probability is very low), where a positive in test 1 results in 70% post-test probability of disease or a positive in test 2 results in 20% post-test probability of disease.

Either way, a positive result of either test will substantially alter (increase) the pre-test probability of disease (relative to the base probability) prior to the 2nd test being applied. For test 1 as the first test, a positive result will result in 70% disease prevalence that feeds into test 2. For test 2 as the first test, a positive result will result in 20% disease prevalence that feeds into test 1. The resulting PPV of either test (as test 2) will be substantially greater than either test as Test 1. The actual factor can be worked out, but the problem asked for an “intuitive” answer.

Reply ↓
Carlos Ungil on August 1, 2022 4:50 PM at 4:50 pm said:

The probability is 1-epsilon, with epsilon being of the order of magnitude of the rarity of the genetic trait.

Say that the prevalence is. For the first test if the probability of a positive when the genetic trait is present is p1 (unknown) and the probability of a positive when the trait is absent is x/(1-x)*p1*3/7. For the second test the probabilities are p2 and x/(1-x)*p2*4.

With the assumption of independence there will be x*p1*p2 true double-positives and (1-x)*x/(1-x)*p1*3/7*x/(1-x)*p2*4 = x^2/(1-x)*12/7*p1*p2 false double-positives. For each true double-positive there are of the order of x false double-positives – a number much smaller than one.

Reply ↓
- Carlos Ungil on August 1, 2022 5:16 PM at 5:16 pm said:
  
  I agree with Nik Vetr that an intuitive way to think of it is in terms of selective filters.
  
  The group 1 is very rare compared to the group 2 in the general population. Each filter applied results in higher relative presence of the first group. Say that the first test selects 70% of group 1 and 0.03% of group 2 and the second filter selects 100% of group 1 and 0.4% of group 2. Combining the filters will always be an improvement as their effects compound. The first one improves the ratio by a factor of ~2000 an the second by a factor of ~200.
  
  Reply ↓
Jack G on August 1, 2022 6:49 PM at 6:49 pm said:

The real travesty here is how I’m not confident in my manipulation of odds rather than probabilities because probabilities were all that were taught in my course (which, from what I’ve heard, is among the single most intensive undergraduate probability courses in the country–Wierman’s Intro Prob at Johns Hopkins).

As worked out by Nik Vetr and Carlos Ungil, I came up with the 7/12 ratio in the limit, but man, it was not worth several lines of symbol pushing just to come to the simply odds form of Bayes’s Theorem with a compound conditional.

I remember another prob prof mentioning that the book we were using for the course was updated from using odds to probabilities, and I thought “well of course, what is this the 1800s?”. How wrong I was–so much of what’s wrong with our interaction with probability and statistics comes down to the fact that probabilities are better viewed in a multiplicative setting, i.e. odds.

Reply ↓
Raghu Parthasarathy on August 1, 2022 8:40 PM at 8:40 pm said:

I like the last line of Alex F’s comment, giving the answers that result from particular values of p. (You should have asked for this, Andrew, to avoid people writing out solutions!)
Anyway, my scribbling + plugging in numbers agrees with Alex F’s numbers, and I will add 99.15% for p = 0.5%. No information is needed beyond what’s in the question.
It’s a nice puzzle!

Reply ↓
Nick Adams on August 1, 2022 11:35 PM at 11:35 pm said:

Ok, 2nd try now I’m awake.
We are given the PPV of 2 tests. The PPV depends on the characteristics of a test and disease prevalence.
Since disease prevalence is very low it implies that these tests have very good characteristics (ie high positive likelihood ratios). Since the tests are independent we can combine these likelihood ratios by multiplication giving an even higher number.
Hence the probability that the baby has the disease is very high, say >99%.

Reply ↓
Different Anon on August 2, 2022 12:19 AM at 12:19 am said:

Granted I have the advantage of reading a few other commenters’ posts before commenting but I have a different answer than everyone else so…

Everyone who said it’s very close to 1 is definitely correct, I like Oscar’s analogy, though the question seems a little confusing, contradictory, or ill-posed. My conclusion is the true population rate (ppop) barely matters on its own though it is already given as small and constrains the test results.

The positive rate of test A (pA) and likewise test B (pB) are almost all that matter (both are not specified, part of the question being confusing). It would be logically contradictory for those values to be too high however as say 0.7*pA must be less than or equal to ppop.

So from the original algebra I wrote down I can basically round off to the result being 1-(0.3*pA*0.8*pB) or 1-0.24(pA*pB). Note that this does not necessarily mean we can simplify pA*pB to (ppop)^2 given a low order of magnitude; the original problem leaves pA and pB undefined and while not larger they could be orders of magnitude smaller than ppop.

So for instance, still truncated in the end, if

ppop = 0.01
pA = 0.01
pB = 0.01
then we’ll call our result posterior = 0.99997551

If ppop above was say, 0.009 that only changes posterior to 0.99997556 hence rounding is acceptable

Though if
ppop = 0.01
pA = 0.0001
pB = 0.0001
then posterior = 0.999999997551

Reply ↓
Peter Dorman on August 2, 2022 1:21 AM at 1:21 am said:

If test A were the only one, there would be a 70% likelihood of a true positive given a test positive. This is true irrespective of the background prevalence rate. (We are not asking how likely it is that a baby would test positive in the first place.) So what is the added effect of the second test? (They are independent, so it’s simple.) Clearly being positive on the second test should increase your likelihood relative to being negative, so we expect a boost, but only moderate since B gives you only a 20% chance of actually being positive conditional on testing positive. So it seems to me we have a classic case of two independent binary probabilities, and their joint probability is given by 1 minus the product of their complements, 1 – (.3 x .8) = .76. I think the added information about the population is a red herring. It is of course extremely unlikely that a randomly chosen baby would test positive on both A and B, but that’s what we’re given.

Reply ↓
- Joshua on August 2, 2022 5:59 AM at 5:59 am said:
  
  Thank you.
  
  As I see it, the wording suggests the generic trait is a marker for a disease, but that’s not stated explicitly. My intuition is that test B really messus you up as a positive result for test B is more likely to be a false positive than a true positive (for the generic trait – and the correlation between the genetic trait and some assumed disease is never stated. Maybe it’s less than 100%).
  
  So I might be wrong but at least I’m not nuts.
  
  Reply ↓
  - Joahua on August 2, 2022 6:00 AM at 6:00 am said:
    
    Arrrgh. Not generic trait, genetic trait.
    
    Reply ↓
  - Chris on August 2, 2022 6:50 AM at 6:50 am said:
    
    I think that’s not relevant. The genetic trait is just that. it may or may not be associated with a disease – (from the scenario it seems most likely that the trait is disease/disorder-associated otherwise why would tests be developed?). But unless i’m mistaken it’s not a relevant aspect of the scenario.
    
    One of the issues around this sort of puzzle/scenario (and an issue with Internet discussions en masse!) is addressing what we feel are ambiguities. This is a issue with, for example, setting exam questions where the possibilities for ambiguous statements are manifold! So much internet discussions are predicated on ambiguities and it’s a marker of whether a participant is “debating” in good faith if they are willing to try to clarify what they mean (as I expect you know!).
    
    Reply ↓
David Thomas on August 2, 2022 3:40 AM at 3:40 am said:

Compared to all the other responses, I’m definitely under thinking this.

But, my “intuition” says one test has a 70% chance having the trait, the other while 20% is by default 80% they don’t have the trait.

So, if we look at 70% and 20% positive, they are essentially “pulling” one another. The 70% pulling the 20% up and the 20% pulling the 70% down.

Since they don’t correlate to one another (being positive on one doesn’t mean anything for the other test) then you can only compare them on a straight odds basis (I’m sure I’m not using the correct terminology).

So, my intuitive answer would this individual has a 45% chance of having the trait based on the two tests.

Reply ↓
- David Thomas on August 2, 2022 3:48 AM at 3:48 am said:
  
  The other way of looking at it:
  
  Test A says 70% positive.
  
  Test B says 80% negative.
  
  Sure, they say 20% positive on test B, but we have to acknowledge what that really means.
  
  You have two tests giving almost the exact opposite result. With test B being a larger chance of being negative than Test A has of being positive.
  
  Reply ↓
Chris on August 2, 2022 4:33 AM at 4:33 am said:

My gut feeling is that the likelihood of a false positive is the product of the false positives of each test – so 0.3 x 0.8. So likelihood that newborn has the trait is 76%. However there’s bound to be some “tricksiness” in the answer and that answer is probably too obvious.

Also I don’t really know what these statements mean:

The study has also found that when a newborn has the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.

Likewise, when a newborn does not have the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.

Again intuitively I feel like if test B is just a substandard version of test A then my calculation is correct. However if the two statements that I don’t understand are taken to mean that the tests are completely orthogonal then Carlos’s answer with the nice sporting example is appealing.

Reply ↓
Carlos Ungil on August 2, 2022 6:25 AM at 6:25 am said:

Another way to look at it, similar to other proposals but assuming for simplicity that the tests that the tests never miss an individual with the genetic trait. One can later consider how relaxing the assumption would affect the argument.

If we test thousands of babies and find ten positives in test A they are – ignoring sampling variability – all the seven babies who have the trait plus three random babies among the thousands who don’t. Test B finds thirty five positives: the seven babies who have the trait plus twenty eight random babies among the thousands who don’t. The seven babies with the trait will have two positives. The thousands of babies without the trait are _extremely_ unlikely to be among the 3 randomly having a positive in one test and the 28 randomly having a positive in the other test. We can expect that only the seven affected babies will have a double positive.

Reply ↓
Bouke on August 2, 2022 7:56 AM at 7:56 am said:

My first intuition told me that the probability is slightly below 50%. Then I realized my intuition is wrong, as it always is in problems like these. Further thought led me to believe the probability is near 100%, with the following reasoning:

Assume that in our population, 100 people have the trait.
Further assume that, for sake of easy computation, of these people 70 test positive with test A, and 20 test positive with B.
Due to the independence of the tests, this results in 14 (70 / 100 * 20 / 100) people with the trait testing positive for both A and B.

Now assume that in our population, 10.000 people do not have the trait. (So our population size is 10.100)
We know that 30 of them test positive with test A and 80 with test B.
Due to the independence of false positives of the tests, this results in 0.000024 (30/10.000 * 80/10.000) people without the trait testing positive for both A and B.

So the probability of having the trait is approximately 100% under my assumptions. Now, the assumption about the number of people testing positive with either test do not matter for the outcome, because it affects double true positives and double false positives equally. The assumptions about the number of people do matter, but the problem said the proportion of people with the trait is low, and under those conditions the probability is close to 100%.

Reply ↓
DJAD on August 2, 2022 8:32 AM at 8:32 am said:

I just applied Bayes Theorem (similar to gec’s answer). The result ends up:

p(T given A and B) = (0.7 x 0.2)p(T) / (0.7 x 0.2)p(T) + (0.3 x 0.8)p(not T).

If we assume p(T) = 0.01, the result works out to be roughly 0.6% which seems absurdly small to me, but I don’t see the reason why it’s wrong.

Reply ↓
Bimal Jain on August 2, 2022 9:53 AM at 9:53 am said:

We need to know the false negative rates for tests A and B to calculate likelihood ratios of these tests for a positive result, which is essential, I believe for making an estimate about the genetic trait when both tests are positive. For example, if the false negative rates are 70% and 20% respectively for tests A and B, then the likelihood ratios for both these tests are 1, making both these tests in combination or singly worthless for estimating the genetic trait with a positive result.

Reply ↓
misken on August 2, 2022 9:53 AM at 9:53 am said:

After trying to do the prob math and reading various answers, I resorted to simulation. Used a small prevalence value and realized that I really wanted to know what P(A+|t) and P(A+|~t) were (likewise for B) and it wasn’t clear to me that P(A+|~t) was 1-.7. Regardless, any middling values of getting a positive test result when not having the trait was going to lead to a whole bunch of false positives since the prevalence was so low. I tried various values of sensitivity for A and B (high, high), (high, low), (middle, middle) but no matter what, the vast majority (>99%) of those who were A+ and B+ were trait negative. My brain hurts. Looking forward to seeing what the right answer is.

Reply ↓
Josh on August 2, 2022 10:03 AM at 10:03 am said:

Here’s my go. First, I think the key to the problem is this sentence: “The study has also found that when a newborn has the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.”

So basically the two tests are testing separate things, let’s say for a 1 (versus a 0) on two separate parts of a chain of RNA. Let’s call these parts of RNA A and B. The first test tests for presence/absence on A. The second test tests for presence absence on B. If either A or B is 1 (versus 0), then you have the genetic feature.

Thus, there are four possibilities. A and B are present (so you have the feature), A is present but not B (so you have the feature), B is present (but not A) so you have the feature, or neither A nor B is present (so you don’t have the genetic feature.

Maybe getting this wrong but without thinking too hard, the probability of neither A and B nor A nor B is (0.3 * 0.8) * 0.3 * 0.8 = 0.0576. So if both results are positive, then there’s about a 95 percent probability that you have the genetic feature.

Reply ↓
Dean Eckles on August 2, 2022 4:08 PM at 4:08 pm said:

Perhaps one useful intuition comes from this case: There are no false negatives and B differs from A simply in adding more false positives. Then obviously B adds no information and P(T | A, B) = P(T | A).

Reply ↓
- Dean Eckles on August 2, 2022 4:15 PM at 4:15 pm said:
  
  Ah but I think I missed this second sentence! “Likewise, when a newborn does not have the genetic trait, a positive result in one test does not affect the likelihood of a positive result in the other.”
  
  Maybe interesting to think about this one-sided version I imagined though… Where P(A|B,T) = P(A|T) but not necessarily P(A|B, ~T) = P(A|~T).
  
  Reply ↓
AL on August 2, 2022 8:13 PM at 8:13 pm said:

The given numbers are expressed as PPVs. Assuming a rare disease with prevalence @ 0.002% (1:50,000) and sensitivity =~ specificity (both very high), the LR of the first test with a PPV of 70% is around 99999:1. The 20% PPV of the second independent test (same assumptions and prevalence), yields an LR of around 9999. Run this second test on the negatives from the first test, and you might think you would pick up 20% of the remaining positives. However, the prevalence among the remaining negative (test-one-depleted) population is now only 30% of 0.002% = 0.0006%, which drops the PPV on the negatives from the first test down to about 6% in this depleted population.

For example:
In 1,000,000 people, there are @20 gold-standard positives (0.002%). Test one picks up 70% of them (14). Test two, run only on the test-one negatives, picks up at most 20% of the remaining 6 (1.2 patients), but with the depleted positives from test one, it’s more like 6% of them (.36 patients). That’s a total of 14.36 TPs out of 20 known positives (combined PPV = 0.718) – not much of an improvement. The overall LR (if that’s what we care about) is still around 99999, but LR does not depend on prior probability, and is thus not as useful as the PPV.

Reply ↓
Dan on August 3, 2022 3:13 PM at 3:13 pm said:

It’s extremely likely, >99%. The probability of not having the trait (given two positive tests) is similar to the prevalence.

If P is the prevalence (and the test has high sensitivity), the probability of an individual having the trait is P and the probability of two false positives is roughly P^2.

The number of people who get a true positive on test A is similar to the number of people who get a false positive, and the number of people who get a true positive on test B is similar to the number of people who get a false positive. (Similar to here means within a factor of 4.) But with the two true positive groups those are the same group of people (basically, assuming high sensitivity), whereas with the two false positive groups those are different (independent) groups of people. So P vs P^2.

The relative sizes of the groups are actually 7:3 and 1:4, rather than 1:1, which makes this inexact. I think that makes it P vs 12/7 P^2. So within a factor of 2. That means that the probability of not having the trait (given two positive tests) is more like 1.7 * prevalence.

Does it matter if the sensitivity is low? Let’s say that each test has the same false negative rate, S. Then each of the four groups has about S*P of the population. Then being in both true positive groups has a probability around S^2 * P, and being in both false positive groups has a probability around S^2 * P^2. The S^2 cancels, so the ratio is still P:P^2. (Put that 12/7 factor back in to get precise again.)

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Here’s a little problem to test your probability intuitions:

70 thoughts on “Here’s a little problem to test your probability intuitions:”

Leave a Reply Cancel reply