An alternative Monty Hall problem. As with the usual Monty Hall problem, just set it up as a probability tree and it all works out

Johannes Fischer writes:

What is the optimal strategy in the problem posed on this tumblr post?

You are playing the Monty Hall problem. However, you secretly know one of the goats is the former pet of an eccentric billionaire who lost it and is willing to pay an enormous amount for its return, way more than the car is worth. You really want that goat. The host is unaware of this. After you pick your door, as is traditional, the host opens one door, which he knows doesn’t have the car. He reveals a goat, which you can tell is the ordinary goat and not the secretly valuable one. The host offers to let you switch doors. Should you?

I [Fischer] can’t wrap my head around whether the information gained in the second stage (seeing a goat revealed, but importantly, learning that the goat is not the one you want) changes the probability meaningfully from the original problem. I’m stuck between 1/2 chance that switching makes sense and 2/3 chance.

My reply:

As always, you can solve these problems by drawing a tree. Call the three outcomes g, c, G (for goat, car, and amazing goat). Your preferences are in the order G > c > g.
I can’t type the tree so I’ll show it in outline form.
Step 1 is which door you picked, Step 2 is which door Monty shows to you.

1a (probability 1/3): you picked g
Then in step 2, Monty can open c or G. You’ve already said that he won’t open c. So he will open G.
1b (probability 1/3): you picked c
Then in step 2, Monty can open g or G. You’ve already said that he doesn’t distinguish between the goats, so he will open g (with probability 1/2) or he will open G (with probability 1/2)
1c (probability 1/3): you picked G
Then in step 2, Monty can open g or c. You’ve already said that he won’t open c. So he will open g.

In summary, here are the possible outcomes:
(i) probability 1/3: You picked g, Monty opens G.
(ii) probability 1/6: You picked c, Monty opens g.
(iii) probability 1/6: You picked c, Monty opens G.
(iv) probability 1/3: You picked G, Monty opens g.

Now condition on the fact that Monty opens g. So you know it’s (ii) or (iv). So renormalize. Conditional on Monty opening g:
(ii) probability 1/3: You picked c, Monty opens g.
(iv) probability 2/3: You picked G, Monty opens g.

So, first off, you’re in great shape. You either have the car or the awesome goat. The second thing is . . . don’t switch, dude!

You said, “I’m stuck between 1/2 chance that switching makes sense and 2/3 chance,” but both those answers are wrong. Switching would clearly hurt you here.

P.S. The above description still makes it look kinda complicated–it’s super-direct when you draw the tree. I recently bought a tablet to help with my work, and I thought I’d try to draw the tree on the tablet, but it just came out as a messy scrawl.

It came out better when I sketched it on paper and then took a picture:

But for my workflow I’d prefer to do it all using the computer.

20 thoughts on “An alternative Monty Hall problem. As with the usual Monty Hall problem, just set it up as a probability tree and it all works out

  1. It has always surprised me that Paul Erdös wasn’t able to understand the Monty Hall Problem even after being shown the probability tee solution!

    • I always assumed that this was because Vazsonyi didn’t explain that the gameshow host always knows whats behind the doors and would never display the car?

      If the gameshow host just picks one of the two remaining doors at random, and could have displayed the car but by chance didn’t, then switching really is a 50/50 proposition.

  2. My first thought: “oh no, a third family Thanksgiving will now be ruined by the Monty Hall problem!”

    But then my second thought: Monty doesn’t know about the special goat, so the added wrinkle doesn’t change his behavior in any way. As in the standard problem, his ability to choose which door to open means that his unopened door has .66 of a car in it. So if you want the car you should switch but if you want the goat you shouldn’t. Pass the potatoes!

  3. > I [Fischer] can’t wrap my head around whether the information gained in the second stage (seeing a goat revealed, but importantly, learning that the goat is not the one you want) changes the probability meaningfully from the original problem.

    Nothing changes from the original problem other than (maybe) our preference. If we want the car we should switch – if we want the remaining goat we shouldn’t.

    —-

    For whoever wants to ruin their Thanksgiving, I find this problem much more interesting:
    “Mr. Smith has two children. At least one of them is a boy. What is the probability that
    the other child is a boy? Mr. Jones has two children. The older is a girl. What is the
    probability that the other child is a girl?”

    It seems that it was discussed here at least once: https://statmodeling.stat.columbia.edu/2010/05/27/hype_about_cond/ (I’ve not read the comments yet)

    • Carlos:

      I was curious so I just drew the trees, and both problems are then super-easy! No ruined Thanksgivings–as long as you’re willing to draw on your napkin:

      (For simplicity I’m assuming Pr(girl) = 0.5 rather than 0.488 or whatever.)

      • “Mr. Smith has two children. At least one of them is a boy. What’s the probability of mixed sex?”

        Super easy: 2/3

        “Mr. Smith has two children. At least one of them is a girl. What’s the probability of mixed sex?”

        Super easy: 2/3

        “Mr. Smith has two children. What’s the probability of mixed gender?”

        2/3?

        If the conditional probability is 2/3 in any case, wouldn’t the unconditional probability be the same?

        • The answer is of course that there is some overlap between the two cases, so one can work out the 1/2 answer for the unconditional case mathematically.

          What’s interesting about the problem is the mapping between model and reality. Why are we asking some particular question in the first place? How do we know what it’s taken as a given?

          The problem is often posed as “Mr. Smith says that he has two children and at least one of them is a boy”. Is that problem equivalent to the one above?

          Giving the same p(mixed sex)=2/3 answer for both “says at least one boy” and “says at least one of a girl” creates a real problem in that case as there is no overlap.

          Here one needs a tree representing the probability of Mr. Smith saying one thing, or the other, or maybe something else, to be able to answer the question(s) in a consistent way.

          The answer will depend on the assumptions and arguably the answer 1/2 is more “natural” than 2/3 for mixed sex (or 1/3 for same sex).

      • The combinatorics works. But what if the question is something like:

        “The Smith family has 2000 births in their family history. At least 1005 are boy-births. What is the probability the latest birth was a girl?”

        In that case your method becomes at least less (maybe in-? [I didn’t try]) tractable. After that the meaning of probability gets muddled. Is it that combinatorics one we can’t solve, or the result of our attempts to estimate/approximate it?

      • Andrew,

        I think this problem is more interesting than you’re giving it credit for. But not for the math, or because the tree is wrong etc.

        I think what’s interesting about it is that I think the reason that different people competent in mathematics and probability come to different answers is because they are implicitly formalizing the problem in different ways. The data generation process is important! And I think you’re ignoring that here; the problem is not evaluating the drawn tree, the interest is in determining which tree corresponds to the situation being described!

        Take your monty hall tree from the original post, you put a 0 beside the possiblities where monty shows a car. Why? Because the data generation process is that he knows which door has a car and he doesn’t open that one. Thats the crucial detail! Otherwise it changes the odds and the problem.

        In the boy/girl problems the process is actually unspecified. How did you come to know one is a boy or one is a girl? What is the actual situation by which you came to know that information? Without that information the problem is actually underspecified, there is no unique tree that you can draw. In my mind its a more ambiguous case of the “what is the probability of a random chord on a circle being larger than its radius?” It’s simply underspecified because “a random chord” does not actually specify a unique probability distribution. So the answer that you’ll come to depends on what you assume the data generation process is.

        I think this is the real reason mathematically competent people come to different answers on the boy/girl problem. Rather than formulating a well specified problem, they intuit an definitive version from the english sentences without being explicit about it, and think the other people’s intuited version must be wrong. That’s my impression anyway — the disagreement isn’t about evaluating the trees, its about interpreting the sentences in terms of actual data generation models. (Or more generally translating the ambiguous english sentence into a precise scenario if you prefer). And that this is a common mistake is interesting in my opinion! though it doesn’t need to ruin thanksgiving.

        • Some:

          I actually agree with you! I think that the best way to solve these problems is to draw the tree, and the most important reason for drawing the tree is that it forces you to make the assumptions explicit.

          If people are going to disagree about the problem, I think it’s much more productive for them to disagree about the tree–that is, to disagree about the mathematical expression of the scenario that was described in natural language–than to disagree about the answer. Drawing the tree expresses the key part of the reasoning, and if two people want to draw two different trees for the problem, then they can discuss and figure out where they are disagreeing about the formulation of the problem.

          This is a special case of the value of generative modeling. Rather than try to jump to the final inference, we set up a full generative model for the process and then see what flows from that.

          Generative modeling is important in both Bayesian and frequentist statistics!

          For Bayesian statistics, it’s obvious: without a generative model for the parameters and a generative model for the data given the parameters, you have no “prior” and no “likelihood,” thus no posterior distribution.

          But generative modeling is absolutely necessary for frequentist statistics too. You need the generative model for your data, along with the specification of your data processing and analysis as a function of potential data, in order to compute p-values, confidence intervals, and other frequency statements. That’s the point of our forking paths paper.

          As the saying goes, Bayesians are frequentists. Both rely on generative modeling, and generative probability trees are a clear way of expressing this.

        • Andrew, (sorry, it seems to many replies deep to respond directly on your message)

          I’m glad you agree, and I’m not surprised because you have done a good job in pointing out to me how useful modelling the data generating process is!

          But I’d still like to “criticize” you a bit here (okay, its not really a criticism, you’re free to focus on what aspects you like and find the most interesting; but I think that there are some points in here that deserve more focus when these problems come up as I think they go underappreciated and cause confusion. So I’m going to focus on them here myself). Take for example the previous discussion linked here (https://statmodeling.stat.columbia.edu/2010/05/27/hype_about_cond/), where you discussed briefly this comment (originally from Todd stark):

          “””
          Doesn’t this illustrate limits to the value of probability? It seems like more than a curiosity to me. If specifying a logically irrelevant detail changes the probability calculation, doesn’t that tell us that probability thinking is a relatively useless tool in situations like this? It is implicit that everyone is born on a particuar day, if specifying something we already knew changes the calculation, isn’t the calculation unreliable for decision making, for this class of situations?
          “””

          My answer here is, an emphatic no! what I think it illustrates is actually the importance of being clear in the data generation process, and the subtle differences between our intuitive and implicit models that we need to be careful to clarify when going to mathematical models!

          I’ll use the 2 children example given by carlos above only because the math is easier, but the point is the same. Let’s imagine two particular real life scenarios:

          Mr. Jones’ two children both go to the same school as your son. At some school activity they pair up boys and girls from across different classes and ages by drawing names from a hat, and your son is paired with Sally, who is Mr. Jones daughter. Your son later tells you about this at the end of the day because you’ve mentioned to him that the Jones have just moved down the street but you haven’t met them yet. What is the probability that Mr Jones’ other child is a boy? I think the best answer is ~1/2.

          Now, imagine as part of the activity the two children ask each other a few questions such as: Where does your age fall relative to any siblings you have? and Sally tells your son that she is the older one. Imagine he has to write this down as part of the school activity and show you the sheet of the activity they did that day.

          I think the best answer is STILL ~1/2. not ~1/3. because the ‘older’ had nothing to do with the data generation process. It really *is* an irrelevant detail which you happen to know but does not influence the probabilities of any step of what occurred in the slightest. Your son was equally likely to get paired with Sally, regardless of her age and you were equally likely to find out Sally was the younger/older regardless as well.

          Contrast this with:

          You work for the school board, you are looking over their database, and you select all families with exactly two kings and at least one boy. Ar first there is a bug in your code, and you accidentally select only families where the eldest child is a boy. Then you see for ~1/2 of the families the other child is a boy. Then you fix your code so it selects one boy properly, then you see for ~1/3 of the families the other child is a boy.

          Okay: what’s the point? My point is when Todd says “If specifying a logically irrelevant detail changes the probability calculation…” he’s actually missing the point! In the first scenario the age really is logically irrelevant to the probability because it doesn’t affect the data generation process in any way whatsoever. In the second its not “logically irrelevant” it is directly affecting the data you have selected, you’re selecting different populations, so its not surprising it changes something! It doesn’t matter that so-and-so is the older child if it does not affect the data generation process, but if it does, well then, clearly … it does! But the problem does not actually specify the data generation process, so they leave each person trying to answer it to assume either implicitly or explicitly whether it does or not. Often people probably hold both ideas in their head at once without explicitly realizing it.

          I think the strange feeling of “what is going on here” comes from the underspecification of the problem, which we allow our intuition to fill in the rest of, and we simply provide an answer (like 1/3, 1/2) as if it were “right” without realizing actually it is underspecified, and both answers can be correct for certain data generation processes (both of which may be very reasonable interpretations of the original problem). This is why I liken it to the chord on a circle problem.

          I think this point really deserves more attention and so this is my “criticism”. That what I think is actually the most interesting and critical part of these discussion isn’t being put into clear focus! But okay, maybe that’s just because others disagree that its the most interesting and critical part.

          (Sorry for the “rant”)

  4. Although I suppose your blogging software probably doesn’t support it, here’s [mermaid](https://mermaid.js.org/) code for this diagram:

    “`{mermaid}
    flowchart LR
    A:::hidden –>|⅓| B[g]
    A –>|⅓| C[c]
    A –>|⅓| D[G]
    B –>|0| E[c] ~~~ K[0]
    B –>|1| F[G] ~~~ L[⅓]
    C –>|½| G[g] ~~~ M[⅙] ~~~ Q[✓]
    C –>|½| H[G] ~~~ N[⅙]
    D –>|0| I[g] ~~~ O[⅓] ~~~ R[✓]
    D –>|1| J[c] ~~~ P[0]
    “`

  5. I put this one in family group-chat and it was interesting to work it through with them. No ruining Thanksgiving! The least intuitive part of this whole process is seeing how understanding sibling order changes your probabilities. So you have the set: {BB, BG, GB, GG}. Knowing you have at least one boy reduces the sample space to: {BG, GB, BB}, each of which must be equiprobable. The part there is seeing the subtle yet profound implications of knowing whether the first or second child is a boy as reducing the sample space further. This clashes with the intuition that we know they HAVE to be either first or second, so why does knowing which change anything?

    I think it helps to solve for the cases where N>2. Even N=3 makes it clearer where the various assumptions are coming in. So you could ask, conditional on one child being a girl, what’s the probability of two brothers versus two sisters (again, assuming sex is independent and 0.5)?

    As a side-note: I cued the N=3 case up to ChatGPT and it is absolutely hopeless at solving it. Produces a ton of what Nassim Taleb would call “verbalistic BS”, but kept getting it wrong, even as I fed it more and more critical pieces of the reasoning.

  6. > I think that the best way to solve these problems is to draw the tree, and the most important reason for drawing the tree is that it forces you to make the assumptions explicit.

    A tree-based solution can also provide an unwarranted appearance of thoroughness – the assumptions may not be laid out as clearly as they would in a symbolic approach. Many people would take the problem

    “Mr. Smith says that he has two children and at least one of them is a boy. What is the probability that the other child is a boy?”

    and draw the same tree that you did. They will get to the 1/3 solution and find it completely unobjectionable. They can’t see the assumptions through the tree.

    Writing down a detailed solution one has to confront that P(two boys | at least one is a boy) is quite different from P(two boys | he says at least one is a boy) and to get the 1/3 solution at some point one has to assign identical values to P(he says at least one is a boy | there are two boys) and P(he says at least one is a boy | there are one boy and one girl).

    If we want – as we should – the symmetrical problem “Mr. Smith says that he has two children and at least one of them is a girl. What is the probability that the other child is a girl?” to have the same solution 1/3 we will need

    P(says at least one boy | two boys) = P(says at least one boy | one boy and one girl) = P(says at least one girl | one boy and one girl) = P(says at least one girl | two girl)

    That’s not impossible (we can make all those probabilities 1/2) but it seems a quite unnatural assumption. Dropping the symmetry requirement is not satisfactory either.

  7. Although I think Carlos has the best insight (the problem doesn’t change, only our preference to avoid the car among the unrevealed doors), I found it easiest to think about it in terms of 6 hypotheses, representing the locations of the prize (p) and the Good Goat (gg) and their locations behind doors Di like this:

    D1=p, D2=gg
    D1=p, D3=gg
    D2=p, D1=gg
    D2=p, D3=gg
    D3=p, D1=gg
    D3=p, D2=gg

    Each has a prior probability of 1/6.
    There are two updates, one based on Monte’s choice of door, and one based on our recognition of the bad goat. We’ll label the door we chose as D1 and day that Monte opened D2 to show the bad goat. The first likelihood, L1, is the same as in the original problem; any hypothesis with the prize behind D2 is 0, and any hypothesis with the prize behind D1 is 1/2. The second likelihood, L2, is 1 if it implies the bad goat is in D2, and 0 otherwise. UP is the unnormalized posterior and Post is the posterior:

    D1=p, D2=gg: P=1/6, L1=1/2, L2=0, UP=0, Post=0
    D1=p, D3=gg: P=1/6, L1=1/2, L2=1, UP=1/12, Post=1/3
    D2=p, D1=gg: P=1/6, L1=0, L2=0, UP=0, Post=0
    D2=p, D3=gg: P=1/6, L1=0, L2=0, UP=0, Post=0
    D3=p, D1=gg: P=1/6, L1=1, L=1, UP=1/6, Post=2/3
    D3=p, D2=gg: P=1/6, L1=0, L=0, UP=0, Post=0

    So we can see that the probability of the good goat behind D1 (our pick) is 2/3 and the probability of it being behind D3 is 1/3.

  8. e 1 2 3 r
    1 g g c c
    2 g c g c
    3 c g g g
    4 c g g g

    e is event/game number, r is remaining closed door
    When car is behind door and that is the player’s 1st guess, the host can open door 2 or door 3, but not both.
    That would end the game. Thus a 4th game is required to open door 3
    A detailed analysis is here:
    https://drive.google.com/file/d/1BceEuNO0LIz6QjgGZeUG76hM-98mDDiR/view?usp=sharing
    That means all games have the same format and same frequency.

Leave a Reply

Your email address will not be published. Required fields are marked *