## Lottery probability update

It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that “the incident of six numbers repeating themselves within a month is an event of once in 10,000 years.”

I shouldn’t mock when it comes to mathematics–after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we’d proved, thus we thought was a theorem.)

So let me retract the mockery and move, first to the mathematics and then to the statistics.

First, how many possibilities are there in pick 6 out of 45? It’s (45*44*43*42*41*40)/6! = 8,145,060. Let’s call this number N.

Second, what’s the probability that the same numbers repeat in a single calendar month? I’ve been told that the Israeli lottery has 2 draws per week, That’s 104/12=8.67 draws per month. Or maybe they skip some holidays, so let’s say 100/12=8.33 draws per month. In either case, it’s 8 or 9 per month. Or maybe they’re using the Jewish calendar, in which a month has approximately 28 days. Let’s just assume that for simplicity, so there are 8 draws per month.

If the probability of winning is 1/N, and there are 8 draws in a month, the probability of no repeats is ((N-1)/N)*((N-2)/N))*…*((N-7)/N). So the probability of at least one repeat is 1 minus this. Plugging in the numbers gives 3.44*10^(-6), or 1 in 291,000.

Another way to do this “birthday problem” computation is to realize that, with 8 possibilities, there are 8*7/2=28 possible pairs, thus the probability of a repeat is approximately 28/N, which again comes to 3.44*10^(-6).

What’s the probability of this happening at least once in a year? A year has 12 or 13 Jewish-calendar months–I vaguely recall they have the extra month 7 years out of every 19?–so that would come to (12 + 7/19)*3.44*10^(-6) = 4.25*10^{-6}, or 1 in 23,500 years.

1 in 23,500 years, that’s close to the 1 in 10,000 that we’re looking for. But I forgot a factor of 2. We could have a match (two identical sets of lottery numbers) less than a month apart, but not in the same month? (For example, once at the end of January and once at the beginning of February, or the equivalent months in the Jewish calendar.) That would count as twice within a month, too. Counting these events gives you something close to another factor of 2. So that would get us down to 1 in 12,000 years, or maybe something like 1 in 150,000 if the factor of 2 is an overcorrection. One could do the calculation more precisely. But I’ll go with the 1 in 10,000 since that was what was reported by the Israeli mathematician.

But wait a second . . .How many lotteries are there out there? A quick Wikipedia search yields the tollowing:
– 62 international lotteries. I think I’m undercounting here because it looks like several countries have multiple “Pick m out of n” lotteries but I’m counting each country only once.
– 46 states or jurisdictions of the United States have lotteries. Some of these appear to be joint between states, however.
I think a safe approximate guess is 100 major lotteries worldwide.

These lotteries have different rules–some are more frequent than twice a week, some less frequent, some are easier to win than “pick 6 out of 45,” some are harder to win. But a quick calculation is that if the Israeli lottery will have a repeat in a single month, once in 10,000 years, that if there are 100 lotteries out there, you’ll see ” the incident of six numbers repeating themselves within a month” roughly once in 100 years. That’s the number I’d give if I were asked. To me, the 1 in 10,000 makes the event seem more rare than it is, given that there are so many lotteries out there. Maybe it makes sense to report 1 in 10,000 in an Israeli newspaper, but in a U.S. paper, I think the 1 in 100 number makes more sense, and also fits better with our intuition that rare things happen but that extremely rare things are extremely rare–unless there are a lot of chances for them to occur.

I quarrel not with the mathematics behind the “1 in 10,000 years” claim but with the implicit choice of reference set that includes only the Israel lottery and nothing else. (Hence my reference to the Bible Code, which had a similar problem with the reference set.)

P.S. There’s more here from Christian Robert, who reports that the 6 repeated numbers were actually drawn from 1 to 37, so that N is only 2,324,784 That gives us another factor of 3.5, which gets us down to approximately 1 in 30 years. Christian also points out that there’s no particular reason why we should care about repeats within a month. Consecutive repeats or repeats within a year or repeats ever, maybe, but there’s no particular reason to care about the month except that this is what happened to occur. One might as well have said something like “The probability of a repeat within the same week is only 1 in a zillion, and . . . once again, it didn’t happen!”

These problems are interesting to think about:

1. There is often confusion on the basic facts (were the repeats 6 numbers out of 45 or 6 out of 37?);

2. The definition of a reference set is central to statistical hypothesis testing;

3. News media often report these “one in a zillion” quotes uncritically, which I fear can degrade public understanding of probability. (See Kaiser Fung’s recent post on drug testing for more on the general topic of confusion about probabilities.)

1. Rahul says:

I think you are misreading Meilijson's statement: "the incident of six numbers repeating themselves within a month is an event of once in 10,000 years."

From the context it seems that he was clearly talking about the specific Israeli lottery which had the incident occur.

Or at the very least he gets the benefit of doubt.

Let's say, one day I drop my cellphone and it accidentally falls in my coffee cup. I say "Darn! What's the chance that this happens, rotten luck!"

Do I mean the chance all over the world or for the set events where I drop my cellphone?

2. Andrew Gelman says:

Rahul:

I agree; Meilijson's statement is fine (assuming it's pick 6 out of 45 rather than 6 out of 37, and I have no idea on that one). On the other hand, why is the Israeli-specific number relevant to the New York Times? If a newspaper asked me for the probability, I would report the less dramatic number because I think that is more relevant to the general question of unlikeliness. (See Christian's post for more on this.) But I agree that this is a question of statistics, not probability, and I was too hasty to mock in my original blog on the topic rather than to more carefully explore the issues.

3. Rahul says:

Andrew:

Yes, I can see your POV too.

Incidentally this reminds me of the Indian village that made the headlines for 250 sets of twins among only 2000 families.

Genetic explanation or a fluke of probability? After all, there must be millions of villages in the world! Would love a post on this!

4. Vincent says:

I understand some basic probability computations and concepts, but I often get hung up on what exactly "probability" is. That is, what it is exactly that a given real-world probability claim claims:

So for example, the probability of rolling "2" on a fair die is 1/6, likewise with 1/2 and "heads" on a fair coin. Got it.

But what exactly does it mean to say that the probability of two lotteries in the same month having the same six winning numbers is 1/10,000 years? (or, for that matter, 1/30?) Does that mean that if you repeat all the lotteries infinitely, only once every x years will you get this particular set of outcomes? (Plus, there are all the nuances mentioned in your post and in the comment above.) What should one infer from this (ie, how does on use this information)? Anyway, these are just recurring questions for me that this post brought up.

Are there any textbooks for probability/mathematical stats you'd recommend for an undergrad with a little bit of advanced math, introductory stats, and a (probably counterproductive) philosophical bent?

Many thanks!

5. Bill Jefferys says:

Andrew, you are correct that there are 7 extra intercalary months in a 19 year period on the Jewish calendar. This is a consequence of the Metonic Cycle.

6. Sanjay says:

Twenty years ago you could have made a career in psychology out of studying this and labeling it a cognitive error. Call it the proximity bias: Given a somewhat unusual event in a large reference set, there will always be somebody close to the event who, after the fact, will view it as a very rare event in a smaller reference set.

7. K? O'Rourke says:

Andrew "definition of a reference set is central" also to Draper et al's argument that exchangeability is primary to probability?

K?

8. anon says:

What's this fabled theorem you keep referencing? Is it available somewhere? It'd be instructive to see :)

9. Andrew Gelman says:

Anon:

I should hardly have to say this, but . . . google andrew gelman false theorem, and you'll find it easily enough.

10. anonimka says:

"In a surprise worthy of Derren Brown, the same six winning numbers have been drawn twice in a row in Bulgaria's national lottery.

The country's government ordered an investigation after the numbers 4, 15, 23, 24, 35 and 42 were selected, in a different order, live on television on September 6 and 10.

However police found no evidence of wrongdoing.

A Bulgarian lottery spokesman said: 'This has happened for the first time in the 52-year history of the lottery. We are absolutely stunned to see such a freak coincidence.'

Bulgaria's Sports Minister Svilen Neykov has now launched a probe into the draws, which were done by machine, on September 6 and 10.

Investigators are due to report back by the end of the week.

Mathematician Mihail Konstantinov said that the probability of this happening is about 4.2 million to one."— etc.

The back story is more interesting.
–6/49 in 3 weekly draws. The 1st and 2nd draw payout prizes for 3,4,5 and 6 correct. The 3rd draw pays out only for 6/49 (so called "jackpot draw").
–The numbers came up in the 3rd "jackpot draw" BOTH times. I think this is relevant.
–The incident happened during the week a newly elected government took over, and replacing a number of appointed admin positions, including the Director of the lottery.
–The investigation found no wrongdoing (of course)
–While no one won the 1st time, about 20 people won the 2nd time underlining that a lottery player should avoid "hot" numbers (as well as low digits)

What is the approach statistically here?

Probability of picking 6/49 is about 1/13 million. Let's call it "set X". Thinking the correct question here is what is the probability of having the sequence "Set X"-not Set X- not Set X- "Set X", or about 1/(13 mm)^2.

What do you guys think?