Suppose I want to estimate my chances of winning the lottery by buying a ticket every day. That is, I want to do a pure Monte Carlo estimate of my probability of winning. How long will it take before I have an estimate that’s within 10% of the true value?

**It’ll take…**

There’s a big NY state lottery for which there is a 1 in 300M chance of winning the jackpot. Back of the envelope, to get an estimate within 10% of the true value of 1/300M will take many millions of years.

**Didn’t you say Monte Carlo only took a hundred draws?**

What’s going on? The draws are independent with finite mean and variance, so we have a central limit theorem. The advice around these parts has been that we can get by with tens or hundreds of Monte Carlo draws. With a hundred draws, the standard error on our estimate is one tenth of standard deviation of the variable whose expectation is being estimated.

With the lottery, if you run a few hundred draws, your estimate is almost certainly going to be exactly zero. Did we break the CLT? Nope. Zero has the right absolute error properties. It’s within 1/300M of the true answer after all! But it has terrible relative error probabilities; it’s relative error after a lifetime of playing the lottery is basically infinity.

The moral of the story is that error bounds on Monte Carlo estimates of expectations are absolute, not relative.

**The math**

The draws are Bernoulli with a p chance of success, so the standard error of the Monte Carlo estimator

is going to be its variance

for draws and a

probability of winning.

**Extra credit: Sequential decision theory**

How long would it take to convince yourself that playing the lottery has an expected negative return if tickets cost $1, there’s a 1/300M chance of winning, and the payout is $100M?

Although no slot machines are involved, this is the simplest framing of a so-called “bandit” problem. More sophisticated problems would involve several lotteries with generalized payout schedules that might be stateful or non-stationary.

Bob:

I know this is not the point of your post, but . . . if you want to estimate the probability of rare events, it makes sense to use a mixture of analytical and simulation approaches, as for example discussed in this article from 1998. A little bit of analytical work can make a huge difference in getting fast and stable estimates.

The point of the post, in case it was lost in the simple thought experiment, is that error guarantees in Monte Carlo methods are absolute, not relative.

For discrete outcomes, I talked about the effect on accuracy of marginalizing them out—this is huge for tail probabilities like in the change-point example in the Stan user’s guide chapter on discrete parameters.

And of course, you want to use real knowledge in real situations and not just rely on weak binary outcomes. Like our knowledge of astrophysics in esitmating whehter the sun will rise tomorrow. Or previous launch angle and speed statistics for evaluating whether a batter will get a hit in baseball.

Bob, I think there are some errors here…

first off, this equation is wrong:

var(x) = sqrt(p*(1-p)/M)

it’s sd(x) = sqrt(p*(1-p)/M) not var(x)

second of all, the sd(x) is actually related to sqrt(p*(1-p)) and the estimate is of p, so there is a *relative* error (it’s nonlinear)… the problem is that as x goes to 0, d/dx sqrt(x) -> infinity so even small relative errors get amplified by a big amount.

I think CLT is not broken here, but it is not helpful here because the Barry–Esseen bound basically give us no good guarantee. How would a Poisson approximation fare instead?

The moral of the story is that you should use importance sampling.

After doing some back-of-the-envelope calculations myself, I gotta respectfully disagree: in most situations, some scratching on an envelope suggests no more than nine ticket purchases are enough, provided we’re talking means. If you want an entire confidence interval under that line, some trial-and-error with a calculator suggests no more than 28 tickets. Doing it the proper way via integrals bumps up that number to 36.

I *can* think of a way to wind up in the millions-of-years zone, though it involves ditching Bayesian statistics for frequentism. Even then, it looks like you accidentally stumbled on a contradiction in frequentist statistics that allows for an answer much smaller than millions of years.

That was the TL;DR. The long, math-y version is here: https://freethoughtblogs.com/reprobate/2020/02/19/dear-bob-carpenter/

I think you misunderstood his goal. the goal wasn’t to find that the probability of winning is less than 0.1 the goal was to find the probability of winning p to within an error of less than 0.1*p

so if p = 1.0/300e6 ~ 3.3e-9 and you want the answer to within 3.3e-10 you will need to take many many samples.

Hmm. Just to make sure I’m understanding you correctly, are you suggesting that Carpenter wanted a confidence or credence interval such that, to within two sigma, the interval is [p – 0.1p, p + 0.1p], where p is the mean of the interval? I ask because neither Bayesian nor frequentist statistics promises the confidence/credible interval contains the true value at all, so a plain reading of your interpretation makes no sense.

Yes more or less, whether you consider 2 sigma or 1 sigma or whatever isn’t important, and neither is whether the interval is centered on the actual true value. the important thing is that he’s looking for an interval whose scale is about 10% *of the true value.* when the true value is small then the interval is also very small.

this is what he means by “relative” as compared with your calculation which is “absolute” size less than 0.1 which can be achieved with the couple dozen or hundred samples he mentions in the third paragraph

my reading is definitely correct in so far as the relative vs absolute error is involved. see the second paragraph where he says “Back of the envelope, to get an estimate within 10% of the true value of 1/300M will take many millions of years. “

10% of the true value of 3.3e-9 means an error width of 3.3e-10

I still think you’re somewhat confused. How would we know when to stop taking samples, in your scenario? If we knew our target error width was 3.3e-10, then we knew the odds of winning are 3.3e-9, and if we knew the odds of winning we wouldn’t need any samples at all. We’d already know the answer!

How about this interpretation instead: we stop taking samples when the confidence interval is of the form [0.9*x, 1.1*x], for some real value of x. Now we don’t need to know the true value at all.

Sure, or if the sd is 0.1 times the mean, whatever you use as your criterion, if it results in sampling until you are sure of the value to within 10% OF THE VALUE and the value is a tiny number, then you will take a huge number of samples. This is because for small p, the stddev is like sqrt(p*(1-p)/N) =0.1p, we assume p is small, we can neglect p^2 and get sqrt(p/N)~0.1p, square both sides, p/N~.001p^2, N ~ 100/p, if p on the order of 1/300M then N on order of 30 Billion

whoops, p/N~.001p^2 should read p/N ~0.01p^2

Cool, then we’re in agreement. I’ve worked out the Bayesian side of this, and the key inequality is

(sigma)/(A + b + n)*sqrt( A*(b + n)/(A + b + n + 2) ) > A and n >> b, and tidy up the math under those assumptions. Now you get

(sigma/T)^2 <= A = a + w,

where "w" is the number of lottery wins. This agrees with your frequentist version: if we take the (1,1) Bayes/Laplace prior, we need to win the lottery 99 times if sigma=1, or 399 if sigma=2. We don't care about the value of n, provided it swamps the prior. We can also derive this result from the first approach if we assume s is very large.

Unfortunately, none of it invalidates my blog post. There's a third way to clean up the inequality, by asserting A = 0. This converts it to 0 <= 0, which is true, and thus my subjective (0,1) prior doesn't need a single sample. Frequentist statistics demands a "reasonable" n, but for the typical sample sizes we'd use that still leads to a maximal likelihood of 0 and a confidence interval of 0 the vast majority of the time. That fits your parameters, as well as Neyman's demands for a frequentist confidence interval, so we can reach a conclusion in well under "millions of years."

Frequentism still winds up with two contradictory answers. Unlike Bayesian statistics, it can't pin the blame on the choice of priors, so this is a legit paradox.