Skip to content

My math is rusty

When I’m giving talks explaining how multilevel modeling can resolve some aspects of the replication crisis, I mention this well-known saying in mathematics: “When a problem is hard, solve it by embedding it in a harder problem.” As applied to statistics, the idea is that it could be hard to analyze a single small study, as inferences can be sensitive to the prior, but if you consider this as one of a large population or long time series of studies, you can model the whole process, partially pool, etc.

In math, examples of embedding into a harder problem include using the theory of ideals to solve problems in prime numbers (ideals are a general class that includes primes as a special case, hence any theory on ideals is automatically true on primes but is more general), using complex numbers to solve problems with real numbers, and using generating functions to sum infinite series.

That last example goes like this. You want to compute
S = sum_{n=1}^{infinity} a_n, but you can’t figure out how to do it. So you write the generating function,
G(x) = sum_{n=1}^{infinity} a_n x^n,
you then do some analysis to figure out G(x) as a function of x, then your series is just S = G(1). And it really works. Cool.

Anyway, I thought that next time I mention this general idea, it would be fun to demonstrate with an example, so one day when I was sitting in a seminar with my notebook, I decided to try to work one out.

I thought I’d start with something simple, like this:
S = 1/1^2 + 1/2^2 + 1/3^2 + 1/4^2 + . . .
That is, S = sum_{n=1}^{infinity} n^{-2}
Then the generating function is,
G(x) = sum_{n=1}^{infinity} n^{-2} x^n.
To solve for G(x), we take some derivatives until we can get to something we can sum directly.
First one derivative:
dG/dx = sum_{n=1}^{infinity} n^{-1} x^{n-1}.
OK, taking the derivative again will be a mess, but we can do this:
x dG/dx = sum_{n=1}^{infinity} n^{-1} x^n.
And now we can differentiate again!
d/dx (x dG/dx) = sum_{n=1}^{infinity} x^{n-1}.
Hey, that one we know! It’s 1 + 1/x + 1/x^2 + . . . = 1/(1-x).

So now we have a differential equation:
xG”(x) + G'(x) = 1/(1-x).
Or maybe better to write as,
x(1-x) G”(x) + (1-x) G'(x) – 1 = 0.
Either way, it looks like we’re close to done. Just solve this second-order differential equation. Actually, even easier than that. Let h(x) = G'(x), then we just need to solve,
x(1-x) h'(x) + (1-x) h(x) – 1 = 0.
Hey, that’s just h(x) = -log(1-x) / x. I can’t remember how I figured that one out—it’s just there in my notes—but there must be some easy derivation. In any case, it works:
h'(x) = log(1-x)/x^2 + 1/(x(1-x)), so
x(1-x) h'(x) = log(1-x)*(1-x)/x + 1
(1-x) h(x) = -log(1-x)*(1-x)/x
So, yeah, x(1-x) h'(x) + (1-x) h(x) – 1 = 0. We’ve solved the differential equation!

And now we have the solution:
G(x) = integral dx (-log(1-x) / x).
This is an indefinite integral but that’s not a problem: we can see that, trivially, G(0) = 0, so we just have to do the integral starting from 0.

At this point, I was feeling pretty good about myself, like I’m some kind of baby Euler, racking up these sums using generating functions.

All I need to do is this little integral . . .

OK, I don’t remember integrals so well. It must be easy to do it using integration by parts . . . oh well, I’ll look it up when I come into the office, it’ll probably be an arcsecant or something like that. But then . . . it turns out there’s no closed-form solution!

Here it is in Wolfram alpha (OK, I take back all the things I said about them):

OK, what’s Li_2(x)? Here it is:

Hey—that’s no help at all, it’s just the infinite series again.

So my generating-function trick didn’t work. Next step is to sum the infinite series by integrating it in the complex plane and counting the poles. But I really don’t remember that! It’s something I learned . . . ummm, 35 years ago. And probably forgot about 34 years ago.

So, yeah, my math is rusty.

But I still like the general principle: When a problem is hard, solve it by embedding it in a harder problem.

P.S. We can use this example to teach a different principle of statistics: the combination of numerical and analytic methods.

How do you compute S = sum_{n=1}^{infinity} n^{-2}?

Simplest approach is to add a bunch of terms; for example, in R:
S_approx_1 <- sum((1:1000000)^(-2)). This brute-force method works fine in this example but it would have trouble if the function to evaluate is expensive.

Another approach is to approximate the sum by an integral; thus:
S_approx_2 <- integral_{from x=0.5 to infinity} dx x^{-2} = 2. (The indefinite integral is just -1/x, so the definite integral is 1/infinity - (-1/0.5) = 2.) You have to start the integral at 0.5 because the sum starts at 1, so the little bars to sum are [0.5,1.5], [1.5,2.5], etc. That second approximation isn't so great at the low end of x, though, where the curve 1/x^2 is far from locally linear. So we can do an intermediate approximation:

S_approx_3 <- sum((1:N)^(-2)) + integral_{from x=(N+0.5) to infinity} dx x^{-2} = sum((1:N)^(-2)) + 1/(N+0.5).

That last approximation is fun because it combines numerical and analytic methods. And it works! Just try N=3:
S_approx = 1 + 1/4 + 1/9 + 1/3.5 = 1.647.
The exact value, to three decimal places, is 1.644. Not bad.

There are better approximation methods out there; the point is that even a simple approach of this sort can do pretty well. And I’ve seen a lot of simulation studies that are done using brute force where the answers just don’t make sense, and where just a bit of analytical work at the end could’ve made everything work out.

P.P.S. Tomorrow’s post: Deterministic thinking (“dichotomania”): a problem in how we think, not just in how we act.

P.P.P.S. [From Bob Carpenter] MathJax is turned on for posts, but not comments, so that $latex e^x$ renders as e^x.


  1. > could be hard to analyze a single small study, as inferences can be sensitive to the prior, but if you consider this as one of a large population or long time series of studies, you can model the whole process, partially pool, etc.

    Sounds like how Fisher should the hard question a good summary – good for what? – with sufficiency.


    Fisher in a 1935 paper read at the Royal Statistical Society. In discussing overcoming the preliminary difficulty of multiple
    criteria for judging estimates “better for what?” he argued Whatever other purpose our estimate may be wanted for, we may require at least that it shall be fit to use, in conjunction with the results drawn from other samples of a like kind, as a basis for making an improved estimate. On this basis, in fact, our enquiry becomes self contained, and capable of developing its own appropriate criteria, without
    reference to extraneous or ulterior considerations. … where the real problem of finite samples is considered, the requirement that our
    estimates from these samples may be wanted as materials for a subsequent process of estimation [combined somehow with results drawn from samples of a like e kind?] is found to supply the unequivocal criteria required.

  2. Dave says:

    The trick with embedding into a harder problem is finding the right embedding.
    In this case, sum_{n=1}^{infinity} 1/n^2 is the value of
    the Fourier series sum_{n=1}^{infinity} cos(nx)/n^2 at x=0.
    Now make an educated (or lucky) guess: this Fourier series converges
    to a locally polynomial function. Then work out which polynomial.

  3. Dave says:

    My previous comment was too hasty, making essentially
    the same mistake as Gelman’s generating function attempt.
    (I had misremembered the solution I read way back when.)
    You have to think of 1/n^2 as the *square* of
    a Fourier series coefficient 1/n.
    Then you figure out that sin(nx)/n
    converges to a piecewise polynomial function.
    Then you integrate the *square* of that function.

  4. Zad Chow says:

    Andrew, do you use or just use WordPress that’s self hosted? Asking because you should probably consider implementing some sort of LaTeX processor, KaTeX/MathJax, only requires adding one line of code to your head file.

    Might make a difference because it’ll be hard for some blog readers to read what you’re writing. I write and read in LaTeX, so not really an issue for me, but I remembered being overwhelmed when seeing scripts like that when I had no idea what LaTeX even was

  5. Ethan Bolker says:

    For the benefit of your readers: this is the Basel problem. I’m not surprised that Andrew couldn’t quite do it on the back on an envelope: “since the problem had withstood the attacks of the leading mathematicians of the day, Euler’s solution brought him immediate fame when he was twenty-eight.”

    If mathjax works here this is what a properly formatted solution looks like:

    \sum_{n=0}^\infty \frac{1}{n^2} = \frac{\pi^2}{6}

    (if I made no mistakes in the $\LaTeX$ since one can’t edit posts).

  6. Vince says:

    Interestingly, there’s a manuscript (that was published in the American Math Monthly in 2012) that takes the same generating function approach, reaches (unsurprisingly) the same roadblock, and then shows how to get past the roadblock using a couple of identities due to Euler! It’s online at

  7. Yuling says:

    Solving the problem in a more complicated space, however, might NOT always return the original answer. For example, the Riemann zeta function is -1/12 at 1 as in the complex space — but it does not mean 1+2+3+…=-1/12.

    In terms of statistics, such embedding seems to work almost always fine– redundant parameter in Gibbs sampling, simulated tempering, continuous model expansion, etc. But in principle, I suspect we can have similar discontinuity paradox too. (e.g., what if there is a phase-transition at exactly temperature 0 in simulated tempering? )

    • Andrew says:


      That reminds me . . . redundant parameter can work fine in Gibbs and Metropolis (as in my paper with David van Dyk) but fail in HMC. Which is funny, because HMC is typically much faster than Gibbs or Metropolis. It’s just that HMC is a global method and, in some way, takes the joint posterior density “literally,” whereas Gibbs and Metropolis are, in different ways, local methods and don’t care if the joint density is improper. It’s one of these “What you don’t know won’t hurt you” situations.

  8. Bob says:

    Some bloggers are comfortable with 1+2+3+…=-1/12.

    The above reference even states:
    The value −1/12 is really the right one and the rightness may be experimentally verified (using the Casimir effect).

    The question is really “How do we assign meaning to such an expression?” See the quote at the end of


Leave a Reply to Vince