Using simulation to do statistical theory

We were looking at some correlations–within each state, the correlations between income and different measures of political ideology–and we wanted to get some sense of sampling variability. I vaguely remembered that the sample correlation has a variance of approximately 1/n–or was that 0.5/n, I couldn’t remember. So I did a quick simulation:

> corrs <- rep (NA, 1000)
> for (i in 1:1000) corrs[i] <- cor (rnorm(100),rnorm(100))
> mean(corrs)
[1] -0.0021
> sd(corrs)
[1] 0.1

Yeah, 1/n, that’s right. That worked well. It was quicker and more reliable than looking it up in a book.

4 thoughts on “Using simulation to do statistical theory

  1. In your example, you produced samples of normal random variables which were uncorrelated. However, if they were normal with correlation rho, then the asymptotic variance would be (1/n)* [1-rho^2]^2, which depends on the true rho, but does in fact equal 1/n if rho=0.

    However, it can be shown that Fisher's z-transformation, (1/2)*ln[(1+r)/(1-r)] has asymptotic variance equal to 1/n regardless of the true correlation.

  2. … or even

    > sd( replicate( 1000, cor(rnorm(100), rnorm(100) ) ) )
    [1] 0.09903454

    nice to hear someone else jogs their memory this way

  3. In a similar vein, I tend to use simulation instead of more formal power tests. This has the benefit that since I am analyzing the simulated data in EXACTLY the same way as I will analyze the final data, I know that I haven't selected the wrong option when using power test software. Not only that, but I can get real probabilities of false-positive and false-negative for various size effects and inference methods.

    For example, what is the probability that the mysterious decision function f will say that two coins are different based on the outcome of n flips if they have probabilities of landing heads of p1 and p2 respectively?

    <pre>
    k1 = rbinom(10000, n, p1)
    k2 = rbinom(10000, n, p2)
    for (i in 1:10000) {
    z[i] = f(k1, n, k2, n)
    }
    mean(z)
    </pre>

  4. Current,

    Thanks for the correction. I'm off the hook in this particular case: the actual rho in our example was around 0.2, so 1-rho^2 = .96, which is close enough to 1 that I don't mind missing the factor. But in general your point is a good one–and also an illustration of the limitations of simulation!

Comments are closed.