Skip to content
 

Using simulation to do statistical theory

We were looking at some correlations–within each state, the correlations between income and different measures of political ideology–and we wanted to get some sense of sampling variability. I vaguely remembered that the sample correlation has a variance of approximately 1/n–or was that 0.5/n, I couldn’t remember. So I did a quick simulation:

> corrs <- rep (NA, 1000)
> for (i in 1:1000) corrs[i] <- cor (rnorm(100),rnorm(100))
> mean(corrs)
[1] -0.0021
> sd(corrs)
[1] 0.1

Yeah, 1/n, that’s right. That worked well. It was quicker and more reliable than looking it up in a book.

4 Comments

  1. current grad student says:

    In your example, you produced samples of normal random variables which were uncorrelated. However, if they were normal with correlation rho, then the asymptotic variance would be (1/n)* [1-rho^2]^2, which depends on the true rho, but does in fact equal 1/n if rho=0.

    However, it can be shown that Fisher's z-transformation, (1/2)*ln[(1+r)/(1-r)] has asymptotic variance equal to 1/n regardless of the true correlation.

  2. anon says:

    … or even

    > sd( replicate( 1000, cor(rnorm(100), rnorm(100) ) ) )
    [1] 0.09903454

    nice to hear someone else jogs their memory this way

  3. Ted Dunning says:

    In a similar vein, I tend to use simulation instead of more formal power tests. This has the benefit that since I am analyzing the simulated data in EXACTLY the same way as I will analyze the final data, I know that I haven't selected the wrong option when using power test software. Not only that, but I can get real probabilities of false-positive and false-negative for various size effects and inference methods.

    For example, what is the probability that the mysterious decision function f will say that two coins are different based on the outcome of n flips if they have probabilities of landing heads of p1 and p2 respectively?

    <pre>
    k1 = rbinom(10000, n, p1)
    k2 = rbinom(10000, n, p2)
    for (i in 1:10000) {
    z[i] = f(k1, n, k2, n)
    }
    mean(z)
    </pre>

  4. Andrew says:

    Current,

    Thanks for the correction. I'm off the hook in this particular case: the actual rho in our example was around 0.2, so 1-rho^2 = .96, which is close enough to 1 that I don't mind missing the factor. But in general your point is a good one–and also an illustration of the limitations of simulation!