We were looking at some correlations–within each state, the correlations between income and different measures of political ideology–and we wanted to get some sense of sampling variability. I vaguely remembered that the sample correlation has a variance of approximately 1/n–or was that 0.5/n, I couldn’t remember. So I did a quick simulation:
> corrs <- rep (NA, 1000) > for (i in 1:1000) corrs[i] <- cor (rnorm(100),rnorm(100)) > mean(corrs) [1] -0.0021 > sd(corrs) [1] 0.1
Yeah, 1/n, that’s right. That worked well. It was quicker and more reliable than looking it up in a book.
In your example, you produced samples of normal random variables which were uncorrelated. However, if they were normal with correlation rho, then the asymptotic variance would be (1/n)* [1-rho^2]^2, which depends on the true rho, but does in fact equal 1/n if rho=0.
However, it can be shown that Fisher's z-transformation, (1/2)*ln[(1+r)/(1-r)] has asymptotic variance equal to 1/n regardless of the true correlation.
… or even
> sd( replicate( 1000, cor(rnorm(100), rnorm(100) ) ) )
[1] 0.09903454
nice to hear someone else jogs their memory this way
In a similar vein, I tend to use simulation instead of more formal power tests. This has the benefit that since I am analyzing the simulated data in EXACTLY the same way as I will analyze the final data, I know that I haven't selected the wrong option when using power test software. Not only that, but I can get real probabilities of false-positive and false-negative for various size effects and inference methods.
For example, what is the probability that the mysterious decision function f will say that two coins are different based on the outcome of n flips if they have probabilities of landing heads of p1 and p2 respectively?
<pre>
k1 = rbinom(10000, n, p1)
k2 = rbinom(10000, n, p2)
for (i in 1:10000) {
z[i] = f(k1, n, k2, n)
}
mean(z)
</pre>
Current,
Thanks for the correction. I'm off the hook in this particular case: the actual rho in our example was around 0.2, so 1-rho^2 = .96, which is close enough to 1 that I don't mind missing the factor. But in general your point is a good one–and also an illustration of the limitations of simulation!