I can see your point — and think it is a good one that we ought to take into account more often. But there are times when causality is an appropriate term — although we may not always be able to describe the causal “factor” precisely; what I am thinking of is that a combination of “subfactors” might validly be considered a “cause” — for example, the girder alone does not “cause” the building to stand (as you point out), but the combination of that girder and other materials, plus the way they are put together does seem (at least in some cases) to qualify as “the cause” of the building’s standing.

But in the “mutant gene” situation, the mutant gene itself may not “cause” disease, but the combination of the mutant gene with other gene variants or with environmental factors may “cause” disease.

]]>Several questions,

Is the following useful – are causes correlated with outcomes? (sometimes, usually, under what conditions Yes, under what conditions No?)

Causation seems relevant in a perturbed system, but what does causation mean in a stable or dynamic system? Loss of a girder can cause a building to fall, but its meaningless to say its presence ’causes’ the building to stand. Or a mutant gene can cause disease, but presence of a normal gene does not ’cause’ health.

Is dependency a more meaningful concept. A building depends on the girder; health depends on a normal (not mutant) gene.

Just thinking out loud…..

]]>Noah:

My point—and I think it’s an important one—is that, even if you’ve observed a correlation between x and y in the sample, and even if this correlation is statistically significant, it doesn’t necessarily follow that this correlation, or anything close to it, holds in the general population.

]]>Agreed. It seems like the crux of the joke here is to deliberately misinterpret the intended use of the word ‘imply’. The intention of the adage is to state a logical relationship, leaving the question of correlation open. It’s only when you start interpreting imply using the unintended meaning of “suggests” that it starts to be useless as a prescription.

]]>“Ordinarily, regressions reflect “mere” correlations, but Clive Granger argued that causality in economics could be reflected by measuring the ability of predicting the future values of a time series using past values of another time series. Since the question of “true causality” is deeply philosophical, econometricians assert that the Granger test finds only “predictive causality”.”

http://en.wikipedia.org/wiki/Granger_causality

Which means that “causation” in econometrics etc. is really just correlation pushed into the future (this is how the test works). That can change at the drop of a hat (hello, Taleb!).

So, yeah, be careful. Smith is obviously a True Believer in this stuff. But let’s not have slightly more inquisitive minds accepting the premise that we can properly PROVE causation in economics.

]]>As much as I am a bayesian, I don’t trust most people to put credible prior probabilities on those possibilities with only a correlation to go on.

Especially in genetics-culture-evolutionary-psychologicy-racistnomics pseudoscience circles, people have a tendency to underestimate “C could cause both A and B”. Just because one’s limited imagination cannot conceive of a confounding scenario (especially when you have a hypothesis in mind) doesn’t mean it’s not there.

]]>How about: B opposes or tends to reduce A; e.g. A = car accident, B = braking

]]>See the discussion on distance correlation above.

]]>“If A caused B, A will correlate with B.” is false. See for example y=x^2.

]]>The problem is affirming the consequent. “If A caused B, A will correlate with B.” This does not imply “A correlates with B, therefore A caused B.” There are lots of things that correlate with B, and A just happens to be one of them. Correlation is generally a necessary condition for causation (we’re not going down that rabbit hole) but it absolutely isn’t sufficient.

]]>http://www.scribd.com/doc/140999751/Non-Linear-Correlation-Coefficients

Causation requires knowledge of the conditional probability as well as the correlation. We propose a method of determining causation and compare it to Granger & Convergent Cross Mapping here:

http://www.scribd.com/doc/155404418/Causation

Sorry, didn’t know embedded links weren’t allowed. Here:

http://xkcd.com/552/

http://xkcd.com/925/

XKCD.com #552 and

There’s more, and certainly lots of joy for math and science geeks, but these two seem most appropriate to this thread.

]]>Glymour says in a recent interview: “And I mind the repeated attribution by philosophers of our work to Pearl. Pearl did a great deal, he is a first-class, original and imaginative mind, but until our work he had thrice (like St. Peter) denied that graphical models could have causal significance, and his development of prediction algorithms derives from algorithms in Causation, Prediction and Search.” http://www.kent.ac.uk/secl/philosophy/jw/TheReasoner/vol7/TheReasoner-7%2812%29.pdf

]]>Surprisingly hard to make sense of this brownian stuff. My biggest gripe is the E[XX’YY’] thing. I need to draw X and X’ from the same distribution and Y and Y’ from the same distribution. If you have a single sample, this means splitting your sample randomly in half for example, and then you have a 4 dimensional thing instead of a 2 dimensional one…. having a hard time getting a feel for what it does.

]]>This is news to me. I thought Glymour attributed causal graphs to Pearl in his critical review of murray’s bell curve book.

]]>I think it’s a bit more of a two way street than that. :-)

In a 1991 paper Sprites [1] acknowledges that their algorithm is similar to Pearl (and Verma’s) and that “[Sprites] used several of [Pearl’s] proof techniques in [Sprites’s] proofs.” (p16)

Although… Sprites does point out about Pearl and Verma 1990, that “the two main claims that [Pearl and Verma] make about patterns [completed hybrid graph] in this paper are both false.” (p15) and goes on to give counter examples…

[1] Spirtes, Peter. “Building causal graphs from statistical data in the presence of latent variables.” (1991).

]]>I wonder if Pearl would agree.

]]>Reading further on Distance Covariance etc, its definition in Wiki is a little un-motivated, but it has something to do with how far away from each other two randomly chosen samples are. There’s an equivalent formulation in terms of brownian covariance that seems interesting.

Intuitively that definition in terms of brownian motion is transforming your variables of interest through a randomly chosen function which produces a kind of standardized version of the variable of interest. It then takes Pearson correlation of those standardized versions. The Pearson correlation instead takes the correlation of the identity applied to those two variables of interest.

The process of “scrambling” your variables of interest through two different brownian motions (and then re-centering them around 0) produces two variables that, if they were independent, would look like two independent normals, but if they tend to move together, they’d look like something else. I think I’m going to simulate this in R to see how it works.

]]>It was the linear dependence parts that were fishy and reddish. ;-)

]]>Corey, thanks for this link to Distance Correlation. That seems really useful for realistic problems, but I don’t see any way it could possibly distinguish between

a <- rnorm(100)

b <- rnorm(100)

vs

a <- rnorm(100)

c <- f(a)

when f(a) fits exactly through the points (a,b) ;-)

The ultimate such function is a pseudo-random number generator, which is just a special function that maps the sequence 0,1,2,3,… to something that passes vast numbers of tests for random numbers in the interval 0,1 ;-)

dcor((1:1000)/1000,runif(1000))

[1] 0.0441

Still, outside of actually hand-constructing pathology like this, dcor is likely to be the way to go to detect dependence.

]]>Huh. That is a very nice measure. I don’t have a good intuition for it, but it worked very nicely on the simulated circle/quadratic data sets above. Thank you for pointing it out!

Gave a correlation of about .2 for the circle; .4 for the half circle, and .45 for the quadratic.

—

Personally, I feel that the correlations are all a bit lower than they ought to be; but it’s hard to say what they actually should be. For something like the quadratic (with no noise) it feels like we want to say that there is a perfect dependence relationship there, and get back a correlation number close to 1. Same for the half circle.

For the circle, it seems like we’d want the value to be a bit less than that. The dependence relationship is not deterministic; for any X value, there are two equally well fitting Y values, but only two such Y values.

But it’s not clear that there would be a reliable way to find a good fitting, arbitrary functional relationship between two data sets without overfitting the data, and always returning perfect fit. So with an added simplicity bias (and preference for linear, or at least monotonic, functions) maybe the distance correlation does something that is effectively close to that.

Does anyone have a better intuition for how the distance correlation works?

]]>Pearle’s framework, with all due respect, *is* a repackaging of Spirtes, Glymour and Scheines (… don’t believe me, check yourself – I heard this from experts in the field who know the literature really well, have deep respect for Pearle’s work, and have made important contributions themselves).

Recently Tom Claassen in the Netherlands developed some very impressive extensions to their framework

See his PhD thesis

]]>Pearl’s pet toolkit seems to need a bit more than just using a Bayesian network, I thought?

]]>Osti tabarnak, links all screwed up. Should be:

]]>Most of the discussion following the parent comment is a red herring; see what happens if you use <a href="http://en.wikipedia.org/wiki/Distance_correlation" instead of Pearson correlation. (The Granger test can be adapted to use it. Granger-causality basically just looks at the sign of the lags of cross-**correlation** maxima; which is to say, it’s correlation, not causation.)

]]>http://hardsci.wordpress.com/2014/08/04/the-selection-distortion-effect-how-selection-changes-correlations-in-surprising-ways/ ]]>

data is the phase angle of the cosine not the x or y values. runif(1000)*2*pi gives uniform angles on one rotation around the circle from which both x and y values are generated.

]]>But isnt that what you did? runif() defaults to [0,1].

]]>http://en.wikipedia.org/wiki/Anscombe's_quartet

In essence, even limiting oneself to the Pearson Product Moment correlation, correlation doesn’t imply correlation, let alone cause.

]]>Granger Causality is something I haven’t really looked into but the wiki article seems to believe it’s appropriate when one time series A provides predictive information about another time series B, separated by a lag. So I don’t think it’s applicable to determining that A which is not a time series is predictive about B through B being a complicated function of A.

Although you can construct things such as the circle which have very low correlation and are not complicated functions, the rapidly varying function example is interesting because it’s so relevant to “signal plus noise” which is the root of many statistical questions.

When we say Y = f(x) + epsilon, where epsilon is iid normal, we may as well be saying y = g(x) where g(x) = f(x) + q(x) and f(x) is slowly varying whereas q(x) is rapidly varying. Since we only ever measure a finite sample of points, and at a finite set of locations, to a finite precision, we can always construct a purely deterministic function which explains the data deterministically, even when we get several measurements at the same x value with different y values (since we observe only finite precision).

]]>Well, it depends on what you mean by “Physical Sciences”. Do you include biology and genetics? The bioinfomatics people use Bayesian networks all the time to tie together medical data and genetic features.

]]>I think cos(x)sin(x) will integrate exactly to zero over any two adjacent quadrants, so the sample correlation will be close to zero. Any single quadrant (e.g., both X and Y >0) will give a high correlation, and any pair of opposed quadrants (both same sign, or both different sign) will give some correlation.

]]>It turns out not; I wasn’t sure so I tested it in R. Here’s the code I used.

data = runif(1000)*2*pi

X = cos(data)

Y = sin(data)

cor(X, Y)

cor(X, abs(Y))

I got that cor(X, Y) = -0.01343885, and cor(X, abs(Y)) = -0.01451961.

Intuitively, this makes sense to me. When X is close to it’s mean, Y is far away from it’s mean. Alternatively, when X is far from it’s mean, Y is also far from it’s mean. Y is only close to it’s mean, when X is somewhat far away from it’s mean.

The same logic makes sense for Y = X^2, although there it might depend highly on what range you sample from. I’d expect no correlation if you took a fairly large uniform sample over [-1, 1], but a strong correlation over [0, 1].

]]>The correlation would still be high if you separately took the parts above & below the A axis? Or not?

]]>Hopefully, none.

The idea is to have a checklist of all possible reasons for a correlation to exist so that you can methodically consider each one.

]]>The relationship doesn’t even have to be that complicated.

If A and B were deterministically related by A^2 + B^2 = 1, then cor(A, B) for a large enough sample size would be close to 0, even though A and B are very clearly related.

I think Andrew Gelman has brought up this example before here.

]]>That list is so broad, I wonder what possibilities (if any) are not on it even?

]]>I’ve never liked a blanket saying of “Correlation is not causation” because that tends to serve as permission for intellectual laziness. I’ve always preferred to say something like:

If A correlates with B, then

Then A could cause B

or B could cause A

or C could cause both A and B

or it’s just a random fluke

or it’s due to incompetence

or it’s due to fraud or bias

or maybe a few other reasons.

That always struck me as a list I could work with, whereas the usual statements suggest I should just turn on the TV to see what’s on.

I guess there are two kinds of people: those who think there is too much inference in the world and those who think there is too little. I’ve probably been on both sides of the divide at different times, but I lean toward the latter prejudice now.

]]>Rahul,

I don’t want to misinterpret Andrew, but I think the key word here would be ‘necessarily’ (that is, correlation does not even necessarily imply correlation). Case example: Number of people electrocuted by powerlines vs. Marriage rate in Alabama . Someone else pointed to this as well i think.

This is kind of a humorous aside but nonetheless a spurious result at best. As i mention elsewhere, 2 factor correlations may oftentimes be uninformative. Alone, a singular association between some variable $latex a_1$ and response $latex b$ may be statistically strong. Add in all possible contributors $latex a_1, a_2, …, a_n$ to $latex b$ and given error $latex E$ and that association you once thought was truly meaningful may tend to lose its informative strength (aka ‘explanatory’ or ‘predictive’ power).

Anon makes a good point when one considers a smaller or at least simpler system (ie one which can be experimented). The likelihood of the relationship being meaningful by statistical result is probably higher (“probably”, assuming one knows or can know all of the variables in a system that affect $latex b$ above).

To me, in larger systems (econ, politics, epidemiology, etc), simple 2 factor (‘unmodeled’) associations should all be taken with ample doses of salt. I don’t feel this is a useful exercise anyway. If both variables are modeled however, and i understand the model, then i am more likely to (though not always) think that a conclusion with reference to the relationship is reasonably sound.

]]>What would Grainger Causality testing yield on that, I wonder? It ought to be able to identify the embedded causality in an oscillating function, I think.

]]>Out of curiosity, has Pearl’s causality framework been even applied to the Physical Sciences to yield an important result? So also any of the other techniques you allude to?

Because correlation is the brute force weapon I’ve always seen employed in practical papers.

]]>