The above title is my response to a discussion that began with this email sent to be by Steve Roth:

Noah Smith had a great tweet recently, a real keeper for me [Roth].

Causation is correlated with correlation.

I would reword it:

Correlation correlates with causation. (Just not very much.)

And I wonder if the following corollaries are safe:

Non-correlation correlates (more strongly) with non-causation.

And/or:

Negative correlation correlates (much more strongly) with non-causation.

This in response to the old nostrum/saw that correlation does not imply causation.

Which has always seemed wrong to me. Of course it does! (Weakly.)

The problem is that “imply” is a very slippery word, so it’s a pretty useless nostrum.

Would be delighted to see a post poking at this.

I replied:

I will post something on this (at some point; we’re on a 1-2 month delay so most things don’t appear right away) but my quick response is: Selection bias. If people start sending you random pairs of variables that happen to be highly correlated, sure, there might well be a connection between them, for example kids’ scores on math tests and language tests are correlated, and this tells us something. But if someone is looking for a particular pattern, and then selects two variables that are correlated, that’s another story. The great thing about causal identification is that it’s valid even if you’re looking to find a pattern. (Not completely, there’s p-hacking and also you can run 100 experiments and only report the best one, etc., but that’s still less of an issue than the fact that pure correlation does not

logicallytell you anything about causation. To put it another way: returning to Noah’s tweet: Correlation is surely correlated with causation in an aggregate sense, but if you take the subset of correlations that a particular motivated researcher is looking for—then maybe not.You could also see the above paragraph as a bit of common-sense reasoning. The expression “correlation does not imply causation” is popular, and I think it’s popular for a reason, that it does capture a truth about the world.

I cc-ed Smith on this exchange and also Dan Kahan, who wrote:

For what it’s worth, my two variants would be:

1. Nothing other than correlation implies causation.

2. Correlation implies causation — except when it doesn’t.Credit to D. Hume for #1 (at least for noticing that there’s no other visible indicator of causation).

#2 is just what Andrew said: causation = correlation plus valid causal inference.

Again, the elephant in the room here is selection. People see enough random correlations that they can pick them out and interpret them how they like.

So if I had to put something on a bumper sticker (or a tweet), it would be:

Correlation does not even implycorrelation

That is, correlation in *the data you happen to have* (even if it happens to be “statistically significant”) does not necessarily imply correlation *in the population of interest*.

P.S. I’ve shifted the emphasis in my slogan to make the point clearer.

The flipside of your bumper sticker is that logical fallacy: serial killers have ears, I have ears, so I am a serial killer. You can identify specific characteristics of a subset and mistakenly attribute those to a larger set or different subset and you can identify characteristics of a larger set and mistakenly attribute those as specific to a subset.

I’ve never liked a blanket saying of “Correlation is not causation” because that tends to serve as permission for intellectual laziness. I’ve always preferred to say something like:

If A correlates with B, then

Then A could cause B

or B could cause A

or C could cause both A and B

or it’s just a random fluke

or it’s due to incompetence

or it’s due to fraud or bias

or maybe a few other reasons.

That always struck me as a list I could work with, whereas the usual statements suggest I should just turn on the TV to see what’s on.

I guess there are two kinds of people: those who think there is too much inference in the world and those who think there is too little. I’ve probably been on both sides of the divide at different times, but I lean toward the latter prejudice now.

That list is so broad, I wonder what possibilities (if any) are not on it even?

Hopefully, none.

The idea is to have a checklist of all possible reasons for a correlation to exist so that you can methodically consider each one.

How about: B opposes or tends to reduce A; e.g. A = car accident, B = braking

As much as I am a bayesian, I don’t trust most people to put credible prior probabilities on those possibilities with only a correlation to go on.

Especially in genetics-culture-evolutionary-psychologicy-racistnomics pseudoscience circles, people have a tendency to underestimate “C could cause both A and B”. Just because one’s limited imagination cannot conceive of a confounding scenario (especially when you have a hypothesis in mind) doesn’t mean it’s not there.

If we must have tweets and bumper sticker slogans about this, then I’d prefer that the tweets and slogans be phrased in terms of what should or should not be *inferred* than in terms of what is or is not *implied*. I’d also suggest that “Negative correlation correlates (much more strongly) with non-causation” might be an unsafe corollary because a negative correlation is only a positive correlation with the coding of one of the variables reversed: in terms of causal inference, there does not seem to be a difference between [getting more Y when there is more X] and [getting more Y when there is less X].

The word “imply” is, on its own, anything

butslippery (“weak implication,” whatever that is, on the other hand…).The adage simply means that if you’ve merely observed a correlation between X and Y, it doesn’t follow necessarily that X caused Y or that Y caused X.

Agreed. It seems like the crux of the joke here is to deliberately misinterpret the intended use of the word ‘imply’. The intention of the adage is to state a logical relationship, leaving the question of correlation open. It’s only when you start interpreting imply using the unintended meaning of “suggests” that it starts to be useless as a prescription.

Noah:

My point—and I think it’s an important one—is that, even if you’ve observed a correlation between x and y in the sample, and even if this correlation is statistically significant, it doesn’t necessarily follow that this correlation, or anything close to it, holds in the general population.

correlation doesn’t imply replication.

This causation stuff is misleading at best. Science deals with causation, statistics is about what can reasonably be inferred despite limited knowledge of causes (their nature or their parameters).

Here’s an example worth considering. The ideal gas law is PV=nRT. So Pressure is highly correlated to 1/Volume. The correlation is strongly predictive in that changing Volume changes the Pressure in just the way the correlation predicts (for constant T).

Yet pressure doesn’t “cause” volume. The “causes” at the quantum/atomic level are something extraordinarily different and unrelated to gas laws in the gross.

The easiest way to see how disconnected the “correlations” are from the “causes” is to note that the gas law was derived historically (and is easily derivable) using causal atomic models/dynamics vastly different than the currently accepted correct quantum mechanics.

In other words, you can get the “causes” radically wrong and still get strong correlations which are highly predictive.

In many such Physics settings is the direction of causation even very relevant? Does Volume cause pressure any more than pressure causes volume? All I care about is that there is a certain relationship between the two that is always obeyed. Whichever variable(s) are “slack” will change to compensate.

e.g. Does flow cause a pressure drop or does a pressure differential cause flow?

From a Newtonian Molecular Dynamics perspective (a simple approximation of QM), A particular set of inter-atomic potentials, and virtually any configuration of position and momentum of particles that has a given total energy implies that a short time later both pressure and volume will be known to high precision.

So, the answer is, neither “pressure causes volume” nor “volume causes pressure” is true.

How about from the perspective of experimental changes: we can change the potential energy field at the boundary of a bunch of stuff, whereas we can’t directly change the volume of the stuff (ie. by actually moving individual molecules on the inside of a material closer together or something). So pressure is more of an experimentally controllable parameter than is volume… except that you change the pressure on the boundary typically by moving some material into closer proximity, thereby reducing the enclosed volume… so even that perspective doesn’t help a lot.

So basically, a lot of the obsession with causation, at least in this context, is pretty futile?

In the sense that there’s a decoupling of the causes from the experimentally reproducible properties.

Let me put it this way: a correlation which appears to a social scientist/statistician to be “causal” is more likely just the result of a system which whose nature is largely independent of the real causes.

That is to say, if two statistics appear to be causally related to a statistician it’s probably not because they’re casually related at the micro level, it’s likely because they’d appear to be that way almost no matter what’s going at the micro level.

Typical causality stuff like Pearl talks about would say something like “if you set the volume to V, then you will observe pressure P” (do vol=V) or something like that. This is true for the ideal gas because there’s a fundamental cause (the interactions of the particles) which ensures that P and V remain related, so that the only way you can “do vol=V” is to interact with the system in such a way that P changes appropriately.

When it comes to stuff like Andrew studies or talks about, he is often careful to say that there could be more than one intervention that enforces the change. So if we’re interested in how say healthiness of hte population affects GDP, then we can increase healthiness by perhaps a variety of ways, such as providing certain kinds of health-care free, or subsidizing high quality diets, or vaccinating populations, or whatever. The system may or may not be sensitive to the *type* of intervention. If not, then you’ll get a “universal gas law” but if the system is sensitive to the type of intervention, then you will need more dimensions to fully describe the causality. THis is like saying for material “X” we need to know not just the volume it takes up, but something about the shape of that volume, or the material on the surface of the enclosing volume, or the time-history of the shape, or whatever.

“This is true for the ideal gas because there’s a fundamental cause (the interactions of the particles) which ensures that P and V remain related,”

That’s odd, because you can derive the ideal gas laws assuming no particle interaction.

Which illustrates my point nicely.

All this talk of “causality” by statisticians isn’t general enough to handle the ideal gas laws, which as reproducible science goes, easily stands comparison to anything social scientists have dreamed up.

“Ideal gas law assuming no particle interaction”

Minimally, you need to have an interaction between the particles of the gas and the particles that make the boundary container. Without an interaction with the container there is no “V” we can really speak of easily, and there is no P at the boundary either.

It’s not just classical statistical mechanics. The gas laws were derived by a great many “atomic” theories that seem hilarious today, or at least would if they hadn’t all been forgotten. There was quite an industry dedicated to inventing special atomic/kinetic theories back in the day which seems to have left no impact. It’s reminiscent of macroeconomics today.

Lest anyone think I’m only talking physics here: pressure and volume are gross “statistics” of the underlying state no different than (1) GDP, (2) percentage of female births, (3) GRE scores by race, (4) blood pressure or (5) just about any variable considered by political scientists, psychologists, or medical doctors.

I’m no Noah Smith, but on June 5th I tweeted (you can look it up!) “Causation causes correlation, and correlation correlates with causation. Does that clear it up?”

Jonathan:

I think that if you could look at all possible correlations, they might well be correlated with causation (if this is all defined carefully). But as noted above my problem is with selection bias. There’s no reason to think that particular subsets of

reportedcorrelations are correlated with causation.Correlations do not imply implications

I like how this sounds. Between this and Andrew’s wording, I’d say less formally “2 factor correlation does not imply an informative association”.

I agree with the idea that selection bias of correlates is a typical problem (I rail alot against this in my own work – nevertheless, it persists like the plague). I think though, your idea of likelihood of association given random selection of correlates would only be in any way informative if the factors chosen were part of the same system (i.e. mortality among some population in a certain geographic area).

Still, how will you know which correlates (let’s stick with 2 factors for now) are truly informative and those which are not? Variables that are unmodelled in a higher dimensional system of independent variables against an outcome are quite often passed as uninformative once included as a variable among others. Their explanatory power diminishes when considering a set of all possible explanatory variables (and inter-variable interactions)of a system that defines an outcome/dependent variable.

I think the acid test of the magnitude of information a single correlation brings is primarily dependent upon 1) the scope of the system under study (the number $N$ of all possible independent variables $x$ that can together explain $Y$ with some degree of uncertainty $E$), and 2) the relative difference between the modelled and unmodelled relationships between correlates.

This isn’t anything new, but I think is often missed.

Isn’t Andrew’s

“Correlation does not even imply correlation”somewhat nihilistic? In the sense, you could as well say“*Nothing* implies correlation (in the population of interest).”So also, correlation may not imply causation, but if so, what does?!

Rahul,

I don’t want to misinterpret Andrew, but I think the key word here would be ‘necessarily’ (that is, correlation does not even necessarily imply correlation). Case example: Number of people electrocuted by powerlines vs. Marriage rate in Alabama . Someone else pointed to this as well i think.

This is kind of a humorous aside but nonetheless a spurious result at best. As i mention elsewhere, 2 factor correlations may oftentimes be uninformative. Alone, a singular association between some variable $latex a_1$ and response $latex b$ may be statistically strong. Add in all possible contributors $latex a_1, a_2, …, a_n$ to $latex b$ and given error $latex E$ and that association you once thought was truly meaningful may tend to lose its informative strength (aka ‘explanatory’ or ‘predictive’ power).

Anon makes a good point when one considers a smaller or at least simpler system (ie one which can be experimented). The likelihood of the relationship being meaningful by statistical result is probably higher (“probably”, assuming one knows or can know all of the variables in a system that affect $latex b$ above).

To me, in larger systems (econ, politics, epidemiology, etc), simple 2 factor (‘unmodeled’) associations should all be taken with ample doses of salt. I don’t feel this is a useful exercise anyway. If both variables are modeled however, and i understand the model, then i am more likely to (though not always) think that a conclusion with reference to the relationship is reasonably sound.

Someone should let Richard-the-king-of-correlation-Florida know about this discussion. I think every article or blog post I read from that guy presents correlations, caveats that correlation does not imply causation, and then goes ahead and implies causation anyways.

That’s my point. People selectively use the

“Correlation is not causation”slogan whenever they do not like the causation being implied.Do you have a counterexample of someone claiming causation and supporting it by evidence stronger than mere correlations? The alternative is to say that we can never prove causation anyways so might as well never use that term?

The strongest evidence for causation is more or less controlled experiment. If you can do X or Y and doing X always accompanies outcome Z but Y does not, then you get more or less that X causes Z. Further evidence for the goodness of a model can come from pre-predicting that P will cause Q and then doing experiments which confirm the prediction.

The biggest issue we have in stats and social sciences is a general lack of ability to carry out experiments.

I think the pithyiest and most correct version is “if causation then correlation… but if correlation then sometimes correlation”

In the case of Florida, he doesn’t write correlation is not causation because he disagrees with the causation. From what I gather, he seems to write that to announce that he knows it, even though he then implies causation from said correlation anyways.

Perhaps we should stop using the term causation in the social sciences because it’s extremely hard (if at all possible) to establish with data analysis alone. I would say though, that random control trials (although its often not practical in the SS’s) are pretty good at establishing causation. Time series analysis is also pretty convincing in the absence of controlled environments.

I tend to agree with the post though, correlation doesn’t tell you jack.

I found this a little painful to read. Since you know about Judea Pearl’s work, Andrew, you presumably know that there are fairly sophisticated algorithmic systems (e.g. Spirtes, Glymour and Scheines) that aim to systematically infer causation from correlation? I’m not persuaded by them myself, but I think we’re well beyond economists coming-up with catchy variations on ‘correlation is not causation’.

For example, in philosophy of science the principle that correlation implies causation is often attributed to Hans Reichenbach. In this working paper directed at philosophers, I discuss some implications of ‘Reichenbach’s Principle’ for econometrics, specifically the basic rationale for IV methods:

http://www.opensaldru.uct.ac.za/handle/11090/176

More valuable is the, relatively neglected, work by Chalak and White on what they refer to as ‘settable systems’; there are a bunch of papers online.

In short: I think enough research has now been done to go way beyond these fairly superficial assessments of the link between correlation and causality.

Sean:

Yup, with enough assumptions you can draw all sorts of conclusions.

Out of curiosity, has Pearl’s causality framework been even applied to the Physical Sciences to yield an important result? So also any of the other techniques you allude to?

Because correlation is the brute force weapon I’ve always seen employed in practical papers.

Well, it depends on what you mean by “Physical Sciences”. Do you include biology and genetics? The bioinfomatics people use Bayesian networks all the time to tie together medical data and genetic features.

Pearl’s pet toolkit seems to need a bit more than just using a Bayesian network, I thought?

Pearle’s framework, with all due respect, *is* a repackaging of Spirtes, Glymour and Scheines (… don’t believe me, check yourself – I heard this from experts in the field who know the literature really well, have deep respect for Pearle’s work, and have made important contributions themselves).

Recently Tom Claassen in the Netherlands developed some very impressive extensions to their framework

http://www.cs.ru.nl/~tomc/

See his PhD thesis

http://www.cs.ru.nl/~tomc/docs/thesis.pdf

I wonder if Pearl would agree.

I think it’s a bit more of a two way street than that. :-)

In a 1991 paper Sprites [1] acknowledges that their algorithm is similar to Pearl (and Verma’s) and that “[Sprites] used several of [Pearl’s] proof techniques in [Sprites’s] proofs.” (p16)

Although… Sprites does point out about Pearl and Verma 1990, that “the two main claims that [Pearl and Verma] make about patterns [completed hybrid graph] in this paper are both false.” (p15) and goes on to give counter examples…

[1] Spirtes, Peter. “Building causal graphs from statistical data in the presence of latent variables.” (1991).

This is news to me. I thought Glymour attributed causal graphs to Pearl in his critical review of murray’s bell curve book.

Glymour says in a recent interview: “And I mind the repeated attribution by philosophers of our work to Pearl. Pearl did a great deal, he is a first-class, original and imaginative mind, but until our work he had thrice (like St. Peter) denied that graphical models could have causal significance, and his development of prediction algorithms derives from algorithms in Causation, Prediction and Search.” http://www.kent.ac.uk/secl/philosophy/jw/TheReasoner/vol7/TheReasoner-7%2812%29.pdf

Just a curious finding about correlation and spurious effects http://www.tylervigen.com/

I have always wondered whether “causation is not always correlation” is true. The reason I ask is related to hash function in cryptography where you can go one way but not the other way. Any ideas?

Suppose A is some variable thing and B is a deterministic function of A which is fairly complicated and oscillates around rapidly. It seems plausible that cor(A,B) could be basically 0 in any given dataset even though B is determined by A deterministically through this complicated function.

One way to see how this would work is to construct it “backwards”. such as

a <- rnorm(100)

b <- rnorm(100)

This will have a very low correlation

now fit a spline or polynomial through (a,b) pairs and call this function F. Now plausibly b could arise as F(a) deterministically, but when sampled at the given a points, it would produce a dataset with very low correlation. If you sample at additional points, who knows…

What would Grainger Causality testing yield on that, I wonder? It ought to be able to identify the embedded causality in an oscillating function, I think.

Granger Causality is something I haven’t really looked into but the wiki article seems to believe it’s appropriate when one time series A provides predictive information about another time series B, separated by a lag. So I don’t think it’s applicable to determining that A which is not a time series is predictive about B through B being a complicated function of A.

Although you can construct things such as the circle which have very low correlation and are not complicated functions, the rapidly varying function example is interesting because it’s so relevant to “signal plus noise” which is the root of many statistical questions.

When we say Y = f(x) + epsilon, where epsilon is iid normal, we may as well be saying y = g(x) where g(x) = f(x) + q(x) and f(x) is slowly varying whereas q(x) is rapidly varying. Since we only ever measure a finite sample of points, and at a finite set of locations, to a finite precision, we can always construct a purely deterministic function which explains the data deterministically, even when we get several measurements at the same x value with different y values (since we observe only finite precision).

The relationship doesn’t even have to be that complicated.

If A and B were deterministically related by A^2 + B^2 = 1, then cor(A, B) for a large enough sample size would be close to 0, even though A and B are very clearly related.

I think Andrew Gelman has brought up this example before here.

The correlation would still be high if you separately took the parts above & below the A axis? Or not?

It turns out not; I wasn’t sure so I tested it in R. Here’s the code I used.

data = runif(1000)*2*pi

X = cos(data)

Y = sin(data)

cor(X, Y)

cor(X, abs(Y))

I got that cor(X, Y) = -0.01343885, and cor(X, abs(Y)) = -0.01451961.

Intuitively, this makes sense to me. When X is close to it’s mean, Y is far away from it’s mean. Alternatively, when X is far from it’s mean, Y is also far from it’s mean. Y is only close to it’s mean, when X is somewhat far away from it’s mean.

The same logic makes sense for Y = X^2, although there it might depend highly on what range you sample from. I’d expect no correlation if you took a fairly large uniform sample over [-1, 1], but a strong correlation over [0, 1].

I think cos(x)sin(x) will integrate exactly to zero over any two adjacent quadrants, so the sample correlation will be close to zero. Any single quadrant (e.g., both X and Y >0) will give a high correlation, and any pair of opposed quadrants (both same sign, or both different sign) will give some correlation.

But isnt that what you did? runif() defaults to [0,1].

data is the phase angle of the cosine not the x or y values. runif(1000)*2*pi gives uniform angles on one rotation around the circle from which both x and y values are generated.

Most of the discussion following the parent comment is a red herring; see what happens if you use <a href="http://en.wikipedia.org/wiki/Distance_correlation" instead of Pearson correlation. (The Granger test can be adapted to use it. Granger-causality basically just looks at the sign of the lags of cross-**correlation** maxima; which is to say, it’s correlation, not causation.)

Osti tabarnak, links all screwed up. Should be:

distance correlation

Pearson correlation

Huh. That is a very nice measure. I don’t have a good intuition for it, but it worked very nicely on the simulated circle/quadratic data sets above. Thank you for pointing it out!

Gave a correlation of about .2 for the circle; .4 for the half circle, and .45 for the quadratic.

—

Personally, I feel that the correlations are all a bit lower than they ought to be; but it’s hard to say what they actually should be. For something like the quadratic (with no noise) it feels like we want to say that there is a perfect dependence relationship there, and get back a correlation number close to 1. Same for the half circle.

For the circle, it seems like we’d want the value to be a bit less than that. The dependence relationship is not deterministic; for any X value, there are two equally well fitting Y values, but only two such Y values.

But it’s not clear that there would be a reliable way to find a good fitting, arbitrary functional relationship between two data sets without overfitting the data, and always returning perfect fit. So with an added simplicity bias (and preference for linear, or at least monotonic, functions) maybe the distance correlation does something that is effectively close to that.

Does anyone have a better intuition for how the distance correlation works?

Corey, thanks for this link to Distance Correlation. That seems really useful for realistic problems, but I don’t see any way it could possibly distinguish between

a <- rnorm(100)

b <- rnorm(100)

vs

a <- rnorm(100)

c <- f(a)

when f(a) fits exactly through the points (a,b) ;-)

The ultimate such function is a pseudo-random number generator, which is just a special function that maps the sequence 0,1,2,3,… to something that passes vast numbers of tests for random numbers in the interval 0,1 ;-)

dcor((1:1000)/1000,runif(1000))

[1] 0.0441

Still, outside of actually hand-constructing pathology like this, dcor is likely to be the way to go to detect dependence.

It was the linear dependence parts that were fishy and reddish. ;-)

Reading further on Distance Covariance etc, its definition in Wiki is a little un-motivated, but it has something to do with how far away from each other two randomly chosen samples are. There’s an equivalent formulation in terms of brownian covariance that seems interesting.

Intuitively that definition in terms of brownian motion is transforming your variables of interest through a randomly chosen function which produces a kind of standardized version of the variable of interest. It then takes Pearson correlation of those standardized versions. The Pearson correlation instead takes the correlation of the identity applied to those two variables of interest.

The process of “scrambling” your variables of interest through two different brownian motions (and then re-centering them around 0) produces two variables that, if they were independent, would look like two independent normals, but if they tend to move together, they’d look like something else. I think I’m going to simulate this in R to see how it works.

Surprisingly hard to make sense of this brownian stuff. My biggest gripe is the E[XX’YY’] thing. I need to draw X and X’ from the same distribution and Y and Y’ from the same distribution. If you have a single sample, this means splitting your sample randomly in half for example, and then you have a 4 dimensional thing instead of a 2 dimensional one…. having a hard time getting a feel for what it does.

Has anyone mentioned the famous “Anscombe Quartet”?

http://en.wikipedia.org/wiki/Anscombe's_quartet

In essence, even limiting oneself to the Pearson Product Moment correlation, correlation doesn’t imply correlation, let alone cause.

[…] Andrew Gelman, “Correlation does not even imply correlation.” (Andrew Gelman) […]

[…] does not even imply correlation” http://statmodeling.stat.columbia.edu/… (yes, another great post on @StatModeling‘s […]

This post is nicely correlated with this one:

http://hardsci.wordpress.com/2014/08/04/the-selection-distortion-effect-how-selection-changes-correlations-in-surprising-ways/

Can’t let a discussion on correlation go by without mentioning the classic XKCD take:

XKCD.com #552 and

XKCD.com #925

There’s more, and certainly lots of joy for math and science geeks, but these two seem most appropriate to this thread.

Sorry, didn’t know embedded links weren’t allowed. Here:

http://xkcd.com/552/

http://xkcd.com/925/

We need to look at a topological, not geometric correlation. This enables nonlinearities to be properly considered. The method is defined here:

http://www.scribd.com/doc/140999751/Non-Linear-Correlation-Coefficients

Causation requires knowledge of the conditional probability as well as the correlation. We propose a method of determining causation and compare it to Granger & Convergent Cross Mapping here:

http://www.scribd.com/doc/155404418/Causation

The crux of the problem is that for any given dataset, there are infinite unrelated/random datasets with which it correlates highly, but only comparably very few with which it correlates due to a causal relationship. Given a particular dataset X and a set D of all datasets with which it correlates (within some threshold), the probability that a dataset chosen at random from D has a causal relationship with X is 0.

The problem is affirming the consequent. “If A caused B, A will correlate with B.” This does not imply “A correlates with B, therefore A caused B.” There are lots of things that correlate with B, and A just happens to be one of them. Correlation is generally a necessary condition for causation (we’re not going down that rabbit hole) but it absolutely isn’t sufficient.

“If A caused B, A will correlate with B.” is false. See for example y=x^2.

See the discussion on distance correlation above.

[…] Andrew Gelman […]

I was going to write something… But I’ll just quote the Wiki article…

“Ordinarily, regressions reflect “mere” correlations, but Clive Granger argued that causality in economics could be reflected by measuring the ability of predicting the future values of a time series using past values of another time series. Since the question of “true causality” is deeply philosophical, econometricians assert that the Granger test finds only “predictive causality”.”

http://en.wikipedia.org/wiki/Granger_causality

Which means that “causation” in econometrics etc. is really just correlation pushed into the future (this is how the test works). That can change at the drop of a hat (hello, Taleb!).

So, yeah, be careful. Smith is obviously a True Believer in this stuff. But let’s not have slightly more inquisitive minds accepting the premise that we can properly PROVE causation in economics.

[…] Aussicht für eine “wissenschaftliche” empirische Beweisführung… Hier noch der Verweis aufs Original, gibt auch etliche interessante Kommentare. Daraus z.B. dieser Lesenswerte, dass die Auswahl der […]

[…] não é causação, qualquer nerd nos informa (e alguns nos informam que correlação não é nem mesmo correlação). No entanto, no mais das vezes, quando declaramos uma morte clínica, por exemplo, assumimos […]

[…] es, según Andrew Gelman, la correlación entre dos variables en una muestra ni siquiera implica su […]

[…] Correlation: It does not even imply correlation. […]

A little late to the discussion….

Several questions,

Is the following useful – are causes correlated with outcomes? (sometimes, usually, under what conditions Yes, under what conditions No?)

Causation seems relevant in a perturbed system, but what does causation mean in a stable or dynamic system? Loss of a girder can cause a building to fall, but its meaningless to say its presence ’causes’ the building to stand. Or a mutant gene can cause disease, but presence of a normal gene does not ’cause’ health.

Is dependency a more meaningful concept. A building depends on the girder; health depends on a normal (not mutant) gene.

Just thinking out loud…..

I can see your point — and think it is a good one that we ought to take into account more often. But there are times when causality is an appropriate term — although we may not always be able to describe the causal “factor” precisely; what I am thinking of is that a combination of “subfactors” might validly be considered a “cause” — for example, the girder alone does not “cause” the building to stand (as you point out), but the combination of that girder and other materials, plus the way they are put together does seem (at least in some cases) to qualify as “the cause” of the building’s standing.

But in the “mutant gene” situation, the mutant gene itself may not “cause” disease, but the combination of the mutant gene with other gene variants or with environmental factors may “cause” disease.

Someone has very probably said this already… but I’d swap correlation for ‘association’ where association is a much more general substitution measure or metric (e.g. mutual information).