Can you write a program to determine the causal order?

Mike Zyphur writes:

Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes.

I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say.

Nonetheless, this might be of interest, so I’m passing it along to you.

30 thoughts on “Can you write a program to determine the causal order?

  1. I’m missing something – why doesn’t that make sense. Temperature doesn’t cause altitude – so it must be that way around?

  2. I saw a talk about the temperature vs altitude example. I could DEFINITELY be mis-remembering, but in that talk, they made certain assumptions:

    1) There is an independent variable, on which the other might depend upon.
    2) The dependent variable was a smooth function of the independent one, plus random noise.
    3) The noise was unimodal.

    Under those assumptions, A probably didn’t cause B in the example, since there would be a bimodality at A=130.

    It was a talk by Bernard Scholkopf at NIPS. I’m not really doing it justice here.

  3. The example seems totally straightforward to me, although I may be missing something.

    It’s very straightforward to manipulate either variable, and we can see if the other one responds:

    1. Going up in a balloon or a plane decreases the ambient temperature (“altitude causes temperature”).

    2. Turning on a heater or changing the albedo of an area doesn’t decrease altitude (“temperature doesn’t cause altitude”)

    Given those two facts, I’d be perfectly comfortable saying “altitude causes temperature.”

    What am I missing?

      • That’s fair, although many of the same processes would operate whether we lifted a thermometer in a balloon or whether we lifted the whole city.

        In fact, that experiment has basically been done already: lots of places have changed dramatically in altitude over time as mountain ranges and rift valleys have formed and worn down. After controlling for global climate changes over geological time, we consistently find that locations get colder when their altitude increases. During the mid-Cretaceous, when the Rockies didn’t exist yet, the spot on the map we call Denver, Colorado probably wasn’t any colder than the spot we call Lincoln, Nebraska. Smaller-scale “natural experiments” have been observed on human timescales as individual buildings and volcanoes increase in height.

        We also have mechanistic models that provide physical explanations for how weather and climate respond to altitude. In contrast, we lack mechanistic explanations for how temperature can affect altitude except in a few irrelevant cases (e.g. glaciers shrinking in height as the climate warms). These models are also consistent with the fact that daily, seasonal, and geological-scale variation in temperature doesn’t tend to raise or lower places appreciably.

        In short, I think that “we could cool down Berlin by putting a mountain underneath it” is a sensible thing to say and that “we could put a mountain under Berlin by cooling it off” isn’t.

        Whether we could make all these inferences from a single two-column data table with altitude and temperature and no knowledge of physics or geology is a different question.

        • Your last sentence is what’s bugging me. I can see why altitude “causes” temperature but how does one figure that out from two un-labelled columns of data?

          That to me sounds like magic.

        • Agreed. Except that some apparently smart people think it’s possible. So *maybe* they know something we don’t.

        • And they’ve put $5,000 where there mouth is. So they know something I don’t, or they have money they don’t have anything to spend on.

      • They’re not going up in a balloon, but the fact that one would record decreasing temperatures as one went up in a balloon means that high altitude “causes” low temperature, doesn’t it? Are you saying that altitude does not have a causal effect on temperature, or just that this data set does not demonstrate it? If it’s the former, then are you unconvinced by other experiments where people did go up in balloons? If it’s the latter, are you saying that there is not a concept of a causal relationship between two random variables outside of their observation in any given data set?

        The point of the exercise is to try to reduce causal inference to a classification problem. The idea seems to be that scatterplots of causal relationships might tend to look certain ways (or not look certain ways) empirically. Personally, I’m agnostic/skeptical, but it’s certainly an empirical question.

        • In another comment below you say that you think Pearl’s framework is reasonable. You write “Basically, if it is possible to intervene in such a way as to change X directly, then one can ask whether X causes Y.” Can you explain why going up in a balloon is not intervening in such a way as to change altitude directly?

        • My comment was flippant (of course there is a causal effect!), but it does point at a genuine difficulty: any argument based on considering altitude as an independent variable (set arbitrarily) and temperature as a dependent variable (measured after setting the independent variable) can be inverted. What is to stop us from considering the balloon as a mechanism for setting temperature, and then measuring altitude?

    • Altitude doesn’t cause temperature any more than temperature causes altitude. A similar example would be to plot “time on the road” versus “distance travelled” for a bunch of motorists. Which causes the other?

      In the cities example the two variables are symmetrical: if you change altitude by going to a higher altitude city, you will probably measure a lower temperature; if you change temperature by going to a colder city, you will probably measure a higher altitude. Going up in a balloon or turning on a heater are not options in this setup. Nor is going up in a heated balloon (which presumably would have led you to conclude that altitude doesn’t cause temperature?).

      The interesting thing in their plot is that it seems to have points corresponding to two types of cities: one type displays a straightforward negative correlation between altitude and temperature; the other consists of cities close to sea level at various temperatures (no correlation with altitude). On visual inspection this seems to be the only feature of the plot that could have led them to their asymmetrical conclusion – I have no idea why they think this should support a causal conclusion in one direction or the other.

      • In your reasoning are there any pairs of variables at all where “X causes Y” makes sense? Which? Any examples?

        • Certainly – I think Pearl’s framework (as explained in his book “Causality”) makes complete sense. Basically, if it is possible to intervene in such a way as to change X directly, then one can ask whether X causes Y. In the cities example, we cannot intervene to change either the altitude or the temperature of a city.

        • The cities are just points in 3 dimensional space with associated temperatures. I agree that the cities data does not prove a causal relationship between altitude and temperature. It’s beyond me why you think that the balloon experiments wouldn’t satisfy Pearl’s framework, however. If you accept that altitude and temperature are causally related but still object, are you saying that whether there is a causal relationship between altitude and temperature varies depending on what particular dataset you happen to be looking at?

        • See my comment above – I’m not really arguing that there’s no causal relationship between altitude and temperature in general, but in the cities example I do think a stronger claim than “not proven by the data” is warranted.

          The exact definition of the variables matters: which actions count as interventions (changing the value of a variable), and which count as switching to a measurement of a different variable? In my view the cities example does not allow _any_ interventions, only switching between different measurement pairs (i.e. different variables – instead of a single variable “altitude”, we have altitude-of-Berlin and altitude-of-Munich). We can cherry-pick the city to make it appear as if we are adjusting X (or Y) but that’s not the same as direct intervention. So in this context I’m not sure the claim that altitude causes temperature is well-defined.

  4. There is statistics and there is physics (in this case, specifically atmospheric physics):
    See lapse rate, which says (subject to lots of caveats):
    “In the lower regions of the atmosphere (up to altitudes of approximately 40,000 feet [12 km]), temperature decreases with altitude at a fairly uniform rate. Because the atmosphere is warmed by conduction from Earth’s surface, this lapse or reduction in temperature is normal with increasing distance from the conductive source.

    Although the actual atmospheric lapse rate varies, under normal atmospheric conditions the average atmospheric lapse rate results in a temperature decrease of 3.5°F/1,000 ft (6.4°C/km) of altitude.”

    Or see this or this.

    Cities near coasts (or even near really large lakes) can have quite different temperature characteristics than those further away, even with not much difference in altitude.

    Anyone who does alpine skiing knows whether or not it is generally colder a few thousand feet above the base, and whether or not mountains have snow at the top or the bottom.

    Finally, if an urban area is heated a few degrees (as per Urban heat Island), it does NOT gain hundreds of feet of altitude.

  5. http://www.kaggle.com/c/cause-effect-pairs shows a scatter plot with horizontal and vertical axes A and B and asks whether A causes B or B causes A. The data are distributed roughly in the shape of a V on its side.

    You say “… the goal is to figure out what the test-writer wanted you to say”. In other words, figure out the assumptions the test writer is making.

    Some assumptions that come to mind are
    – If x causes y then y will be a single-valued function of x.
    – The errors in the x and y measurements have unimodal distributions.

    With these assumptions, the answer the test-writer would be looking for would be that B causes A.

    • I question your first bullet point: In real world data there probably will be several cities that have the same altitude yet different yearly average temperatures (since temperature does not depend on altitude alone ).

      There also will be cities that have the same temperature yet different altitudes.

      Won’t that mess up the single valued function heuristic for causality?

      • It will always be possible to come up with special situations that confound any heuristic for inferring causality from observational data. But it may be the case that *empirically* causal scatterplots tend to look (or not look) certain ways. We could potentially learn some ways to classify scatterplots as causal or not (or classify the direction of causality) that work well in practice and could be used as one tool among many to search for causal relationships, possibly as part of a large scale screening mechanism.

      • I guess it would be more accurate to talk about a pair of observed variables (A and B) and a group of unobserved variables C, D, etc. Assume that there is a causal relationship between A and B, and the dependent variable (we don’t know if it’s A or B) also depends on C, D, etc, but their effects are weaker than the effect of the other observed variable.

        Unobserved variables introduce noise whose magnitude is related to the share of variance explained by those variance.

        If we see that B is a non-single-valued function of A and, for some A, values of B take two values far enough apart compared to the overall magnitude of noise, it is likely that B causes A.

        If the relationship is single-valued, one other approach would be to look at the level of noise at different points of A-B curve. If A=f(B) + gaussian noise, then B=f^-1(A) + non-gaussian noise whose magnitude depends on the derivative of f(x).

        • Can you expand on your last paragraph?

          I was thinking that
          – variable a depends on variable b. a = f(b)
          – A and B and measurements of a and b
          – there are errors in the measurements of a and b, not in the relation between a and b.

        • I was thinking about this the wrong way. If I think of it in the following way then I get your answer

          – b causes a means that if you change b then a will change.

          Therefore the way to think to the A,B data being created is

          – b is set to value B
          – a = f(b)
          – a is measured as A

          Therefore

          A = f(B – eB) + eA

          where
          – eB is the error in setting b to B
          – eA is the error in measuring A

        • I’m thinking that measurement errors are usually much less important than unaccounted-for causal effects (and, in many cases, they _are_ unaccounted-for causal effects.)

          In altitude vs. temperature example, we can measure the altitude with the accuracy of fractions of a centimeter and the annual temperature with the accuracy of 0.1 C or better. If we don’t get perfect relationship between the two, it’s not because we need to improve our measurements, but because there are confounding factors. For example, at any given altitude, mean January temperature in Germany increases south to north (as you get closer to the sea), mean July temperature increases north to south (away from the sea and toward lower latitudes), and mean annual temperature does something complicated but, generally, decreases west to east (away from the Atlantic).

  6. The phrase “X causes Y” is a major causal factor in creating confusion.

    Here, the science says that for average yearly temperature (over a useful period, say 15 years): by city, there are many causal factors, and climate scientists often talk about “attribution problems.”

    1) Latitude matters, higher = cooler.

    2) Altitude matters, higher = cooler.

    3) Nearby geography matters, especially large bodies of water, but also the upwind geography and wind patterns: Los Angeles is very different when wind comes from East versus the usual West.

    4) Sulfate aerosol plumes matter: if the city is in one, say by being downwind from big coal plants or (if there were some) persistent volcanoes. that’s a cooling factor, as in China.

    5) Cities with buildings painted white or with high-reflecting paint, with more trees, roof greenery and less air-conditioners will tend to be cooler than otherwise-same ones that don’t. I.e., Urban Heat Island effects vary.

    6) While CO2 is generally well-mixed, some areas (say, with a lot of cars) can have much higher local CO2, which raises temperatures, as per Jacobson..
    ETC

    Still, in this case, given that it is Germany, the latitude doesn’t vary much, emission rules keep the worst of the sulfates down, etc, so at least some of the confounding factors are lower than if one threw in equatorial cities.

    But still, seeing such a graph, the first thing one might want to do is ask if there are two populations of cities, separate them into clusters, and analyze them separately. A very similar effect is seen in computer performance analysis, in which vector computers can have wildly-different performance characteristics compared to typical scalar machines.

  7. Causal analysis without context is pure nonsense. Remember Nancy Cartwright’s law, “No causes in, no causes out.” One can always torture a simultaneous equations model to reproduce any data set in any causal order. For example, it is possible to write an SEM where altitude is a function of temperature, if we choose the error process appropriately. And don’t even get me started on the issue of omitted variables!

Comments are closed.