Raul:

I agree about the importance of pluralism, and I wish that Pearl and Mackenzie had been able to write about their approach without trying to put down other approaches. It seems silly for things to be framed as “causal vs. statistical,” given that we use statistical models to study causal inference! That said, I wouldn’t label what Pearl and Mackenzie wrote as “childish”; I just think they are somewhat misinformed about some topics outside their field of interest.

]]>The “book of why” collects many already published works and present them to a more general public in a friendly style. It mentions in different parts, how causal inference, e.g. using graphs, help researchers analyzing the problem without so much “mental gymnastics”. Furthermore, it mentions cases where researchers has been successful without causal inference and cases when they have not. By no means, should be interpreted as “fighting against the statistics establishment”.

It is important to keep in mind that one of Pearl’s purpose has been to deal with an approach that eventually be useful to AI, in an easier manner, without so much “mental gymnastics”, or at least, lying the foundations of it.

My point is simple: Reading other people’s subject areas should be an eye opener, at least professionally. I’ve notice how leading authors of a subject tend not to cite criticism from “the other side”.

Being a causalista (or whatever it is called), I’ve noticed how even leading authors in causal inference avoid some important papers in econometrics and the other way around. The same happens in statistics vs causal inference.

Instead of trenching our thinking, we should be happy getting exposed outside out comfort zone. Are we getting into another childish “Causal vs Statistical” dispute, equivalent to the “Bayes vs frequentist” one?

]]>If you want to read a fascinating story of smoking and statistics then read this book: “The Emperor of All Maladies: A Biography of Cancer” by Siddhartha Mukherjee.

]]>Steven:

I followed the link. It’s interesting to read this, as it seems that this person and I are interested in entirely different sorts of problems. The causal inference problems that I work on do not work the way he describes—as discussed above, for most of the variables in the problems I study, the idea of a “do operator” doesn’t really make sense. But, then again, perhaps the problems that he works on would not be amenable to the approaches described by Angrist and Pischke, or my work with Jennifer. Different methods for different sorts of problems, perhaps.

]]>Judea,

Any suggestions on the how to analyze this simple toy problem?

]]>Lets take the simplest causal problem possible, say

a Markov chain X —>Z—>Y with X standing for Education,

Z for Skill and Y for Salary.

Is it a simple problem? Like, how would you define Skill, Education and eve Salary to be measurable and comparable?

]]>Terry:

Pearl and Mackenzie wrote a book and sent it to me, I reviewed it. There were some things I liked about the book, some things I disagreed with, and some clear mistakes. There’s no cat and mouse here. Pearl, Mackenzie, and anyone else can solve whatever problems that seem important to them; meanwhile I’ll solve the problems that seem important to me, which include some problems of causal inference and some problems that don’t involve causal inference.

]]>Feel free to ignore my comment above. It is an argument from ignorance. Perhaps Judea has explained at great length how he would handle this problem in his books (which I have only skimmed).

]]>Lets take the simplest causal problem possible, say

a Markov chain X —>Z—>Y with X standing for Education,

Z for Skill and Y for Salary.

I’m confused by why neither of you want to explain how this toy problem could be handled. By anybody. Using any technique. Under any assumptions.

Shouldn’t it be possible for at least one of you to say, something like “I would use Joe Schmo’s technique of blah-blah to analyze this, specifically I would implement Schmo’s technique by using assuming … and running …?”

Do none of the techniques mentioned by Andrew successfully handle this problem? Is the problem inadequately specified so that a lot of assumptions have to be made? Is that Judea’s point, that a lot of issues are swept under the rug because current techniques are too simple?

And why doesn’t Judea tell us what existing analytical methods are appropriate for the toy problem? It looks like about as simple a causality problem as can be proposed. Are there no existing techniques that can adequately handle it? If so, how is Judea’s causality analysis helpful if it judges all known techniques to be inadequate? Is it a useful method of identifying and raising issues assumed away by existing techniques? Does it point us to additional information or techniques that would make the problem tractable?

I just don’t understand this cat-and-mouse back and forth.

]]>Nathan:

My view is similar to many other statisticians and econometricians. I view Pearl’s formalism as a way to help people understand certain causal structures that can also be expressed using traditional statistical models. I recognize that many people find this structure to be helpful. I’ve also seen various claims, for example that causal structure can be discovered from data analysis alone. I don’t think those claims make sense, for various reasons including what I say on pages 960-962 of this article.

I’m happy to agree to disagree because I think that’s the only possibility. To say that “maybe Pearl is right that his tools are essential” . . . That just makes no sense to me. We solve causal inference problems all the time without Pearl’s tools. Beyond that, no tools are essential. I’ve done a lot of research on Bayesian data analysis, but I don’t think Bayesian data analysis is essential; I just think it is useful. The utility of any tool depends also on who is using it. I welcome Pearl’s book in part because I know that many people do find his tools compelling.

Regarding toy problems: Pearl and I have had such discussions in the past but they have gone in circles. See for example the discussion I refer to in this comment elsewhere in the thread. I find the whole thing exhausting. Again, though, I recognize that many people find Pearl’s ideas appealing, and I did think there was a lot of interesting stuff in his book. It’s common for someone to have a mix of good and bad ideas—this is not at all unique to Pearl and his collaborators!—and so “agreeing to disagree” often makes sense, I think. It’s not as if there are any realistic alternatives here!

]]>What Matt says below.

do-calculus might be necessary to solve certain toy problems, and it offers a formal language for looking at causality, but we can think of other examples where scientific consensus on causation (whether by ‘inference to best explanation’ or whatever) was achieved without the do-calculus, e.g. smoking and lung cancer

Dear Andrew,

Is it really a stretch of the imagination to say that mu causes y in your simple example?

Fancifully, mu causes location and sigma causes dispersion.

Can we think of mu as a “proximate cause” in a chain of causation?

An attempt at a scientific explanation would render mu as a conditional mean depending on putative causes.

The model parameters instantiate the nature of the connection.

If this line of reasoning is valid, the same DAG that is used to organize causal queries (following Pearl)

can be augmented to organize the Bayesian analysis of the statistical model (following Gelman).

This approach is giving me very satisfying results in applications to semiconductor design, manufacturing, test and reliability.

I am uncomfortable including a variable in a model just because it is correlated to an effect of interest.

In my applications, I get the best results with putative causes organized as a DAG.

Best regards,

Jeff

Judea:

Let me clarify. I have worked on problems that involve causal inference, and there we have identification strategies (some combination of assumptions, data collection, and modeling). I’ve worked on other problems that do not involve causal inference, and for those we do not require causal identification. I don’t think we can do identification using statistical techniques without causal structure or assumptions, and I would not want to leave the impression that I think that. Indeed, I’ve consistently been critical of statistical methods that have been advertised as being able to discover causal structure using data alone.

]]>Joshua,

You characterization of the two efforts is accurate. Up to the point where you say:

“Note that you don’t need a causal model to answer associtaional / “statistical” queries”

This is true, but you need a causal model to decide what you need to estimate. You can’t start

the estimation process before receiving instructions from the identification process. So, how do

statistician survive? They estimate convenient quantities, and make believe they are engage in

“causal inference” because if any catches them in cheating, they can always post-justify what they

did by finding assumptions that will make it Kosher.

You mention three limitations to SCM. (1) nonparametric (2)acyclicity (3) large sample.

(1)True, although graphiccal models are revolutionizing linear SEM as well

(2)True, but you cannot manage cycle with PO or with statistical techniques

(3) True, but you can’t do any finite sample inference if you do not have an estimand to estimate

As to ““There are other methods that you might not consider to be fully rigorous,” I do not insist on rigor, I insist however on

stating those other methods, however handwaving they are, and relating them to what rigor dictates, so that we know what approximations were made.

Finally: “it’s unclear how to use your methods for the problems we’re interested in.” Really? All it takes is to examine the estimand

that comes out of our inference engine and try to estimate it with your powerful statistical methods, rather than pretend that you dont care about the estimand, because whatever you estimate “seems to be working”.

The main issue is how can Andrew and his team estimate things without an estimand, namely without doing identification, or borrow

an estimand from graphical models. Andrew answers it: “I find it baffling that Pearl and his colleagues keep taking statistical problems and, to my mind, complicating them by wrapping them in a causal structure “. In other words, he thinks he can do identification without causal structure, using statistical techniques, or wish away the need for identification, continue estimating what is easily estimated and then write: “identification is important, we need more books about it” . I dont get the logic.

Brian,

I could not resist a comment on your post. Rubin’s manipulability restriction is unnecessary.

This paper explains why:

-483 Pearl, “Does Obesity Shorten Life? Or is it the Soda?

On Non-manipulable Causes,”

https://ftp.cs.ucla.edu/pub/stat_ser/r483-reprint.pdf

https://ucla.in/2EpxcNU

Journal of Causal Inference, 6(2), online, September 2018.

Moreover, SUTVA is needed only for orthodox PO folks who do not speak structure.

Otherwise, it is automatically satisfied in the structural interpretation of counterfactuals,

Accessible here: https://ucla.in/2G2rWBv

A drastic revision of Stone Age PO is in order.

They both fought for truth but when their methods clashed, “Kah-BLAMMO!” … Rhetorical CONFLICT! … /munch, munch, munch (add salt; sorry you pseudo-experimentalists), /munch, munch, munch. / PAUSE. / run to kitchen, … microwave. Pop, pop, pop. Crossing fingers hoping for more. /vaults over sofa back and settles back in.

]]>By ‘mechanistic models’ and mathematical biology I mean eg

https://www.springer.com/gp/book/9780387952239

Is this sort of thing orthogonal to your goals? Which again, I take as constructing estimators from empirical data as guided by qualitative ‘causal’ info?

]]>Thanks :-)

I find it interesting that a seemingly ‘causal’ notion relies on such a statistical notion as having a random sample from a population.

Do all of your usual ‘internal’ validity methods require assumptions on sampling mechanisms?

As I mentioned elsewhere, I’m more used to things like conservation of mass and energy, which usually don’t suffer from transportability problems (I suppose actually you can derive these laws from ‘transportability assumptions a la Noether!), so find this all quite foreign, but very interesting.

To me when I think ‘causal’ I think eg conservation equations. The closest ‘philosophical’ account I know of is Dowe’s ‘physical causation’

https://www.cambridge.org/core/books/physical-causation/D056895488F735AC513E455D3683497F

Is it fair to say that you are more interested in constructing estimators from empirical data than say building ‘mechanistic’ models like those found in something like mathematical biology?

]]>ojm,

Yes, the standard frontdoor adjustment for unknown confounders assumes we have a representative (eg random) sample from our target population.

If you suspect disparity between target and study population, express your suspicions in a Selection Diagram and turn the

transportability engine on. The answer will come back in seconds. See https://ucla.in/2N7S0K9 It depends of course on HOW the two populations differ; some differences can be ignored and others may be detrimental.

[BTW, Andrew could not forgive me for stating that “the problem of external validity has not progressed an iota since Campbel and Stanley”. I hope you see how damn right I was.]

]]>CK,

To identify the interaction, we need to identify the quantity

P(y|do(x,z)) for at least four values of x and z, and check whether

the difference P(y|do(x,z)) -P(y|do(x’,z)) depends on z.

The first identification is an exercise in do-calculus, for which we have complete algorithm

once you write down the graph. If the backdoor condition holds, it becomes a difference between

two regression expressions.

For gentle introduction, see https://ucla.in/2KYYviP

]]>This is my favorite comment. I would just add that none of us could do non-stupid statistics all the time. Which means everyone of us could benefit from Pearl’s insights.

]]>Good share.

Causal Salad is in strong contention for my phrase of 2019!

Judea:

I disagree with your implicit claim that, before your methods were developed, scientists were not able to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave. I doubt much will be gained by discussing this particular point further so I’m just clarifying that this is a point of disagreement.

Also, I don’t think in my review I portrayed you as thoughtless. My message was that your book with Mackenzie is valuable and interesting even though it has some mistakes. In my review I wrote about the positive part as well as the mistakes. Your book is full of thought!

]]>So one group did a causal inference study. And they showed why yes marijuana did have a slight causal relation to schizophrenia.

Then another group did a similar study but looking at arrows going in *both* directions. And they found that the schizophrenia leading to marijuana use causal arrow was so strong (quantitatively) as to make them look at the arrow going the other way as a possible error.

So causal inference eventually showed the way to what most folks knew already, but that years of stats and ‘an association found’ hadn’t been able to pinpoint. :)

study 1 (biased arrow, but indeed a strong look at the causal link from marijuana to schizophrenia): https://www.nature.com/articles/mp2016252

study 2 (both arrows and mendelian randomization, showing what may be the use of marijuana to *relieve* issues with schizophrenia): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5341491/

I do agree that the book’s savaging of statistics is at cross-purposes with trying to get people to use the techniques. Thanks for the blog review.

]]>Somewhat related:

Does the standard frontdoor adjustment for unknown confounders assume we have a representative (eg random) sample from our target population?

Eg what assumptions are made about the connection between the distribution of the unmeasured vars in the target population and that in the data available?

]]>Judea,

How do we assess if X and Z interact to cause Y and whether this interaction effect is identified from the observed data.

Adrew,

Agree to division of labor: causal inference on one side and statistical analysis on the other.

Assuming that you give me some credibility on the first, let me try and show you that even the publisher advertisement that you mock with disdain is actually true and carefully expressed. It reads: “Using a calculus of cause and effect developed by Pearl and others, scientists now have the ability to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave”.

First, note that it includes “Pearl and others”, which theoretically might include the people you have in mind. But it does not; it refers to those who developed mathematical formulation and mathematical tools to answer such questions. So let us examine the first question: “whether a a drug cured an illness”. This is a counterfactual “cause of effect” type question. Do you know when it was first formulated mathematically? [Don Rubin declared it non-scientific]. Now lets go to

the second: “when discrimination is to blame for disparate outcomes,” This is a mediation problem. Care to guess when this problem was first formulated (see Book of Why chapter 9) and what the solution is?

Bottom line, Pearl is not as thoughtless as your review portrays him to be and, if you advise your readers to control their initial reaction: “Hey, statisticians have been doing it for centuries” they would value learning how things were first formulated, first solved and why statisticians were not always the first.

Judea:

I’ve attacked a lot of toy problems.

For an example of a toy problem in causality, see pages 962-963 of this article.

But most of the toy problems I’ve looked at do not involve causality; see for example this paper, item 4 in this post, and this paper. This article on experimental design is simple enough that I think it could count as a toy problem: it’s a simple example without data which allows us to compare different methods. And here’s a theoretical paper I wrote awhile ago that has three toy examples. Not involving causal inference, though.

I’ve written lots of papers with causal inference, but they’re almost all applied work. This may be because I consider myself much more of a practitioner of causal inference than a researcher on causal inference. To the extent I’ve done research on causal inference, it’s mostly been to resolve some confusions in my mind (as in this paper).

This gets back to the division-of-labor thing. I’m happy for you and Imbens and Hill and Robins and VanderWeele and others to do research on fundamental methods for causal inference, while I do research on statistical analysis. The methods that I’ve learned have allowed my colleagues and I to make progress on a lot of applied problems in causal inference, and have given me some clarity in understanding problems with some naive formulations of causal reasoning (as in the first reference above in this comment).

As I wrote in my above post, I think your book with Mackenzie has lots of great things in it; I just can’t go with a statement such as, “Using a calculus of cause and effect developed by Pearl and others, scientists now have the ability to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave”—because scientists have been answering such questions before Pearl came along, and scientists continue to answer such questions using methods other than Pearl’s. For what it’s worth, I don’t think the methods that my colleagues and I have developed are *necessary* for solving these or any problems. Our methods are helpful in some problems, some of the time, at least until something better comes along—I think that’s pretty much all that any of us can hope for! That, and we can hope that our writings inspire new researchers to come up with new methods that are useful in the future.

Andrew,

Convergence is in sight, modulo two corrections:

1. You say:

“You [Pearl] have toy problems that interest you, I [Andrew] have toy problems that interest me.

…I doubt you’re particularly interested in the problems I focus on. “

Wrong! I am very interested in your toy problems, especially those with causal flavor. Why?

Because I love to challenge the SCM framework with new tasks and new angles that other researchers found

to be important, and see if SCM can be enriched with expanded scope. So, by all means, if you have

a new twist, shoot. I have not been able to do it in the past, because your shots were not toy-like,

e.g., 3-4 variables, clear task, with correct answer known.

2. You say:

“you continue to characterize me as being frightened or lacking courage”

This was not my intention. My last remark on frightening toys was general, everyone is frightened by the honesty

and transparency of toys — the adequacy of one’s favorite method is undergoing a test of fire. Who wouldn’t be frightened?

But, since you prefer, I will stop using this metaphor.

3. Starting afresh, and the sake of good spirit: How about attacking a toy problem? Just for fun, just for sport,

]]>Judea:

I think we agree on much of the substance. And I agree with you regarding “not all econometrics” (and, for that matter, not all of statistics, not all of sociology, etc.). As I wrote in my review of your book with Mackenzie, and in my review of Angrist and Pischke’s book, causal identification is an important topic and worth its own books.

In practice, our disagreement is, I think, that we focus on different sorts of problems and different sorts of methods. And that’s fine! Division of labor. You have toy problems that interest you, I have toy problems that interest me. You have applied problems that interest you, I have applied problems that interest me. I would not expect you to come up with methods of solving the causal inference problems that I work on, but that’s OK: your work is inspirational to many people and I can well believe it has been useful in certain applications as well as in developing conceptual understanding. I consider toy problems of my own for that same reason. I’m not particularly interested in your toy problems, but that’s fine; I doubt you’re particularly interested in the problems I focus on. It’s a big world out there.

In the meantime, you continue to characterize me as being frightened or lacking courage. I wish you’d stop doing that.

]]>Some twitter convos with the causal folk have led me to realise that they are actually far closer to data analysts than generative modellers than I realised.

They basically have observed empirical data and some qualitative causal info and what to construct estimators based on the empirical data that are valid for some aspect of the causal pathway regardless of the unknown details.

I think they require samples to be representative of the unknown confounders etc, ie collected under the relevant regime, they just don’t have access to the values.

Meanwhile the hierarchical modellers I know are building explicit fully specified generative models that don’t require any data a priori – they can always be simulated. When data becomes available they crank the handle and get an updated generative model.

Because the model is generative and based on mechanisms any query can be directly simulated to represent the current state of knowledge about some pathway etc. On the other hand the causal folk directly use observed data to estimate eg P(Y|X).

So, weirdly, I think they are closer to traditional stats than many ‘generative’ modellers!

]]>Essentially, +1 to everything ojm wrote.

I think another valid way of expressing the aims of Bayesian hierarchical models (BHM) is that they enable us to wield conditional probability to build *generative models* of our data, that can readily embody substantive scientific assumptions/models/theories, thus naturally including “causal” models. In many applications at the cutting edge of science, we are not really interested in quantities like “average causal effects” – rather we want to fit, expand and/or compare generative models that provide greater scientific insight (i.e. the parameters have meaningful scientific interpretation), and/or in some cases forecast accuracy (i.e. we care a great deal about predictive ability).

]]>> not looking for a formal discussion

To me you are.

An example where someone credibly transported a parameter in an application does not count.

From your comments to Andrew below, you want to see “organizing these assumptions in any “structure””, “apparatus

… [to have] representation for such assumptions” and “just making “causal assumptions”

and leaving them hanging in the air is not enough. We need to do something with the assumptions, listen to them, and

process them so as to properly guide us in the data analysis stage.”

I have read your paper with the three figures many times and did not discern anyway I would done anything different in that paper above.

But I do agree that good formal representations are important and that is absent.

p.s. I am guessing you are aware of CS Peirce’s Existential Graphs which do the same for logic – put it into a manipulatible representation that preserves truth relationships.

]]>Andrew,

I would love to believe that where we disagree is just on

terminology. Indeed, I see sparks of convergence in your

last post, where you enlighten me to understand that by

“the apparatus of statistics, …’ you include

the assumptions that PO folks (Angrist and Pischke, Imbens and

Rubin etc.) are making, namely, assumptions of conditional

ignorability. This is a great relief, because I could not

see how the apparatus of regression, interaction,

post-stratification or machine learning alone, could elevate

you from rung-1 to rung-2 of the Ladder of Causation. Accordingly,

I will assume that whenever Gelman and Hill talk about

causal inference they tacitly or explicitly make the

ignorability assumptions that are needed to take them

from associations to causal conclusions. Nice.

Now we can proceed to your summary and see if we still have

differences beyond terminology.

I almost agree with your first two sentences:

“So, to summarize: To do causal inference, we need (a) causal

assumptions (assumptions of causal structure), and (b) models

or data analysis. The statistics curriculum spends much more

time on (b) than (a)”.

But we need to agree that just making “causal assumptions”

and leaving them hanging in the air is not enough. We need to

do something with the assumptions, listen to them, and

process them so as to properly guide us in the data

analysis stage.

I believe that by (a) and (b) you meant to distinguish

identification from estimation. Identification indeed

takes the assumptions and translate them into a recipe with which

we can operate on the data so as to produce a valid estimate of

the research question of interest.

If my interpretation of your (a) and (b) distinction is

correct, permit me to split (a) into (a1) and (a2)

where (a2) stands for identification.

With this refined-taxonomy, I have strong reservation to your

third sentence: “Econometrics focuses on (a) as well as (b).”

Not all of econometrics. The economists you mentioned, while

commencing causal analysis with “assumptions” (a1), vehemently resist to

organizing these assumptions in any “structure”, be it a

DAG or structural equations (Some even pride themselves

of being “model-free”). Instead, they restrict their

assumptions to conditional ignorability statements

so as to justify familiar estimation routines.

[In https://ucla.in/2mhxKdO, I labeled them:

“experimentalists” or “structure-free economists”

to be distinguished from “structuralists” like Heckman,

Sims, or Matzkin.]

It is hard to agree therefore that these “experimentalists”

focus on (a2) — identification. They actually assume (a2) away

rather than use it to guide data analysis.

Continuing with your summary, I read:

“You focus on (a).” Agree. I interpret (a) to mean

(a) = (a1) + (a2) and I let (b) be handled by

smart statisticians, once they listen to the guidance of (a2).

Continuing, I read:

“When Angrist, Pischke, Imbens, Rubin, Hill, me, and various

others do causal inference, we do both (a) and (b).

Not really. And it is not a matter of choosing “an

approach”. By resisting structure, these researchers

apriori deprive themselves of answering causal questions

that are identifiable by do-calculus and not by a single

conditional ignorability assumption. Each of those questions may

require a different estimand, which means that you cannot start

doing the “data analysis” phase before completing the identification

phase.

[Currently, even questions that are identifiable by

conditional ignorability assumption cannot be answered by

structure-free PO folks, because deciding on the

conditioning set of covariates is intractable without the

aid of DAGs, but this is a matter of efficiency not of

essence.]

But your last sentence is hopeful:

“A framework for causal inference — whatever that

that framework may be — is complementary to, not in

competition with, data-analysis tools such as hierarchical

modeling, post-stratification, machine learning, etc.”

Totally agree, with one caveat: the framework has to be a genuine

“framework,” ie, capable of leverage identification to guide

data-analysis.

Let us look now at why a toy problem would be frightening;

not only to you, but to anyone who believes that the PO

folks are offering a viable framework for causal inference.

Lets take the simplest causal problem possible, say

a Markov chain X —>Z—>Y with X standing for Education,

Z for Skill and Y for Salary. Let Salary be determined by

Skill only, regardless of Education. Our research problem is

to find the causal effect of Education on Salary given

observational data of (perfectly measured) X,Y,Z.

To appreciate the transformative power of a toy example,

please try to write down how Angrist, Pischke, Imbens, Rubin, Hill,

would go about doing (a) and (b) according to your understanding

of their framework. You are busy, I know, so let me ask any

of your readers to try and write down step by step how

the graph-less school would go about it.

Any reader who tries this exercise ONCE will never be the

same. It is hard to believe unless you actually go through

this frightening exercise, please try.

Repeating my sage-like advice:

Solving one toy problem in causal

inference tells us more about statistics and science than

ten debates, no matter who the debaters are.

Try it.

Maxwell:

I don’t think that in our research we are doing “causal discovery” in the sense of learning true conditional independences in nature; see the discussion on pages 960-962 of this article. I agree that it could be beneficial to improve and formalize the process that we use to construct statistical models for causal inference; I’m just not convinced that it makes sense or is useful to do so using a causal discovery framework that is centered around the estimation of patterns of conditional independence.

]]>There is some underlying mental process by which you are interpreting the relevant science to select a causal model.

In selecting a model, isn’t your brain engaging in a poorly understood form of causal discovery?

Wouldn’t it be useful to understand that mental process, for the purpose of communicating its results, and replicating it?

]]>+1

]]>Finally, here’s a recent example from my own work using hierarchical modelling of this sort to combine physical and statistical models:

https://arxiv.org/abs/1810.04350

I found the frameworks discussed by Gelman, Berliner, Cressie etc much easier to relate to such a setting, where we have a geophysical model based on PDEs, than discrete DAGs etc, but perhaps it would be possible to use your ideas to do similar things?

Do you have any pointers for doing this sort of thing (geophysical inverse problems) using DAGs etc?

]]>This ‘hierarchical physical-statistical’ modelling point of view is also nicely discussed in a statistics book that has plenty of ‘physical’ or ‘causal’ modelling examples:

‘Statistics for spatial-temporal data’:

https://www.wiley.com/en-us/Statistics+for+Spatio+Temporal+Data-p-9780471692744

They even discuss the connections between the epistemically distinguished conditional distributions and DAGs in section 2.4. Some screenshots here:

https://twitter.com/omaclaren/status/1084250405884723206?s=21

]]>> Hierarchical models are based on set-subset

relationships, not causal relationships.

Judea:

While I agree that causal or physical assumptions are a necessary supplement to pure empirical analysis, I wanna mention that this seems like a weird interpretation of hierarchical models.

Hierarchical models are fundamentally about conditional independencies and Markovian assumptions, not set/subset relationships as far as I’m familiar with them. Do you have any concrete examples of a hierarchical model where this is the case?

While the implied conditional independencies might not be enough for you to directly model causal assumptions – since you seem to require probabilistic assumptions to be about observables and not latent or unobservable constructs – it is a convenient way to incorporate or combine physical and statistical assumptions.

Two references on this:

‘Physical‐statistical modeling in geophysics’

https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2002JD002865

‘Bayesian hierarchical time series models’

http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:hierarquical_model_time_series.pdf

An important point is that while the probability calculus is symmetric etc etc, our epistemic status with respect to the various conditional distributions is different – thus we build a process model for future variables conditional on past variables that directly influence these (Markovian physical assumptions etc).

You of course would probably prefer to express this knowledge as a DAG, but I’m also unsure whether this formal representation and the theorems are sufficient to cover real world phenomena where models are not recursive and so on. Physics gets by with a mixture of mathematics and intuition, but has not been axiomatised to anyone’s satisfaction (I think this was even one of Hilbert’s problems – to axiomatise physics – perhaps you could try to claim the prize?)

(Similarly classical mechanics is time reversible but our epistemic access is different – we know the past, not the future and we are usually only interested in coarse-grained features etc – this is well-known to be enough to deal with reversibility/irreversibility ‘paradoxes’ and the second law)

]]>Judea:

We are in agreement. I agree that data analysis alone cannot solve any causal problems. Substantive assumptions are necessary too. To take a familiar sort of example, there are people out there who just think that if you fit a regression of the form, y = a + bx + cz + error, that the coefficients b and c can be considered as causal effects. At the level of data analysis, there are lots of ways of fitting this regression model. In some settings with good data, least squares is just fine. In more noisy problems, you can do better with regularization. If there is bias in the measurements of x, z, and y, that can be incorporated into the model also. But none of this legitimately gives us a causal interpretation until we make some assumptions. There are various ways of expressing such assumptions, and these are talked about in various ways in your books, in the books by Angrist and Pischke, in the book by Imbens and Rubin, in my book with Hill, and in many places. Your view is that your way of expressing causal assumptions is better than the expositions of Angrist and Pischke, Imbens and Rubin, etc., that are more standard in statistics and econometrics. You may be right! Indeed, I think that for some readers your formulation of this material is the best thing out there.

Anyway, just to say it again: We agree on the fundamental point. This is what I call in the above post the division of labor, quoting Frank Sinatra etc. To do causal inference requires (a) assumptions about causal structure, and (b) models of data and measurement. Neither is enough. And, as I wrote above:

I agree with Pearl and Mackenzie that typical presentations of statistics, econometrics, etc., can focus way too strongly on the quantitative without thinking at all seriously about the qualitative aspects of the problem. It’s usually all about how to get the answer given the assumptions, and not enough about where the assumptions come from. And even when statisticians write about assumptions, they tend to focus on the most technical and least important ones, for example in regression focusing on the relatively unimportant distribution of the error term rather than the much more important concerns of validity and additivity.

If all you do is set up probability models, without thinking seriously about their connections to reality, then you’ll be missing a lot, and indeed you can make major errors in casual reasoning . . .

Where we disagree is just on terminology, I think. I wrote, “the apparatus of statistics, hierarchical regression modeling, interactions, poststratification, machine learning, etc etc., solves real problems in causal inference.” When I speak of this apparatus, I’m *not* just talking about probability models; I’m also talking about assumptions that map those probability models to causality. I’m talking about assumptions such as those discussed by Angrist and Pischke, Imbens and Rubin, etc.—and, quite possibly, mathematically equivalent in these examples to assumptions expressed by you.

So, to summarize: To do causal inference, we need (a) causal assumptions (assumptions of causal structure), and (b) models or data analysis. The statistics curriculum spends much more time on (b) than (a). Econometrics focuses on (a) as well as (b). You focus on (a). When Angrist, Pischke, Imbens, Rubin, Hill, me, and various others do causal inference, we do both (a) and (b). You argue that if we were to follow your approach on (a), we’d be doing better work for those problems that involve causal inference. You may be right, and in any case I’m glad you and Mackenzie wrote this book which so many people have found helpful, just as I’m glad that the aforementioned researchers wrote their books on causal inference which so many have found helpful. A framework for causal inference—whatever that framework may be—is complementary to, not in competition with, data-analysis tools such as hierarchical modeling, poststratification, machine learning, etc.

P.S. I’ll ignore the bit in your comment where you say you know what is “frightening” to me.

]]>I appreciate your kind invitation to comment on your blog.

Let me start with a Tweet that I posted on

https://twitter.com/yudapearl (updated 1.10.19)

1.8.19 @11:59pm – Gelman’s review of #Bookofwhy should be of

interest because it represents an attitude that paralyzes

wide circles of statistical researchers. My initial reaction

is now posted on https://bit.ly/2H3BH3b Related posts:

https://ucla.in/2sgzkPZ and https://ucla.in/2v72QK5

These postings speak for themselves but I would like

to respond here to your recommendation:

“Similarly, I’d recommend that Pearl recognize that the

apparatus of statistics, hierarchical regression modeling,

interactions, post-stratification, machine learning, etc etc

solves real problems in causal inference.”

It sounds like a mild and friendly recommendation, and your

readers would probably get upset at anyone who would be so

stubborn as to refuse it.

But I must. Because, from everything I know about causation,

the apparatus you mentioned does NOT, and CANNOT solve any

problem known as “causal” by the causal-inference community

(which includes your favorites Rubin, Angrist, Imbens,

Rosenbaum, etc etc.).

Why?

Because the solution to any causal problem

must rest on causal assumptions and the apparatus

you mentioned has no representation for such assumptions.

1. Hierarchical models are based on set-subset

relationships, not causal relationships.

2. “interactions” is not an apparatus unless you represent

them in some model, and act upon them.

3. “post-stratification” is valid only after you decide what

you stratify on, and this requires a causal structure (which you

claim above to be an unnecessary “wrapping” and complication”)

4. “Machine learning” is just fancy curve fitting of data

see https://ucla.in/2umzd65

Thus, what you call “statistical apparatus” is helpless in

solving causal problems. We came to this juncture several

times in the past and, invariably, you pointed me to books,

articles, and elaborated works which, in your opinion, do

solve “real life causal problems”. So, how are we going

to resolve our disagreement on whether those “real life”

problems are “causal” and, if they are, whether your

solution of them is valid. I suggested applying your methods to

toy problems whose causal character is beyond dispute.

You did not like this solution, and I do not blame you,

because solving ONE toy problem will turn your perception of

causal analysis upside down. It is frightening.

So I would not press you. But I will add another Tweet

before I depart:

1.9.19 @2:55pm – An ounce of advice to readers who comment

on this “debate”: Solving one toy problem in causal

inference tells us more about statistics and science than

ten debates, no matter who the debaters are. #Bookofwhy

Addendum. Solving ONE toy problem will tells you

more than dozen books and articles and

multi-cited reports. You can find many such toy problems

(solved in R) here:

* https://ucla.in/2KYYviP

* sample of solution manual: https://ucla.in/2G11xUE

For your readers convenience, I have provided free access

to chapter 4 here: https://ucla.in/2G2rWBv

It is about counterfactuals and, if I were not inhibited

by modesty, I would confess that it is the best text

on counterfactuals and their applications that you can

find anywhere.

I hope you take advantage of my honesty.

Enjoy

Judea

]]>None of that applies to my position.

Instrumental variables and 2SLS are not the same thing. I am not arguing against IV. I am arguing against using 2SLS. There are better estimators.

]]>Judea:

All disagreements aside, I just want to thank you again for commenting here. We have a great comments section, this is a rare place for open and sustained intellectual discussion, and I appreciate your willingness to engage.

]]>Of late, I have enjoyed the discussions on Andrewʼs, Frank Harrelʼs, Sander Greenlandʼs Twitter, and Deborah Mayoʼs blogs Preferable by far to the political discourse. I may not understand the technical discussions. But I often delve into something I know zippo about. I pick stuff up along the way.

I was just surprised that there were so many interesting thinkers. I wish I had come across u all 15 years ago.

]]>Somewhat off topic, but see also: ‘inverse problems as

statisticsʼ

http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.55.6364&rep=rep1&type=pdf

bottom of pg 6. Distinguishing parameters from random variables also makes clear the difference between the physical theory (represented by parameters) and the observable consequences (represented by random variables).

In frequentist stats the consequences Y of a theory theta are represented by p(Y;theta) which is not the same as p(Y|theta), as discussed to no end around here.

Is p(Y;theta) the frequentist statisticianʼs analogoue of p(Y|do(X))?

]]>Joshua:

If by “if youʼre okay with the idea of ceteris paribus,” you mean, “if Iʼm ok with routinely interpreting regression coefficients as causal effects,” my answer is No, Iʼm not! Jennifer and I talk about this a lot in our book, and itʼs also discussed in other applied statistics books with a causal inference focus, such as in the book by Angrist and Pischke.

]]>It seems to me that if youʼre okay with the idea of ceteris paribus, you should be okay with the idea of do() for any variable, even if itʼs modeling an intervention that you could never actually perform in practice. There doesnʼt exist any way to just change someoneʼs temperature, for example. You can give them ibuprofen, you can put ice on their forehead, you can have them eat some soup, but I think the idea of saying, “What would happen if we somehow had a way of directly changing their temperature?” still makes sense as a concept. A more extreme visual: thereʼs no way to suddenly reduce the mass of the sun to zero, but this doesnʼt change our expectation that if, somehow, it was, then the earth would stop orbiting the sun. This is entirely untestable, but I think almost all physicists would admit it as a valid question, and theyʼd all have the same answer.

Thereʼs a subtlety in cyclic models: if we think of X as causing Y and Y as causing X, then (I would argue) this is ‘shorthandʼ for X_t causing Y_t+1 and vise-versa. The notion of the ‘valueʼ of X and Y becomes under-defined: there may be a stable equilibrium prior to intervention, but post-intervention, there could be several possible equilibria, or perhaps no equilibria (thereʼs a proof that this is uncomputable in the general case, under some mild assumptions about the types of random variables permitted). But I donʼt think this is a case of do() not making sense, I think itʼs a case of the model itself being underspecified. do() may be irrelevant, but not invalid as a construct.

(I do hope that Iʼm not becoming annoying; I just think this is an interesting topic and Iʼm just curious where, exactly, we end up disagreeing.)

]]>For anyone like me, who was trained on differential equations

and physics before linear regression, I think Pearlʼs stuff is initially pretty confusing (and still not without some issues imo).

Other than the Book of Why – which I do think is probably the best intro to causal DAGs Iʼve read minus the other stuff – I found this other paper by the same group (Same group as a previous paper I linked, not Pearl et al) really helpful:

https://arxiv.org/pdf/1304.7920.pdf

They show how you can think of DAGs as describing the equilibrium states of ODEs, under certain somewhat restrictive conditions.

The original ODE contains causal ordering info that is lost in just considering the equilibria, so they introduce the idea of ‘labelledʼ equilibrium equations, which is very similar to the idea of nullclines in ODE theory – you know which variable had its derivative set to zero to get that particular equation.

In the other paper I linked they show how to extend these ideas to non-recursive causal systems, which is much more appropriate for real world systems with feedbacks like an enzyme reaction.

Alternatively you could try to unfold the graph in time but then youʼd have to bite the bullet on analysing general dynamical systems, including oscillations or even chaotic behaviour, and thatʼs one I havenʼt seen the DAG folk come close to doing (yet?)