Judea Pearl writes:

Can you post the announcement below on your blog? And, by all means, if you find heresy in my interview with Ron Wasserstein, feel free to criticize it with your readers.

I responded that I’m not religious, so he’ll have to look for someone else if he’s looking for findings of heresy. I did, however, want to share his announcement:

The American Statistical Association has announced a new Prize, “Causality in Statistics Education”, aimed to encourage the teaching of basic causal inference in introductory statistics courses.

The motivations for the prize are discussed in an interview I [Pearl] gave to Ron Wasserstein. I hope readers of this list will participate, either by innovating new tools for teaching causation or by nominating candidates who deserve the prize.

And speaking about education, Bryant and I [Pearl] have revised our survey of econometrics textbooks, and would love to hear your suggestions on how to restore causal inference to econometrics education. [I’m confused on that last point; I thought that causality was central to econometrics; see, for example, Angrist and Pischke’s book. — AG]

Is there any evidence that scientists who specifically study “causality” are better scientists than those without such formal education?

You can ask the same question about “Philosophy of Science”. Is there any evidence that scientists who specifically study the Philosophy of Science are better scientists than those without such formal education? Answer: NO.

Have any studies been conducted?

Actually, I’d take even circumstantial evidence. For example, Laplace was doing Bayesian statistics to determine which small measured aberrations in astronomy couldn’t reasonable be explained by measurement errors, and then he applied Classical Mechanics to investigate any “significant” anomalies. How exactly would Laplace have benefited from any formal training in casual analysis? It seems like his “casual inference” was pretty much perfect without any training at all, which is exactly the same impression I get from every Physicists I’ve ever met that wasn’t exposed to Frequentist Statistics.

People who have been exposed to Frequentist ideas on the other hand are full of the following intuitions:

-Instead of thinking of the data as real and a probability distribution as a made up construct, they think of the probability distribution as real and the data as some kind of phantom outcome out of an amorphous universe of possible outcomes.

-They imagine an interval estimate for a parameter is something that is going to contain the right answer a fixed percentage of time in experiments that will never be performed.

-They imagine infinite prepositions of experiments that couldn’t possible be repeated even in principle,

-They imagine multiple repetitions of our universe in order to be able to think about certain probability distributions.

-They imagine they’re examining “data generation mechanisms” even though there is no there is no Frequentist analysis imaginable which would have lead them Euler’s equations of rigid body motion by conducting statistical analysis of coin flips.

All this focus on irrelevancies seems to be pretty much destroy everyone’s intuition about real physical systems. Of course, some statisticians are so brilliant they can overcome these shortcomings and still do real work. For everyone else though, their intuition seems to be permanently damage by this nonsense. The need for “casual inference” seems to be a solution to an artificial problem great by Frequentist statistics, which to this day is the first look that almost every student sees of statistics.

Entsophy, this is a better list of reasons to be uncomfortable with the frequentist paradigm for statistics than what I have managed to come up with. Thank you.

And ignore the troll (below), As a 2+ year reader of this blog, let me opine that comments from folks such as your good self, K? O’Rourke and Bill Jeffreys (not an exhaustive list) add value to this already excellent blog.

I like Pearl and he is right to push for change.

Change is generational so I see the focus on education.

The current establishment is never going to change.

PS Angrist and Pischkes book is not essentially about causality. You don’t need probability or regression or counterfactuals to teach causality: Mill’s methods and DAGs will do for identification and estimation. Probability only comes in to summarize uncertainty.

Very well put.

I’ve managed a significant number high speed/low drag quantitative efforts. The typical analyst involved had either a good BS degree or a week advanced degree in a quantitative field. Most of this crowd had above average, but below genius, intelligence. Academically, I’d say they’re about equivalent to an average Ph.D. in Sociology or Psychology.

Within that crowd, there were two groups: those that had some statistical straining and those that didn’t. What I noticed is that those who had no statistical training never had a problem figuring out causal relationships. Nor was their lack of exposure to basic statistical methods, like hypothesis testing, p-values, or linear regression, ever a problem. They always seemed to find some clever way of looking at the data without statistics that brought out the essential evidence and which was almost always far more convincing.

The only problems I ever had was with those who had a basic statistics education. They constantly made unwarranted casual leaps in their analysis. So my recommendation for improving casual inference is to stop teaching the introductory hypothesis test/p-value blah blah blah. There are after all alternatives. If those alternatives are too difficult to teach in a cookie cutter fashion, thereby leaving students with either no statistical training, or very good statistical training, then so much the better.

You sound like a crank with an ax to grind.

Please find another blog, or your own blog, to voice your anecdotes and tribal posturing. You are increasing the noise within the comments section.

OMG, you’re right they are anecdotes; I didn’t even calculate a p-value, directed acyclic graph or anything. I repent and apologize to the other tribes. I can see now that I’ve been thinking about causality all wrong, just like those other ignorant rubes and cranks (Galileo, Newton, Euler, Gauss, Cauchy, Gibbs, Maxwell, Einstein, Schrodinger).

One day, “casual inference” won’t just be the plaything of a select few Super Scientists (disciples of Pearl or Rubin), who alone have to make all the breakthroughs, but will be part of everyone’s education. When that happens we’ll see an explosion of scientific understanding that will make the Enlightenment look like finger painting.

You make perfect sense to me.

The classical introductory statistics curricula is fundamentally demented. It’s taught as a series of cookbook algorithms that one applies, who knows why. There is little discussion of any justification for the techniques as properly performing inference based on data, mainly because there is little justification. The techniques just don’t hold water at their foundations.

Back in grad school (EE, machine learning), I took the introductory graduates statistics sequence. Didn’t make a lick of sense, and the techniques tended to obscure straightforward solutions. I found Pearl and Jaynes, everything instantly made sense, and the approach would instantly clarify otherwise complex problems.

An economist (econometrician) friend of mine often corresponds with Prof. Pearl, and what I understand is that Pearl believes the econometrics approach to causality is deeply, fundamentally wrong. (And econometricians tend to think Pearl’s approach is fundamentally wrong.)

It sounds to me like Pearl was being purposefully snarky.

Yes, the problem with the econometrics approach is that it lumps together identification, estimation, and probability, so papers look like a Xmas tree.

It all starts with chapter 1 in econometrics textbooks and all those assumptions about the disturbance, linearity, etc…

Yet most discussions in causality oriented papers revolve around identification and for that you can mostly leave out functional forms, estimation, and probability.

Why carry around reams of parametric notation when it ain’t needed? One wonders how Galieo, Newton, or Franklin ever discovered anything without X’X^(-1)X’Y?

Jack, I think you misunderstood what you friend told you.

If you read any of my papers or books you will come to realize immediately

that I believe the econometrics approach to causality is deeply an fundamentally

right (I repeat: RIGHT, not WRONG). Although there have been two attempts to

distort this approach by influx of researchers from adjacent fields — see my

reply to Andrew on this page, or read http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

Next, I think you are wrong in stating that “econometricians tend to think Pearl’s approach is fundamentally wrong”. First,I do not offer anyone “an approach”, I offer mathematical tools to do

what researchers claim they want to do, only with less effort and greater clarity, which researchers may

choose to use or ignore. The invention of the microscope was not a “new approach” but a new tool.

Second, I do not know a single econometrician who tried my microscope and thought it is “fundamentally

wrong”; the dismissals I hear come invariably from those who refuse to look at the microscope for religious reasons.

Finally, since you went through the trouble of interpreting hearsay and labeling me “purposefully snarky”, I think you owe readers of this blog ONE concrete example where I criticize an economist for reasons that you judge to be unjustified. You be the judge.

Reply to Andrew:

Causality is indeed central to econometrics.

Our survey of econometric textbooks

http://ftp.cs.ucla.edu/pub/stat_ser/r395.pdf

is critical of econometric education today, not of econometric

methodology proper.

Econometric models, from the time of Haavelmo (1943), have

been and remained causal

(see http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)

despite two attempted hijacking, first by

regressionists, and second by “quasi-experimentalists,”

like Angrist and Paschke (AP). The six textbooks we reviewed

reflect a painful recovery from the regressionist assault which more

or less disappeared from serious econometric research, but is

still obfuscating authors of econometric textbooks.

As to the debate between

the “structuralists” and “experimentalists,” I address it

in Section 4 of this article:

(see http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)

Your review of Angrist and Paschke book “Mostly Harmless

Economometrics” leaves out what in my opinion is the major drawback

of their methodology: sole reliance of instrumental variables

and failure to express and justify

the assumptions that underlie the choice of instruments.

Since the choice of instruments rests on the same type of

assumptions (ie.,exclusion and exogeneity) that Angrist and

Paschke are determined to avoid (for being “unreliable,” ) readers

are left with no discussion of what assumptions do go into

the choice of instruments, how they are encoded in a

model, what scientific knowledge can be used to defend

them, and whether the assumptions have any testable

implications.

You point out that Angrist and Pischke completely avoid the task of

model-building; I agree. But I attribute this avoidance,

not to lack of good intentions but to lacking mathematical

tools necessary for model-building. Angrist and Pischke

have deprived themselves of using such tools by making an

exclusive commitment to the potential outcome language,

while shunning the language of nonparametric structural models.

This is something only he/she can appreciate who attempted

to solve a problem, from start to end, in both languages,

side by side. No philosophy, ideology, or hours

of blog discussion can replace the insight one can gain by such an exercise.

This is a horribly incomplete characterization of Angrist & Pischke’s textbook. The discussion of instrumental variables is quite nuanced and represents but one topic in a much broader discussion of identifying and estimating causal effects. Sure there are gaps and some material is already outmoded, but it provides an outstanding foundation in my opinion. In their identification results, I can’t imagine there could be contradictions with what would obtain using your NPSEM approach—in fact if you look at their characterization of dose response functions I am inclined to say they have already subsumed most of what your text provides and done one better by marrying it with workable and robust approach to estimation.

Cyrus,

The purpose of my post was not to provide a complete

“characterization of Angrist and Pischke’s textbook.”

Its stated purpose was to point out “what in my opinion is the major

drawback of their methodology.” Among other drawbacks, I

listed: (1) failure to encode the IV assumptions in the model

(2) failure to reason about them,

and (3) failure to discuss whether these assumptions have

testable implications.

Of course there can be no contradiction between

the method of Angrist and Pischke and the one

based on nonparametric structural equations (NPSEM);

the former is what remains from the latter after

a few mathematical tools are forbidden. By analogy, arithmetics

that forbids multiplication would never contradict

ordinary arithmetics that embraces both multiplication

and addition.

If you think that Angrist and Pischke’s book provides an

outstanding foundation for identification I would

challenge you to assess how many of their

students can solve the toy problems presented

in Section 3.2 of this article:

http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

especially those pertaining to instrumental variables

(section 3.2.4). Note that these problems are not

contrived to prove my point; these are the most elementary and recurring

problems in the analysis of IV’s, e.g., Is there an instrumental

variable in our model? What would the IV estimand be?

You cannot get more elementary than that.

I would be curious to know your assessment.

I feel pretty secure in assuming that an AP student would apply the tools of conditional probability and counterfactual reasoning as needed and required to answer those questions. There’s nothing exotic about what one learns from AP that would prevent one from doing so (and there is nothing that restricts relative to NPSEM in a manner that resembles the silly reference of an “arithmetics that forbids multiplication”). Nonetheless I can contribute to taking up your challenge by assigning the question to an actual class of mine (who are trained using AP) if you accept to find a way to assign to a comparable class the same plus something along the lines of the LATE result, say, with premises articulated in potential outcomes (the latter are already assigned to mine). Heck, there’s no reason for us to accept this single idiosyncratic test: we could do this on a larger scale with reasonable rigor were there buy-in by relevant faculty. All that is needed then is an agreed upon set of canonical causal problems.

Cyrus,

We have a deal!

I like your proposal to create a large scale database of

canonical causal problems that the causal inference community

agrees represents what students need to know in this area.

(BTW, have a look at the criteria for submitting nominations

for the causality education prize, and check if it does not

meet your expectations)

I am glad you are already assigning my toy problems to your

class, and I accept your condition in the bargain.

(“to assign to a comparable class the same plus something along

the lines of the LATE result, say, with premises articulated

in potential outcomes”). This would probably be easier for me,

because my students are equally conversant in both languages

and, as a matter of fact, the LATE theorem has

been assigned as a homework in my causal inference class

in the past 15 years. (See

http://bayes.cs.ucla.edu/BOOK-2K/viewgraphs.html

Week 7, Homework 3).

Two remarks before we embark on this exciting experiment.

You say: “There is nothing exotic about what one learns

from AP that would prevent one from doing so [ie, apply

probability and counterfactuals to solve the problems]”.

I agree; the obstacles surface not in what AP teach but

in what they do not teach, namely, two indispensible tools

of causal inference: 1. How to read counterfactuals and

ignorability conditions in a given NPSEM model and, (2) how

identify the testable implications of a given NPSEM.

And, as I wrote recently, the neglect is not accidental

but cultural.

“.. the PO framework has also spawned an ideological

movement that resists this symbiosis and discourages its

faithfuls from using SCM or its graphical representation.

This ideological movement (which I call

“arrow-phobic”) can be recognized by a total avoidance

of causal diagrams or structural equations in research

papers, and an exclusive use of “ignorability” type

notation for expressing the assumptions that (must) underlie

causal inference studies. For example, causal diagrams are

meticulously excluded from the writings of Rubin, Holland,

Rosenbaum, Angrist, Imbens, and their students who, by and

large, are totally unaware of the inferential and

representational powers of diagrams.

(See http://www.mii.ucla.edu/causality/?p=554 for full text

of my position on the PO and SCM frameworks)

Lastly, if we are going to collaborate,

I must ask you to refrain from using disrespectful

adjectives such as “silly” (as in your “.. in a manner that

resembles the silly reference of an arithmetics that forbids

multiplication”). I do not use analogies

lightly. And the analogy to arithmetics was

chosen carefully, to represent the cultural prohibition

that the PO camp imposes on its faithfuls. Quoting

again from my blog piece, I wrote:

———————–

“The arrow-phobic exclusion can be compared to a prohibition

against the use of “multiplication” in arithmetics.

Formally, it is harmless, because one can always replace

multiplication with addition (e.g., adding a number to

itself n times). Yet practically, those who shun

multiplication will not get very far in science.

The rejection of graphs and structural models leaves

investigators with no process-model guidance and, not

surprisingly, it has resulted in a number of blunders which

the PO community is not very proud of.

————————

Do we have a deal?

Judea

Judea (if I may),

I am replying above you as we seem to have exhausted the nested “reply-to” levels available.

Here’s how I am coming to see the experiment : establish a set of canonical causal problems, and let students’ attempts to solve them shed light on the relative merits of potential outcomes vs graphical or NPSEM analytical tools for different types of problems. It will be good to have this set of problems for pedagogical purposes. Others can benefit from it too.

I expect we will find that there are comparative advantages and disadvantages in each. Whether one can fully integrate the other is a question though.

In my own work, I switch freely between analytical approaches, appreciating the comparative advantages.

It seems you do too: I note for example that your assignment related to LATE (to which you link below) has students first recast the IV problem in terms of potential outcomes and then discover the LATE result. This is about as clear a case as one might hope of a shift of analytical frameworks allowing one to uncover new and profound insights previously hidden from view. I hope you acknowledge this, and the broader class of principle stratum results, as being a major accomplishment for those working with the potential outcomes analytical framework. And this is an even less fundamental accomplishment than what those working with potential outcomes have done to provide a coherent foundation for robust estimation and inference (after all identification is just the very start of the process).

Having had the chance to have this more elaborate exchange (and I am grateful for your participation and humor, even despite using phrases like “silly”!), my more refined take on your critique of AP is that they do too little to help students understand from where identification might come beyond randomized experiments or striking natural experiments. I am not sure this is disservice or oversight, but quite possibly a very mindful neglect.

Cyrus,

I am glad you propose to start with a list of canonical problems, and let students

choose whatever combination of techniques they deem useful to get them solved.

I will let you take the first shot, because my definition of a “problem” may not

be the same as yours — for me, a problem must start with a story that everyone understands.

My book is full with those, but I know that “stories”, in some very respectable circles,

are mocked as “toy-like” and are immediately replaced with numerical tables of statistical data.

So, I am anxious to see an example of a “problem definition”.

As to your comments on the drawbacks and achievements of the PO framework,

I suspect you did not read the end of my blog post, where I mention three

embarrassing blunders that PO researchers fell into, having to operate in the

darkness of the “missing data” black box. I will copy that portion below. Note

that I count the “principal strata framework” (not the concept) as one of those blunders,

and I explain why. Here it is:

—————————start of quote ————

The rejection of graphs and structural models leaves investigators with no process-model guidance and, not surprisingly, it has resulted in a number of blunders which the PO community is not very proud of.

One such blunder is Rosenbaum (2002) and Rubin’s (2007) declaration that “there is no reason to avoid adjustment for a variable describing subjects before treatment”

http://www.cs.ucla.edu/~kaoru/r348.pdf

Another is Hirano and Imbens’ (2001) method of covariate selection, which prefers bias-amplifying variables in the propensity score.

http://ftp.cs.ucla.edu/pub/stat_ser/r356.pdf

The third is the use of ‘principal stratification’ to assess direct and indirect effects in mediation problems. which leads to paradoxical and unintended results.

http://ftp.cs.ucla.edu/pub/stat_ser/r382.pdf

In summary, the PO framework offers a useful analytical tool (i.e.. an algebra of counterfactuals) when used in the context of a symbiotic SCM analysis. It may be harmful however when used as an exclusive and restrictive subculture that discourages the use of process-based tools and insights.

Additional background and technical details on the PO vs. SCM tradeoffs can be found in Section 4 of a tutorial paper (Statistics Surveys)

http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf

and in a book chapter on the Eight Myths of SEM:

http://ftp.cs.ucla.edu/pub/stat_ser/r393.pdf

Readers might also find it instructive to compare how the two paradigms frame and solve a specific problem from start to end. This comparison is given in Causality (Pearl 2009) pages 81-88, 232-234.

————————-end of quote ——————-

Please note the last remark, which leads you to an example of a “causal problem” solved

in the two frameworks, starting with a “story” and ending with an estimate.

I think it is the only such example in the literature, but you may surprise me.

I like your “mindful neglect” excuse for PO’s blunders.

I would not be so forgiving. My 20 years experience with many

of its researchers leads me to a different characterization: “mindful resistance.”

by which I mean mindful resistance to invest the 4 minutes it takes to learn

the multiplication table. (And I choose my analogies carefully).

Looking forward to your first causal example.

Reply to all discussants,

I hear many voices who agree that statistics education needs a shot

of relevancy, and that causality is one area where statistics education has

stifled intuition and creativity.

I therefore encourage you to submit nominations for the causality in statistics

prize, as described in http://www.amstat.org/education/causalityprize/

and http://magazine.amstat.org/blog/2012/11/01/pearl/

Please note that the criteria for the prize do not require fancy formal methods;

they are problem-solving oriented. The aim is to build on the natural intuition that students

bring with them, and leverage it with elementary mathematical tools so that they

can solve simple problems with comfort and confidence (not like their professors).

The only skills they need to acquire is: (1) articulate the question, (2) specify

the assumptions needed to answer it (3) determine if the assumptions have testable

implications.

The reasons we cannot totally dispose of mathematical tools are: (1) scientists have local

intuitions about different parts of a problem and only mathematics can put them all together

coherently, (2) eventually, these intuitions will need to be combined with data to come up

with assessments of strengths and magnitutes (e.g., of effects). We do not know how to

combine data with intuition in any other way, except through mathematics.

Recall, Pythagoras theorem served to amplify, not stifle intuitions of ancient geometers.

This post is related and something I, as someone whose work sometimes involves statistics and causality, would be interested to hear Andrew and others respond too. Is this a legitimate or making a fuss about nothing?

http://wmbriggs.com/blog/?p=6804

Chrisare,

Thanks for bringing this post to my attention. No,

the post is not just making fuss about nothing; it reflects

the prevailing thinking among many mainstream analysts,

(perhaps not represented on this blog).

William Briggs, the blog master, says that

“The equation Y = beta x + epsilon is WRONG,”

“and in a sad way, too.”

Whereas Paul Holland wrote in 1995:

“The only meaning I have ever determined for such an

equation is that it is a shorthand way of describing the

conditional distribution of Y given X.”

Briggs goes further and states that the equation is plainly

WRONG, and that the only correct way of writing what the equation

means is to specify the full-blown bi-variate distribution

of X and Y.

It would probably come as a shock to Briggs, Holland and

other analysts to know that, since Haavelmo (1943), economists

have taken the structural equation Y = beta x + epsilon

to mean something totally different, and

that it has nothing to do with the distribution of X and Y.

And I literally mean NOTHING; structural equations are distinct

mathematical objects that convey totally different information

about the population and, in general, they do not even constrain

the regression equation describing the same population.

Well, you said you would be interested to hear Andrew and

others respond — I join you in interest.

Andrew (and others), can you contribute a thought or two?

I am curious to know if Haavelmo’s distinction

is common knowledge, or comes as a surprise to readers

of this blog.

Judea:

I don’t usually get much out of those old-style theoretical papers but I know that some people (including you and Rubin, each in your own way) do, and I respect the search for intellectual antecedents to current work. As I recall, a key difference between the regression notation used in statistics and econometrics is that statisticians tend to model the data while econometricians model the underlying phenomenon. Thus, for example, in a simple regression model the economist will talk about the assumption that the error is independent of the predictors, whereas statisticians think of that as part of the model specification and not a substantive testable assumption. In my opinion, many of these notational tangles become more understandable with multilevel models, because with multilevel modeling you’re not simply giving a distribution to data, you’re modeling underlying parameters. This brings the statistical approach closer to the economics approach in which latent variables are often in mind.

P.S. As a statistical educator, I appreciate your generosity in endowing this prize.

(Trying to reply, but the system says: duplicate)

Andrew,

You hit it right on the nail: “statisticians tend to model the data

while econometricians model the underlying phenomenon.”

But this cleavage is far from being a topic of “old-style

theoretical papers” or “intellectual antecedents to current

work”; it is a major impediment to current work.

Given this cleavage, we can understand the bewilderment

of economists (like Heckman and Leamer) who read statistical

papers and say: This is nonsense, all they do is modeling

the data”. It is also easy to understand the bewilderment

of statistics-trained analysts (like Holland and Rubin

and Imbens) who read econometrics paper and say: “This is

nonsense, all they do is regression, not causation”

Bewilderment aside, we can also understand the agony of

econometrics students having textbooks which can’t decide

which side they are on, data or underlying phenomenon.

And, speaking symmetrically, we can also understand the agony of

statistics students growing up on textbooks that never even

mention the existence of a phenomenon underlying the data.

But instead of bemoaning the current state of education,

I would like to educate myself by

your remark about multilevel modeling, in which “you’re

not simply giving a distribution to data, your modeling

underlying parameters.”

Here is my question. Assume you find an economist who

writes down a bunch of structural equations, among them

Y = beta x + epsilon, and goes about his/her usual routine

of identifying and estimating beta, etc..

(Recall, by writing down Y= beta x +eps. he/she assumes a

fixed causal effect, beta, for every individual in the population).

How would you advise him/her to change his/her routine

if he/she wants to incorporate some “multilevel modeling”

techniques, without changing his/her substantive assumption

about the economy?

What would he/she do differently?

Judea: Thanks for your comments and especially this one –

“that they can solve simple problems with comfort and confidence (not like their professors)”

[…] interesting points concerning the teaching of causality in econometric and statistics classes (link here). I responded to some of the discussants and, below, I share my replies with readers of this […]