Comments on Imbens and Rubin causal inference book

Posted on September 7, 2015 10:32 AM by Andrew

Guido Imbens and Don Rubin recently came out with a book on causal inference. The book’s great (of course I would say that, as I’ve collaborated with both authors) and it’s so popular that I keep having to get new copies because people keep borrowing my copy and not returning it. Imbens and Rubin come from social science and econometrics. Meanwhile, Miguel Hernan and Jamie Robins are finishing up their own book on causal inference, which has more of a biostatistics focus. If you read both these books you’ll be in great shape.

Anyway, rather than reviewing the Imbens and Rubin book, I thought I’d just post the comments I sent on the book to the publisher, when they were asking for reviews of the manuscript, back in 2006.

Comments on table of contents and the 5 sample chapters of Causal Inference in Statistics, by Rubin and Imbens

General Comments

First off, Rubin and Imbens are the leaders in the field of causal inference. Rubin also has an excellent track record, both as a researcher and as a book author. So my overall recommendation is that you publish the book exactly as Rubin and Imbens want it to be. My suggestions are just suggestions, nothing more, and I recommend completely trusting the authors’ judgments about how the book should be. These two authors are such trailblazers in this area that I can only defer to their expertise.

My general comment about the book is that it reminds me of the Little & Rubin book on missing data, in two ways:

1. The book is conceptual, more of a “how does it work” than a “how to”. (I think that a lot of the “how to” is in our forthcoming book (Gelman & Hill) so I hope that the authors can take a look and make some appropriate cross-cites to help out the reader who wants to apply some of these methods.)

2. The book spends a lot of space on methods that I don’t think the authors would generally use or recommend in practice. I’m thinking here of the classical hypothesis-testing framework, especially in Chapters 4,5,6. If it were up to me, I would start right off with the model-based approach and just put the Fisher randomization test in an appendix for interested readers. I think most of the audience will be familiar with regression models, and, to me, that would be a logical place to start, thus moving directly into inference (likelihood or, implicitly, Bayesian inference) for causal estimands. The classical hyp testing framework is fine, but to me it’s more of of historical interest. Could be good in an appendix for the readers who want to see this connection. It seems like a big detour to have this right at the beginning of the book.

My other major comment is that I’d like to see more details on the worked examples, so that they have more of a real feeling and less of the flavor of numerical illustrations.

Specific Comments

Chapter 1

p.2, “potential outcomes”: Perhaps give a couple sentences explaining why you do _not_ use the term “counterfactual”. (See chapter 19, page 30, for an example where you really do use the concept of counterfactual.)

p.7, l. -2: “loose” should be “lose”

p.13: At this point, I made a note that the discussion seems to be going very slowly. I think that much of chapter 1 could be picked up in pace. Perhaps that’s just because I’m familiar with the material–but I wonder if there’s too much discussion here. I think it might go smoother if you’ve worked out some exxamples first.

p.14, “attributes” and “covariates”: Do you ever define these terms? “Covariate” in particular is a key concept for you, so I’d like to see some definition and discussion of the concept. How is it like a “right hand side” variable in a regression?

p.15: this is repeated from the preface.

p.16: of all other versions of causality, I don’t know that you should privilege the relatively obscure “Granger-Sims causality”.

p.18-19: combine tables 1.4-1.5 into one table.

Chapter 5

p.1: It’s funny that you personalize Fisher and Neyman (giving them first names, not just references) but nobody else in the story.

p.2: I’d recommend removing “Notice that”, “Note that”, “It is important that”, etc., from the book in all cases.

p.4: Is it important that it’s in Fresno and Youngstown? I’d remove this detail (and combine the two cities) so as not to distract the reader. Also, you should point out that these are first-graders and give the year of the experiment.

p.6, bottom: Is it really plausible that the treatment would have a constant effect c for all units? On the next page, you quesetion whether the treatment effect should be additive in levels rather than logarithms–but why assume it’s constant at all?

More generally, I don’t like the focus on the so-called exact test–I’d rather see this done using regression modeling.

p.7: The mathematical “Definition” seems out of place and does not fit the rest of the book. Also, why focus on test statistics? In your applied work, you will be estimating parameters, not trying to cleverly design test statistics. This all just seems completely opposite to the Rubin approach to statistics.

p.8: similar question: are you really recommending that researchers spend their time “looking for statistics that have power against intersting alternatives”? Bayes would be spinning in his grave!

p.9: You talk about log of nonnegative variables. You should say positive variables, since you can’t take the log of zero. Nonnegative variables might have to be divided into 2 parts: 0/1 and the positive part. Also, wealth can be negative: lots of people have mortgage and credit card debt.

p.10: I don’t know if you should be talking about rank statistics. See p.252 of Gelman, Carlin, Stern, and Rubin.

p.11: Does the Wilcoxon really have a closed-form distribution? I think it’s an invariant distribution which has been calculated and tabulated

p.12: typos: “Fets’s”, “an FET”. Actually, I think you’d be better off simply using the term “permutation test”.

p.13, Section 5.4.4: Again, are you really recommending that researchers spend their time choosing among test statistics? This is not how you do your applied work.

p.18, Section 5.7: This is another one of these ideas that works in very simple cases but, in general, will not work. You should say this in the book; otherwise people might actually try to use these methods in applied problems.

p.20, data analysis: all this combinatorics is so complicated. During the time it took you to write the section, you could have done the appropriate regression analysis about 10 times! The regression framework would allow you to think about intersting extensions such as interactions, rather than the approach presented in your chapter, which draws the reader into a technical morass of combinatorics.

Similarly, Section 5.9 is hugely complicated, considering that you could solve the same problems nearly trivially using regression.

p.24, section 5.10: Do you really think this approach is “excellent”???

p.26: data should be displayed as scatterplot, not table. (See p.174, 176, 52 of the forthcoming Gelman and Hill book for examples of how to display these data.) Actually, in the discussion you only use the first 6 units, so maybe best to just show these. Also, I seem to recall that the actual experiment had a paired design. This is not reflected in the table.

Tables 5.2,5.3,5.5: combine into a single table.

Table 5.6 should be a graph with treamtne effect on the x-axis.

Table 5.7 should be a graph with number of simulations on the x-axis. Also, I don’t see the advantage of separating Fresno and Youngstown. It’s a distraction.

Table 5.8 should be a graph (see, e.g., p.176 of Gelman & Hill).

Chapter 7

p.1: “Bayes Rule” should be “Bayes’ rule” (or “Bayes’s rule”)

p.4: This will be clearer if you use consistent notation p() for probabilities (rather than p(), f(), and L() for different probabilities). See Gelman, Carlin, Stern, Rubin book.

p.12: Is this integral necessary? In practical calculations, we’re never actually doing an integral. Maybe it’s a bit of a distraction.

p.15, footnote: “Chapters” should be “chapters”.

p.15-…, Section 7.3.2: This is getting really complicated. Is all this matrix algebra and integration really necessary? I’m bothered by the unevenness of the mathematical level. At some points, you have these matrix calculations, at other points (see (7.24) on page 21), you’re spelling out the steps of simple multiplications and additions. What is the math you expect the readers to be using? (You should go into this in the preface.) You also have to decide whether to use an X or a dot for multiplication (compaer to p.23).

p.22, simulation: Some computer code would be nice. See, e.g., Chapters 7 and 8 of the Gelman & Hill book.

p.24: “Notice that” can be removed.

p.29, “de Finetti’s theorem”: Have you discussed this? And is this over-sophisticated for the general level of the book.

p.33, “the Bernstein-Von Mises theorem”: Huh? Have I heard of this one?

p.35: “variance equal to 10,000” should be “standard deviation equal to 100”

p.36: I’d skip the Tanner reference, I don’t think it’s that useful. Also, “Markov-Chain-Monte-Carlo” should be “Markov chain Monte Carlo”. Also, at the beginning of this reference section, you might cite Chapter 7 of Gelman, Carlin, Stern, and Rubin, which develops the Rubin (1978) approach with many examples. This is also relevant for the refs in Section 7.10.

Section 7.11: Can you just remove this? It seems so hard to follow and so messy.

Tables 7.1-7.4: With care, these can be combined into a single table which would make the operations much easier to follow.

Table 7.5: What’s “Freq” doing in the middle of the table? That looks weird. Also, you can spell out the word: there’s space there. Similarly, say “linear”, not “lin”, and explain what is meant by “Cov”. A longer caption would help. And I’d combine the two cities.

Chapter 11

p.4, line 1: Why “less than alpha or greater than 1-alpha”? Might it make sense to use different thresholds at the two ends?

p.4: “highschool” should be “high school”

p.7: If you’re going to recommend histograms, you should recommend scatterplots.

p.8, first displayed equation: This is the logit. You should define it here, partly so you can use it again, partly to connect to what students already know.

p.8, typo: “the an”

p.9, bottom: this is logit^{-1}.

p.14: Here’s where you can use the “logit” notation and simplify the presentation.

p.14, “the potential outcomes are more likely to be linear in the log odds ratio…”: Where does “more likely” come from here?

p.15: use “1” rather than “one”; it’ll be easier to follow.

p.15, “Inspecting the distribution of the differences is generally useful”: Could you supply an example? That would help.

p.20: When discussing this matching, perhaps also look at Ben Hansen’s work.

p.22: You define p_x, then you use p. Are these the same? I’m confused.

p.23, sentence near the top with the word “easy”: This is confusing to me. It would be clearer if stated directly.

p.25: “It is interesting to note”

p.29 etc: These should be graphs. See, for example, p.202 of Gelman and Hill for an example of how to do this compactly. I think it is these comparisons you want, not all the numerical values.

Three pages of histograms following page 35: These should be made smaller, fit on 1 page, and oriented right side up. Also some explanation is needed.

Chapter 14

p.1: you write that “in this chapter we discuss a second approach…namely matching”. But you already discussed matching in Sections 11.4 and 11.5. I’m not syaing you’re repetitive, I’m just saying that you’re introducing something here that you’ve already introduced.

p.5, near bottom: This hyper-mathematical notation looks ugly to me. Can’t you say it in words?

p.8, bottom: “It is important to realize”

p.12: A picture would be helpful here.

p.15, “it will be easier to find good matches if we are matching on only a few covariates.” Not really! If you match only on a few, you’re just ignoring all other potential covariates: not necessarily “good” at all. It should only be better, not worse, to include more covariates.

Related to this, you might mention the work of Hill and McCulloch on using BART (Bayesian additive regression trees) for matching on many variables.

p.20: perhaps mention that this is equivalent to allowing treatment interactions.

More generally, I didn’t see much discussion of interactions in this manuscript. But it’s an important topic (especially given that you’re talking about ATE, LATE, CACE, etc, all of which differ from each other only in the presence of interactions).

p.24: The goal of matching (at least as I understood from Rubin) is to match groups, not individuals. I don’t see this point emphasized here.

p.27: “Notice that”

Throughout this example: Things would be easier to follow if you round off the half-numbers. The material is tough enough without having to tangle with all these decimal places. Just round them off and say in a footnote that you did it for simplicity.

p.28: Have ATE, ATT, and ATC been defined yet?

p.32, top equations: a bit technical. Perhaps clearer to explain in words.

Later on in page 32, you have triple subscripting: Y_kappa_t_i. I suggest putting some of this upstairs as superscripts. Also, I suggest being consistent with treatment/control notation. Are these t/c or 1/0? If you’re using t/c, I’d prefer T/C, since t often is used for “time”. I’m also not thrilled with “tau” as treatment effect. I’d recommend “theta”.

p.38, footnote: Rubin is using complete-case analysis? I’m shocked! I’d think this would be a good “teaching moment” to show people how to do matching even with some missing data.

p.43, “OLS”: perhaps say “full-data” instead. The point is that they use all the data, not that they use least squares.

p.46: Should be a scatterplot, not a table. Also, round off the .5’s.

Tables 14.2-14.7: I’m not thrilled with these, but maybe they’re good for following the details. I’d think about what you could remove from the tables so that they’re still useful but not so overwhelming.

Tables 14.8-14.9: make a graph (see, for example, p.202 of Gelman and Hill)

Table 14.10: What’s the ordering here? Perhaps simpler to have all the ATEs, then all the ATTs, then ATC? Also, it looks like nothing is statistically significant (or even close)! So does this matter at all? Worth discussing, I think.

Table 14.11: a graph would be better (see, for example, p.505 of Gelman and Hill for an example of how to display and compare estimates and se’s). Also, what are the units of “time till first raise”? The tiny coefs suggest you should rescale, perhaps use years, not months, as your scale.

Chapter 19

p.2: A picture with causal arrows could help (or else say why you dislike such pictures).

p.5, typo: “units’s”

p.7, assumption 1: Say “is statistically independent of”. Don’t just use that “perpendicular lines” symbol. This isn’t a math textbook.

p.8,9: What are M and N? I must have missed their definitions.

p.9, assump 2: Say “is statistically independent of”. Don’t just use that “perpendicular lines” symbol. Also, make clear in the formatting where the assumption ends and the discussion beginns.

p.12: “no-one” should be “no one”

p.16: I don’t see the intuition in all this matrix algebra.

p.17, “An alternative is a model-based approach”: I recommend doing the model-based approach first, since that’s what you actually prefer.

p.17, notation: use p() for all densities, rather than the conusing pi(), f(), and L90.

p.19-20: This is getting ugly gain. Maybe it’s necessary, but as a reader I certainly don’t enjoy it! The formulas on p.20, in particular, look like they could be simplified in some intuitive way. Can you also discuss what happens when sum_i Z_i = 0?

p.21, example: Is thisThis seems like a Sutva violation since the assignment to one woman in the village is correlated with the assignment to others?

p.21, “we do not have indicators…”: Can’t you get this info?

p.22, elsewhere: to increase readability, remove the commas in the numbers, for example, “2,385” is “2385”. Also, take the square root of the variance: se’s are more directly interpretable.

p.23,24: It would be better to express as death rate than survival rate ,then you don’t have to work with ugly numbers like 0.9936, you can work with cleaner things like 0.64 percent.

p.24: “Notice that”

p.26: see comment on p.23,24 above

p.32, typos: no space before/after “ITT”

p.32, section 19.9: Perhaps best to put the naive analyses earlier?

p.34: see comment on p.23,24 above

p.40: “onesided” should be “one sided”

p.42, line 4: why “however”?

p.42, since Rubin is the author, it might be more polite to avoid use of the term “Rubin causal model”. You could use the term “potential outcome notation”.

p.42: also include refs from sociology and psychology on structural equation modeling. Even if you don’t like it, say why you don’t. You could cite Sobel’s work in this area. As is, it seems odd that you’re singling out a somewhat obscure idea of Zelen.

I don’t actually know how many of these recommendations they followed. As I wrote, I defer to the authors’ expertise in this area.

In any case, I thought it might amuse you to see this mix of serious and minor comments.

28 thoughts on “Comments on Imbens and Rubin causal inference book”

Fernando on September 7, 2015 11:16 AM at 11:16 am said:

@Andrew

I am glad they did not follow all your suggestions. It is nice to see Fisher, Neyman, regression, and Bayes all together.

Also, you say at various points how this or that could be done with regression quickly and simply. This of course assumes that readers have take STATS 101 and STATS 102.

Put differently, regression is simple once you have invested 2 semesters in building up to it. By contrast I think Fisher exact testing can be taught in 2 weeks.

It also makes people focus more on design, than on specification searches.

Reply ↓
- Andrew on September 7, 2015 11:26 AM at 11:26 am said:
  
  Fernando:
  
  As I wrote in my review, I completely trust the authors’ judgments about how the book should be. So I agree with you that they made the right decision, even if it’s not how I would’ve done it.
  
  Reply ↓
Jack on September 7, 2015 11:46 AM at 11:46 am said:

Why would you post that? I don’t know this just doesn’t feel right, this seems like a private thing, they sent you a version that was not public and you are revealing details about this private version. Sometimes you post stuff on the blog that seems like this, private things that should remain private. I don’t you should post these just to have a higher count of posts.

Reply ↓
- Andrew on September 7, 2015 12:31 PM at 12:31 pm said:
  
  Jack:
  
  Don’t worry, I’m not posting “to have a higher count of posts.” What a ridiculous thought! Do you think somebody is paying me to do this? I’m posting because I thought it could be interesting to some readers to see a bit of the inside of the publication system, to demystify the book-writing process a bit. Also because I have some thoughts on causal inference I wanted to share; recall the name of the blog!
  
  Finally, I doubt the corresponding chapters in the published version of the book are much different than the chapters I was commenting on, so I don’t think I’m revealing any secrets by revealing that they were reporting variances rather than standard deviations on page 22, etc.
  
  Reply ↓
  - QMS on September 8, 2015 10:58 AM at 10:58 am said:
    
    As a younger academic, I’ll say that I am definitely one of those interested readers!
    
    Reply ↓
konrad on September 8, 2015 3:48 PM at 3:48 pm said:

For the opposite perspective, readers may also be interested in the review by Judea Pearl. See item 6 here: https://www.mii.ucla.edu/causality/?p=1578&utm_source=rss&utm_medium=rss&utm_campaign=mid-summer-greeting-from-the-ucla-causality-blog

Reply ↓
- Judea Pearl on September 9, 2015 4:25 AM at 4:25 am said:
  
  Thank you Konrad for pointing out my perspective on Imbens and Rubin’s book on causal inference.
  I am surprised that Andrew, as an editorial adviser, did not alert the authors to the fact that a book
  purporting to “describe the leading analysis methods” in causal inference,
  cannot omit questions such as: control of confounding, model specification,
  model testing, causal mediation, causes-of-effects and more. These are central in any exercise of
  causal analysis, regardless of the framework or approach.
  
  To be more specific, this book provides:
  (1) No guidance on how to select covariates for matching or adjustment.
  (2) No discussion of how to judge the plausibility of “identifying assumptions” in any given scenario, and
  (3) No tools for deciding whether such assumptions have testable implications.
  
  I find a much broader and friendlier coverage of causal inference in books such as
  Morgan and Winship (2014) “Counterfactuals and Causal Inference”.
  and VanderWeele (2015) “Explanation in Causal Inference”.
  
  judea
  
  Reply ↓
  - Andrew on September 9, 2015 9:28 AM at 9:28 am said:
    
    Judea:
    
    I did write that the book is conceptual, more of a “how does it work” than a “how to.” But no book can have everything, which is why I recommend that students also take a look at some other books that are out there.
    
    Reply ↓
    - Rahul on September 10, 2015 4:46 AM at 4:46 am said:
      
      I’ve no dog in this fight, but you did sound remarkably charitable in this case versus how exactingly critical you can be of the smallest flaws in graphs & articles. Kinda atypical.
    - Andrew on September 10, 2015 8:15 AM at 8:15 am said:
      
      Rahul:
      
      I was writing a set of comments for the publisher of a forthcoming book. First, the book had excellent material. Second, it was going to get published anyway. This isn’t like being a referee for a journal article where you have some ability, for better or worse, to insist on changes. When writing a report for a book publisher in this setting, all you can really do is make suggestions. Which I did.
Judea Pearl on September 10, 2015 3:42 AM at 3:42 am said:

Andrew,
If we all agree that the omissions listed are basic to causal inference, then some of us should
confess that the book is critically defective. True, no book can have everything. But a book on
arithmetic that omits ‘addition’, should not be praised as the “Bible” (quoting one reviewer)
with the hope that students will pick up ‘addition’ from other books that are out there.

Wait, I have just discovered a curious economist writing: “seems fishy”….
https://www.econjobrumors.com/topic/imbens-and-rubin-amazon-reviews-seems-fishy

Glad I am not the only one spoiling the choir of “The Emperor’s New Cloths

Reply ↓
- CK on September 10, 2015 11:48 AM at 11:48 am said:
  
  It seems to me that the authors were trying really hard not to cite Judea’s work and that explains the omissions.
  
  Reply ↓
  - Andrew on September 10, 2015 1:02 PM at 1:02 pm said:
    
    CK:
    
    I know the authors and I’m pretty sure they were not “trying really hard” not to cite anything! It’s the opposite: Imbens and Rubin had a lot they wanted to say, and they struggled to fit it all in a single book. The Morgan and Winship book is different as it is intended to be an overview of different approaches. Imbens and Rubin are presenting their approach, which they and others have found useful in many applications. As noted, I recommend that students also read other books with other perspectives.
    
    Judea:
    
    The Imbens and Rubin book is what it is. I think an “Emperor’s new clothes” analogy is ridiculous. You might not like their methods, and that’s fine. But there’s certainly something there. These methods have been used to solve lots of problems in many areas of application. You have your books, they have theirs, and there are also books like Morgan and Winship’s that present multiple perspectives. That’s great, it’s how it should be.
    
    Reply ↓
  - Judea Pearl on September 10, 2015 2:36 PM at 2:36 pm said:
    
    CK
    The reasons why the authors chose to deprive readers of some basics is not what I am talking about
    (I discussed those reasons with Imbens on my blog).
    I am asking why astute reviewers do not have the guts (or understanding) to tell potential readers:
    “This is not causal inference; this is a crippled version of causal inference.!”
    (Arithmetic without addition is not arithmetic)
    
    Why I am concerned?
    Because I can see innocent newcomers to the field asking themselves: Is this what causal inference is about?
    No model? No testable implications? No plausibility checks?
    This is damaging to all of us who labor to make causal inference a science.
    
    Judea
    
    Reply ↓
    - Andrew on September 10, 2015 2:43 PM at 2:43 pm said:
      
      Judea:
      
      The approach of causal inference described by Imbens and Rubin does involve models, testable implications, and plausibility checks.
      
      But, in any case, I’m glad that you’re working on the science of causal inference, and I’m glad that Guido Imbens is working on the science of causal inference, and I’m glad that Don Rubin is working on the science of causal inference, and I’m glad that Angrist and Pischke are working on the science of causal inference, and I’m glad that Hernan and Robins are working on the science of causal inference, and I think there are people out there who are glad that Jennifer and I are working on the science of causal inference.
      
      We’re all laboring to make causal inference a science, we just have different perspectives, and that’s fine, that’s how science proceeds. We build the house and the foundations at the same time. Foundations are important, but I think it’s also important not to take them too seriously, as often a method can be useful for reasons that are not fully understood until later.
Judea Pearl on September 10, 2015 3:36 PM at 3:36 pm said:

Andrew,
Please do not characterize this discussion as a clash between “perspective” or between “approaches”. It is not.
To prove my point, I would like to kindly ask you to go over the book and pull out ONE page to which you can
point and say: “This is a model This is what the model claims about the world. These are the model’s testable implications,
and this is how we can go about checking the plausibility of this model. ”

I believe you will be very disappointed if you take the trouble to search for one such page.
I did.
And, if after this labor you would still think “there’s certainly something there [ in Imbens-Rubin’s book] ” I will
join you in saying: “there’s certainly something there”

But please do not characterize this discussion as a difference between perspective or between methods.
It is a matter of defining and agreeing on the minimal requirements of arithmetic , sorry, of the science of causal inference.

Judea

Reply ↓
- Andrew on September 10, 2015 3:47 PM at 3:47 pm said:
  
  Judea:
  
  For example, p.579 which I happened to find using Amazon’s “look inside” feature. Imbens and Rubin don’t quite use the same notation that I would (I consider their notation a bit old-fashioned, and I prefer how it is done in my book with Jennifer) but it’s a model. As for the testable implications, perhaps they don’t focus on that in their book, but the models of theirs that I like are all implicitly (or explicitly) Bayesian, and we discuss the testing and evaluation of Bayesian models in chapters 6 and 7 of BDA.
  
  For better or for worse, Imbens and Rubin chose to stay as close as possible to a classical statistical framework, and I agree with you (perhaps) that in this classical framework, the checking of models is not given a high priority. But when you consider the larger Bayesian picture, yes, it’s all there. If you’re interested in how to go about checking the plausibility of these models, there are several chapters and many examples in my books on just this topic.
  
  Finally, from a more fundamental viewpoint, the potential outcome notation is all about making claims about the world, and much of Rubin’s research from the 1970s onwards has been on the question of, what aspects of inference can be validated from data and what aspects are purely model-based. In many ways, I’m not a big fan of the econometric terms such as “local average treatment effect,” and I agree with you that concepts such as “missing at random” and “ignorability” can be slippery to pin down—but, in Rubin’s defense, one of his motivations for introducing these ideas is that they are implicit in various classical statistical procedures, which only have direct interpretations under ignorability, etc. Throughout his career, Rubin has worked at making his methods “backward compatible” with classical statistics, and I think that’s a valuable endeavor, even if at times I become impatient with some of the details.
  
  Again, I respect that you don’t find Imbens and Rubin’s framework helpful, and I think it’s good that various researchers are proceeding on different lines to attack these important problems.
  
  Reply ↓
  - Andrew on September 10, 2015 4:49 PM at 4:49 pm said:
    
    P.S. Many of you might find these exchanges exhausting—I know I do!—but I appreciate that Judea is engaging us on the blog, so I like to reply as best as I can.
    
    Reply ↓
    - Judea Pearl on September 11, 2015 6:43 AM at 6:43 am said:
      
      Andrew et al.
      The formulas you found on p. 279 of Imbens and Rubin are merely a statistical model of a conditional distribution,
      they do not define a causal model. But, I agree with you that our discussion is getting exhausting; we seem to have
      different views on what a causal model is, and whether the kind of ignorability assumptions that Imbens-Rubin use
      satisfy the defining criteria of a model.
      
      Still, before we part ways, I thought I should compensate readers of your blog for their patience in enduring this
      exahausting discussion. The best compensation I can think of would be to shift the topic from Imbens-Rubin’s book,
      and share with you what I have found when I asked myself the question: “What is a causal model”.
      (As the Talmud says: “Never miss an opportunity to learn from another person, however low and uneducated”.)
      
      So, what is a causal model?
      1. A “model” is a mathematical object that carries claims about the world.
      2. A “causal model” is one that carries causal claims about the world. For example, “Smoking increases
      the chance of cancer”, or, “My headache is gone because I took an aspirin”.
      3. Aside from carrying claims, a model must also be scrutinizable by scientific judgment of plausibility.
      Why? To rule out ‘smart allecks’ like “The the world is such as to make my inference routine successful.”
      This smart one is a perfect carrier of claims, but not a very useful one, because it is circular; it is not based
      on share scientific knowledge about the world.
      
      4. To make a model “scientifically scrutinizable” its claims must be deducible from a combination of
      (i) data and (ii) scientifically defensible statements about the world.
      5. A canonical example of a “causal model” is the structural equations model (SEM) used in econometrics.
      The model itself makes qualitative statements about the world (e.g., the interest rate in Kamchatka has
      no effect on the traffic in Los Angeles) and, combining it with data (namely, estimating the structural parameters),
      enables the economist to deduce quantitative claims about policy questions of interest.
      6. SEM is only one type of model. There are many alternatives, varying in mathematical form. For example, one can
      take a bunch of properties of an SEM and pose the bunch as a new model. If the logical ramifications of the properties
      selected are identical to those of the original model, the new model would be identical to the original, differing
      only in representation.
      
      7. Whether the new model would be “scientifically scrutinizable” depends on how the model builder stores
      scientific knowledge. If he/she stores such knowledge in terms of the set of properties chosen for representation,
      then all would be fine and dandy. If not, the model may be close to useless. Imagine replacing a picture of a
      chess board position with a collection of its properties, and asking a player: “What’s your next move.”
      In computer science, we have many examples where one representation permits easy solutions while
      another, logically equivalent to the first is provenly intractable.
      
      8. We have now reached the point where I am tempted to compare the representation used in SEM to an equivalent
      one, used in the potential outcome framework. But, since I promised not to discuss Imbens-Rubin’s book, I will
      defer this discussion and trust that interested readers would be curious enough to take a simple problem and
      represent it in two ways: (1) in SEM and (2) in statements of conditional ignorability ,and judge for themselves
      whether the latter passes the scientifically scrutinizable criterion.
      
      9. Hint: several such examples are worked out side by side in my book (page 232-234), my articles,
      and on my blog, and I wish I could say “In Imbens and Rubin’s articles too”. Sorry to disappoint the curious
      reader — not a blip from their side. Such comparisons seem to be off limit in certain quarters, and for a very
      good reason — the results are embarrassing in their clarity. Try it.
      
      Thanks for the chance to share my thoughts.
    - hjk on September 11, 2015 11:15 AM at 11:15 am said:
      
      Though a little exhausting, I’ve found these exchanges quite useful and have even returned to some of the old ones. It’s help me clarify some of my own implicit thoughts causality and even change my mind on some things.
      
      BTW, Judea, what do you think of Dawid’s ‘Beware of the DAG’?
    - Judea Pearl on September 12, 2015 1:37 AM at 1:37 am said:
      
      hji,
      Dawid’s “Beware of the DAG’ makes two major complaints.
      
      1. He does not trust counterfactuals, because they are grounded in determinism.
      My answer is summarized in https://ftp.cs.ucla.edu/pub/stat_ser/r269-reprint.pdf
      and also in several papers where I celebrate the “victory of the logic of counterfactuals”
      e.g., Treating “causes of effects” https://ftp.cs.ucla.edu/pub/stat_ser/r431-reprint.pdf
      or, effect of treatment on the treated, and detecting latent heterogeneity https://ftp.cs.ucla.edu/pub/stat_ser/r406.pdf
      
      2. Dawid would like to load the DAG with decision nodes, as on p. 71 of Causality, to distinguish manipulable from non-manipulable
      variables. I find this to be too messy but, why not, do it if it helps you get things right.
      I prefer to assume that, by default, every variable is manipulable and, if Z is not, refrain from asking for P(Y|do(Z=z)).
      
      The net result of papers like “Beware of the DAG” is that professional objectors point to it and say:
      “You see, even the gurus do not agree on graphical models” and impressionable students do listen.
      
      It is similar to what Rubin and his disciples are doing to their students, though via different tactics:
      “We havn’t found addition to be useful in arithmetic” they say, and Andrew agrees with them under
      the banner of: “Arithmetic is a matter of perspective, this is how science progresses”.
      (The quote is mine, based on Andrew’s posts above).
      
      Glad you asked.
    - hjk on September 13, 2015 6:00 AM at 6:00 am said:
      
      Thanks for your response Judea, I will try to read your response and links carefully.
      
      In meantime, one comment and a question.
      
      Firstly, I should say that I found Dawid’s paper (and some of his others) much more sympathetic to your approach than one might think from the title. The reason for this seems to be explained in the intro. Thus I actually became even more interested in better understanding your work after reading this.
      
      Secondly, I am intrigued by the ideal gas example he gives, perhaps because I once spent far too much time working through axiomatic treatments of thermodynamics. Do you have an example written up where you analyse the ideal gas within your framework?
      
      As a bonus, is it possible to describe the two slit experiment within your framework? Some may consider these esoteric but the ideal gas and two slit experiment are to me canonical examples illustrating how to think about physical science.
    - hjk on September 14, 2015 8:24 AM at 8:24 am said:
      
      Every time I return to the various presentations of causality concepts I find Dawid’s the most understandable and closest to my own intuitions and experience. His approach seems like a (relatively) clear extension to existing ideas that is most clearly compatible with successful theories from the physical sciences.
    - Rahul on September 12, 2015 10:51 AM at 10:51 am said:
      
      @Judea
      
      As a reader, I don’t find these exchanges exhausting at all. In fact, it is great to have your posts around for a different perspective. (If I did have to pick a set of posts that’s exhausting it is the plagiarism themed ones)
      
      It would be kinda sad if you “parted ways” with commenting on Andrew’s blog. At least that’s my personal perspective.
      
      PS. I don’t really understand your DAG methods (in spite of trying) but it is indeed nice to have a divergent perspective. Otherwise at times it becomes a bit of an echo chamber.
  - Keith O'Rourke on September 10, 2015 8:44 PM at 8:44 pm said:
    
    Even if we do agree there is one reality that does not mean we (ever) know the one _right_ way to get to it.
    
    Your [Andrews] references in here https://www.stat.columbia.edu/~gelman/research/unpublished/objectivity13.pdf and references in there make very good arguments for that.
    
    (And of course they go back to CS Peirce)
    
    Maybe more helpful, we need to distinguish between math which does have one answer though many routes to that and induction which never has an answer more than is seems to repeatedly work in practice as far as we can discern?
    
    Reply ↓
    - hjk on September 11, 2015 11:08 AM at 11:08 am said:
      
      > math which does have one answer
      
      I’d say that many important mathematical (inverse) problems are just as ‘ill-posed’ as philosophical/scientific induction! The goal, of course, is to convert them into well-posed problems by adding info or weakening constraints, just as in ordinary induction. Of course, the answer may be ‘at least one solution exists’, which is not always as satisfactory (but can also be useful)
    - Keith O'Rourke on September 14, 2015 8:53 AM at 8:53 am said:
      
      hjk:
      
      OK, once an answer is found it can be verified (easily replicated) by those with adequate mathematical background.
    - hjk on September 14, 2015 9:38 AM at 9:38 am said:
      
      Fair enough, I spose (without wanting to go too far astray anyway).
      
      Just for fun – https://www.jstor.org/stable/2253263
      
      More on topic, I would really like to hear more from Judea re: systems like the ideal gas which have been claimed to be problematic for DAG representations.

Statistical Modeling, Causal Inference, and Social Science

Comments on Imbens and Rubin causal inference book

28 thoughts on “Comments on Imbens and Rubin causal inference book”

Leave a Reply Cancel reply