Sander Greenland recently published a paper with a very clear and thoughtful exposition on why causality, logic and context need full consideration in any statistical analysis, even strictly descriptive or predictive analysis.

For instance, in the concluding section – “Statistical science (as opposed to mathematical statistics) involves far more than data – it requires realistic causal models for the generation of that data and the deduction of their empirical consequences. Evaluating the realism of those models in turn requires immersion in the subject matter (context) under study.”

Now, when I was reading the paper I started to think how these three ingredients are or should be included in most or all fake data simulation. Whether one is simulating fake data for a randomized experiment or a non-randomized comparative study, the simulations need to adequately represent the likely underlying realities of the actual study. Only have to add simulation to this excerpt from the paper “[Simulation] must deal with causation if it is to represent adequately the underlying reality of how we came to observe what was seen – that is, the causal network leading to the data”. For instance, it is obvious that sex is determined before treatment assignment or selection (and should be in the simulations), but some features may not be so obvious.

Once someone offered me a proof that the simulated censored survival times they generated where the censoring time was set before the survival time (or some weird variation on that) would be meet the definition of non-informative censoring. Perhaps there was a flaw in the proof, but the assessed properties of repeated trials we wanted to understand, were noticeably different than when survival times were first generated and then censoring times generated and then applied. In that way, simulations likely better reflect the underlying reality as we understand it. And others (including future selves) more likely to raise criticisms about this.

So I then worried about how clear I had been in my seminars and talks on using fake data simulation to better understand statistical inference, both frequentist and Bayes. At first, I thought I had, but on further thought I am not so sure. One possibly misleading footnote on the bootstrap and cross-validation I gave likely needs revision, as that did not reflect causation at all.

That is, if one could somehow simulate the correct distribution of the data of the represented reality (model assumptions), but that method did not explicitly involve the same pathways as what lead to the data being observed, it hides what is being represented to be going on. With great transparency comes doubt, doubt acquired with much less difficulty and greater value.

Now as for being clear about this in presentations on fake data simulation, I had been using diagrammatical reasoning to enable non-statisticians and early career statisticians to better grasp statistical reasoning. Statistical reasoning that is, being mostly about what to make of analysis results and how they should change ones thinking and future actions. Arguing that what to make of analysis results should primarily be based on discerning what would repeatedly happen given a (model) representation of how the results came about. Today, such an assessments can be carried out with simulation (fake data simulation) but only if understood as simply discerning what happens in a “realistic” but idealized abstraction (mathematical) or fake (possible) world, that needs to be _transported_ to actual studies in hand.

The footnote was – “As an aside, the Frequentist approaches based on cross-validation and bootstrapping are a form of mathematically degenerate simulation as they use finite populations to mechanically extrapolate to and from representations to realities.”

In an email to a colleague afterwards I explained – “Approaches based on cross-validation and bootstrapping somewhat thoughtlessly define the fake world (model) as either the hold out samples or the data set in hand. For the bootstrap, an automatic and often thoughtless choice of fake world – just the data in hand (but then if say the x were chosen rather than sampled – don’t resample x values.) So there is a model and like you point out usually assumption of iid sampling. So in my mind, they are a form of mathematically degenerate probability as they use finite populations to mechanically extrapolate to and from those representations to realities. But both can be embedded in [more flexible] probability models to be assessed under those (perhaps more realistic) assumptions.”

But the bootstrap and cross-validation completely disregard causality. So doubly degenerate?

The discussion in the concluding section also reminds me of Nelder’s JRSS paper where he discusses how statistical science differs from mathematical statistics or as he preferred to call it, “statistical mathematics”:

“One of our biggest problems is the word `statistics’ itself. We need a new term, and that term should be, I believe, `statistical science’. It is the name of a journal, and it also the title of the new professorship in the University of Cambridge. It shows that statistics belongs with science (and hence technology) and not with mathematics. If the new name is accepted several changes follow.

First ‘applied statistics’ becomes a tautology, for statistics is nothing without its applications. The phrase should be abandoned. It has arisen to distinguish it from ‘mathematical statistics’. However, this is also a misnomer, because it should be ‘statistical mathematics’, as A. C. Aitken entitled his book many years ago.

To make this change does not in any way diminish the importance of mathematics. Mathematics remains the source of our tools, but statistical science is not just a branch of mathematics; it is not a purely deductive system, because it is concerned with quantitative inferences from data obtained from the real world.

Bertrand Russell said `mathematics is a subject in which we do not know what we are talking about, nor do we care whether what we say is true’. As statisticians, we should know what we are talking about and should care that what we say is true, in the sense of agreeing with phenomena in the real world. If we statisticians are to become statistical scientists we must become thoroughly familiar with the processes of science.”

https://www.jstor.org/stable/2681191

Zad – good point. Let me propose a wider scope on the evolution of applied statistics.

1. we certainly have the distinctions made by Nelder that relate to applied statistics and mathematical statistics. Nelder is actually echoing David Cox on this https://pubmed.ncbi.nlm.nih.gov/28756534/

2. the discussion you raise could start by asking a simple question: what is statistical science about? my response is that is it about the generation of information quality. together with Galit Shumeli, we proposed a framework to respond to thia challenge, see https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12007 and ://www.galitshmueli.com/system/files/Kenett%20Shumeli%20JRSSA%202014.pdf

3. i believe this direction for statistical science should join the path of data science. statisticians should play a leading role on that joined path. for a sketch of it see my recent book on The Real Work of Data Science and my Box award lecture https://www.youtube.com/watch?v=gHoeeuuwcPs&list=PLMCuIG3AKGww8SgP0JQGOXqxu2bFThhIS&index=1&t=161s

Finally,

4. A decade ago I sketched something I called a Theory of Applied Statistics to address the issues you mention. This note was heavily discussed and lead to 12 revisions. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2171179

In papers by Robins and colleagues where they simulate to study causal methods, they enforce causal restrictions by sampling sequentially over time branching out from initial (baseline) causes through intermediates on to end effects – that is, simulation based on the g-computation algorithm. The class of ‘causally coherent’ distributions that can be generated this way can be much narrower than the class of all possible joint distributions for the observed variables. In line with your censoring story, that means that if one ignored the causal-sequence restriction, one could inadvertently generate data from a distribution that contradicted basic background information like temporal sequencing and absence of certain effects. Conversely it also displays how the independencies assumed by common methods (such as partial likelihood for fitting the Cox model) can be far too strict given what isn’t known. To get more realistic data-generation simulations, one has to add observation-selection variables (e.g., selection and censoring indicators) to the sequence.

On an interesting related note, one can view Bell’s inequality as a restriction forced by our familiar type of potential-outcomes (counterfactual) causal model; experiments showing its violation thus refute that type of model. See Robins Vanderweele & Gill https://onlinelibrary.wiley.com/doi/full/10.1111/sjos.12089

“Fake data simulation”, or in other words, likelihood, to me is just another name for “prediction”. Causality = potential outcome/ prediction under covariate shift so I would say fake data simulation does play a role there. Indeed we always need to take care of covariate shift in bootstrap and cross-validation anyway.

What does the phrase “covariate shift” mean? Is it just a change in covariates or something weirder?

Yuling:

> “Fake data simulation”, or in other words, likelihood,

The more commonly accepted definition of likelihood is something like the probability of re-observing the same observation as a function of the parameters. So it’s a restricted prediction of just that – the same observation given a point in the parameter space.

> covariate shift in bootstrap and cross-validation

Can you elaborate on this in the bootstrap and cross-validation?

Dunno. I’m with Russell. “The law of causality, I believe, like much that passes muster among philosophers, is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm.”

Andrew-not-Gelman: Thanks for bringing up the Russell quote…

One of my themes is that we should not bow to “great men” of the past because however brilliant they were at times, they also made great errors. I am a huge fan of Russell as a general philosopher. But he had no research experience in “soft sciences” or their statistics, and here he made a huge mistake – not unlike Kelvin at the same time denying the Earth could be billions of years old, or later, Jeffreys denying continents could drift or Fisher’s intransigent pseudoskepticism about cigarettes causing lung cancer.

To err is human, and unfortunately it is as human to repeat errors and invent fallacious rationales for the errors just because venerated authorities made them. We see this today with ongoing defenses of significance tests as truth tests and “confidence intervals” as uncertainty intervals from those who should know better – but never will because they have based their careers on repeating the errors. Sadly, this behavior refutes the notion that science progresses funeral by funeral: Bad ideas and methods seem more like undead zombies that keep rising from the grave to eat the living flesh of science, as you demonstrated by resurrecting Russell’s quote – which needs a stake through its heart.

Speaking of stakes, Russell had no idea of what is at stake today in research or of how modern causal concepts supply tools to address those. He may have been referring to the fact that there is no singular “law of causality” (perhaps apart from those arising from the 2nd law of thermodynamics or from relativistic constraints). There are however causal models which (as Pearl explains) incorporate our qualitative information about time order of events and the mechanisms generating data in order to better predict consequences of actions. It is tragic when statisticians try to avoid learning and teaching these tools, because causal forecasting (prediction of potential data outcomes of different design choices) is as crucial to study design and conduct as it is to analysis and decisions based on results. To reveal assumptions and invite their criticism, such forecasting needs to be done as transparently as possible. Due to their acausal nature, pure probability models do not begin to provide the transparency needed for observational research (which for example is crucial for postmarketing surveillance of drug and device safety) whereas causal graphs can quickly show bias avenues and fallacies that are routinely overlooked in reports of conventional regression analyses (e.g., see https://academic.oup.com/aje/article/177/4/292/147738).

+1. Thanks for sharing your thoughts!

Re: ‘ undead zombies that keep rising from the grave to eat the living flesh of science, as you demonstrated by resurrecting Russell’s quote – which needs a stake through its heart.’

———-Too funny

Sanders – Causality is a powerful generalisation argument. The reason statisticians should be interested in causality is that this enhances the generalizability of the claims derived form their analysis. A related idea is the use of pseudo variables and simple model simulated data to better understand the generalisability properties of your analysis. Given that a pseudo variable is self generated random noise, any effect above it indicates an effect beyond noise.

A related issue is the representation of findings. You first present findings in one way or another and than you should discuss their generalisability. A causality argument certainly does that.

For more on some of this see https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3035070

Thanks Ron. True that causal arguments are at the heart of generalizations. See my response above to Andrew-not-Gelman about how causality is also at the heart of study design, conduct, and analysis including narrow interpretation specific to the study or data source under analysis – even for descriptive surveys such as for voter preferences.

Thus I’m with Pearl in arguing that causality is more fundamental than probability for sound statistical science. The view of probability as a sufficient foundation has been a massive conceptual error in applied statistics, apparently stemming from the missteps of K. Pearson, Russell and other authorities in the early 20th century who were notably writing before modern models for causation had been elaborated and deeper questions about relativistic limits emerged (I have read that, ironically, it was Pearson’s The Grammar of Science that was one of young Einstein’s inspirations to explore those questions). I advise those quoting them to take heed of the resurgence of causal notions not only in soft sciences but also in physics. Here’s some quick PBS coverages of that:

https://www.youtube.com/watch?v=msVuCEs8Ydo

https://www.youtube.com/watch?v=1YFrISfN7jo

Sander:

I think that Don Rubin would agree with you on this. He never framed probability as being fundamental. In his take on things, you should first define what you’re interested in (using latent variables where necessary), and then probability modeling is just a convenient tool for performing statistical inference. Yes, we use probability all that time, but probability is not fundamental. Similarly, we use math all the time and we use computers all the time, but math and computers are not fundamental to scientific inference; they are just very useful tools.

Cool if Don agrees, thanks for pointing that out – I presume you then agree too that we should start off theory for stat with causality and models for it, then derive probabilistic consequences of those models.

That would just leave the sticking point about graphs as useful tools, especially for tracing out bias sources such as that from conditioning on colliders. It seems Don was unable to ‘get’ that back in the exchanges with Schrier, Sjolander and Pearl in Stat Med 2008-2009 when they each pointed out the dangers of tossing all measured treatment predictors in a propensity score; there he thought the objections involved cancelations when it is just the opposite – like confounding, collider bias happens unless there are perfect cancelations (unfaithfulness). Can we teach an old Don new tricks?

Sander:

My guess is that Don’s position on all of this is best captured in his book with Imbens, which we discussed here a few years ago.

Sanders – The Grammar of Science has been quite controversial. In a paper titled with embedded British humour, David Cox gives a well rounded and sound review of where statistics stands, including a full section dedicated to causality: https://pubmed.ncbi.nlm.nih.gov/28756534/ I had interesting discussions with him on causality, at Oxford, 2 years ago.

Regarding Judea Pearl. His repeated view that statistics has ignored causality is not factual. I am not sure what drives this “anger” but it certainly does not entice constructive discussions. On the pragmatic side, Structural Causal Models have not been used to address COVID challenges. You would think that someone would have tried to establish causality in disease contagion data but, unless I missed something, I have not seen it. SCM would have been great to support generalisability (transportability) of findings from area A to area B – a much needed information supporting decision makers.

It seems that causality is addressed by a wide range of options. The challenge is to operationalise it but, this is more complex than the toy examples you find in journals.

Sorry Ron, but my point is not about Pearl. Whatever historical and political mistakes he’s made are irrelevant to my point that for sound analysis we have to delineate what caused the data – e.g., what caused our observations to show various features. SCMs are just one of many, many classes of models in our toolkit; focusing on them misses the general point that applied statistics rests on causality and thus needs to include basic causal ideas and tools in basic training and beyond – whether SCMs are worth including depends on the field and application.

I also think you err grievously in describing the literature:

1) There are plenty of causal contagion models, e.g., look up the work of Halloran, Longini and colleagues.

2) There are now many articles on transporting results using causal models, e.g., search on transportability and authors like Barenboim, Stuart, Cole, Hernan.

3) There are now decades of articles in epidemiologic journals applying modern causal models to real, complex data. Many have Robins as a coauthor so you should be able to find some by going to his online publication listing at Harvard.

> Whatever historical and political mistakes he’s made are irrelevant

Definitely worth guarding against those sorts of things getting in the way of understanding what is of real value.

Sander – the book by Hernán and Robins is indeed one of the best treatments on causality. Also the software developed by Elias and his team works very well https://causalfusion.net/app. My point was that in the COVID19 related long list of publications, I did not see any reference to SCM and generalizability (also called transportability). Did you see any such applications?

No but I am unclear how that relates to the general topic here. Covid epidemiology is not even a year old and is like nothing I’ve ever seen, disastrous from the start thanks to innumerable problems including lack of reliable or consistent survey data and (here in the U.S. at least) appalling politics. Meanwhile media coverage and publication has bordered on chaos. As an example, at the editor’s invitation we wrote the following lament in March and it was not even published until July: https://ajph.aphapublications.org/doi/10.2105/AJPH.2020.305708

Basic issues remain.

Sander – thank you for your response. Three comments/thoughts:

1. The paper you wrote with a very impressive list of co authors does not mention generalisation of findings. What are the implications of a study in Taiwan, to Italy? Can we handle this question methodologically or is t only left to expert opinion inter[retation.

2. The premise of the paper you referred to, and many others, is that statistical analysis leads to claims presented in a certain way. There is no mention of methods to present findings. They can be directional or magnitude related, leading to an S type or M – type error evaluation. Gelman and Carlin introduced this. John Carlin is one of your co-authors and, for some reason, your paper does not bring this up. I wrote about all this in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3035070

3. You would think that the COVID case, with messy data but also with significant stakes, would lend itself to some high level analysis, including attempts at causality assessments. Are kids attending schools driving infection rates? Is the application of ventilators detrimental to health outcomes etc etc. Some of these claims are made. Again, not attempts to establish causality with SCM or other methods seem to have been made. The SEIR models are not actually addressing causality as they only rely on health outcomes and no driving factors. Moreover, they often are using aggregate data which is quite nonsensical when you consider localised patterns. Look for example at the heterogeneity in Italian provinces. Here in Israel, the patterns in various groups like the ultra orthodox are very different from those in other groups. My observation is that applying SEIR at the country level, in Italy or Israel, is not very useful to policy makers and possibly much misleading.

To help assess data driven analysis I have been suggesting checklists tailored to specific application domains. For checklists in industrial statistics see https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3591808 One might want to develop such a checklist in epidmiology..

So, as you write “Basic issues remain”. Would be very interested in getting the inputs of Andrew to all this.

Ron Kennett said,

“My observation is that applying SEIR at the country level, in Italy or Israel, is not very useful to policy makers and possibly much misleading.”

Makes sense to me.

Ron: You seem to want to go far off the present post’s topic (getting causal foundations integrated into basic probability and stat education) into sophisticated issues of modeling for covid-19 research. Given the distance of your questions from basics of education and data collection (at least, I haven’t seen you draw a clear connection), I suggest you might ask Andrew if you can open up a new post/page for your topic.

Hi Ron,

For a paper using SCM to address Covid challenges, see Victor Chernozhukov, Hiroyuki Kasahara, Paul Schrimpf “Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S.” https://www.medrxiv.org/content/10.1101/2020.05.27.20115139v6 .

I’m not associated in any way with this research.

Thank you Sergio. Very useful reference. I had seen the LA times report in https://www.latimes.com/science/story/2020-11-20/face-masks-didnt-stop-coronavirus-spread-in-danish-clinical-trial

Causality involves time. When I hear you describing what you all seek, I hear something that could be aligned with (Jay Forrester’s) field of “system dynamics”: the solving of problems by means of the creation of a generative simulation model using systems of often nonlinear ODEs operating over time. PBPK models and epidemiological models such as an SIR model fit in the category of a system dynamics model. John Sterman’s /Business Dynamics/ is one current and somewhat encyclopedic text in the field.

In doing system dynamics modeling, there is emphasis on model testing that includes both validating whether the model represents the causal structure of the “problem” being addressed as well as how the statistics of the model hold up. Admittedly, many treat such models deterministically, for often the underlying problem that’s being addressed has a strong enough deterministic component that one can learn how to change the model and then the real-world system to eliminate the problem without worrying about its stochastic aspects.

In other cases, the statistics are vital: one may want to determine model parameters based on data, or one may want to optimize the performance of the model to achieve a certain goal which one hopes one can achieve in the real world.

If you’re interested in thinking about making causality explicit in generative models, it might be worth reading at least section 2.5 and chapter 3 of /Business Dynamics/ on the modeling process. It might also be worth reading chapter 21 on model testing to see what system dynamics calls for.

Whether one does system dynamics in the Vensim, STELLA, Powersim, or another classic system dynamics simulator or by simulating ODEs in Stan or MCSim, I think there’s opportunity for system dynamicists and statisticians to learn from each other.

Keith – the reflection of bootstrapping and crossvalidation approaches to the structure of the problem at hand is indeed unexplored territory. Gelman addresses it in https://arxiv.org/abs/1507.04544 when he talks about hierarchical data. My paper with colleagues addresses this in the context of designed experiments with replicates https://onlinelibrary.wiley.com/doi/abs/10.1002/qre.802

The classical fallacy is in the application of crossvalidation in fitting a neural network to data from a designed experiment. The models you fit are all over the place. A better approach is the Bayesian bootstrap suggested by Don Rubin and recently investigated as a fractionally weighted bootstrap by Gotwalt and Meeker https://www.tandfonline.com/doi/abs/10.1080/00031305.2020.1731599?journalCode=utas20

Thanks (always liked the Bayesian bootstrap. By the way when Rob Tibshirani presented the bootstrap in the lab course when he was a post doc – I asked if it was not just the method of moments and something to be wary of. Of course it is, but as Peter Hall explained to me years later, not one of the “valid” ways to express it that people like. (Only?) With great expertise, it can work responsibly.)

( By the way, I once had to review a neural network that predicted outcomes based on < 100 noisy observations :-( )

Hey Keith,

You might be interested in this paper (https://arxiv.org/pdf/1910.09648.pdf) by Max Little and Reham Bedawy (2020). From their introduction:

“

We augment the classical bootstrap resampling method [Efron and Tibshirani, 1994] with information from the causal diagram generating the observational data. This leads to a simple weighted bootstrap which can be used to generate new

data faithful to an interventional distribution of interest. Any standard, complex nonlinear machine

learning predictor can then be applied to the new data to construct interventional predictors, rather

than associational predictors. This method is applicable to most interventional distributions which

can be derived from observational causal models using the rules of do-calculus, according to the general

identification algorithm of Shpitser and Pearl [2008].

We develop several bootstrap algorithms for common causal inference scenarios including general

back-door and front-door deconfounding, tailored to supervised classification or regression machine

learning methods. We demonstrate the effectiveness of this technique for synthetic data and real-world,

practical causal inference problems.

“

Thank you Tim for the pointer. When AI/ML people mention causality they focus on what in the data and algorithm “caused” the algorithmic output. The huskie or wolf, in snowy background, is an example. http://innovation.uci.edu/2017/08/husky-or-wolf-using-a-black-box-learning-model-to-avoid-adoption-errors/

Causality, to statisticians, is something different. The blog by Keith, referring to the paper by Sander, makes this point abundantly clear. The term “causality” seems to open up a new pandora box in that the term is used with different meanings by different disciplines.

Five years ago we published in Nature Communications a note aiming at clarifying the terms: reproducibility, repeatability and replicability https://pubmed.ncbi.nlm.nih.gov/26226358/

Seems that something similar is required for “causality”.

I know that one way causation can be simulated by simulating the casual terms first. But what if two items each have some influence on each other, in some feedback loop? Is this the sort of thing where you do multiple rounds of simulation and hope that it converges?

This is a time-series dynamic model. An outcome can’t influence the past.