Jeff Walker writes:

Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. I would love for you to expand this thought in a blog. Specifically:

Does a measured association “suggest” a causal relationship? Are measured associations a good and efficient way to narrow the field of things that should be studied? Of all the things we should pursue, should we start with the stuff that has some largish measured association? Certainly many associations are not directly causal but due to joint association. Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. But I’d also like to know if one is more likely to find a large causal effect given some association, so the association makes a good “working hypothesis”. I hope I’ve asked these questions clearly enough. Effectively I’m asking, are observational studies worth the time and effort or would we be better to limit ourselves to experimental systems?

My response:

I like Don Rubin’s take on this, which is that if you want to go from association to causation, state very clearly what the assumptions are for this step to work. The clear statement of these assumptions can be helpful in moving forward (here’s an example from my own work, with Gary King).

Another way to say this is that all inference is about generalizing from sample to population, to predicting the outcomes of hypothetical interventions on new cases. You can’t escape the leap of generalization. Even a perfectly clean randomized experiment is typically of interest only to the extent that it generalizes to new people not included in the original study.

What if you’re trying to find causal explanations for events for which you have data on the entire population? That is, say you have all the data from a social tagging system, and you want to find causal effects regarding social influence. There might be dimensions of data you are missing, but of the type of data you have, you have all that you can. There are still, presumably, useful statistical analyses and ultimately inferences you can draw about your specific system without generalizing, no? You can still make generalizations, but if you’re looking only for specific information, wouldn’t there be more efficient methods of getting what you’re looking for without interpo/extrapolation? Or would you say that A) hidden variables are still being guessed, and thus generalization is necessary, or B) once you have full population, you’re beyond the realm of what should be called statistics?

Unless I’m misunderstanding the example, you’d likely be interested in a model that would also accommodate for future data points, or if anybody else joined the network.

Or you’re truly only interested in the relations in the present data set, in which case, I guess you don’t need to find the confidence intervals of your coefficients, but just the coefficient, as a descriptive, not inferential term.

I’m a historian, so I’m actually more interested in things that happened, rather than how things happen in general, so my example was actually a stand-in for data from hundreds of years ago for which there would be no more data points.

I am trying to make inferences about cause, but not about missing variables (unless we call the more complex interactions hidden variables in some sort of model), so I don’t think it’s accurate to say any analysis I run is purely descriptive.

But with historical data, don’t you assume that you’re seeing a snapshot (sample) and you’re trying to generalize to the population at that time? If not, what would be an example?

There are plenty of great complete historical sources; the complete transactions of the royal society, member lists, information on all the cahiers de doléances from the French revolution, etc. Also, not all history is ancient! I’m often not trying to generalize about the population, but to retrace a specific series of events.

Jaynes’ chapter 1 considers the proposition A:

“Beethoven and Berlioz never met.”

We can use “probability as extended logic” to estimate P(A). If “all inference is about generalizing from sample to population”, then where is the population here?

The population is the set of possible histories? That is, each history is a sequence of events that is logically/temporarily consistent. Not knowing which one actually did occur, the population which is being generalized over is those that might have occurred.

Neither inductive nor deductive inference need take the form of an inference to a population,nor would inductive inference be “to a probability”. I discuss this a bit on my current blog:

http://errorstatistics.com/2013/08/22/a-critical-look-at-critical-thinking-deduction-and-induction/

I am a little skeptical. I think if you replaced “all” with “most” I could get on board. There are definitely cases where we’re mainly interested in finding out true facts about a fixed single set of objects. For example suppose we examine 100% of the widgets that were delivered to Mr. A and installed for use. We made some basic measurement of each widget’s properties before it was installed (100% of them). We’d now like to estimate how long Mr A’s machine will operate before “something goes wrong”.

Perhaps you wouldn’t call it inference until you’ve observed some failures, and they you’re extrapolating to the failures of the remaining widgets. Prior to any failures, you have “nothing but priors” as there’s no data (actually, there’s a single censored time to failure). But to me it still is “inference” in the sense that we’re finding out what the information we have and our model implies about the world.

I think there are probably even better examples others could come up with where “generalization to a population” isn’t necessarily the best description of the goal.

A classic example is ordering Plato’s books; Plato wrote 7 books, we know which were first and last, but not the ordering of the middle 5. There isn’t a population of books to generalize to; Plato only wrote 7.

But to go about the problem, we can use (notional) populations – of lengths of syllables in sentence-endings in Plato’s writing during the period he wrote each of the 5 books. The data in each book allows us to do inference on parameters describing Plato’s possible writing during each period. The problem of ordering the books is one of making comparative statements about these population’s parameters.

I agree with you that “generalization to a population” isn’t necessarily the best description of the goal, and would add that it’s sometimes not the most natural way to think about how to tackle the problem – but both can be done.

Hardly any inference problems I work on involve sampling from populations.

I looked at the first three pre-prints on your website, and in each case the observations do not appear to be comprehensive, so I would interpret them as sampling from a population. But perhaps you mean something different by ‘population’?

“Does a measured association “suggest” a causal relationship?”

Generally not, because causality in nature has a tendency to be both sparse and connected.

+++++++++++++++++++++++++++

Jeff,

The discussion that followed your questions may not have

been explicit enough about the fact we can do more today than what is

implied by Rubin’s advice:

“if you want to go from association to causation, state very clearly

what the assumptions are for this step to work” .

Many of the questions that you raised, now

have concrete formal underpinning, leading from transparent assumptions to causal

conclusions with provable guaranteed. In particular, your bottom line question

“are observational studies worth the time and effort or would we be better

to limit ourselves to experimental systems?” now enjoys a mathematical

solution in the sense that, for every causal query, we know what assumptions MUST be

defended in order to have this query estimable from observational studies.

This simply means that the line between the estimable and the non-estimable

has been mathematized to the point where your question “are observational studies

worth the time…” can be answered, on a case by case basis, depending on what

causal assumptions you are willing to assert in any given problem.

(Some causal assumptions are always necessary, recalling that (quoting Cartwright)

“no causes in — no causes out”.)

Even the difficult step that Andrew calls “leap of generalization”

has succumbed to mathematical analysis recently, which allows us

to distinguish the generalizable from the non-generalizable

depending on what we are willing to assume about the two populations.

(This analysis goes under the rubric of “transportability”)

So, there is no reason to despair, Science does make some progress some of the time.

Have these advances stayed the focus of only causality researchers or are they being used in actual applied projects at all?

Rahul,

Applications in epidemiology and psychology are abundant.

Here is one I read recently.

http://psycnet.apa.org/journals/met/18/2/137/

But Jeff Walker did not ask “how many people use this

or that method?”. He asked “are observational studies worth the time?”

and, unless I missed it,I did not see this question addressed in the discussion that followed.

So,if you are curious about the same question, you should be

delighted to know that the answer is available.

Thanks!

Thanks! No relation to Jeff’s original question, I was moving on a tangent. Just clarifying.

Apologies if I’m being obtuse here: but that paper (and most of the references in it) seem to be to

Methodspapers. Even if domain specific, still Methods papers (and a good one too).What I was looking for was applied work (on HIV, flu, obesity etc. whatever) that uses these techniques to reach strong (guaranteed? ) causal conclusions that you refer to.

In short, a practitioners using these techniques strictly as a tool rather than teaching people how to use the tool. Any classic papers of this sort you’d point us to?

My personal naive perception (which may be totally wrong!) is that the toolkit developers are racing ahead whereas the target audience is still stuck using age-old observational statistics to weakly hint at casualty.

Do you sense this lukewarm adoption too or is it only my mis-perception?

Rahul,

I can attest to having attended a journal club meeting in which the presenter discussed an applied epidemiology paper that used Pearl’s causal graph framework. (Unfortunately I can’t recall enough detail now to find the reference.) It was clear to me that most of the discussants had never seen that framework before, so I pointed out the key reference for the causal inference framework — one of Pearl’s papers — which, oddly, had been buried in the Discussion section instead of being highlighted in the Introduction.

It’s worth noting that the paper’s statistical inference approach was based entirely on p-values of coefficients in regressions. The causal graph approach provided a framework in which to express the causal assumptions underlying the analysis and gave the mapping from selected regression model to causal interpretation.

Here’s one http://www.sciencedirect.com/science/article/pii/S0013935113001096

Would be interested in opinions on whether it did more good that harm for this particular question.

Rahul:

Perhaps Judea can clarify, but from my understanding, the term “guarantee” is used to imply that there are sets of rules which will guide you to the correct solution (or tell you that the solution doesn’t exist), if YOUR PRIOR CAUSAL ASSUMPTIONS ARE TRUE (the accuracy of his tools are conditional on this assumption). Ofcourse in practice it can be quite a challenge to get all your assumptions right (there could be a lot of uncertainties) and I doubt if there ever will be a perfect solution to that.

No, I do understand that. My point is even under those constraining assumptions, I don’t find many mainstream applied papers use Judea’s sort of formal rules to shape their models.

Again, I could just be reading the wrong papers.

I agree they are not that many but they are there. For example see this one…

http://cid.oxfordjournals.org/content/52/7/941.full

It takes years for a new approach to be widely adopted.

My impression has also been that these methods are not used in applied research. I’ve also seen researchers question whether these methods can be of much use in teasing out causality. See, for example, the comment by Borsboom et al. here.

Are there two questions here: one of correctness [plenty addressed already], and another of interpretation as a path of least resistance to manufacture statements bound for verification

AV: thanks for noticing the two questions. Let me expand: Take the plethora of gene-association studies or epidemiology studies of say diet or lifestyle on disease. These studies cost massive dollars and human time. Are the resulting correlations worth the effort? Do these studies narrow the list of associations that should be studied experimentally? Or, will some big causal effects be missed because they were masked by the association studies? Should we start at the biggest correlations and work down? Or should we include additional information to choose which associations to first study experimentally? But what kind of additional information? Much (most?) of this is also correlational so does this help?

Rahul: I came across a similar phenomenon with the lasso. The lasso and related methods papers are highly, highly cited but in my short search for papers using the lasso (or related) almost all the papers were methods papers. Maybe it was late at night and I missed the application papers but that was my impression.

I don’t know where the mismatch lies. Are the methods guys coming up with techniques that are too cumbersome or are the practitioners just too comfortable to try something new.

Or are some of these things a solution in search of a problem? I am not sure but I’d love to know.

If what you say is a selection problem, then this could be a joke about allocation:

https://twitter.com/Kasparov63/status/393461934088916992

English gives a nice lead by semantic association: selection – sorting – choice; quite a pity that choice axioms & statistics were served in different buildings,’used to forget one by the time I got to the other.

There’s an easy test of whether this is a correct pot hic rationalisation or a pre-olanned positive intervention: did JE deoosit a not somewhere to this effect ahead of time? Preferably a letter deposited with lawyer, but anything, really, written ahead of submission saying “dear JPSP. This is nonsense but yiuvwill have to accept it which will expose the banality of everything else you publish”

I doubt if such a document or even email to a friend exits.

There’s an easy test of whether this is a (correct) post-hoc rationalisation or a pre-planned positive intervention: Did Professor Bem deposit a note somewhere to this effect ahead of time?

Preferably a letter deposited with a lawyer, but anything, really, written ahead of submission saying “Dear JPSP. This is nonsense and I know it, but you will have to accept the paper, and that will expose the banality of everything else you publish”

I doubt if such a document or even email to a friend exits.

(same as above, without phone-typos :-) )

Try once more and you might get the post right. ;)

“Even a perfectly clean randomized experiment is typically of interest only to the extent that it generalizes to new people not included in the original study.” This statement is true for the non-statistical inference of interest to clinicians, but it’s not quite true of the statistical inference warranted by the act of randomization. The statistical inference in a trial applies to the randomization distribution (all allowable randomizations) among those actually randomized, not to new people (of course I’m ignoring loss to follow-up, which muddies the waters). The inference is really about whether the observed between-group differences were due to the experimental treatment (i.e., causal) among those actually randomized. Internal validity is everything, external validity is a stretch.

Of course, inference is *still* to a population, but it’s the finite population of counterfactuals among the experimental units.

What’s the difference between “non-statistical inference” and “statistical-inference”?

Non-statistical inference is based on gut feeling, “expert opinion”. Statistical inference has a basis in probability.

Mark:

What you write is what I was taught (by Rubin) in grad school. But I don’t agree. As I wrote in my post above, in practice even a perfectly clean randomized experiment is typically of interest only to the extent that it generalizes to new people not included in the original study.

Andrew, that’s fine, I’m happy to disagree. But I haven’t yet seen a reasonable argument as to why one should expect inferences from a randomized trial to extend to any larger population. I understand that’s what is desired… I just don’t believe there’s any way to get there.

If there isn’t, what’s the point behind any randomized trial, say on drug design. Absent external validity isn’t it entirely useless. Surely, you aren’t saying we ought to abandon this approach?

Of course not! The point is establishing causality, albeit in a VERY limited population. Oscar Kempthorne has several excellent (seemingly forgotten) papers in this regard… a good one is his 1977 paper in J. of Statistical Planning and Inference, “Why Randomize?” Also, here’s one of my favorite papers of all time: http://www.ncbi.nlm.nih.gov/pubmed/12413233

Internal validity is truly all that I care about when it comes to assessing whether a treatment has worked for some people. Will it work for you or will it work for me? Who knows, try it and find out… but accept the associated risks.

It will not work even within subgroups in which randomization was done. Group causation is not necessarily equal to individual causation.

My humble contribution to this debate http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2304970

Nice paper Fernando and a good demonstration of how causal effects can be transported across populations with differing characteristics.

@GK

Many thanks!

Pingback: On the difficulty of generalizing from sample to population (wonkish) | LARS P. SYLL