# Estimating the effect of A on B, and also the effect of B on A

Lei Liu writes:

I am working with clinicians in infectious disease and international health to study the (possible causal) relation between malnutrition and virus infection episodes (e.g., diarrhea) in babies in developing countries.

Basically the clinicians are interested in two questions: does malnutrition cause more diarrhea episodes? does diarrhea lead to malnutrition? The malnutrition status is indicated by height and weight (adjusted, HAZ and WAZ measures) observed every 3 months from birth to 1 year. They also recorded the time of each diarrhea episode during the 1 year follow-up period. They have very solid datasets for analysis.

As you can see, this is almost like a chicken and egg problem. I am a layman to causal inference. The method I use is just to do some simple regression. For example, to study the causal relation from malnutrition to diarrhea episodes, I use binary variable (diarrhea yes/no during months 0-3) as response, and use the HAZ at month 0 as covariate. Similarly we can do for other periods, e.g., use diarrhea yes/no during months 3-6 as outcome, and HAZ at month 3 as covariate.

For the relation from diarrhea episode to malnutrition, I use a linear model to regress HAZ (at 3 month) on the diarrhea yes/no during months 0-3, and so on.

However, I feel this is not adequate. Do you have any suggestions to do the practical analysis? I also think it might be a good topic for statistical methodology development.

My quick thought is to recall the general advice that each causal inference requires its own analysis. So, yes, I think it’s a good idea to fit one model to estimate the effects of malnutrition, and another model to estimate the effects of diarrhea. I think the next step, both conceptually and practically, is to look for two natural experiments, one on the effects of malnutrition and the other on the effects of diarrhea Or maybe view two different aspects of your data as natural experiments, with suitable controlling for pre-treatment variables.

## 3 thoughts on “Estimating the effect of A on B, and also the effect of B on A”

1. I think Systems Dynamics offers the best approach to model such chicken-egg problems. In any fairly complex system, the lines between cause and effect starts blurring and Systems Thinking approach helps in modeling such emergent/dynamics systems accurately.

2. Natural experiment, indeed. Or to put it another way: find an instrument.

(Of course, much easier contemplated then done…)

3. I want to stress your advice: "look for two natural experiments, one on the effects of malnutrition and the other on the effects of diarrhea" as opposed to the very different approach the questioner proposes of new "statistical methodology development." As you suggest, the researchers need to find shocks to the disease environment (e.g., introduction of treated piped water to a community), access to nutrition (e.g., changes in prices of staple foods facing urban populations), etc. Nothing within their dataset will let them separate chickens and eggs.

Next time the researchers plan all the hard work and expense of collecting all those data, hand out soap and dillute chlorine to a random 25% of the sample (along with prizes for using them); deworming pills, peanut butter and micronutrients to a 2nd 25%; and both to a third 25%. The control group perhaps should get a scarf. The interventions cost far less than the data collection, so this experiment might actually be worth running. Then the researchers can start to disentangle the complex causality of diarrheal disease and malnutrition.

Comments are closed.