Lei Liu writes:
I am working with clinicians in infectious disease and international health to study the (possible causal) relation between malnutrition and virus infection episodes (e.g., diarrhea) in babies in developing countries.
Basically the clinicians are interested in two questions: does malnutrition cause more diarrhea episodes? does diarrhea lead to malnutrition? The malnutrition status is indicated by height and weight (adjusted, HAZ and WAZ measures) observed every 3 months from birth to 1 year. They also recorded the time of each diarrhea episode during the 1 year follow-up period. They have very solid datasets for analysis.
As you can see, this is almost like a chicken and egg problem. I am a layman to causal inference. The method I use is just to do some simple regression. For example, to study the causal relation from malnutrition to diarrhea episodes, I use binary variable (diarrhea yes/no during months 0-3) as response, and use the HAZ at month 0 as covariate. Similarly we can do for other periods, e.g., use diarrhea yes/no during months 3-6 as outcome, and HAZ at month 3 as covariate.
For the relation from diarrhea episode to malnutrition, I use a linear model to regress HAZ (at 3 month) on the diarrhea yes/no during months 0-3, and so on.
However, I feel this is not adequate. Do you have any suggestions to do the practical analysis? I also think it might be a good topic for statistical methodology development.
My quick thought is to recall the general advice that each causal inference requires its own analysis. So, yes, I think it’s a good idea to fit one model to estimate the effects of malnutrition, and another model to estimate the effects of diarrhea. I think the next step, both conceptually and practically, is to look for two natural experiments, one on the effects of malnutrition and the other on the effects of diarrhea Or maybe view two different aspects of your data as natural experiments, with suitable controlling for pre-treatment variables.