Our story begins with this article by Sanjay Kaul and George Diamond:
The randomized controlled clinical trial is the gold standard scientific method for the evaluation of diagnostic and treatment interventions. Such trials are cited frequently as the authoritative foundation for evidence-based management policies. Nevertheless, they have a number of limitations that challenge the interpretation of the results. The strength of evidence is often judged by conventional tests that rely heavily on statistical significance. Less attention has been paid to the clinical significance or the practical importance of the treatment effects. One should be cautious that extremely large studies might be more likely to find a formally statistically significant difference for a trivial effect that is not really meaningfully different from the null. Trials often employ composite end points that, although they enable assessment of nonfatal events and improve trial efficiency and statistical precision, entail a number of shortcomings that can potentially undermine the scientific validity of the conclusions drawn from these trials. Finally, clinical trials often employ extensive subgroup analysis. However, lack of attention to proper methods can lead to chance findings that might misinform research and result in suboptimal practice. Accordingly, this review highlights these limitations using numerous examples of published clinical trials and describes ways to overcome these limitations, thereby improving the interpretability of research findings.
This reasonable article reminds me of a number of things that come up repeatedly on this blog and in my work, including the distinction between statistical and practical significance, the importance of interactions, and how much I hate acronyms.
They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up all the time, over and over in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you–that is, good advice that’s easy to apply and which you might not already know–it would be to use transformations. Log, square-root, etc.–yes, all that, but more! I’m talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those “total scores” that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don’t care if the threshold is “clinically relevant” or whatever–just don’t do it. If you gotta discretize, for Christ’s sake break the variable into 3 categories.
This all seems quite obvious but people don’t know about it. What gives? I have a theory, which goes like this. People are trained to run regressions “out of the box,” not touching their data at all. Why? For two reasons:
1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even changing your variable names or converting them from ALL CAPS), then you can’t cheat.
2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you’re learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).
Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.
P.S. On page 426, Kaul and Diamond recommend that, in subgroup analysis, researchers “perform adjustments for multiple comparisons.” I’m ok with that, as long as they include multilevel modeling as such an adjustment. (See here for our discussion of that point.)
P.P.S. Also don’t forget economist James Heckman’s argument, from a completely different direction, as to why randomized experiments should not be considered gold standard. I don’t know if I agree with Heckman’s sentiments (my full thoughts are here), but they’re definitely worth thinking about.