I wanna be ablated

[cat picture]

Mark Dooris writes:

I am senior staff cardiologist from Australia. I attach a paper that was presented at our journal club some time ago. It concerned me at the time. I send it as I suspect you collect similar papers. You may indeed already be aware of this paper. I raised my concerns about the “too good to be true” and plethora of “p-values” all in support of desired hypothesis. I was decried as a naysayer and some individuals wanted to set their own clinics on the basis of the study (which may have been ok if it was structured as a replication prospective randomized clinical trial).

I would value your views on the statistical methods and the results…it is somewhat pleasing: fat bad…lose fat good and may even be true in some specific sense but please look at the number of comparisons, which exceed the number of patients and how they are almost perfectly consistent with an amazing dose response esp structural changes.

I am not at all asserting there is fraud I am just pointing out how anomalous this is. Perhaps it is most likely that many of these tests were inevitably unable to be blinded…losing 20 kg would be an obvious finding in imaging. Many of the claimed detected differences in echocardiography seem to exceed the precision of the test (a test which has greater uncertainty in measurements in the obese patients). Certainly the blood parameters may be real but there has been accounting for multiple comparisons.

PS: I do not know, work with or have any relationship with the authors. I am an interventional cardiologist (please don’t hold that against me) and not an electrophysiologist.

The paper that he sent is called “Long-Term Effect of Goal-Directed Weight Management in an Atrial Fibrillation Cohort: A Long-Term Follow-Up Study (LEGACY),” it’s by Rajeev K. Pathak, Melissa E. Middeldorp, Megan Meredith, Abhinav B. Mehta, Rajiv Mahajan, Christopher X. Wong, Darragh Twomey, Adrian D. Elliott, Jonathan M. Kalman, Walter P. Abhayaratna, Dennis H. Lau, and Prashanthan Sanders, and it appeared in 2015 in the Journal of the American College of Cardiology.

The topic of atrial fibrillation concerns me personally! But my body mass index is less than 27 so I don’t seem to be in the target population for this study.

Anyway, I did take a look. The study in question was observational: they divided the patients into three groups, not based on treatments that had been applied, but based on weight loss (>=10%, 3-9%, <3%; all patients had been counseled to try to lose weight). As Dooris writes, the results seem almost too good to be true: For all five of their outcomes (atrial fibrillation frequency, duration, episode severity, symptom subscale, and global well-being), there is a clean monotonic stepping down from group 1 to group 2 to group 3. I guess maybe the symptom subscale and the global well-being measure are combinations of the first three outcomes? So maybe it’s just three measures, not five, that are showing such clean trends. All the measures show huge improvements from baseline to follow-up in all groups, which I guess just demonstrates that the patients were improving in any case. Anyway, I don’t really know what to make of all this but I thought I’d share it with you.

P.S. Dooris adds:

I must admit to feeling embarrassed for my, perhaps, premature and excessive skepticism. I read the comments with interest.

I am sorry to read that you have some personal connection to atrial fibrillation but hope that you have made (a no doubt informed) choice with respect to management. It is an “exciting” time with respect to management options. I am not giving unsolicited advice (and as I have expressed I am just a “plumber” not an “electrician”).
I remain skeptical about the effect size and the complete uniformity of the findings consistent with the hypothesis that weight loss is associated with reduced symptoms of AF, reduced burden of AF, detectable structural changes on echocardiography and uniformly positive effects on lipid profile.
I want to be clear:
  • I find the hypothesis plausible
  • I find the implications consistent with my pre-conceptions and my current advice (this does not mean they are true or based on compelling evidence)
  • The plausibility (for me) arises from
    • there are relatively small studies and meta-analyses that suggest weight loss is associated with “beneficial” effects on blood pressure and lipids. However, the effects are variable. There seems to be differences between genders and differences between methods of weight loss. The effect size is generally smaller than in the LEGACY trial
    • there is evidence of cardiac structural changes: increase chamber size, wall thickness and abnormal diastolic function and some studies suggest that the changes are reversible, perhaps the most change in patients with diastolic dysfunction. I note perhaps the largest change detected with weight loss is reduction in epicardial fat. Some cardiac MRI studies (which have better resolution) have supported this
    • there is electrophysiological data  in suggesting differences in electrophysiological properties in patients with atrial fibrillation related to obesity
  • What concerned me about the paper was the apparent homogeneity of this particular population that seemed to allow the detection of such a strong and consistent relationship.  This seemed “too good to be true”.  I think it does not show the variability I would have expected:
    • gender
    • degree of diastolic dysfunction
    • smoking
    • what other changes during the period were measured?: medication, alcohol etc
    • treatment interaction: I find it difficult to work out who got ablated, how many attempts. Are the differences more related to successful ablations or other factors
    • “blinding”: although the operator may have been blinded to patient category patients with smaller BMI are easier to image and may have less “noisy measurements”. Are the real differences, therefore, smaller than suggested
  • I accept that the authors used repeated measures ANOVA to account for paired/correlated nature of the testing.  However, I do not see the details of the model used.
  • I would have liked to see the differences rather than the means and SD as well as some graphical presentation of the data to see the variability as well as modeling of the relationship between weight loss and effect.
I guess I have not seen a paper where everything works out like you want.  I admit that I should have probably suppressed my disbelief (and waited for replication). What’s the down side? “We got the answer we all want”. “It fits with the general results of other work.” I still feel uneasy not at least asking some questions.
I think as a profession, we medical practitioners  have been guilty of “p-hacking” and over-reacting to small studies with large effect sizes. We have spent too much time in “the garden of forking paths” and believe where have got too after picking throw the noise every apparent signal that suits our preconceptions.  We have wonderful large scale randomized clinical trials that seem to answer narrow but important questions and that is great. However, we still publish a lot of lower quality stuff and promulgate “p-hacking” and related methods to our trainees. I found the Smaldino and McElreath paper timely and instructive (I appreciate you have already seen it).
So, I sent you the email because I felt uneasy (perhaps guilty about my “p-hacking” sins of commission of the past and acceptance of such work of others).


25 thoughts on “I wanna be ablated

  1. weight loss (>=10%, 3-9%, <3%)

    Why do medical researchers/journals insist on turning everything into categories like this? I have still never gotten an answer. Is it really as simple as I suspect: Because that’s a major p-hacking tool?

    • Anoneuoid: Perhaps because they are more familiar with anova than regression? Or perhaps they think the results presented in tables like these will be more understandable to the readers (who might include non-academic physicians as well as academics)?

      • How could a table of anova results be more understandable than a scatterplot of, eg, “% weightloss” vs “atrial fibrillation frequency”? If true, their mindset must be completely foreign to me. That would actually explain a lot about how difficult it is to communicate with them though.

      • I’m in a different field (human-computer interaction), but familiarity with ANOVA is an explanation that rings true to me. Analysis step 1 is bash your face against the data until you can run an ANOVA on it. When all you have is a hammer…

    • MDs love these arbitrary categories — I think partially because they see them in so many papers and professional guidelines, and also because it facilitates them making quick decisions for patient treatment as opposed to thinking more deeply about changes over a continuum. I’d like to think that it is not driven by p-hacking, though I know that happens too.

      I’ve had good success pushing back on this, pointing-out the arbitrariness of the categories, and helping them to interpret results on a continuum and communicate it to others. The important thing is to not just take the data they bring and analyze uncritically.

    • Anon, Carol, Clark:

      From the standpoint of statistical efficiency, it’s a lot better to use 3 categories than 2. David Park and I discuss in this paper, “Splitting a predictor at the upper quarter or third and the lower quarter or third.”

      • My greater concern is arbitrary categorization boundaries. An argument can be made to split at (sometimes approximate) quartiles or tertiles for clarity, likewise sometimes there are informed reasons for particular boundaries. Alternatively, you can model on a continuum and use the model to estimate where on the continuum important differences occur.

    • In clinical settings, especially where rapid decisions are needed, simplified categories are easier to apply. That can save lives, since doctors are frequently overloaded with things they need to have memorized and other complex clinical guidelines already.

      Of course, this should be fixed – and Atul Gawande had written about using checklists to help solve this, in part. But until problem is fixed, simple binary/categorical guidelines are not only ok, but I’d argue better than the more accurate analyses that statisticians would prefer.

    • Maybe because they think the relationships are likely to be non-linear and they have outliers in the weight-loss values that make splines non-starters.

  2. I’d like it to be right, as would the cardiologists. There tends to be less critical scrutiny of results we’d like to believe are correct.

  3. Those increases in “global well-being” seem enormous. Pre and post means on a scale of 1 to 10 for the three groups:
    2.7 -> 8.1; 2.4 -> 6.1; 2.5 -> 5.7.

    • Just looking at the >10% group.

      They went from mean AF frequency 7 to AF frequency 3 (this is a scale of 1-10) and a duration of 7 to 4.

      A change of that magnitude will make a big difference in one’s quality of life!

      Mean global well being went from 2.7 to 8.1—a delta of 5.4 on a scale with maximum delta of 9.

      My global well being jumped about that much when I had successful cardiac ablation for afib.


  4. Aside from any multiple testing concerns, the paper uses some pretty causal language to describe the correlations estimated, and this issue (omitted variables bias / confounding) really doesn’t get any play in the limitations section. I am not the subject matter expert to know whether the factors that lead to weight loss might be independently associated with the outcomes they are interested in, but seems plausible.

    • The causal language in the article seems pretty aggressive. It claims or implicitly assumes one-way causation from weight-loss to 22 measures of health including Atrial Fibrillation (AF). It looks like their Table 2 could be restructured to make any of the 22 health measures the independent variable and to “show” that any of the 22 health measures “causes” improvements in the other health measures.

      My intuition (unsupported by any medical training) is that AF improvement is the primary driver of weight loss, i.e., that causation runs in the opposite direction to that assumed by the authors. It seems reasonable that when heart problems go away, subjects tend to lose weight because they are more active, they eat better, and their metabolism is healthier than when they were sick.

      Particularly noteworthy is that ALL health measures, including AF symptoms, improved for all the weight loss groups, including the group that lost little or no weight: Global Well-Being went from 2.5 to 5.7 for the <3% Weight-Loss Group. This suggests there is substantial mean reversion in AF symptoms that is independent of weight loss.

      A less aggressive description of the results would be pretty boring, though. It might be something like this:
      “We studied 355 subjects with Atrial Fibrillation symptoms over a period of 4 years. On average, subjects’ health improved markedly on 22 measures of health over the 4 year period. Health measure included …. Subjects’ health improvement was positively correlated with weight loss, although health improved substantially for all weight-loss groups, even for the group that lost little or no weight.”

      The statistical results are also astonishingly strong. Every one of the 22 p-value in the rightmost column of Table 2 is <0.001. It is particularly startling that each of the 4 measures of cholesterol changes significantly, and in the “better” direction. Further, every single one of the 22 health measures (except one) shows monotonic improvement across the 3 weight loss groups.

  5. So the major confounder is that the type of people that can lose weight easily are also the type of people who are generally healthier and who will have better health outcomes than those people who can not lose weight easily. No surprise there at all. Also, no causal relationship between this specific intervention and the outcome of better health.

  6. I liked the study or, rather it fit with my prior that Afib tends to be a progressive condition, like other heart ills, and that it will tend to worsen without intervention (and sometimes despite that) and among the most sensible interventions are generally diet and exercise. I particularly noted the section on cardiac improvements; meaningful weight loss reduced ventricle volume and reduced the thickening of the the interior heart wall, which are both very good things. The correlative model is fairly compelling and I think it’s case is made stronger by the relative good effects tending to disappear and even to reverse as weight remained relatively stable (especially when fluctuation is involved) because this suggests there is an underlying condition which manifests as Afib AND that maintaining or fluctuating around typical weight may allow the condition to progress. Beyond that, I’d have a million questions: does reason for BMI matter (the old Arnold is obese quandary)? Is exercise without weight loss better? Worse? How much does fluctuation matter – and does it only matter when it …? And so on. And if you are fit, does weight loss make the condition better or worse or does it not matter? And again exercise is …?

    As for me, I want a condition in which the prescription is to relax and enjoy myself and then see whether that motivates me to improve my condition. Instead, I get this bleep.

  7. I have no idea what my BMI is, so I just entered my height and weight into one of those online calculators. I was quite surprised to see that BMI is based on height and weight only. Why is sex not considered? Or age??

    • BMI stands for Body Mass Index. It is the ratio of body mass by height (adjusted to give standard units). It is intended as a rough measure of degree of obesity or un underweight. Medical decisions also need to consider other relevant factors, depending on the purpose. These factors might include sex, age, and body type (e.g., fat distribution, muscle mass)

    • BMI is a “first cut” variable based on height and weight measures that can be easily obtained in a doctor’s office or asked on a survey. They are even on my driver’s license. The limitations are well known (e.g. Michael Jordan was not really obese in his playing days, just very well muscled). But ease and reliability of measurement (where weight is not self-reported) make the measure useful.

Comments are closed.