Can talk therapy halve the rate of cancer recurrence? How to think about the statistical significance of this finding? Is it just another example of the garden of forking paths?

James Coyne (who we last encountered in the sad story of Ellen Langer) writes:

I’m writing to you now about another matter about which I hope you will offer an opinion. Here is a critique of a study, as well as the original study that claimed to find an effect of group psychotherapy on time to recurrence and survival of early breast cancer patients. In the critique I note that confidence intervals for the odd ratio of raw events for both death and recurrence have P values between .3 and .5. The authors’ claims are based on dubious adjusted analyses. I’ve tried for a number of years to get the data for reanalysis, but the latest effort ended in the compliance officer for Ohio State University pleading that the data were the investigator’s intellectual property. The response apparently written by the investigator invoked you as a rationale for her analytic decisions. I wonder if you could comment on that.

Here is the author’s invoking of you:

In analyzing the data and writing the manuscript, Andersen et al. (2008) were fully aware of opinions and data regarding the use of covariates. See, for example, a recent discussion (2011) among investigators about this issue and the response of Andrew Gelman, an expert on applied Bayesian data analysis and hierarchical models. Gelman’s (2011) provided positive recommendations for covariate inclusion and are corroborated by studies examining covariate selection and entry, which appeared prior to and now following Gelman’s statement in 2011.

Here’s what Coyne sent me:

“Psychologic Intervention Improves Survival for Breast Cancer Patients: A Randomized Clinical Trial,” a 2008 article by Barbara Andersen, Hae-Chung Yang, William Farrar, Deanna Golden-Kreutz, Charles Emery, Lisa Thornton, Donn Young, and William Carson, which reported that a talk-therapy intervention reduced the risk of breast cancer recurrence and death from breast cancer, with a hazard rate of approximately 50% (that is, the instantaneous risk of recurrence, or of death, at any point was claimed to have been reduced by half).

“Finding What Is Not There: Unwarranted Claims of an Effect of Psychosocial Intervention on Recurrence and Survival,” a 2009 article by Michael Stefanek, Steven Palmer, Brett Thombs, and James Coyne, arguing that the claims in the aforementioned article were implausible on substantive grounds and could be explained by a combination of chance variation and opportunistic statistical analysis.

A report from Ohio State University ruling that Barbara Anderson, the lead researcher on the controversial study, was not required to share her raw data with Stefanek et al., as they had requested so they could perform an independent analysis.

I took a look and replied to Coyne as follows:

1. I noticed this bit in the Ohio State report:

“The data, if disclosed, would reveal pending research ideas and techniques. Consequently, the release of such information would put those using such data for research purposes in a substantial competitive disadvantage as competitors and researchers would have access to the unpublished intellectual property of the University and its faculty and students.”

I see what they’re saying but it still seems a bit creepy to me. Think of it from the point of view of the funders of the study, or the taxpayers, or the tuition-paying students. I can’t imagine that they all care so much about the competitive position of the university (or, as they put it, the “University”).

Also, if given that the article was published in 2008, how could it be that the data could “reveal pending research ideas and techniques” in 2014? I mean, sure, my research goes slowly too, but . . . 6 years???

I read the report you sent me, that has quotes from your comments along with the author’s responses. It looks like the committee did not make a judgment on this? They just seemed to report what you wrote, and what the authors wrote, without comments.

Regarding the more general points about preregistration, I have mixed feelings. On one hand, I agree that, because of the garden of forking paths, it’s hard to know what to make of the p-values that come out of a study that had flexible rules on data collection, multiple endpoints, and the like. On the other hand, I’ve never done a preregistered study myself. So I do feel that if a non-prereigstered study is analyzed _appropriately_, it should be possible to get useful inferences. For example, if there are multiple endpoints, it’s appropriate to analyze all the endpoints, not to just pick one. When a study has a data-dependent stopping rule, the information used in the stopping rule should be included in the analysis. And so on.

On a more specific point, you argues that the study in question used a power analysis that was too optimistic. You perhaps won’t be surprised to hear that I am inclined to believe you on that, given that all the incentives go in the direction of making optimistic assumptions about treatment effects. Looking at the details: “The trial was powered to detect a doubling of time to an endpoint . . . cancer recurrences.” Then in the report when they defend the power analysis, they talk about survival rates but I don’t see anything about time to an endpoint. They then retreat to a retrospective justification, that “we conducted the power analysis based on the best available data sources of the early 1990’s, and multiple funding agencies (DoD, NIH, ACS) evaluated and approved the validity of our study proposal and, most importantly, the power analysis for the trial.” So their defense here is ultimately procedural rather than substantive: Maybe their assumptions were too optimistic, but everyone was optimistic back then. This doesn’t much address the statistical concerns but it is relevant to implications of ethical malfeasance.

Regarding the reference to my work: Yes, I have recommended that, even in a randomized trial, it can make sense to control for relevant background variables. This is actually a continuing area of research in that I think that we should be using informative priors to stabilize these adjustments, to get something more reasonable than would be obtained by simple least squares. I do agree with you that it is appropriate to do an unadjusted analysis as well. Unfortunately researchers do not always realize this.

Regarding some of the details of the regression analysis: the discussion brings up various rules and guidelines, but really it depends on contexts. I agree with the report that it can be ok for the number of adjustment variables to exceed 1/10 of the number of data points. There’s also some discussion of backward elimination of predictors. I agree with you that this is in general a bad idea (and certainly the goal in such a setting should not be “to reach a parsimonious model” as claimed in this report). However, practical adjustment can involve adding and removing variables, and this can sometimes take the form of backward elimination. So it’s hard to say what’s right, just from this discussion. I went into the paper and they wrote, “By using a backward elimination procedure, any covariates with P < .25 with an endpoint remained in the final model for that endpoint.” This indeed is poor practice; regrettably, it may well be standard practice. 2. Now I was curious so I read all of the 2008 paper. I was surprised to hear that psychological intervention improves survival for breast cancer patients. It says that the intervention will “alter health behaviors, and maintain adherence to cancer treatment and care.” Sure, ok, but, still, it’s pretty hard to imagine that this will double the average time to recurrence. Doubling is a lot! Later in the paper they mention “smoking cessation” as one of the goals of the treatment. I assume that smoking cessation would reduce recurrence rates. But I don’t see any data on smoking in the paper, so I don’t know what to do with this. I’m also puzzled because, in their response to your comments, the author or authors say that time-to-recurrence was the unambiguous primary endpoint, but in the abstract they don’t say anything about time-to-recurrence, instead giving proportion of recurrence and survival rates conditional on the time period of the study. Also, the title says Survival, not Time to Recurrence. The estimated effect sizes (an approx 50% reduction in recurrence and 50% recurrence in death) are implausibly large, but of course this is what you get from the statistical significance filter. Given the size of the study, the reported effects would have to be just about this large, else they wouldn’t be statistically significant.

OK, now to the results: “With 11 years median follow-up, disease recurrence had occurred for 62 of 212 (29%) women, 29 in the Intervention arm and 33 in the Assessment–only arm.” Ummm, that’s 29/114 = 0.25 for the intervention group and 33/113 = 29% in the control group, a difference of 4 percentage points. So I don’t see how they can get those dramatic results shown in figure 3. To put it another way, in their dataset, the probability of recurrence-free survival was 75/114 = 66% in the treatment group and 65/113 = 58% in the control group. (Or, if you exclude the people who dropped out of the study, 75/109 = 69% in treatment group and 65/103 = 63% in control group). A 6 or 8 percentage point difference ain’t nothing, but Figure 3 shows much bigger effects. OK, I see, Figure 3 is just showing survival for the first 5 years. But, if differences are so dramatic after 5 years and then reduce in the following years, that’s interesting too. Overall I’m baffled by the way in which this article goes back and forth between different time durations.

3. Now time to read your paper with Stefanek et al. Hmmm, at one point you write, “There were no differences in unadjusted rates of recurrence or survival between the intervention and control groups.” But there were such differences, no? The 4% reported above? I agree that this difference is not statistically significant and can be explained by chance, but I wouldn’t call it “no difference.”

Overall, I am sympathetic with your critique, partly on general grounds and partly because, yes, there are lots of reasonable adjustments that could be done to these data. The authors of the article in question spend lots of time saying that the treatment and control groups are similar on their pre-treatment variables—but then it turns out that the adjustment for pre-treatment variables is necessary for their findings. This does seem like a “garden of forking paths” situation to me. And the response of the author or authors is, sadly, consistent with what I’ve seen in other settings: a high level of defensiveness coupled with a seeming lack of interest in doing anything better.

I am glad that it was possible for you to publish this critique. Sometimes it seems that this sort of criticism faces a high hurdle to reach publication.

I sent the above to Coyne, who added this:

For me it’s a matter of not only scientific integrity, but what we can reasonably tell cancer patients about what will extend their lives. They are vulnerable and predisposed to grab at anything they can, but also to feel responsible when their cancer progresses in the face of information that should be controllable by positive thinking or take advantage of some psychological intervention. I happen to believe in support groups as an opportunity for cancer patients to find support and the rewards of offering support to others in the same predicament. If patients want those experiences, they should go to readily available support groups. However they should not go with the illusion that it is prolonging their life or that not going is shortening it.

I have done a rather extensive and thorough systematic review and analysis of the literature I can find no evidence that in clinical trials in which survival was in a priori outcome, was an advantage found for psychological interventions.

23 thoughts on “Can talk therapy halve the rate of cancer recurrence? How to think about the statistical significance of this finding? Is it just another example of the garden of forking paths?

  1. Wonder if there is a meta-comparison with the Bartels study you criticized (and then semi-apologized for doing so–I know you have to be kind to political scientists (unlike psychologists), but sometimes you need to stick to your guns)–intermediate effects that may be of some interest, but bottom line almost no effects, and an emphasis on the intermediate and an obscuring of the lack of substantive importance.

    • Numeric:

      1. As I’ve written, the Bartels experience was extremely frustrating to me (and I think to Larry as well), but, just to be fair: it wasn’t Bartels’s study; it was Bartels giving an over-the-top interpretation to a study conducted by others, that was consistent with Bartels’s earlier research.

      2. One advantage with the Bartels discussion was that the Erisen thesis was full of intermediate results—indeed, the focus was on intermediate results—so there was lots to look at. The cancer-therapy paper seems to have minimal detail about intermediate outcomes (as noted above, I found nothing about smoking, for example) so it all seems to be speculation.

  2. > A 6 or 8 percentage point difference ain’t nothing, but Figure 3 shows much bigger effects. OK, I see, Figure 3 is just showing survival for the first 5 years.

    What do you mean with “just .. the fist 5 years”? Figure 3 shows the “predicted” effect, which is much bigger than the “unadjusted” effect. That’s the point of the commentary:

    > Simply put, when straightforward analyses produce null effects, the inappropriate use of multivariate statistical analyses can ‘‘find’’ effects that do not really exist.

    A couple of other blog posts by Coyne about that study:

    • Thanks for these links. The first one is a great example of how many things can go wrong in a single study. Also, there are lots of good comments by readers as well as a good discussion by Coyne.

  3. It depends on the jurisdiction, but in many, the principle investigator controls access to their data and there is little that anyone can do.

    I have heard that sometimes, the FDA is only supplied study data after signing a non-disclosure agreement (I think because the study was done outside the US) – OK you get to see what a mess our study _recorded_ data is – but don’t ever think of letting anyone else know.

    Just because people’s lives are literally at stake – principle investigators still have to be protected.

    • Why do the Journals publish without the dataset being posted? The Journals seem to be in the best position to fix this.

      I’m not sure why they do not act.

      • Well, if you have carried out a complex clinical trial, and you have not only intervention and outcome data but many potential mediating variables, a given paper may be only part of the story. Some trials are published as a series of several papers. But posting the data on line would make it very difficult, if not impossible, to go beyond paper #1.

        Now, you could argue that all of the results should be published simultaneously. And I would support that argument. But cost-of-paper conscious journal editors do not.

        That said, when we are talking about the release of data many years after the final publications have appeared, as in the original post, there really is no legitimate reason for the investigators to withhold the data. They may be within their legal rights to do so, but it is ethically dubious.

        • What about non clinical trial data? Journals in general seem to be loathe to take a strong stand about “No data, No publication”.

          I’ve nothing against allowing exceptions where warranted but the default policy ought to require the submission of raw data.

        • Clyde:

          Making it publicly available seems misguided, in business shareholders don’t want full access to all company records nor them made public, rather a chartered accounting firm is asked to audit and make a report. In clinical research, perhaps best would be multiple potential third party qualified researchers who can get full access and are facilitated in making reports (potentially with comments from original study authors) – but without the non-disclosure agreement.

          The problem currently is that no one can get get past “just trust me” from authors unless they volunteer which usually is not in the career interests. With the “just trust me” system its NOT science but speculation (hopefully not too misleading more than x% of the time.

        • You can do that but then why do such people publish in peer reviewed journals at all? I mean either you hide data or you don’t. But I think it doesn’t make sense to call it a peer reviewed paper and hide data simultaneously.

          The non-transparency of data is antithetical to the idea of peer review. If you want to retain some competitive advantage keep it a trade secret or start a start-up to exploit the commercial value of your data. Why go through the charade of a peer review & publication?

  4. Andrew:

    You write that you assume smoking cessasation reduces recurrance.

    I was wondering if this is empirically true or even expected phenomenologically. I thought recurrance was a function of how well you could mop up the original cancer cells. Via chemo, radiation, surgery etc.
    Does recurrence depend significantly on the continued exposure to the agent (smoking, benzene etc. ) that may have precipitated the original cancer?

    Just curious.

    • Rahul:

      I have no expertise in this area. I just had the sense that smoking interferes with the body’s ability to fight cancer, and just from a general impression that smoking interacts with other risk factors.

  5. The idea that the authors don’t have to reveal their data because so doing would place them at a competitive disadvantage is fine for private companies who don’t want others to use what they have found. For academics publishing in scholarly journals, it is antithetical to the scientific mission. I wrote about this (Journal of Money, Credit and Banking, 2008):

    The objective of an embargo is to permit the author to have sole use of the dataset
    that he collected. Even without an embargo, some authors will be hesitant to provide
    data, arguing that it infringes upon the author’s competitive advantage. McCullough
    and Vinod (2003) and Gill and Meier (2000) have noted that such articles cannot
    be relied upon at least until the embargo period ends, since the accuracy of such
    Some journals permit authors to “embargo” originally collected data, i.e., the author does not
    have to make his data available to other researchers for some period of time, e.g., two years.
    By the time the article appears in print, the author should already have a second article submitted
    and be working on his third. If an author really wants to have the dataset all to himself, he can simply
    write all the articles he wants, and then submit them to journals simultaneously rather than seriatim—
    the cost to the author is that he has to wait to submit his articles.1098 : MONEY, CREDIT, AND BANKING
    articles cannot be independently verified by other researchers. Not providing data
    or code, even if due to an embargo, is merely a method by which some journals permit
    the author to shift the cost of keeping the dataset to himself (delayed publication) onto the
    journal-reading public (in the form of articles whose accuracy cannot be assessed) with
    the added expense of retarding scientific progress. Additionally, if the data do have
    value, then the author will gain citations as others use his data.

  6. This underscores the importance of replication in advancing science. I was active in this general area from 1974 to 2013. A zillion good results came along. Duplicatable good results were less common. I should review all the articles in the Journal of Clinical Oncology from 1990 to 2000 to establish a baseline for the Bayesian prior to believing a study.

    • “I should review all the articles in the Journal of Clinical Oncology from 1990 to 2000 to establish a baseline for the Bayesian prior to believing a study.”

      Sounds like a good idea.

      • Just for fun, I looked at the January 1990 issue of the JCO. I got a good laugh because I remember the first article well. The study was done by Dr. Ron Chlebowski who always does good work! It is a study of hydrazine sulfate that discredits that compound. To my amusement that article was cited frequently by the alternative medicine community who must not have read it but cited it because their favored compound was mentioned in the title. It would fall into the category of well done research that gets cited to support the antithesis of the evidence presented. I talked to Dr. Chlebowski about this once, and he just rolled his eyes.

        • “It would fall into the category of well done research that gets cited to support the antithesis of the evidence presented”

          A sad commentary on the state of use of evidence.

        • In this case it is to be expected though: The alternative medicine community has hardly had any history of following the scientific method, in general.

          The fact that they abuse evidence doesn’t surprise me.

  7. Andrew: 29+33=62, but 113+114=227 (not 212). I bring this up because it seems like in your point 2, 29% is supposed to be a weighted average of 25% and, uh, 29%. Are all of the numbers up there correct?

    • Alex:

      I don’t remember the details now, but it could have something to do with the people who dropped out of the study. It was not clear the best way to count them in these comparisons.

  8. “However [patients] should not go with the illusion that it is prolonging their life or that not going is shortening it.”

    To me this sounds like a clear agenda. It is very different than mentioning lack of evidence, which was presumably the implicit justification though. Also it is contradicting all the recent research on the physiological power of the various placebo&nocebo effects and stress influence. Beware that critics are themselves not immune to the garden of forking paths.

  9. Very informative piece, and the refusal to provide the data for re-analysis by the authors is disappointing. The enterprise of science involves doing what we can to provide a firm foundation upon which to build. There should be no “circling of the wagons” to prevent such open analysis and discussion and it is anathema to science and progress-and is of course damaging to patients. It may well be that the authors position is reasonable and further analysis can at least determine the likelihood of this or not-but apparently this open process will continue to be blocked.

Leave a Reply

Your email address will not be published. Required fields are marked *