He’s skeptical about Neuroskeptic’s skepticism

Jim Delaney writes:

Through a link in the weekend reads on Retraction Watch, I read Neuroskeptic’s post-publication peer review of a study on an antidepressant application of the drug armodafinil.

Neuroskeptic’s main criticism is that he/she feels that a “conclusion” in the abstract is misleading, “… Adjunctive armodafinil 150 mg/day reduced depressive symptoms associated with bipolar disorder to a greater extent than adjunctive placebo although the difference failed to reach statistical significance…”

Neuroskeptic says, “The mean reduction in symptoms was indeed slightly higher in the armodafinil group than in the placebo group, but in the absence of statistical significance, the difference of means is meaningless (no pun intended).”

There is some discussion in the comments that suggest that others feel that a more complete decision analysis should be performed here, rather than reliance upon a decision based exclusively on the “loss function” induced by alpha=.05, beta=?. I agree.

I am less concerned with the conclusion written in the abstract, however, Neuroskeptic’s second criticism that the same language should be applied to the claim about the drug being “well tolerated” is quite valid.

My reply:

I have not actually followed the link (let alone read the research article being discussed), but I think I can give some general comments. First, of course you’re right that there’s nothing special about p=.05. This is often as stated: what if p=.06 or .04? But it’s not even that. A result could be completely non-significant (for example, an estimate of 1 with a standard error of 1) but that doesn’t mean it’s zero! Indeed, in lots of example, there’s more available information about effect sizes from prior knowledge than from the study at hand. I’m guessing that Neuroskeptic is worried about cheating. But from a scientific point of view, sure, you report what you found.

The larger point, though, is that unregularized estimates don’t tell us much. Statistical significance is a crude way to control the proliferation of experimental noise, but really the sort of claims encapsulated in confidence intervals and p-values can only be interpreted in a larger context. Call that a hierarchical model or prior information, whatever. There’s nothing rigorous about looking for shiny objects that happen to be statistically significant.

11 thoughts on “He’s skeptical about Neuroskeptic’s skepticism

  1. Can someone help me understand this statement?

    “… but really the sort of claims encapsulated in confidence intervals and p-values can only be interpreted in a larger context.”

    • Here are two contexts:

      We got some undergrads who needed extra credit and asked them questions about their use of different types of toothpaste, and then measured the amount of RNA from each of 8000 different bacteria in a swab sample of salive, and found a statistically significant difference in 35 different bacteria between those who use X and those who use Y.

      We recruited patients from 15 different dental offices and assigned them to randomly use either toothpaste X or toothpaste Y. After a month we measured 8000 different bacteria. Combining the quantities from all 75 of the known most damaging bacteria into a single score based on a weighted estimate of the damage each one is likely to cause, we found a statistically significant difference in means between group X and group Y on this score.

      Obviously, I’m making this all up, but the point is:

      case 1) The sample of people is biased compared to the population, there was multiple testing of 35 different measurements, and there was no scientific information put into the choice of measurements.

      case 2) The sample has a better chance of being representative of a general population, the groups were assigned via RNG, and scientific information went into choosing what to measure and how to measure it. A SINGLE test was done.

  2. Not the least informative paragraph of Neurosceptic’s piece:

    `Reaching the end of the paper, it’s no surprise to learn that it was sponsored by and funded by Teva Pharmaceuticals, who sell armodafinil, and that it was written with the ‘assistance of’ a medical writing firm, and that “Teva provided a full review of the article.”’

    And yes, p=0.24.

  3. Neuroskeptic’s post actually mostly agrees with your post. Neuroskeptic brings up the (very salient fact) that there were *lots* of non-significant differences that the authors chose not to point out (at least, not in the abstract).

    The best example is the fact that one patient in the experimental group died, while none in the control group did. But the authors chose not to mention the “higher risk of death” because it clearly didn’t reach the threshold of significance. Likewise, in the long list of side effects that the researchers observed (differences between the control and experimental group), they only pointed out a couple with significant effects (nasuea and headaches, I believe).

    Basically, it’s not as if they were simply holistically presenting all of their experimental results. They were choosing to elevate certain results (that made the drug look more effective) while ignoring others.

  4. Hi Jon

    “Basically, it’s not as if they were simply holistically presenting all of their experimental results. They were choosing to elevate certain results (that made the drug look more effective) while ignoring others.”

    I don’t think the elevation of the depression analysis was necessarily a function of selective reporting bias; rather – as far as I can tell – depression was the primary outcome, and hence formed the basis for the power calculation etc. So they would need to report it. Also note the authors reported in the abstract at least 1 non-significant adverse effect (headaches; 16% of 150mg vs 13% of placebo). Clearly, they report this in the abstract because it was the most frequent such AE in the drug group – so again I don’t think selective reporting bias is the issue.

    I think we should be much more concerned about the failure of the authors to report the effect size. In the case of the effect of the drug on depression, it was trivial (d = -0.13, 0.05 to -0.32). Given this ES, and given the almost exact rates of response (see paper) I think this trial is pretty close to demonstrating equivalence of the drug and placebo… I presume the trial was powered to detect effect sizes of around 0.30, so even the sponsors probably thought 0.13 was not worth detecting. So I agree with Neuroskeptic that the the abstract is highly misleading, albeit for different reasons.

    P

  5. Actually, the conclusion of the abstract of the paper is completely misleading that I’m going to use this article to teach my students about effect sizes, bias and spin. There’s no evidence at all from this study that the drug had any effect on depression in bipolar, yet the impression given by the abstract is entirely different… Not seen something quite as brazen as this before!

  6. “Adjunctive armodafinil 150 mg/day reduced depressive symptoms associated with bipolar disorder to a greater extent than adjunctive placebo”

    Everyone always misses the real problem: The difference cannot be automatically attributed to the treatment (adjunctive armodafinil). The logic being used is that because the null hypothesis is false (even though it apparently wasn’t deemed so here, imagine if it was), the research hypothesis is true. There is no such logical step being performed, it is pure fallacy.

    Were there baseline differences? Dropouts? This could result in a difference between groups. What was actually measured, was it a survey? Is the only explanation for that certain pattern of answers “depressive symptoms”? How were these associated with bipolar disorder?

    The question we want answered is: “How did the two groups differ and what is the best explanation for these differences”? Another thing, you are probably going to want more than a measurement at a single timepoint to answer this.

  7. There’s another criticism of this study that I did not see in the referenced review. One symptom of depression is fatigue. (footnote omitted) Armodafinil (Nuvigil) is a wakefullness agent. (see below) Many depression inventory questions touch on fatigue.

    The abstract stated that depression was measured using the IDS-C30 scale. If I found the right document on the net, it includes questions like:

    How has your energy been this past week?
    IF LOW ENERGY: Have you felt tired? (How much of the time? How bad has it been?)

    and

    During the past week, have you had feelings of being weighted down, like you had lead weights on your
    arms and legs? How many days? How much of the time? Do these symptoms interfere with your day-to-day
    activities?

    One would expect a wakefullness agent such as armodafinil or black coffee to show some anti-depression effects given the nature of the measures of depression.

    http://www.nuvigil.com/PDF/Full_Prescribing_Information.pdf
    It is approved for the treat narcolepsy, obstructive sleep apnea, and shift work disorder.

Leave a Reply to P Cancel reply

Your email address will not be published. Required fields are marked *