Lady in the Mirror

In the context of a report from a drug study, Stephen Senn writes:

The bare facts they established are the following:

The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo. 

and the false conclusion they immediately asserted is the following

This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.

To understand the fallacy, look at the accompanying graph. This shows the simplest possible model describing events over time that is consistent with the ‘facts’. The model in question is the exponential distribution and what is shown is the cumulative probability of response for individuals suffering from tension headache depending on whether they are treated with placebo or paracetamol. The dashed vertical line is at the arbitrary International Headache Society critical time point of 2 hours. This intersects the placebo curve at 0.49 and the paracetamol curve at 0.59, exactly the figures quoted in the Cochrane review.

headache graphThe model that the diagram represents is simplistic and almost certainly false. It is what would apply if it were the case that all patients given placebo had the same probability over time of headache resolution and ditto for paracetamol and an exponential model applied. However, the point is that for all we know it is true. It would take careful measurement over time for repeated headaches of the same individuals to establish the element of personal response (Senn 2016).

The curve given for placebo is what we would expect to find for the simple exponential model if it were the case that mean time to response were 2.97 hours when a patient was given placebo. The curve for paracetamol has a mean of 2.24 hours. It is important to understand that this is perfectly compatible with this being the long term average response time (that is to say averaged over many many headaches) for every patient and this means that any patient at any time feeling the symptoms of headache could expect to shorten that headache by 2.97-2.24=0.73 hrs or just under 45 minutes.

Is this a benefit or not? I would say, ‘yes’. And that means that a perfectly logical way to describe the results is to say, ‘for all we know, any patient taking paracetamol for headache will benefit. The size of that benefit is an increase of the probability of resolution at 2 hours of 10 percent or a reduction of mean headache time of 3/4 of an hour’.

The latter, of course, depends on the exponential model being appropriate and it may be that some alternative can be found by careful analysis of the data. The point is, however, that the claim that only 10% will benefit by taking paracetamol is completely unjustified.

Senn summarizes:

Unfortunately, the combination of arbitrary dichotomies (Senn 2003) and naïve analysis continues to fuel misunderstandings regarding personalised medicine.

Interesting. Similar issues arise in the interpretation of fitted regression models in social science.

34 thoughts on “Lady in the Mirror

  1. The definition of “benefit” keeps changing though.

    1) It was originally [reporting] “being pain free two hours after taking a medicine”.
    2) The Cochrane study apparently changed this to “[reporting] being pain free or having only mild pain at two hours”.
    3) For Senn it is something like “[whether] any patient at any time feeling the symptoms of headache could expect to shorten that headache”.

    I couldn’t figure out where the 2 hour choice is coming from. I’d guess someone just used it one time because people didn’t want to sit around more than 2 hrs:

    1.3.2 Percentage of patients pain-free at 2 h

    Recommendations: The percentage of study participants whose headache pain score is zero at 2 h (pain freedom at 2 h), before any rescue medication should usually be the primary measure of efficacy and is recommended for both migraine with and migraine with aura RCTs.

    Comments: Freedom from pain before use of rescue medication is simple, clinically relevant, reflects patients’ expectations (70,71) is independent of the potential effect of other interfering therapies (e.g. rescue medication). It may be argued that some medications have a slow time to maximum (tmax) or time to effective (teff) plasma concentration and therefore an expectation of pain resolution within 2 h seems unrealistic. This is counterbalanced by the ethical argument that participants in clinical research should not be subjected to undue harm (the principle of non-maleficence) and the availability of effective drugs should not be delayed, that is, past 2 h.

    Pain freedom at 2 h is one primary efficacy outcome measure, but it is not the only one. Pain freedom at a time point earlier than 2 h should be considered for parenteral (e.g. intravenous, intramuscular, subcutaneous) test drugs.

    The primary efficacy outcome of pain relief at 2 h (headache response, that is, improvement of headache pain from moderate to severe at baseline to mild or none at 2 h) has been used extensively in several acute migraine RCTs (2–4,7,8), partly based on clinical experience suggesting that a patient perceives ‘cure’ while some residual headache may persist (8), and also because headache relief is statistically more powerful than the IHS recommended criterion (72). The validity of the clinical argument has been challenged severely as studies have shown that patients (a) do not consider it success to have a reduction in headache pain from moderate to mild (73), and (b) expect freedom from pain when treated (70,71).

    Also, headache response assumes that the magnitude of change from severe pain to no pain is clinically equivalent to that of a change from moderate pain to mild pain, which is false (74). Finally, the ordinal pain scale of severe (pain score = 3), moderate (score = 2), mild (score = 1) and none (score = 0) assumes that pain severity is an interval variable, that is, that there is equivalence between score intervals. This assumption is not clinically validated.

    International Headache Society Clinical Trials Subcommittee. Guidelines for
    controlled trials of drugs in migraine: third edition. A guide for investigators.
    Cephalalgia 2012;32:6-38. http://journals.sagepub.com/doi/full/10.1177/0333102411417901

    Also, keep in mind “pain” isn’t being measured directly: https://xkcd.com/883/

    • The fallacy still stands even if the original definition of the outcome remains. The authors claim that “only 10%” benefit from this treatment. This is not justified; perhaps everyone benefited but only net 10% of people are pain free within 2hrs when assigned to paracetamol. And there are infinitely many other explanations for the observed 10% difference – perhaps, say, 50% benefit and 40% are harmed, in amounts that make the net come out to a 10% benefit for being pain free at 2hrs.

      There just isn’t information available in the review to make a claim that “only 10%” benefit, yet the review’s authors make it anyway. The authors either don’t understand what the data tell them, or have communicated it so badly that they mislead most lay readers. Neither is very good.

      • However, one possible explanation is the 10% benefit, and that is their favorite one for some reason. Therefore they will go with it.

        This underdetermination is a general problem with studies designed to “check if there is a difference between two groups”. I have been arguing the data collected in such studies is too weak to conclude anything of substance for awhile now. Think of the billions of dollars spent on headache meds/research and still there is apparently no timecourse available like in Senn’s chart for even a untreated group. It is disgusting.

        • +10^6

          How is it that we haven’t ever paid at least 100 patients who regularly suffer from headaches to record their pain levels on some score every 30 minutes every time they have a headache for 3 months and then published the dataset in a publicly downloadable form? Inexcusable that we don’t even know what a headache “looks like” much less all the other more complicated issues we study. Couldn’t we just pay people to come to a clinic weekly and get their cholesterol measured for 3 months, and then let them try any old thing they like to adjust their diets and continue to get cholesterol checked at least once a month through the end of the year?

          pfff

        • Anon: You say “There is apparently no timecourse available like in Senn’s chart” but did you actually look for one? Two minutes with the google lead me to the control group in this study, which looks like what you’re after. So it seems premature to say the lack of information is “disgusting”, and it’s unfortunate that you made your unjustified claim in discussion of the Cochrane authors’ unjustified “10%” claim.

          You also seem to lapse into hyperbole when you say the review’s data are “too weak to conclude anything of substance”. If, based on large well-conducted trials, 59% benefit on paracetamol versus 49% on placebo at 2hrs, it’s reasonable to conclude that the drug is helping some people, under some conditions, for some period of time, under the conditions in the reviewed studies. That conclusion might not have the specificity we’d like – or that was claimed – but it’s also not nothing.

        • did you actually look for one?

          No, the point of a review is to include that info if it exists. It looks like you are talking about table 2 (complete relief, which I assume is the same as “no pain”), which has data similar to that which was in the Cochrane review*. Except for hours 1, 2, 3 we see 0%, 2.2%, and 8.7%, respectively.

          They need to take a group of untreated people and monitor them until 100% of the headaches have gone away, so we can see the whole curve. Then we can try to figure out what type of process would result in a curve like that.

          *http://statmodeling.stat.columbia.edu/2017/03/14/lady-in-the-mirror/#comment-442649

        • So, no you did not and will not look for timecourse information but you will decree what “they need” to do. Way to go.

          The purpose of a Cochrane review is to use primary research to address a clearly formulated question; here, does paracetamol help alleviating tension-type headaches. The authors succeed in this; based on what they present one might reasonably conclude that paracetamol does help on average.

          The important issue Senn raises is whether any individual-specific benefit can be inferred, where the authors way overstate what can be inferred from their report. Senn’s point about this being messed up – on which he is right – can be made easily even without any need for timecourse information, or the simulations he uses. Complaining about the lack of timecourse data on which to base a Senn-like curve misses the point.

        • clearly formulated question… does paracetamol help alleviating tension-type headaches

          From my first post here you can see that question is not clearly formulated, which is what allowed them to change the definition of “benefit/alleviate” from that recommended by the IHS. They don’t even know what they want to be measuring yet.

        • Anon: did you even read the review? The authors discuss both outcomes – text is below – and make the same over-interpretation mistake both times. The over-interpretation is the issue here, which you seem to keep missing.

          The outcome of being pain free at two hours was reported by 24 in 100 people taking paracetamol 1000 mg, and in 19 out of 100 people taking placebo, meaning that only 5 in 100 people benefited because of paracetamol 1000 mg (high quality evidence). The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo (high quality evidence), meaning that only 10 in 100 people benefited because of paracetamol 1000 mg.

        • only 5 in 100 people benefited because of paracetamol 1000 mg…meaning that only 10 in 100 people benefited because of paracetamol 1000 mg

          So which is it? And keep in mind people like Senn A reject both of these definitions of benefit…

        • > So which is it?

          It’s neither, as I already pointed out, it’s the same over-interpretation mistake both times. The over-interpretation is the issue.

          You seem to want to continue to kvetch about the two different outcomes – but they are plainly reported as two different outcomes in the italicized text, from the report, which further documents why this was done. Engaging you on this seems futile, so I will stop here.

        • Engaging you on this seems futile, so I will stop here.

          George, this is all an aside because you claimed they clearly formulated a question about whether paracetamol and successfully answered it. Yet you agree that they have more than one definition of “benefit”, and that both ways they use “benefit” are incorrect. I don’t see how this is consistent.

          You also claimed I missed some data by not doing a literature search, then shared some you found that was pretty much the same format as that in the Cochrane review I had posted earlier (except the numerical values were drastically different… do people answer “complete relief” much differently than “no pain” or what?).

          It is clear to me this is an extremely rudimentary field of research. BTW, I have never gotten a headache in my life, and know others who report the same. Besides figuring out what they want to measure and collecting more detailed data, they should probably be investigating that rather than running an endless series of these RCTs.

      • It seems to me that the use of “only” in the plain language summary is really an issue. I’m not sure what effect size they consider not “only” but for the chronic headache sufferers who were the subjects of this research helping 20% more people (as opposed to a 10 percentage point change) and reducing the time seems like it might be clinically meaningful. I feel the text would be better if they removed that word.

    • >I couldn’t figure out where the 2 hour choice is coming from.

      I would suspect it was a choice made on the basis of peak serum concentration after taking oral acetaminophen, which is, in some places, described as 30 minutes – 2 hours.

  2. I suspect that placebo works faster than the drug since delays in GI tract processing, distribution via the circulation, and cellular effects after binding are unlikely, in my view, to be a problem. I think that the time to response for the placebo might be very fast and beat the drug in the first hour.

    • I think you can still get a similar curve if you have some people who will respond to placebo early with those who are equally unlikely to respond to placebo at any time.

      The thin solid line is the “drug”, while the thick dotted line is the mean of the three thin dotted lines (placebo subgroups):

      x = seq(0, 5, 1/60)
      y = data.frame(a = pexp(x, rate = 0.4),
      b1 = pexp(x, rate = 1.0),
      b2 = pexp(x, rate = 1.1),
      b3 = rep(0.1, length(x)))

      plot(0, 0, type = “n” , xlim = range(x), ylim = range(y), xlab =””, ylab =””)
      sapply(1:ncol(y), function(i) lines(x, y[,i], lty = i))
      lines(x, rowMeans(y[,-1]), lwd = 4, lty = 2)

      https://i.imgur.com/uvJbUqg.png

    • Wait, the placebo effect affects both treatments, drug and placebo, right? Therefore there’s no reason for the placebo to work faster than the drug unless the drug actually makes headaches worse at the beginning.

      • Yes, I agree. The placebo doesn’t need metabolism, but there is no reason that the drug shouldn’t act like the placebo. I now think the curves will be identical at first with the seperation coming later.

      • sgc,

        I don’t see your reasoning here. Consider the following argument:

        Both placebo and drug have a placebo effect.
        The drug also has a drug-only effect.
        However, as Oncodoc has pointed out, the drug-only effect necessarily has a time delay between administration and effect, whereas a placebo effect does not necessarily have a time delay between administration and effect.
        Thus it is likely to see some placebo effect before any drug-only effect kicks in.

        (By “some” placebo effect I mean of some degree in some patients.)

        • I think he means that at the beginning (until the drug effect kicks in) you wouldn’t expect any difference between drug and placebo because they are both only showing the combination of placebo effect and the hazard of natural resolution.

        • I think maybe we’ve gotten into a lot of confusion in who is responding to what was said by whom. And there is also lots of “noise” in the situation being studied. So I’ll bow out of the discussion at this point rather than try to untangle all the possible confusions.

    • The placebo “works” because pain resolves on its own or a social psychological effect of the belief that you are being treated. Are you saying the the treatment is preventing the pain from resolving on its own?

  3. Senn: “the simplest possible model describing events over time that is consistent with the ‘facts’.”

    But headaches have been around for some time. Presumably, there’s some information about the uptake time of the medication and the general length of time headaches last that can be used. This may be the basis for using two hours as a measurement time; this is not my area of expertise.

    I’m a big fan of parsimony, but in this case assuming the simplest possible model doesn’t seem right.

    • zbicyclist: “I’m a big fan of parsimony, but in this case assuming the simplest possible model doesn’t seem right.”

      Senn is not proposing this model as good enough model to do an adequate analysis but rather one that is not known to be wrong (can’t be dismissed out of hand) but might just be understood by Cochrane statisticians who seem to be ignoring his criticisms of their work.

      On the blog of Senn’s original report (first link above) I related some of my own similar experiences with statisticians at Cochrane.

      It seems to be a reluctance to look behind statistical procedures/methods into the models that either do or don’t support them.

      There is also this video lecture which discusses this at around 20 minute https://vimeo.com/187305681 – this first 20 minute he tries to get across other concepts that to most of us seem simple but are very very hard to really get across to others.

      (Disclosure: I used to be a member of Cochrane’s Statistical Methods Group).

  4. Suppose time to relief is exponential with mean depending on treatment vs placebo. Difference in means is 0.75 with 95% credible interval (-0.3,1.9) and posterior probability 0.92 of being greater than zero. Just saying.

  5. I don’t have a problem with saying that two hours is a reasonable target, but the problem is that essentially in this case this is this is censored data requiring a model that works with such data correctly. Sure exponential will be one of those possibilities but it is hardly the only one. As always it would be much better to include that actual data.

    From @Anoneuoid’s comment I would say that 2 hours was possibly the censoring point and, for me, that means the graph should indicate that above 2 hours it’s an extrapolation.

    The 10% figure is wrong on so many levels. Since I think the censoring model makes sense I’m with Senn on comparing the estimated mean time to no pain, but you could also do an odds ratio at 2 hours.

  6. I tried to see if I could spot the fallacy before reading the rest. I didn’t get the explained one.

    It is of course possible that 49 people have headaches resolve no matter what in 2 hours, and thus APAP is going to help ‘closer’ to 20%. For that you’d need a non-Placebo control. Although maybe they have one, I didn’t read the paper.

  7. Presumably the control group and the treatment group were administered the drug/placebo in the same form (I would think in pill format but have not read through Keith’s links). Would the observed difference change if the drug/placebo were administered in an alternative manner I wonder?

    This is not at all my field but I have to imagine the drug is administered in at least several ways on a regular basis; should we not restrict conclusions from a study about the effectiveness of a drug to the effectiveness of that drug in the manner in which it was administered? Unless it’s well known in the medical literature that administration method has little impact on effectiveness, I think this may be prudent.

    I’ve become accustomed to crushing up headache medication and I find that increasingly the surface area makes it work quite fast. As an example.

  8. Ha, headaches! Let’s try this sort of thing…

    # I’ll have a whopping 1000 patients PER GROUP. Wohoo!

    nPatients = 1000

    # I’ll have everyone have a different starting level of pain, measured in
    # General Pain Units (but no difference between groups)

    g1StartLevel = rnorm(nPatients, 100, 10)
    g2StartLevel = rnorm(nPatients, 100, 10)

    # Also, everyone has a different criterion for reporting to be pain free
    # (and again, I assume/hope that the group assignment doesn’t affect that)

    g1Criterions = rnorm(nPatients, 0, 10)
    g2Criterions = rnorm(nPatients, 0, 10)

    # Now to choose the rates of pain reduction for the participants.
    # The pain level of both groups will have some level of natural pain reduction
    # BUT the treatment group will have that plus some additional boost from the medicine.

    naturalPainReduction = 7.5
    effectOfMedicine = 2.5

    # The rates will be drawn from gamma distributions to restrict them to positive values

    g1mode = naturalPainReduction + effectOfMedicine
    g2mode = naturalPainReduction
    sd = 2

    # (These formulae were copied (stolen) from Kruschke’s blog)

    ra1 = (g1mode + sqrt(g1mode^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
    sh1 = 1 + g1mode * ra1

    ra2 = (g2mode + sqrt(g2mode^2 + 4*sd^2 ) ) / ( 2 * sd^2 )
    sh2 = 1 + g2mode * ra2

    g1Rates = rgamma(nPatients, sh1, ra1)
    g2Rates = rgamma(nPatients, sh2, ra2)

    # Assuming that everything’s nice and linear, people will report being pain free at
    # these time points:

    g1Times = (g1StartLevel – g1Criterions) / g1Rates
    g2Times = (g2StartLevel – g2Criterions) / g2Rates

    # Who wants to see how the proportion of “I’m grand!” responses evolves as
    # a function of time?!

    timeVector = seq(0, 40, 0.1)

    g1propGrand = c()
    g2propGrand = c()

    for(t in 1:length(timeVector)) {
    g1propGrand[t] = length(which(g1Times < timeVector[t])) / nPatients
    g2propGrand[t] = length(which(g2Times < timeVector[t])) / nPatients
    }

    plot(timeVector, g1propGrand, ylab = "Proportion ill", xlab = "Cosmic Time Unit")
    points(timeVector, g2propGrand, col = "red")
    legend("bottomright", legend = c("Treatment", "Placebo"), pch = c(1, 1), col = c("black", "red"))

    ——-

    Okay, this was a horrible waste of time. Just one more cup'o'tea and then back to work.

Leave a Reply to george Cancel reply

Your email address will not be published. Required fields are marked *