Orphan drugs and forking paths: I’d prefer a multilevel model but to be honest I’ve never fit such a model for this sort of problem

Amos Elberg writes:

I’m writing to let you know about a drug trial you may find interesting from a statistical perspective.

As you may know, the relatively recent “orphan drug” laws allow (basically) companies that can prove an off-patent drug treats an otherwise untreatable illness, to obtain intellectual property protection for otherwise generic or dead drugs. This has led to a new business of trying large numbers of combinations of otherwise-unused drugs against a large number of untreatable illnesses, with a large number of success criteria.

Charcot-Marie-Tooth (CMT) is a moderately rare genetic degenerative peripheral nerve disease with no known treatment. CMT causes the Schwann cells, which surround the peripheral nerves, to weaken and eventually die, leading to demyelination of the nerves, a loss of nerve conduction velocity, and an eventual loss of nerve efficacy.

PXT3003 is a drug currently in Phase 2 clinical testing to treat CMT. PXT3003 consists of a mixture of low doses of baclofen (an off-patent muscle relaxant), naltrexone (an off-patent medication used to treat alcoholism and opiate dependency), and sorbitol (a sugar substitute.)

Pre-phase 2 results from PXT3003 are shown here.

I call your attention to Figure 2 [above], and note that in Phase 2, efficacy will be measured exclusively by the ONLS score.

My reply: 33 comparisons, 4 are statistically significant: much more than the 1.65 that would be expected by chance alone, so what’s the problem??

In all seriousness, I’d recommend they fit a multilevel model. That said, I’ve never fit such a model for this sort of experiment. I’d like to do it (at least) once, for a live example, as I think this would help me better understand the statistical issues and then I’d be able to make more helpful recommendations.

13 thoughts on “Orphan drugs and forking paths: I’d prefer a multilevel model but to be honest I’ve never fit such a model for this sort of problem

  1. It’s striking that the tests/traits that have the smallest CIs show no apparent changes at any dose. So either there are really no effective changes, or the medications only improve certain tests/traits (if indeed any are actually improved) and not others. Someone who understands the condition could think about whether the second would be plausible.

    I suppose that “ONLS” means “Overall Neuropathy Limitations Scale”. If you consider the improvement in ONLS vs dose (which seems to be the obvious choice for multilevel modeling), there is a small increase with dose, but the CIs are too large to tell for sure. If you just took the difference between high dose and no dose, and propagated the CIs, it looks (absent having the actual numbers to compute with) as if the difference would not have a significant p-value. This could be one of those cases where the mean, being the best unbiased estimate, could indicate something even if it’s not “statistically significant”. But there’s probably too much noise to really tell.

    To me, these results call out for another test at an even higher dose. But I suppose that won’t be in the cards.

    • To me, these results call out for another test at an even higher dose.

      It’s interesting because my thought process is totally different than what is found in your 2nd paragraph but I would come to the same conclusion: They need to collect enough dose-response data to cover the entire curve. In my case it is so that we can come up with models of what is going on consistent with this curve (There are some other issues w the data etc but I’m ignoring that for now).

      Can you give more detail on how you link the ideas in your second paragraph to your conclusion? Specifically, I see these as your premises (there are probably unstated ones that need to be filled in):
      p1) there is a small increase with dose, but the CIs are too large to tell for sure
      p2) the difference between high dose and no dose…would not have a significant p-value
      p3) the mean could indicate something even if it’s not “statistically significant”

      And here is your conclusion:
      C) these results call out for another test at an even higher dose

      Basically, I don’t see how the conclusion follows from what you wrote. For comparison, here is my reasoning in the same format:

      p1) The dose response curve contains useful info about what is going on when people are given this drug
      p2) Having data on the entire curve (including upper plateau) will constrain the number of explanations for what is happening when people take this drug
      p3) It would be useful to have explanations/models that predict the dose response curve
      p4) It isn’t clear whether the highest dose tested was at the upper plateau or not

      Therefore:
      C) these results call out for another test at an even higher dose

      Is it possible for anyone else to follow my reasoning here?

        • Thanks, it isn’t really about this in particular so I don’t think reading the paper is necessary to follow the logic. I really only glanced at it.

        • Anoneuoid:

          I think your reasoning is very odd in that it treats all measures of dose effect as exact. A very peculiar view from someone who believes all researchers are just chasing noise.

          While I understand your point (errors are of second order interest), I think Tom’s reasoning is totally fine in that he is saying that random error is so high it’s hard to pick out signal from noise. By upping the dosage, we presumably strengthen the signal (unless we’ve hit a plateau).

        • I think your reasoning is very odd in that it treats all measures of dose effect as exact.

          How so? If it reads like that there is a communication issue, because nothing could be farther from the truth regarding the thought process I attempted to describe.

          A very peculiar view from someone who believes all researchers are just chasing noise.

          Not sure why there is a strawman introduced here, but lets ignore for now…

          While I understand your point (errors are of second order interest)

          I’m not sure what you mean. Perhaps by “errors” you mean false positive/negative? In that case, they are totally irrelevant to the reason I would want more data. If the ONLS score means anything, I assume there is a non-zero effect of taking this pill.

          I think Tom’s reasoning is totally fine in that he is saying that random error is so high it’s hard to pick out signal from noise. By upping the dosage, we presumably strengthen the signal (unless we’ve hit a plateau).

          Thanks, so something like:

          p1) there is a small increase with dose, but the CIs are too large to tell for sure
          p2) the difference between high dose and no dose…would not have a significant p-value
          p3) the mean could indicate something even if it’s not “statistically significant”
          p4) upping the dosage may increase the change in ONLS score
          p5) it would be useful to know whether this treatment has any effect at all on ONLS score

          Conclusion:
          C) these results call out for another test at an even higher dose

          Does that fit? I could follow this line of reasoning, but it was non-obvious to me (no sarcasm). Of course, I reject “premise 5”, since I would assume some non-zero effect exists by default.

Leave a Reply

Your email address will not be published. Required fields are marked *