Knee meniscus repair: Is it useful? Potential biases in intent-to-treat studies.

Paul Kedrosky writes:

Not sure if you’ve written on this, but the orthopedic field is tying itself in knots trying to decide if meniscus repair is useful. The field thought it was, and then decided it wasn’t after some studies a decade ago, and is now mounting a rear-guard action via criticisms of the statistics of intent-to-treat studies.

He points to this article in the journal Arthroscopy, “Can We Trust Knee Meniscus Studies? One-Way Crossover Confounds Intent-to-Treat Statistical Methods,” by James Lubowitz, Ralph D’Agostino Jr., Matthew Provencher, Michael Rossi, and Jefferson Brand.

Hey, I know Ralph from grad school! So I’m inclined to trust this article. But I don’t know anything about the topic. Here’s how the article begins:

Randomized controlled studies have a high level of evidence. However, some patients are not treated in the manner to which they were randomized and actually switch to the alternative treatment (crossover). In such cases, “intent-to-treat” statistical methods require that such a switch be ignored, resulting in bias. Thus, the study conclusions could be wrong. This bias is a common problem in the knee meniscus literature. . . . patients who fail nonsurgical management can cross over and have surgery, but once a patient has surgery, they cannot go back in time and undergo nonoperative management. . . . the typical patient selecting to cross over is a patient who has more severe symptoms, resulting in failure of nonoperative treatment. Patients selecting to cross over are clearly different from the typical patient who does not cross over, because the typical patient who does not cross over is a patient who has less severe symptoms, resulting in good results of nonoperative treatment. Comparing patients with more severe symptoms to patients with less severe symptoms is biased.

Interesting.

That article is from 2016. I wonder what’s been learned since then? What’s the consensus now? Googling *knee meniscus surgery* yields this from the Cleveland Clinic:

Meniscus surgery is a common operation to remove or repair a torn meniscus, a piece of cartilage in the knee. The surgery requires a few small incisions and takes about an hour. Recovery and rehabilitation take a few weeks. The procedure can reduce pain, improve mobility and stability, and get you back to life’s activities. . . .

What follows is lots of discussions about the procedure, who should get it, and its risks and benefits. Nothing about any studies claiming that it doesn’t work. So in this case maybe the “rear-guard action” was correct, or at least successful so far.

In any case, this is a great example for thinking about potential biases in intent-to-treat studies.

34 thoughts on “Knee meniscus repair: Is it useful? Potential biases in intent-to-treat studies.

  1. Very interesting topic. I wonder how much impact these debates have on # of surgeries performed. Anecdotally, my guess is not much. Two reasons: (1) physicians performing a procedure tend to believe in its efficacy even when data suggest otherwise– human nature to want to believe you are helping, and it takes very strong evidence to change that thinking (2) cannot avoid the financial self-interest. “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

    Lots of examples in healthcare, but for one see these references on balloon sinuplasty and endoscopic sinus surgery. Then google the topics– all of the top links are ENT practices touting the procedure.

    https://rightcarealliance.org/article/endoscopic-sinus-surgery-tale-overuse/
    https://www.enttoday.org/article/balloon-dilations-continue-despite-skepticism/3/?singlepage=1

  2. As (only) a personal anecdote, a meniscus repair did give me some improvement but it only lasted for a few weeks, and the recovery from the procedure was longer and more unpleasant than I had expected. Eventually I had a total knee replacement.

    So I am usually somewhat negative when the subject comes up.

  3. My experience differs from Mr. Passin’s. I had a torn meniscus repaired. I immediately went from someone whose knee would occassionally lock-up causing a fall or near-fall to a person whose knee would not lock up. I even returned to running. However, cartilage does not heal well. The damaged meniscus continued to degrade the joint. About 20 years later, I required a total knee replacement.

    Bob76

  4. I have had meniscopies on both knees with great success. I my case(s) I could walk about a quarter mile before my knee started to ache, now I can walk all day but I might get a bit of soreness the next day. My wife had a meniscopy but it did not fully resolve her knee pain like it did mine.

    I wonder if knee meniscopy recovery is a type of perfect storm for statistical analysis (kind of like the hot hand). Patients facing a meniscopy will never have a perfect knee again no matter what they choose. At the same time having a torn meniscus typically does not require crutches. The upshot is that a successful meniscopy take the patient from a low to moderate level of pain involved with the specific task of walking, to hopefully a lower level of pain. With robust effects mostly lacking, and a squishy measurement of “how much does it hurt now?”, is it any wonder that folks might question the efficacy of the intervention?

  5. > Comparing patients with more severe symptoms to patients with less severe symptoms is biased.
    > potential biases in intent-to-treat studies.

    I like the word bias here because it tells me I should stop and think, even if I wasn’t reading anything so closely before. Like a stop sign.

    I do not like the word bias here because it makes me think biased against what? Like there is some idealized method

    There is also this line:

    > and actually switch to the alternative treatment (crossover). In such cases, “intent-to-treat” statistical methods require that such a switch be ignored, resulting in bias

    Statistical methods requires us to ignore the crossover and then we’re biased? That doesn’t sound good. Are we just hosed? Maybe — I don’t really know much about this, just commenting on the bias word.

      • Yeah, agreed. But here, instead of this:

        > In such cases, “intent-to-treat” statistical methods require that such a switch be ignored, resulting in bias. Thus, the study conclusions could be wrong.

        What if the line was: The methods used here assume that there are no switches when in fact there are many switches.

        The bias word just annoys me a little. The message is clear enough in the abstract for sure.

    • I read it as generic statistical bias that has this particular “cross over” application, meaning that there’s a systematic tendency for certain types of patients (those with more severe symptoms) to cross over–it’s not random. The authors are arguing that this bias is significant enough to affect the comparisons.

  6. For what it’s worth, my wife had meniscus surgery, and it had no beneficial effect. When my wife’s orthopedist recommended the surgery to her, he didn’t give her any indication that it was controversial and might be useless.

  7. OK, I have NOT had knee surgery or injuries that might have led to surgery…

    It seems to me that any intention to treat study should be accompanied by the per-protocol analysis of the same data. Where the two disagree substantially then there should be serious discussion of biases and problems with the study design.

    In the absence of per-protocol analysis, intention to treat seems to me to be a good way to discard information. Laziness makes the ‘standard approach’ attractive.

        • I would like to learn more about this. It isn’t clear what a good choice for an instrumental variable would be in many of these cases. The colonoscopy study I linked had a large proportion of cross-overs. Presumably, those recommended for a colonoscopy that chose not to receive one were different in some (unmeasurable?) way. What would be a good instrumental variable in such a case? I don’t understand what it means to use “assignment” as an instrumental variable, as Dean suggests. One further question: wouldn’t an alternative be to just provide 2 analyses – ITT and “as treated,” both of which are flawed but possibly might provide a reasonable range of estimates?

  8. Cross-over causes bias only with regard to an idealized effect of a treatment decision that cannot ever be changed. But in real life that’s not the case, if something doesn’t work you try something else. So the (pragmatic) RCT can be seen as a test of a strategy: surgery right away, or medical treatment first and surgery only if that doesn’t work. Then the switch is part of the strategy and there is no bias in the ITT analysis.

    Also the described bias seems conservative, as treatment failures are corrected in the less effective medical treatment arm. This reduces the contrast.

    • For this reason, the ICH E9 Addendum on Estimands has been introduced in 2020.
      The case you are referring to characterizes a “treatment policy” estimand. That is, when the target of your ITT analyses is not only the net effect of your primary intervention but also the secondary effects that either intervention will entail, there is no such thing as bias.
      By contrast, an ITT analyses will be biased when you are interested in a “hypothetical” estimand.

      For any practitioner, the former will probably be more interesting.

    • Agreed on both points. Unless there is something special going on here, the bias of ITT can be thought of as just scaling down the LATE by the fraction of compliers. But maybe the point is those who benefit the most aren’t compliers anyway (they are always-takers).

      My quick look at the paper leaves me with the impression that it is pretty vague and at least encourages misunderstanding. For example:
      “The inequity of 1-way crossover is not the only bias confounding trials that compare surgical versus nonoperative treatment using ITT. Selection bias also confounds 1-way crossover. As readers well know, the typical patient selecting to cross over is a patient who has more severe symptoms, resulting in failure of nonoperative treatment. Patients selecting to cross over are clearly different from the typical patient who does not cross over, because the typical patient who does not cross over is a patient who has less severe symptoms, resulting in good results of nonoperative treatment. Comparing patients with more severe symptoms to patients with less severe symptoms is biased.”
      But ITT doesn’t involve comparing different patients… it specifically avoids this, unlike as-treated or per-protocol analyses.

  9. An interesting example of intention to treat issues concerns a recent article in the New England Journal of Medicine (https://www.nejm.org/doi/full/10.1056/NEJMoa2208375?query=featured_home). In brief, randomized groups were either “invited” to have a colonoscopy or not given that invitation. The percentage of people actually following the advice was around 50% (possibly related to the practice, in some places, of not using sedatives with colonoscopies). The ITT study found little or no benefits from colonoscopies – however, based on the actual treatment received (an analysis the authors did not perform), it looks like colonoscopies were effective screening procedures.

    In a case like this, where adherence is especially low, I think the issue hinges on whether adherence represents some kind of self-selection related to medical conditions or only related to psychological attitudes. If those choosing to follow the advice have other symptoms related to actual disease, then analyzing the actual treatment received will be biased towards effectiveness. However, if the reason for non-adherence is unrelated to actual medical condition (e.g., it solely has to do with psychological willingness to have an invasive procedure), then I think the ITT study underestimates effectiveness.

    I also think there may be a divergence between which analysis is appropriate from a public health viewpoint and from an individual patient’s viewpoint. One of the primary reasons for using ITT analysis, is that it more closely mirrors real-world experience. If you recommend a treatment and many people will not adhere to the plan, that is an aspect of reality that matters to public health decision-makers. However, if I am a patient considering whether to agree to a particular treatment, I may not care about others’ adherence (unless, as above, it is due to self-selection based on some unmeasured relevant health variables). What I care about is how effective the treatment is likely to be, if I undertake it.

    • This trial shows the IV estimate in the appendix and is an excellent example of application of an IV in an RVT context. I am confused why you state the authors did not do this analysis.

      • You are correct – somehow I missed that. I also missed that the editorial in that same issue noted those results. And, the results were markedly different – the insignificant (sorry about the NHST reference) odds ratio in the ITT analysis was significant in the per protocol analysis. I still wonder why the article and the editorial de-emphasized that in favor of their lead that the study’s results were disappointing for those that favor such screening. I’d say they buried the lead here.

        Thanks for correcting me and pointing me to what I missed.

  10. It seems to me that this post raises a basic question: Do Bayesian methods offer any insights into the analysis of data intended to be analyzed under an intent-to-treat protocol? This question reminds me of the stopping rules issue. See https://statmodeling.stat.columbia.edu/2018/01/03/stopping-rules-bayesian-analysis-2/.

    Long rant follows. tl;dr version: In some cases, intent-to-treat analysis will mislead.

    ===============================

    I have spent a lot of time thinking about knees and knee repair. Not particularly systematically—but many hours.

    Long ago, I injured my knee in a sporting incident. The surgeon I saw later that afternoon was a very well-regarded orthopedic surgeon. He diagnosed the injury as a torn lateral meniscus. He told me (1) the injury appeared mild enough that the trauma of surgery would probably outweigh the benefits and (2) I would develop arthritis in the knee in my seventies.

    I had occasional discomfort in that knee thereafter. But I did not seek medical treatment.

    This was an error. I made a decision at the time of the injury, based on the surgeon’s advice, not to seek treatment. However, as the years went by, arthroscopic knee surgery was perfected and the surgery for my knee became less traumatic. Unfortunately, I did not revisit my earlier decision. I failed to recognize that the facts had changed and that reanalysis might lead to a different choice. My training in Bayesian decision analysis did not address this kind of failure.

    Three-plus decades after the first injury, I twisted the same knee when kayaking. There was an immediate degradation of the knee. As I recall there was a some discomfort much of the time. But, on occasion, the knee would lock up and I would stumble. Such events were painful.

    I immediately went to an orthopedist who ended up performing an arthroscopic repair. As I recall, he stated that he removed a little flap of cartilage that was flopping around. There was some discomfort after the operation but the knee never locked up again.

    Now that change may been a very powerful placebo effect. But it sure sounds pretty mechanical to me. I suppose that my locked up knees could have been some hysterical syndrome and the finding of the cartilage flap a coincidence.

    I read the cited article (editorial actually) as well as the NYT article that it mentions.

    Reflecting on this issue, it seems to me that meniscus surgery is sometimes in the same category as parachutes. If you see someone use a parachute once successfully, you will be convinced that they can work. Even if that person had been randomized to the “jump with no parachute” treatment, it would be a mistake to classify their survival as evidence toward the hypothesis that “jump with no parachute” is just as good a “jump with a parachute.” Intent-to-treat analysis seems like a mistake in analyzing parachute use.

    I am not expressing a general opinion on the value of meniscus surgery. Rather, I believe that it can unambiguously improve some cases. Whether the diagnostic skills, surgical skills and incentives of the entire population of orthopedic surgeons are such that meniscus surgery is more likely than not to lead to a better life for the patient is another matter.

    Bob76

  11. The comments are incredibly interesting – so much reliance on personal experience and anecdote. It’s as if all my M.D. colleagues are right here in the comments section ;)

    The thing a lot of folks seem to be neglecting here is that the primary claim is not the meniscus repair doesn’t work, it’s that it doesn’t work better than conservative treatment.

      • Exactly.

        Old joke from my youth. The MDs in the LA area got mad about something so they went on “strike”, refusing to do elective surgeries. The mortality rate went down for the duration.

        https://pubmed.ncbi.nlm.nih.gov/18849101/

        “The paradoxical finding that physician strikes are associated with reduced mortality may be explained by several factors. Most importantly, elective surgeries are curtailed during strikes. …”

        REMEMBER JOAN RIVERS!!!

        (She was from a universe I don’t have much interest in, or even respect for, but the few times I happened to see her, she had a seriously kewl sense of humor about her world and herself. I wish someone had pointed her to that article.)

  12. That article is from 2016 and seems a bit outdated: There now been several randomized controlled studies using placebo treatments instead of ITT, see e.g. Sihvonen et al., “Arthroscopic partial meniscectomy for a degenerative meniscus tear: a 5 year follow-up of the placebo-surgery controlled FIDELITY (Finnish Degenerative Meniscus Lesion Study) trial” (https://dx.doi.org/10.1136/bjsports-2020-102813 , 2020).

    (In this case, placebo surgery is less weird than it sounds, since the patients were undergoing a *diagnostic* surgery anyway: “All participants had a diagnostic knee arthroscopy and were then (during the same operation) assigned to either APM or placebo surgery. For the randomization, we used sequentially numbered, opaque, sealed envelopes prepared by a statistician with no involvement in the clinical care of participants in the trial.”)

    And regarding your question about the current consensus, a Cochrane review from 2022 (https://doi.org/10.1002/14651858.CD014328 ) concludes that “Arthroscopic surgery for degenerative knee disease (osteoarthritis including degenerative meniscal tears) […] provides little or no clinically important benefit in pain or function, probably does not provide clinically important benefits in knee‐specific quality of life, and may not improve treatment success compared with a placebo procedure. It may lead to little or no difference, or a slight increase, in serious and total adverse events compared to control, but the evidence is of low certainty. Whether or not arthroscopic surgery results in slightly more subsequent knee surgery (replacement or osteotomy) compared to control remains unresolved.”

    • Tilman wrote:

      “a Cochrane review from 2022 (https://doi.org/10.1002/14651858.CD014328 ) concludes that “Arthroscopic surgery for degenerative knee disease (osteoarthritis including degenerative meniscal tears) […] provides little or no clinically important benefit in pain or function, probably does not provide clinically important benefits in knee‐specific quality of life, and may not improve treatment success compared with a placebo procedure. It may lead to little or no difference, or a slight increase, in serious and total adverse events compared to control, but the evidence is of low certainty.”

      This is not surprising but is a bit of a red herring. In general, you cannot fix arthritis with surgery, but that had nothing to do with my situation. I do not have arthritis and the surgeries did indeed fix my knees.

  13. My mother had meniscus surgery and she said she would not have done it if she knew how painful it would be. I think doctors understate the pain of orthopedic procedures because they make a ton of money on it. Patients are told Medicare will pay, so they are done on someone else’s dime. I know an orthopedic doctor who did a “study” on how much his patients improved after hip and knee replacement, but had no control group. There is reason to believe physical therapy works as well, but surgeons don’t make money in that. Plus there are massive placebo effects. Studies show sham arthroscopy works as well as the real one. https://www.nejm.org/doi/full/10.1056/nejmoa1305189

  14. The links below are first, to an article in the New England Journal of Medicine 27 Oct 2022, about a study of the effectiveness of colonoscopy.

    Applying an ITT analysis, the study found little benefit from recommending colonoscopy. A per-protocol analysis found the risk of death from colon cancer was cut in half for those who underwent colonoscopy. The mass media picked on these conclusions and repeated them.

    The second link points to an editorial in the same issue that discusses a variety of reasons for the substantial differences between the two analyses.

    The third link is to a later issue of NEJM containing letters to the editor about the study.

    Bottom line. Proper analysis of such studies is hard and takes lots of effort.

    https://www.nejm.org/doi/full/10.1056/NEJMoa2208375

    https://www.nejm.org/doi/full/10.1056/NEJMe2211595

    https://www.nejm.org/doi/full/10.1056/NEJMc2215192

    • I’m glad you posted these links – I cited that study above (confusing my memory of the study, which in fact, did both an ITT and per protocol analysis) but I hadn’t seen the letters. The first letter was right on target with what bothered me about the NEJM editorial – I put it above as “they buried the lead.” The study does appear to show that colonoscopy screening is highly effective when adhered to – this makes adherence a critical element in public health decision making. And, I think the per protocol analysis is much more relevant to an individual patient than ITT – particularly when adherence is so low and crossovers are so high. As a patient I want to know what screening can accomplish for me, not what it accomplishes when 50% of those randomized to be invited for a colonoscopy defer, instead, to forgo the screening. Anyway, what does it mean to be “invited” to have a colonoscopy – is that the strongest medical advice to promote adherence? (I haven’t read the protocol, so perhaps it is explained there).

      • The study does appear to show that colonoscopy screening is highly effective when adhered to
        […]
        As a patient I want to know what screening can accomplish for me, not what it accomplishes when 50% of those randomized to be invited for a colonoscopy defer, instead, to forgo the screening.

        Per-protocol doesn’t tell you that either though. The patients are self-selected to get screened vs not. They would have to take the patients that showed up for a screening then randomize half of them to not get one. Then some might just go get one elsewhere…

        Basically, you aren’t going to get a valid answer to your question based off these types of A vs B studies.

        • I think that is a bit too skeptical of the per protocol analysis. There is self-selection going on, but it depends on whether the self-selection is related to people’s health in some unmeasured way or whether it reflects some psychological attitude unrelated to health. For example, if the people forgoing the colonoscopy just don’t like invasive procedures (and in this case, in some countries they were quite invasive – no sedation) then I think the per protocol outcomes are what I am interested in. On the other hand, if the people forgoing the screening are healthier in ways that relate to decreased risks of cancer, then that analysis would be biased. So, there may be useful information for my question from the per protocol analysis, even if it is not a “valid answer.”

        • Of course if you want to make enough assumptions you can validly conclude whatever you want (given the assumptions are correct).

          But in that case what is the advantage over an observational study? It seems like the worst of both worlds.

Leave a Reply

Your email address will not be published. Required fields are marked *