Debate over kidney transplant stats?

Dan Walter writes:

A few years ago, in a post about Baysian statistics, you referred to a book that I wrote about a study on catheter ablation for atrial fibrillation: The Chorus of Ablationists

I am writing a story on the transplant industry and am wondering about a widely cited article concerning the long term health effects of donating a kidney.
I am told that the original data set is suspect – which is another story – but I’m wondering about the methods used in this study:

Diminishing Significance of HLA Matching in Kidney Transplantation

The stakes are pretty high, as this article is being used to expand the donor pool.
I was hoping that you or one of your blogging comrades could take a quick look at the statistical analysis section.
I have no time at all to look at this, but it seems like it could be important.  If anyone does have the chance to take a look, please give us your thoughts in the comments.  Thanks.

23 thoughts on “Debate over kidney transplant stats?

  1. Andrew, I would remove those shortened links and replace them with the real ones. Especially since you seem to be quoting this verbatim from somewhere else.

  2. From a quick read-through; the primary source of signal seems to be difference in LR test statistic, i.e. statistical significance, over time. But there are also smaller numbers in the study in later years so that explains part of why LR test statistics might decrease – and they take account of this. But differences in the design matrix between years is also presumably relevant, and that’s not discussed.

    It seems to cry out for an interaction analysis but there are age/period/cohort problems?

    • The signal is not just a decline of statistical significance for HLA mismatches, it’s a decline in the magnitude (that is, the practical significance, not just the “statistical significance.”) The paper seems to use ‘significance’ in its plain-English sense rather than the statistical sense. The statistics community should have long ago stopped using the term “significant” to mean something so different from its normal usage.

      Anyway, Table 2 seems to tell the story; let’s see if it formats OK in this comment….at the least we are going to lose the indication of “statistically significant” relative risk (that is, “statistically significantly” greater than 1). Take a look at 1995 compared to 1998: in the 1998 cohort the RR is lower in every single row. It does look like there is still a trend towards more mismatches leading to higher relative risk, but the risk associated with 4 or 5 or 6 mismatches does seem lower than it used to be and thus possibly more tolerable than it used to be.

      There are several confounders, including two biggies that are discussed in the paper: donor age, and “cold ischemia time”. Just from looking at the summary tables, which is all I’ve done, there’s no way to tell if they’ve done a good job at controlling for everything, or, in fact, controlling for anything.

      Table 2. HLA mismatch and relative risk of graft failure
      HLA mismatch 1995 RR 1996 RR 1997 RR 1998 RR
      referent group 0 HLA mismatches.
      1 1.033 0.998 1.020 0.991
      2 1.101 1.000 1.100 1.050
      3 1.088 1.065 1.223 1.026
      4 1.118 1.173 1.146 1.063
      5 1.118 1.154 1.191 1.107
      6 1.265 1.227 1.286 1.152

      • I’m very interested in this topic (I’m waiting for a kidney transplant, so I’m *really* interested). They say that they got their data from a US agency that collects this information. How can one get *their* data (as opposed to going through the same process oneself), to replicate their analysis or to examine it more closely? It seems that for such important topics (as opposed to red and sexual availability) such data should accompany the paper (probably with the full analysis).

        • Re: “How can one get *their* data…”

          That is actually far more important than the journal article I posted here. I hope to bring others to this discussion who are more knowledgeable than I, but in any event will post further on this later in the day…

      • “the risk associated with 4 or 5 or 6 mismatches does seem lower than it used to be and thus possibly more tolerable than it used to be.”

        That is a significant observation, because some argue that better immuno-suppressant therapy makes Zero matches acceptable. But then again, in light of your other comments, it seems like you’d really have to be stretching for that (much desired) answer…

  3. “The stakes are pretty high, this article is being used to expand the donor pool”

    From the paper:
    “We hypothesized that the relative significance of HLA matching in determining graft survival had diminished over the past several years, while the significance of nonimmunologic risk factors had not…
    To test these hypotheses, we used data from the United Network of Organ Sharing (UNOS) and United States Renal Data System (USRDS). The USRDS collates and analyzes data on approximately 95% of all persons with ESRD (whether on dialysis or following transplantation) in the US…
    Our study was restricted to deceased donor kidney transplants performed between December 1994 and December 1998 (n = 33,443). We divided this sample into four cohorts according to transplant year (1995 and before, 1996, 1997, and 1998).”

    It strikes me as odd that this is so important yet no one in the group of people concerned has bothered accessing these databases (or their successors) to get more recent data. It also strikes me as odd that a paper published in 2004 is “restricted” to data from 1998. It again strikes me as odd that the background contains no hint as to why they formed this hypothesis.

    Someone should get the more recent data (spanning as long a time as possible) and plot the individual dates of transplant vs. survival times (as a scatterplot). The dots should be colored according to HLA mismatches, or each of those categories can be in a separate panel. I wouldn’t take anyone arguing from this paper seriously unless there is a good reason no one has done that.

      • Dan,

        What do you mean exactly? What complaints have others had about this data and in what context? I get the sense that for whatever reason you already believe, or want to discover, this study is flawed. If this is true, is that based on the comments of other researchers and/or do you have some political/etc motivation to be against the findings of the study?

        Sorry for so many questions…but I have no knowledge of the “transplant scene”.

        • “those who have suggested that data which is widely used to encourage organ donation is flawed”

          Are you able to share the reasons they gave? Also, I agree with this statement: “flawed methodology is far from uncommon in medical journals”. I would say it is pervasive, at least on the preclinical side. I haven’t seen any evidence to the contrary on the clinical side.

          Here is a good quote:

          “We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort.”
          Fisher, R N (1958). “The Nature of Probability”. Centennial Review 2: 261–274.

          The primary flaw in my mind is that very little effort is put into ruling out explanations. People rule out that two groups are exactly the same (the usual strawman null hypothesis) and then run wild with whatever explanation they favor. A related aspect is that the predominate strategy is to only look at averages (often these averages are averages of other averages), because the data is so “noisy”. This hides all of the information that may be inconsistent with the favored interpretation.

          Because so much of the data published is of this nature, this is what the theories are based on. This then impedes people from understanding the actual process acting at the individual level. In many cases I suspect we do not even understand the system well enough to know what controls are necessary.

          In the end you have physicians being advised based on an imaginary world populated by “the average man/cell” who does/responds to nearly everything according to normal distributions, analyzed by perfect experiments that have ruled out all other explanations.

        • I don’t think anyone has any idea what 99% of these drugs are doing at all. It seems impossible for them to know when all they do is rule out that two groups are different multiple times. The criteria for evidence is so weak that you can prove anything you want, given sufficient funding. After my experience in biomed research, I won’t go to the doctor for anything other than physical trauma and supportive care. It seems safer to risk it (whatever may occur) alone. Unfortunately, my views have come to be essentially those expressed by Kary Mullis in the 1990s:

          “The horror of it is every goddamn thing you look at, if you look at it through the glasses that you’ve developed through looking at this thing, seems pretty scary to me. Look at the oncogene people and I go, oh yeah, I know what they are doing. Same stuff. Oncogenes don’t have anything to do with cancer. Radiation probably doesn’t have anything to do with stopping cancer. The drugs that we use on people – all those goddamn horrible poisons – they’re no less toxic than AZT. And we are doing it to everybody. Everybody’s aunt is being radiated once a goddamn month and given drugs that are going to kill her. We’re dealing with a bunch of witch doctors. The whole medical profession – except for the people that patch you up when you get a broken leg or you have a plumbing problem – is really fucked. It’s just a bunch of people that have become socially important and very rich by thinking about the fact that they might be able to cure the diseases that actually cause people in our society to die. And they can’t do shit about it. It’s scary, that’s what it is.”

    • I’m going to try doing that; I may first try European data. Incidentally, I recently attended a patient-education seminar in the Berlin hospital (Charite) where I’m registered on the waiting list for a transplant, and they gave a pretty detailed and technical presentation of the process of graft selection at the central agency, Eurotransplant I think it’s called. HLA matching is front and center even today, 10 years later in 2014. They did not discuss the factors responsible for graft survival, except to note dryly that the single biggest factor is patient non-compliance (something like 30%? apparently after being transplanted for a while, patients start to think they’re “cured” and stop taking immunosuppressives, or just get careless in other ways). The closing sentence in their slides is that (I translate) “HLA typing is a basic requirement for determining histocompatibility between donor and recipient.” So this 2004 paper does not seem to have had any impact on the state of the art in Europe.

      • Shravan,

        In this paper they say the cause of death was “Drug ingestion, trauma, or violence” in 50% of the cases.(Table 1) This seems odd to me, perhaps “drug ingestion” means not taking your immuno-suppressants?

        • Not taking your immunosuppressants won’t usually kill you (not right away anyway), but it will cause the graft to be rejected, and that just puts you back on dialysis. Given the extremely high number of 50%, it could mean that taking the immunosuppressives eventually killed them due to its side-effects. Some of the immunosuppressives themselves are nephrotoxic, and raise the risk of cancer, infection, etc. It’s possible to catch an infection and not survive that, especially if you are old and unfit.

          But 50% is not a number I have ever seen, and I’ve been reading about this as a patient (not as a medical specialist) for over 30 years. Maybe the category subsumes non-compliant patients, patients who committed suicide by overdosing, and patients who died from the side effects of the drugs? Older dialysis patients at least sometimes kill themselves by overdosing on their blood pressure meds. And immunosuppressives tend to cause wild mood swings and depression, maybe leading to suicide by drug overdose in transplantees. But it’s a weird conflation to call it all “drug ingestion”.

  4. Here are some seminal papers, specifically, this section of “Data Sources and Structure” –

    “Primary Data: The OPTN Data Collection System”

    The “Conclusion” is more self-congratulatory than enlightening:
    “a tremendous effort has been in making these data high-quality and well-organized for research at the SRTR, OPTN, and among other researchers. Further, we have shown that these efforts have paid off. For many research questions, the data submitted to the OPTN are complete and of high quality; for other questions, secondary sources are easily integrated to improve data quality or expand data scope. These resources taken together provide a rich and accurate source of information about the transplant process.”

    THere are those who take great exception to this, citing among other things that

    Data sources and structure:

    I am always amazed by the number of journal articles with absolutely meaningless “Conclusions.” An example from
    “Analytical Approaches for Transplant Research”:

    Analyses of outcomes—including organ procurement rates, transplantation rates, graft failure rates, and mortality rates—require a thoughtful choice of analytical methods, particularly regarding censoring. Such analyses are of value to the entire community, including patients, clinicians, and policy analysts.

    What are they trying to say here?

    Here is a link to the latest data:

    • Dan,

      This is an attempt to address (at least in part — I’m saying “address” rather than “answer” deliberately) your question, “What are they trying to say here?”

      1. “Analyses of outcomes—including organ procurement rates, transplantation rates, graft failure rates, and mortality rates—require a thoughtful choice of analytical methods”

      Careful choice of analysis method is always the case in any statistical study. What is missing (in my opinion) from the “conclusion” is something like, “Therefore, it is important that analyzers give reasons for their choice of methods, and that there is thoughtful critique of these methods and comparison of results of methods that may have equally strong support for their use”

      2. Possibly the word “censoring” is bothering you. (I don’t know what your statistics background is.)

      In statistics, “censoring” does not have the everyday meaning of “choosing to delete some information”. In the technical statistical use, it refers to data with some entries that are not known precisely, but instead are only known to be “at least this amount,” or “at most this amount,” or something similar. This often occurs in medical data — for example, if follow-up is only for two years, then for some patients all you can say is that the method worked for at least two years, or that the patient lived at least two years beyond treatment, etc. This, as you might imagine, makes analysis of the data much more difficult than if everything were known exactly. In particular, it makes the choice of method more difficult, so choosing the method carefully and being transparent about reasons for the choice is especially important when there are censored data.

  5. I think the issue is the length of follow-up.
    I can’t find any explicit reference to the censoring date, but looking at the Kaplan-Meier graphs, it looks like the end of 2000.
    In the Kaplan-Meier graphs you can see that this results in progressively shorter periods of follow-up in each of the calendar year cohorts: 5-year follow-up for those treated in 1995 down to only 2 years for those treated in 1998.

    The number of events (deaths) determines the statistical power, so with shorter periods of follow-up there will be fewer events.

    Now lets look at the HLA mismatch results in Table 2.
    It looks like a high HLA mismatch score is a bad thing. This means that those with a high HLA score will tend to fail earlier than those with lower HLA score. This means that if you were only interested in statistical significance, you would be less likely to observe this in the lower HLA mismatch scores if follow-up was short.

    Given this, I don’t find the pattern of “statistically significant” results in Table 2 all that surprising.

Leave a Reply

Your email address will not be published. Required fields are marked *