Thank you, James Watson. Thank you, Peter Ellis. (Lancet: You should do the right thing and credit them for your retraction. Actually, do one better and invite them to write a joint editorial in your journal.)

So, Lancet issued a retraction of that controversial hydro-oxy-choloro-supercalifragilisticexpialadocious paper.

From three of the four authors of the now-retracted paper:

After publication of our Lancet Article, several concerns were raised with respect to the veracity of the data and analyses conducted by Surgisphere Corporation and its founder and our co-author, Sapan Desai, in our publication. We launched an independent third-party peer review of Surgisphere with the consent of Sapan Desai to evaluate the origination of the database elements, to confirm the completeness of the database, and to replicate the analyses presented in the paper.

Our independent peer reviewers informed us that Surgisphere would not transfer the full dataset, client contracts, and the full ISO audit report to their servers for analysis as such transfer would violate client agreements and confidentiality requirements. As such, our reviewers were not able to conduct an independent and private peer review and therefore notified us of their withdrawal from the peer-review process.

We always aspire to perform our research in accordance with the highest ethical and professional guidelines. We can never forget the responsibility we have as researchers to scrupulously ensure that we rely on data sources that adhere to our high standards. Based on this development, we can no longer vouch for the veracity of the primary data sources. Due to this unfortunate development, the authors request that the paper be retracted.
We all entered this collaboration to contribute in good faith and at a time of great need during the COVID-19 pandemic. We deeply apologise to you, the editors, and the journal readership for any embarrassment or inconvenience that this may have caused.

And from Lancet:

Today, three of the authors of the paper, “Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis”, have retracted their study. They were unable to complete an independent audit of the data underpinning their analysis. As a result, they have concluded that they “can no longer vouch for the veracity of the primary data sources.” The Lancet takes issues of scientific integrity extremely seriously, and there are many outstanding questions about Surgisphere and the data that were allegedly included in this study. Following guidelines from the Committee on Publication Ethics (COPE) and International Committee of Medical Journal Editors (ICMJE), institutional reviews of Surgisphere’s research collaborations are urgently needed.

It’s good they retracted this paper, also good that it only took them a couple weeks. That’s much faster than the 12 years it took them to retract Wakefield’s vaccine paper.

Better than nothing, but . . . they write, “several concerns were raised with respect to the veracity of the data and analyses.”

I have two problems here.

1. Even had the data been clean, there were many questions raised about the analysis, not just veracity but also statistical appropriateness. Even if the other authors of this paper trusted Surgisphere’s data, why should they have trusted their statistical analysis?

2. What’s with the passive voice? “Several concerns were raised . . .” They should give credit to James Watson and Peter Ellis, along with the (mostly anonymous) commenters on pubpeer. Watson and Ellis did a lot of work, and they stuck their necks out. They should be thanked by the author of the paper and the editors of the journal.

It’s frustrating to me how this sort of retraction is considered to be an embarrassment to be swept under the rug.

What Lancet should really do is invite Watson and Ellis to write an editorial for their journal, detailing the story of how they became suspicious and what happened from there.

I also agree with Zad, who writes:

Of course, it would now be great if med journals like it would retract studies for bad analyses and designs and not just fraud, but that’s still wishful thinking.

No cat picture, though!

44 thoughts on “Thank you, James Watson. Thank you, Peter Ellis. (Lancet: You should do the right thing and credit them for your retraction. Actually, do one better and invite them to write a joint editorial in your journal.)

  1. Good morning everyone! I stumbled across this blog as I was going down the rabbit hole of this whole Surgisphere debacle/Lancet-gate issue.

    On the Lancet and NEJM articles, I’d say, at a guess, that Dr Desai made it all up but was smart enough to make it look and feel real.

    Does anyone really believe the reason Dr Desai would not make any of the data, ISO certification, contracts with hospitals, or any data available to independent investigators, was because of “confidentiality”.

    Reading the Lancet paper, I was struck by how similar the first paragraphs of the Methods were to the Surgisphere website. Just a bunch of meaningless techno-marketing-speak word soup.

    We had a case, on a much smaller scale & with much less global significance, here in NZ of people claiming brilliant AI at their disposal, but zero evidence for it. The parallels with Surgisphere and its ‘QuartzClinical’ are striking. You can find the NZ story here, about ‘Zach’ the non-existent AI that its makers refuse to admit is not real: https://thespinoff.co.nz/society/13-01-2020/rip-zach-probe-finds-serious-wrongdoing-over-miracle-medical-ai/

    Thanks & kia kaha (stay strong)!

    • @ Tracy Harrison

      I disagree with your premise that Dr. Desai “made it all up”. I think that he had help. Why? Yesterday, the NEJM published another paper titled “A Randomized Trial of Hydroxychloroquine as Post-exposure Prophylaxis for Covid-19”. The paper claims to have enrolled 821 participants who were exposed to COVID positive cases. It reaches the conclusion that Hydroxychloroquine was no better than placebo at preventing “clinical illness”. The primary outcome was stated to be “the incidence of either laboratory confirmed COVID-19 or illness compatible with COVID-19 within 14 days.”

      For a study with a primary outcome of laboratory confirmed COVID-19, which also reports to confirm symptoms with epidemiological linkage, and specifically uses the term incidence, it is remarkable that there were only 20 “confirmed” laboratory findings of COVID-19 anywhere in the study. Also, the paper gives it away when it states under Methods that “Participants had known exposure(by participant report)…”

      So lets go back in time. The study states that the first participants were enrolled in the study on March 17. On March 18, a Minnesota outlet called the StarTribune, published an article titled “Minnesota COVID-19 testing is restricted amid supply shortage” which talks about…. not being able to test for COVID-19.

      Also on March 18, the Surgisphere website posted a COVID19 Triage Tool with claims that it could accurately predict COVID-19 cases in the absence of actual testing. Most unusual though was the statement on the webpage that “This blurb was written on Wednesday, March 18, 2020.” Hmmm

      Getting back to the NEJM paper published yesterday, the authors did manage to cite the Lancet debacle, in footnote 7 (which was hidden between 5-8). Of course the paper also cites footnote 13, which references “The REDCap Consortium: Building an international community of software platform partners.” The paper was published in June 2019, yet if you try to access it on the NIH website, you will find an “Embargo” note with a publish date of July 2020. If you go to the Elsevier site (who owns the Lancet), you can access an abstract and you will find a really pretty graphic that looks like it does EXACTLY what Surgisphere is trying to explain about which we don’t understand.

      Of course the NEJM debacle (I am referring to the paper printed yesterday as RAT-2) leaves so much more to the imagination. Like what are the other “occupational exposures” referenced by the paper, knowing that “a whole bunch” of states gave their positive COVID test results to the police. It is easy to find a needle in a piece of hay. But put a bunch of hay down and it is harder to see where all of the “state data” is going… like to a REDCap application? Of course, the whole race question posed by Dale Lehman also begs the question “did the Minnesota Police have access to George Floyds test data?” and would the statement “I cannot breathe” on a police report count as a “shortness of breath” conversion from a free text field? You have to go to the paper Appendix on page 9 under the header Case Adjudication Process to understand that comment.

      So in short, no, I don’t think that ALL of the Surgisphere data was made up. And I do think that Surgisphere had help. And I am still perplexed why there would be an embargo on the NIH release of the REDCap research paper. But those get harder to answer with every retraction. And to Carlos below… the problem with BIG DATA is that people are compressing BIG DATA into little bits for “sharing research”, but no one is checking to see how valid the little bits are.

      • “The paper claims to have enrolled 821 participants who were exposed to COVID positive cases.”

        You seem sceptical of this paper? I am not sure why. It would seem legitimate to me, and completely unrelated to the observational studies that have been retracted.

        It has ClinicalTrials.gov number, NCT04308668. “This trial was approved by the institutional review board at the University of Minnesota and conducted under a Food and Drug Administration Investigational New Drug application. In Canada, the trial was approved by Health Canada; ethics approvals were obtained from the Research Institute of the McGill University Health Centre, the University of Manitoba, and the University of Alberta.”

        There is also a data sharing statement. The authors’ offer to make available a complete de-identified patient data set within one much of publication.

        Perhaps I misunderstood the point you were making?

      • More concerns about ‘A Randomized Trial of Hydroxychloroquine as Postexposure Prophylaxis for Covid-19’

        So, as they point out themselves the ‘vast majority of the participants, including health care workers, were unable to access testing.’ So then what? They state –
        ‘The primary outcome was prespecified as symptomatic illness confirmed by a positive molecular assay or, if testing was unavailable, Covid-19–related symptoms.’

        What then counts as a ‘covid-19 related symptom’ outcome?

        Well, its rather broad ..

        From ‘probable’ –
        ‘probable cases (the presence of cough, shortness of breath, or difficulty breathing, or the presence of two or more symptoms of fever, chills, rigors, myalgia, headache, sore throat, and new olfactory and taste disorders),’

        we’re pretty familiar with those, to ‘possible’ –

        ‘possible cases (the presence of one or more compatible symptoms, which could include diarrhea).15 ‘

        We assume of course that ref15 (https://cdn.ymaws.com/www.cste.org/resource/resmgr/2020ps/Interim-20-ID-01_COVID-19.pdf) lists these other possible symptoms including diarrea. Well .. no mention of diarrea, but only the much more severe looking
        ‘..
        OR
        Severe respiratory illness with at least one of the following:
        • Clinical or radiographic evidence of pneumonia, or
        • Acute respiratory distress syndrome (ARDS)’
        (which would seem to pretty much come under difficulty breathing etc. anyway).

        So none of the usual criteria but only diarrea or whatever else – that isn’t even stated – is just assumed to ‘indicate’ covid-19 illness.

        But it gets worse, here are some of the possible side effects of hydrochloroquine sulfate (the paper states there were already quite a lot of those), listed at the fda (https://www.accessdata.fda.gov/drugsatfda_docs/label/2017/009768s037s045s047lbl.pdf) –

        ‘Gastrointestinal disorders: .., diarrhea, ..’

        And just that little matter of dose. The FDA regarding hydroxy-cloroquine sulfate use for covid-19 (https://www.fda.gov/media/136537/download):

        ‘The suggested dose under this EUA for hydroxychloroquine sulfate to treat adults and
        adolescents who weigh 50 kg or more and are hospitalized with COVID-19 for whom a
        clinical trial is not available, or participation is not feasible, is 800 milligrams of
        hydroxychloroquine sulfate on the first day of treatment and then 400 milligrams daily for
        four to seven days of total treatment based on clinical evaluation.’

        So 800mg followed by 400mg daily, it doesn’t say 800mg then six hours later the next dose (at 400mg). But the latter is what happens with all those given HQZ in this study.

        And the FDA one is at a level specifically for treatment of at least moderately severe – hospital cases not for prevention. As this is a trial the dose use in it is not recommended against per se and they use the malaria treatment doses. So then if malaria itself is anything to go by – prophylaxis should be nearer to half as low at 400mg per day and once .. per week.

        • HCQ side effects again –
          ‘Gastrointestinal disorders: Nausea, vomiting, diarrhea, and abdominal pain.’

        • > So none of the usual criteria but only diarrea or whatever else – that isn’t even stated – is just assumed to ‘indicate’ covid-19 illness.

          Possible cases include also those presenting *one* of fever, chills, rigors, myalgia, headache, sore throat, and new olfactory and taste disorders.

          The inclusion of diarrhea, a common side-effect, as symptom was done with some care (ignored if it was the only symptom for patients on the active arm) and doesn’t change much the result: “When we excluded 13 persons with possible Covid-19 cases who had only one symptom compatible with Covid-19 and no laboratory confirmation, the incidence of new Covid-19 still did not differ significantly between the two groups: 10.4% in the hydroxychloroquine group (43 of 414 participants) and 12.5% in the placebo group (51 of 407) (P=0.38).” [There seems to be an inconsistency in the reporting, they say “13 persons” in the paper, “10 cases” in the appendix.]

          According to the appendix:

          Possible Cases[2]: 17
          Compatible symptom(s) and epidemiologic link.
          Sore throat (n=7) of whom 3 had nasal symptoms, Anosmia alone (n=3), myalgia alone (n=2), fever with nasal congestion (n=1), fatigue (n=2) with 1 rhinorrhea, diarrhea off study medicines (n=1), diarrhea with rhinorrhea (n=1)

          Adjudicated as Not Compatible as Covid-19 Cases[3]: 19
          Isolated symptoms of headache (n=3), diarrhea (n=4), nasal congestion (n=2), diffuse pruritic maculopapular rash lasting 10 days (n=1), nasal congestion with rhinorrhea (n=1).

          [2] By the U.S. case definition, the “Possible Cases’ are actually considered a “Probable case” due to the epidemiologic linkage. This possible classification is used to distinguish from the more robust symptom-complex presentation as all cases had PCR+ epidemiologic linkage. Of four adjudicated as possible cases using the clinical case definition alone but who were PCR positive (i.e. definitive Covid-19), they had isolated symptoms of: fatigue (n=1), myalgia (n=1), and anosmia and lack of taste (n=2).

          When excluding 10 Possible cases with only one Covid-19 compatible symptom and without lab confirmation, incidence of new Covid-19 did not differ between those receiving hydroxychloroquine at 10.4% (43 of 414) versus placebo at 12.5% (51 of 407; P=0.34).

          [3] While all of these symptoms could possibly occur with Covid-19, some such as diarrhea and headache also overlap with side effect profile of hydroxychloroquine. Particularly when these symptoms occurred during the 5 days of study medicine administration and stopped after day 5, the blinded adjudication process thought these isolated symptoms were less compatible with Covid-19 and more compatible with medication side effects. When diarrhea occurred or persisted after 5 days, these were considered possible. All adjudications were blinded to study arm.

        • > (ignored if it was the only symptom for patients on the active arm)

          I think this was wrong and it was in fact ignored if it was the only sympton and while on medication (either active or placebo).

        • @ carlos

          *subjective* fever, not objective fever. very unreliable. the current “prevalence” of COVID19 is actually still quite low, and even lower during this study period (even NYC had less than 10% “prevalence” during this time). Even then, testing for prior infection (with a false positive rate of 10%) at such a low “prevalence” number makes it somewhat futile. Which means (if common things are still common) that these sore throats could still be strep, post nasal drip, allergic irritant, common cold, etc. Lets also not forget that there were 2 randomizations, one for Canada and one for the US. Canadian COVID rates have been far lower and while one would think that Ontario would be a better match for the US population, Quebec (french canadian) was included instead. Please do not underestimate the impact of regional dialect, which means that the “blinded” investigators (and even the interviewers) could and probably did make assumptions about whether a “symptom” was COVID related or not based upon their knowledge of the “pandemic” in certain areas.

          For all of the time and money to have been spent, front end testing, back end testing and confirmation of exposure through testing should have been (and currently is if one considers health care providers who get stuck by a “contaminated needle” while at work) designed into the study.

          And no, I dont buy the “we did what we could do with what we had.” If you accept that, be prepared for surgeons equipped with butter knives.

        • RE: And no, I dont buy the “we did what we could do with what we had.” If you accept that, be prepared for surgeons equipped with butter knives.

          This is one of the aspects of the current situation that most dismays me. The world is FULL of important research questions that simply are not answerable with any available source of data. Most everything about COVID-19 is included in that.

          Yet there are hundreds (thousands?) of papers being published which ignore the non-existence of usable data and just plow forward pretending that stuff like self-reported symptoms, results from dodgy screening tests and highly self-selected samples are going to magically translate into real measures of the disease.

          Is anybody, anywhere doing the sort of Public Health research into COVID-19 that was described in Epi textbooks back when I was in grad school circa late 20th century? The kind where you use reliable tests to screen random samples from a population and everything is tied to strict, objective case and exposure definitions?

          I dunno, maybe that sort of stuff is still years away and all these nonsensical quickie “studies” are filling the void in the meanwhile.

        • @ Brent Hutto

          You heretic you. Are you suggesting that Translational Research and the Gabillions of dollars thrown into the wind is wasted money? Or that all of the STEM degrees designed to engage our youth and prevent radicalization have little merit?

          Don’t be so catty.

          PS. I checked the mirror. I have more nose hair and ear hair then head hair. I guess that makes me old school too.

        • Here is a better response to your point about how not including Possible’s doesn’t anyway effect overall results:

          The authors’ reference for covid symptom criteria was https://cdn.ymaws.com/www.cste.org/resource/resmgr/2020ps/Interim-20-ID-01_COVID-19.pdf which
          lists the following
          ‘A1. Clinical Criteria
          At least two of the following symptoms: fever (measured or subjective), chills, rigors, myalgia,
          headache, sore throat, new olfactory and taste disorder(s)
          ..’
          Compare that with Fig S3 details that include the unlisted Fatigue Diarrhea Nasal Congestion Rhinorrhea, and Other Symptoms – whatever they are supposed to be.

          But now having checked the Probables category – those with cough alone (accepted there – questionably I think) or one needed symptom along with only the questionable symptoms criteria – give the same totals whether for treatment or placebo.

          But the authors had the opportunity to request repeat covid tests along with antibody tests for this first RCT double -blind study. Recent covid test research has shown that the covid tests can indicate positive for ‘dead viruses’. This implies the covid tests alone could have picked up previously recovered non-symptomatics – ie they may have even originally transmitted it to the person they believed exposed them and developed another problem since.

          The main issue anyway should be death or hospitalisation and there are no deaths and only one hospitalisation for either group. One hospitalisation for each isn’t a helpful statistic especially along with the uncertainties of covid diseaae.

          For the overall uncertainties related to symptom only covid determination and to see that – aside from fatigue – the other symptoms are covid related at below 6% see ‘People with non severe Covid’ in the table here https://www.cebm.net/covid-19/covid-19-signs-and-symptoms-tracker/

        • Eleven cases with one possible symptom weren’t included as developing Covid19 but ..
          another 17 were. These cases were also listed in the Table S2, Supplementary Appendix – add them up – it makes 17:

          ‘Possible Cases 17[Note2] Compatible symptom(s) and epidemiologic link
          Sore throat (n=7) of whom 3 had nasal symptoms, Anosmia
          alone (n=3), myalgia alone (n=2), fever with nasal congestion
          (n=1), fatigue (n=2) with 1 rhinorrhea, diarrhea off study
          medicines (n=1), diarrhea with rhinorrhea (n=1)’

          Four of these (from [Note2]) were lab tested positive anyway – leaving 13 with only one possible symptom. Add together the results within Table S3 and we see these single symptom 13 must have been regarded as covid disease cases:

          ‘Confirmed or Probable Covid-19 49 (11.8%) 58 (14.3%)
          Lab-confirmed Diagnosis 11 (2.7%) 9 (2.2%)
          Probable Covid-19 compatible illness 32 (7.7%) 42 (10.3%)
          Possible Covid-19 compatible illness* 6 (1.5%) 7 (1.0%) ‘
          ..

          So possible becomes probable – how improbable.

          Fatigue? muscle pain? tiredness? Clearly these in isolation are not sufficient criteria in refs 15 (ref2 of the Supp Appendix) and the ‘epidemiological link’ issue doesn’t effect the official criteria for judging illness there either (as far as I can tell).

          Fig S4 also details these cases. 5 ‘Confirmed’ (lab positive) with one adjudicated symptom (N=1) are listed in Fig.S4. That leaves 8 supposed covid19 cases based on no covid19 test – nor antibody test – that are so adjudicated after allowing for many weeks for reply:

          ‘We sent follow-up e-mail surveys on days 1, 5, 10, and 14. A survey at 4 to 6 weeks asked about any follow-up testing, illness, or hospitalizations.’

          So up to six weeks after exposure – or is that more – to allow these symptoms to show up??

          And in respect to those treated – the 13 possibles – I’m finding anomalies from the details in FigS3 (as far as I can see). Two cases marked as having 0 covid19 adjudicated symptoms (N) are nevertheless allocated to the 13 ‘Possible’ – as opposed to the 11 ‘Non Cases’ with symptoms (p.21). Given that we’re supposed to rely on ‘adjudicated possible covid19’ as ‘is covid19’ I might have hoped that at least these results wouldn’t now get skewed towards finding illness for those on treatment. In both these cases the non-adjudicated yet adjudicated covid19 symptoms include .. diarrhea – which doesn’t even make it as a symptom at WHO, nor is officially mentioned in UK https://www.nhs.uk/conditions/coronavirus-covid-19/symptoms/

          So, all count as covid19 –
          Adjudicated probable covid19
          Adjudicated possible covid19
          .. Some non adjudicated possible covid19

        • I see I needn’t have gone over their inclusion of an over-all 13 with possible with symptom, bit of a misreading of your comment there.

        • Somehow I put this in the wrong place before –

          Here is a better response to your point about how not includng Possible’s doesn’t anyway effect overall results:

          The authors’ reference for covid symptom criteria was https://cdn.ymaws.com/www.cste.org/resource/resmgr/2020ps/Interim-20-ID-01_COVID-19.pdf which
          lists the following
          ‘A1. Clinical Criteria
          At least two of the following symptoms: fever (measured or subjective), chills, rigors, myalgia,
          headache, sore throat, new olfactory and taste disorder(s)
          ..’
          Compare that with Fig S3 details that include the unlisted Fatigue Diarrhea Nasal Congestion Rhinorrhea, and Other Symptoms – whatever they are supposed to be.

          But now having checked the Probables category – those with cough alone (accepted there – questionably I think) or one needed symptom along with only the questionable symptoms criteria – give the same totals whether for treatment or placebo.

          But the authors had the opportunity to request repeat covid tests along with antibody tests for this first RCT double -blind study. Recent covid test research has shown that the covid tests can indicate positive for ‘dead viruses’. This implies the covid tests alone could have picked up previously recovered non-symptomatics – ie they may have even originally transmitted it to the person they believed exposed them and developed another problem since.

          The main issue anyway should be death or hospitalisation and there are no deaths and only one hospitalisation for either group. One hospitalisation for each isn’t a helpful statistic especially along with the uncertainties of covid diseaae.

          For the overall uncertainties related to symptom only covid determination and to see that – aside from fatigue – the other symptoms are covid related at below 6% see ‘People with non severe Covid’ in the table here https://www.cebm.net/covid-19/covid-19-signs-and-symptoms-tracker/

        • I agree that self-reporting is not ideal and there is always room for getting better data and making a better analysis. I just pointed out that “possible” cases include the usual criteria (in fact 11 out of 13 cases classified as “possible” had one of those symptoms) and that they report similar results with or without the inclusion of possible cases (11.8%/14.3%, P=0.35 vs 10.4%/12.5%, P=0.34). Apart from that, I have no particular interest in defending their methods or results.

  2. > Lancet: You should do the right thing and credit them for your retraction.

    Is it accurate to say that the Lancet has retracted the paper? The authors (most of them, at least) have done it.

    There are different paths to retraction and it would have been a bit different if all of them have done it jointly saying something like “unfortunately our confidentiality agreements make it impossible to blah, blah, blah”. The way it happened looks worse.

    This sets a relatively high bar for future EHR big-data data-minining efforts (but it would have been higher if the retraction had not been “voluntary”).

  3. I am puzzled by the retraction. Three of the authors retracted the study because the company wouldn’t give the data to independent peer reviewers. Didn’t these original authors have access to the data? How can authors not have access to data? I understand that confidentiality requirements might generally keep some portions of the data from being disclosed, like patient’s name etc. But, I genuinely don’t understand why that was needed for a independent review. And, I genuinely don’t understand how the authors could have signed on as authors of the paper if they did not have access to the data to begin with.

    • The cynical supposition would be Desai wrote up an analysis of his supposed data and invited the other three to author a paper almost guaranteed to generate an enormous amount of attention and prestige, overnight. They thought Desai’s finders were important and impactful and wanted to help get the word out as fast as possible, without bothering to do any data checking or analysis themselves.

      But that’s just one of the many explanations that fit the observed facts. Maybe they were in on the scam all along.

        • Well, consider this excerpt from the paper Surgisphere published in their journal (2013), 2 of whose authors were authors on the NEJM and Lancet papers:
          “The most serious cause of fraud in medical publishing is manufactured data that authors use to
          support high impact conclusions. The Journal of Surgical Radiology requests primary, de-identified
          data, from authors and completes its own statistical analysis on the information. Co-authors are asked
          to review this data and sign off on its authenticity, with the belief that fraud is less likely to be
          perpetuated among multiple stakeholders than from a single author.”

          Of course, we now know this was not actually published in that journal, but was intended to be there. Instead it appeared in some mysterious place called “Publications.”

        • Can you point to where that statement actually appears? I see it in the twitter thread but I can’t find the statement in the manuscript itself.

    • To be fair, I have been involved with projects that involved electronic medical records enriched with social security information (like unemployement benefits). Particularly in Europe access restrictions around that sort of data are pretty strict*. And we had a set-up where we jointly wrote and analysis plan and reviewed the analyses, but only a small number of collaborators from one institution were really hands-on with the data. On the other hand, we did disclose this clearly. There was also a long track record of a huge number of people having been involved with the creation of the database, publications about creating it in the first place and an academic steering committee from multiple universities overseeing the effort. Additionally, I certainly got a lot of questions answered (e.g. I knew the geographic region, the types of hospitals, I knew time frames, I got a lot of explanations on how data were captured/combined etc.).

      * I guess if we talk “just” electronic medical records, the NHS in the UK is now trying to come up with ways of doing such analyses efficiently and reponsibly – see e.g. https://www.nhs.uk/your-nhs-data-matters/ – and if I understood correctly, it may again only be some institutions that would have access (and others would have to collaborate with them). In contrast, in the US the HIPAA restrictions are at least a little more permissive, so you get anonymized research datasets like MIMIC being truly public. Perhaps more similar to the Surgisphere situation: If I buy access (which certainly a research institution could do and my company has done) to one of the big US databases, you typically get a lot of information (but for example not free text doctor’s notes) and you tend to not get exact identity of hospitals (you might know e.g. the state in the US). Additionally, you would not just be able to send that data to someone else, but you could most certainly (possibly at a cost) get someone else access.

      In any case, it seems clearly inappropriate, if authors claimed to have (had?) access to the data and never did, if that is what happened. Especially when relying on a somewhat unknown actor with not much of a previous track record, you’d have thought they would have been more cautious about signing on to these publications.

  4. [Tin foil hat on]
    What the likelihood that they wouldn’t provide the data because they weren’t authorized to have it in the first place (and so releasing it would show that they got it through nefarious means)? (My bayesian prior is >0.)

    • @ divalent

      Could this be a silly cat in a tin hat who has hijacked a keyboard? For someone like a Surgisphere to have gotten a hold of that much data through nepharious means… well that might mean that “folks” like the CDC and CMS were in on it too. And it might require the backing of a very big institution with a lot of clout (read money to spread around). And probably skin in the game too.

      Of course that might also explain the “messed up” test kits sent out by the CDC and the multiple “supporting documentation waivers” released by CMS at that point in time. The researchers did claim a whole lot of PCR (polymerase chain reaction) confirmations that would be quite improbable at that early point in time. PCR testing, like statistics, is easy. Anyone can do it. Now if you want to discuss accuracy of results or even integrity of process, well my non disclosures prevent me from doing so.

      Remember the “no ethics review” part of the story? Lightening strikes come in threes.

      Instead of feeding a conspiracy theory, I am going to patiently wait for Dr. Gelman to cat claw the June 3 NEJM study. I hope one of his TAs is standing by with the catnip, because he might need CAT scan when he blows a vessel.

      Thats the best I can do without a cat spinning in a dryer.

  5. The wonderful thing about the retractions is that post-publication peer review worked. People from outside the publication establishment were able to get those inside the system to do the right thing. The terrible thing about the retractions is that post-publication peer review worked. People inside the publication establishment will now point to this episode as affirming their system, and obviating the call for reform. The paper was retracted, so the system worked, see? Nevermind that the paper shouldn’t have gotten published in the first place, and reform could prevent that in the future. No system is perfect, they will say, and if only one (ha-ha) bad paper makes it through, and only in the midst of the rush to generate the science crucial to fighting off a pandemic, that’s a pretty good record, right? And yeah, Wakefield-shmakefield, a bad paper every 12 years is also a pretty good record.

    In summary: Journals have historically minimized the problems with their review systems, and ignored outsiders’ criticism, until absolutely forced to acknowledge them by outside pressure. The next time outside efforts are brought to bear against their systems, I predict they will point to this episode as an example of why the problems with their system are minimal and the latest complaints should be ignored. Science!

  6. Zad: “retract studies for bad analyses and designs and not just fraud, but that’s still wishful thinking.”
    Hey sometimes wishes come true.

    But efforts of many help!

    I still want random audits of all published studies.

    No mercy, no excuses. But with understanding of mistakes, miss-understanding and extreme prejudice for good intentions.

    • @ keith o’rourke

      Or better yet, have a data and statistical validation performed prior to turning the work over to a peer reviewer prior to publishing. If it becomes the gold standard then IRBs and ERBs would know, like in The Departed, “No Tickie, No Laundry”. Plus, why should folks have to waste time cleaning up the trash when the publisher can simply take it to the curb?

  7. I’d add Melissa Daley (of the Guardian) to the thanks list — her team contacted enough Australian hospitals to prove that the Australian data could not possibly be real. While the Guardian story only came out hours before the retraction, the researchers were asked for comment in advance, so it could well have played a role in when they folded.

    • Thomas:

      More generally, I expect that news media attention made all the difference. I could’ve screamed all day and all night on this blog and I’m guessing that Harvard and Lancet would not have listened.

      Just for example, do you think the Association for Psychological Science will care what I think about their published research on racial explanations for social disparities? I think not! But if the Guardian etc. were to run some stories about it, then maybe they’d decide it was worth a look. When there’s no spotlight, people will hide until the controversy disperses on its own.

      And the sad thing is, that this I don’t think this retraction will help Lancet’s reputation. Sure, they retracted, but the takeaway is, “Don’t trust Lancet. They have a habit of publishing controversial papers that are fatally flawed.” No one’s gonna say, “Yeah, Lancet is great. After a Guardian investigation, they finally decided to give up on that one article.”

      • “And the sad thing is, that this I don’t think this retraction will help Lancet’s reputation. Sure, they retracted, but the takeaway is, “Don’t trust Lancet. They have a habit of publishing controversial papers that are fatally flawed.

        And this exactly what is happening. Also, not just Lancet. People say don’t trust science in general. If Lancet and NEJM could get it wrong, so almost every journal out there is tarnished.

        Data fabrication and falsification is very hard to catch. So I am not sure what Lancet could have done better. I think Open peer review is one way to , especially BMJ shows it can be done. Also, maybe pay the reviewers??

      • Dr. Gelman, I am a simple caveman. And I dont go to an orthopedist every day. But I do when I have a broken bone. The Lancet has a broken reputation. And you are a reputation surgeon. The patient is in the OR, supine on the table. You can let them bleed to death, you can let them maybe heal themself (but probably with a horrible disfigurement) or you can go in with a team and show them how to do it right. That this publication happened is tragic. But how and where it goes from here, yes, you can make a difference. And maybe that team requires a Melissa Daveys. I bet she would love to cover a story about the redemption of research in medicine. And I bet she would if you asked her.

        But no, you dont get to hide behind a statistical probability without even trying. (insert “Braveheart motivational speech about being on your deathbed” here). Too much is at stake with this one.

  8. Very interesting that even (a subset of) the authors had never at any stage had access to the data (not even in principle?) behind a paper that had their name on it…

    I recently became an associate editor the Linguistic Society of America flagship journal Language, as their statistics consultant. My job is to evaluate any paper that has a statistical component (and it seems a high proportion of linguistics papers do have this nowadays). I’m trying to enforce data+code release at the review process itself, and my reviewers are looking at the code+data as part of the review (and I am too). Including data and code as part of the review definitely improves the paper in my experience.

    So I am puzzled as to why anonymized data could not be made available with this paper. I understand the confidentiality issues; but these issues can be dealt with in many cases. For example, we work with individuals with aphasia, and with sufficient anonymization steps there is no chance that confidentiality will be violated.

    On a side note, I have heard psycholinguists say that EEG data should not be released because one day it may be possible to identify an individual just from their EEG recordings. I don’t know if this is a cop-out or a real concern.

    • I think that in all but extreme cases, “It might be possible to identify someone one day” is not a reason to withhold data. There has to be a credible reason why someone might want to know that Ms A spent 20 minutes connected to an EEG recording while listening to either Brahms or the Beatles, depending on which condition she was randomised to, even if the idea of reversing EEG data to determine identity wasn’t absurd on its face.

      If I’m in hospital, I know that doctors will share my records with other medical professionals whom they deem it appropriate to consult, on the basis that those others will not then paste copies all over Instagram. I think it’s reasonable for peer reviewers to be entrusted with a copy of the dataset for an article that they are examining, on the same basis. Even quite sensitive patient data, which might not be appropriate to upload to OSF, are not the nuclear launch codes.

      • Nick said,
        “Even quite sensitive patient data, which might not be appropriate to upload to OSF, are not the nuclear launch codes.”

        I assume that by “OSF” you mean the Open Science Framework, and not the Order of Saint Francis, which owns and operates OSF HealthCare.

  9. To me, the really striking thing about the retraction is that the three authors retracting the paper don’t have access to the data. They brought in some independent group to obtain the data from their co-author and perform an independent analysis. Similarly, they don’t seem to consider themselves qualified to analyse the data so they outsource this to someone else. They seem to have made just two contributions to this project: affixing their names to the publication and retracting it.

    That said, they do deserve some credit for the retraction. They could have just toughed it out and it is likely that the Lancet would have fully backed them.

Leave a Reply to Steve Cancel reply

Your email address will not be published. Required fields are marked *