When conclusions are unverifiable (multilevel data example)

A. B. Siddique, Y. Jamshidi-Naeini, L. Golzarri-Arroyo, and D. B. Allison write:

Ignoring Clustering and Nesting in Cluster Randomized Trials Renders Conclusions Unverifiable

Siraneh et al conducted a clustered randomized controlled trial (cRCT) to test the effectiveness of additional counseling and social support provided by women identified as “positive deviants” to promote exclusive breastfeeding (EBF) within a community. However, their statistical methods did not account for clustering and nesting effects and thus are not valid.

In the study, randomization occurred at the cluster level (ie, kebeles), and mothers were nested within clusters. . . . Because this is a hierarchical modeling environment and individuals within a cluster are typically positively correlated, an individual-level analysis that does not address clustering effects will generate underestimated standard errors and unduly narrow confidence intervals. That is, the results will overstate statistical significance.

That’s right! They continue:

One alternative is calculating the mean observation by cluster and analyzing the data at the cluster level. . . . A valid alternative would be to use multi-level hierarchical modeling, which recognizes the hierarchy in the data and accounts for both lower and higher levels as distinct levels simultaneously.

Right again.

So what happened in this particular case? Siddique et al. tell the sad story:

We requested the deidentified raw data and statistical code from the authors to reproduce their analyses. Even though we pledged to limit our analysis to testing the hypotheses tested in the article, and the Editor-in-Chief deemed our request “appropriate and reasonable”, the authors were unwilling to share their deidentified raw data and statistical code.

Unwilling to share their deidentified raw data and statistical code, that’s not good! What was the reason?

They said they needed time to analyze the “remaining data” for publication and that the dataset contained identifiers.

Whaaaa? They specifically asked for “deidentified data,” dude. In any case, the authors could’ve taken about 5 minutes and reanalyzed the data themselves. But they didn’t. And one of the authors on that paper is at Harvard! So it’s not like they don’t have the resources.

Siddique et al. conclude:

Given the analytical methods used, the evidence presented by Siraneh et al1 neither supports nor refutes whether a positive deviance intervention affects EBF. The analytical methods were incorrect. All authors have an ethical and professional scientific responsibility to correct non-trivial reported errors in published papers.

Indeed. Also if the authors in question have any Wall Street Journal columns, now’s the time to pull the plug.

My reason for posting this article

Why did I post this run-of-the-mill story of statistical incompetence followed by scientific misbehavior? There must be millions of such cases every year. The reason is that I was intrigued by the word “verifiable” in the title of Siddique et al.’s article. It reminds me of the general connection between replicability and generalizability of results. For a result to be “verifiable,” ultimately it has to replicate, and if there’s no evidence to distinguish the statistical data from noise, then there’s no reason we should expect it to replicate. Also, when the data are hidden, that’s one more way things can’t be verified. We’ve seen too many cases of incompetence, fraud, and just plain bumbling to trust claims that are made without evidence. Even if they’re published in august journals such as Psychological Science, the Proceedings of the National Academy of Sciences, or Risk Management and Healthcare Policy.

P.S. The paper by Siddique et al. concludes with this awesome disclosure statement:

In the last thirty-six months, DBA has received personal payments or promises for same from: Alkermes, Inc.; American Society for Nutrition; Amin Talati Wasserman for KSF Acquisition Corp (Glanbia); Big Sky Health, Inc.; Biofortis Innovation Services (Merieux NutriSciences), Clark Hill PLC; Kaleido Biosciences; Law Offices of Ronald Marron; Medpace/Gelesis; Novo Nordisk Fonden; Reckitt Benckiser Group, PLC; Law Offices of Ronald Marron; Soleno Therapeutics; Sports Research Corp; and WW (formerly Weight Watchers). Donations to a foundation have been made on his behalf by the Northarvest Bean Growers Association. Dr. Allison is an unpaid consultant to the USDA Agricultural Research Service. In the last thirty-six months, Dr. Jamshidi-Naeini has received honoraria from The Alliance for Potato Research and Education. The institution of DBA, ABS, LGA, and YJ-N, Indiana University, and the Indiana University Foundation have received funds or donations to support their research or educational activities from: Alliance for Potato Research and Education; Almond Board; American Egg Board; Arnold Ventures; Eli Lilly and Company; Haas Avocado Board; Gordon and Betty Moore Foundation; Mars, Inc.; National Cattlemen’s Beef Association; USDA; and numerous other for-profit and non-profit organizations to support the work of the School of Public Health and the university more broadly. The authors report no other conflicts of interest in this communication.

Big Avocado strikes again!

4 thoughts on “When conclusions are unverifiable (multilevel data example)

  1. These practices are not limited to social science research. Sadly “unverifiable” is exactly how I felt reading many of the Covid19 related publications throughout the pandemic. Observational data are pushed through very simplistic models described verbally. Data are withheld due to “privacy” when they can be deidentified and/or aggregated. It’s particularly jarring to read something along the lines of “we also did analysis Y and the conclusion would be the same so we omit this from the paper”.

  2. I won’t name names though perhaps I should. There was data that would have been
    useful, from a paper in a journal where one is obliged to make the data public.
    We asked repeatedly for the data and got no response for a long time.
    We persisted, and were eventually told

    “The problem is at you might spot something that we hadn’t noticed…”

  3. One problem with verifiability is that even if the code and data are shared, the analysis as reported can be difficult to reproduce:

    Anna Laurinavichyute, Himanshu Yadav, and Shravan Vasishth. Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy. Journal of Memory and Language, 125, 2022.

    This isn’t necessarily incompetence (but sloppy coding practices play a big part–I am guilty there too). It’s just hard to reproduce stuff.

    • This is a really good paper. It was eye opening to see that even in the case where code and data were provided the ability to reproduce results was incredibly poor (never mind the garden of the forking path issues to tack on after that).

      I plan on incorporating it into a talk / workshop for some of the researchers at my college (primarily a teaching institution) in the new year. Along with a heavy dose of R / RMarkdown.

Leave a Reply

Your email address will not be published. Required fields are marked *