“Fill in the Blank Leads to More Citations”: Junk science and unavailable data at the European Heart Journal

Retraction Watch links to this post by Phil Davis, who writes:

Even a casual reader of the scientific literature understands that there is an abundance of papers that link some minor detail with article publishing — say, the presence of a punctuation mark, humor, poetic, or popular song reference in one’s title — with increased citations. Do X and you’ll improve your citation impact, the researchers advise.

Over the past few years, the latest trend in X leads to more citations research has focused on minor interventions in social media, like a tweet.

There is considerable rationale for why researchers would show so much enthusiasm for this type of research. Compared to a clinical trial on human subjects, these studies are far easier to conduct.

Snap!

Davis continues:

As a social media-citation researcher, all you need is a computer, a free account, and a little time. This is probably why there are so many who publish papers claiming that some social media intervention (a tweet, a Facebook like) leads to more paper citations. And given that citation performance is a goal that every author, editor, and publisher wishes to improve, it’s not surprising that these papers get a lot of attention, especially on social media.

Well put. Next comes the example:

The latest of these claims that X leads to more citations, “Twitter promotion is associated with higher citation rates of cardiovascular articles: the ESC Journals Randomized Study” was published on 07 April 2022 in the European Heart Journal (EHJ), a highly-respected cardiology journal. The paper . . . claims that promoting a newly published paper in a tweet, increases its citation performance by 12% over two years. The EHJ also published this research in 2020 as a preliminary study, boasting a citation improvement by a phenomenal 43%.

OK, publicity helps, so this doesn’t seem so surprising to me. On the other hand, Davis writes,

Media effects on article performance are known to be small, if any, which is why a stunning report of 43% followed by a less stunning 12% should raise some eyebrows. The authors were silent about the abrupt change in results, as they were with other details missing from their paper. They were silent when I asked questions about their methods and analysis, and won’t share their dataset . . . According to the European Heart Journal’s author guidelines, all research papers are required to include a Data Availability Statement, meaning, a disclosure on how readers can get authors’ data when questions arise.

Hey, that’s not good! Davis continues:

Based on my own calculations, I suspect that the Twitter-citation study was acutely underpowered to detect the reported (43% and 12%) citation differences. Based on the journals, types of papers, and primary endpoint described in the paper, I calculated that the researchers would require at least a sample size of 6693 (3347 in each arm) to detect a 1 citation difference after 2 years (SD=14.6, power=80%, alpha=0.05, two-sided test), about ten-times the sample size reported in their paper (N=694). With 347 papers in each study arm, the researchers had a statistical power of just 15%, meaning they had just a 15% chance of discovering a non-null effect if one truly existed. For medical research, power is often set at 80% or 90% when calculating sample sizes. In addition, low statistical power often exaggerates effect sizes. In other words, small sample sizes (even if the sampling was done properly) tend to generate unreliable results that over-report true effects. This could be the reason why one of their studies reports a 43% citation benefit, while the other just 12%.

And where are the data?

I [Davis] contacted the corresponding author . . . by email with questions about sample size calculation and secondary analyses (sent 5 May 2022) but received no response. One week later, I contacted him again to request a copy of his dataset (sent 10 May 2022), but also received no response. (I do have prior correspondences from [the author] from earlier this year.)

After another week of silence, I contacted the European Heart Journal’s Editor-in-Chief (EiC) on 16 May 2022, asking for the editorial office to become involved. I asked for a copy of the authors’ dataset and for the journal to publish an Editorial Expression of Concern for this paper. The response from the editorial office was an invitation to submit a letter to their Discussion Forum, outlining my concerns. If accepted, EHJ would publish my letter along with the author’s response in a future issue of the journal. The process could take a long time and there was no guarantee that I would end up with a copy of the dataset.

Yup. I’ve seen this sort of thing before. It’s too hard to publish criticisms and obtain data for replication.

Davis continues:

The European Heart Journal has clear rules for the reporting of scientific results and even clearer rules for the reporting of randomized controlled trials. The validity of the research results described in the paper is irrelevant. Required parts of the paper are missing. Even if my analysis had shown that the conclusions were justifiable, the authors clearly violated two EHJ policies.

I agree. This is like Columbia University’s attitude to being caught with their hand in the U.S. News rankings cookie jar. If you’re a public-serving institution and you break the rules, the appropriate response is to make things right, not to hide and hope the problem goes away on its own.

Davis followed up the next week:

More than a month after my initial request, the editorial office of the European Heart Journal provided me [Davis] a link to the authors’ data (as of this writing, there is no public link from the published paper). . . .

The statistics for analyzing and reporting the results of Randomized Controlled Trials (RCTs) are often very simple. Because treatment groups are made similar in all respects with the exception of the intervention, no fancy statistical analysis is normally required. This is why most RCTs are analyzed using simple comparisons of sample means or medians.

Deviating from this norm, the authors of the Twitter-citation paper used Poisson regression, a more complicated model that is very useful in some fields (e.g. economics) when analyzing data with lots of independent variables. However, Poisson regression is limited in its application because it comes with a big assumption — the mean value must equal the variance. When this assumption is violated, the researcher should use a more flexible model, like the Negative Binomial.

That’s right! We discuss this in chapter 15 of Regression and Other Stories. Good catch, Davis!

He continues:

Using Poisson regression on their data, I got the same results as reported (12% citation difference, 95% confidence interval 8% to 15%, p<0.0001), which appears to be a robust and statistically significant finding. However, the model fits the data very poorly. When I analyzed their dataset using a Negative Binomial model, the data were no longer significant (13%, 95% C.I. -5% to 33%, p=0.17). Yes, the estimate was close, but the confidence interval straddled zero. Using a common technique when dealing with highly-skewed data (normalizing the data with a log transformation) and employing a simple linear model also provided non-significant results (8%, 95% C.I. -7% to 25%, p=0.33). Similarly, a simple comparison of means (t-test) was non-significant (p=0.17), as was the non-parametric (signed-rank) equivalent (p=0.33). In sum, the only test that provided a statistically significant finding was the one where the model was inappropriate for the data.

Ouch!

And more:

The authors didn’t register their protocols or even provide justification for a Poisson regression model in their preliminary paper. A description of how their sample size was determined was missing, as was a data availability statement — both are clear violations of the journal’s policy. The editorial office was kind enough to provide me with a personal link to the dataset, but it is still not public. . . . No one is willing to admit fault, and the undeclared connection of several authors with current or past EHJ editorial board roles raises questions about special treatment.

Davis concludes:

A tarnished reputation is deep and long-lasting. I hope the editors of EHJ understand what they are sacrificing with this paper.

I’d like to believe that, but I don’t know. Lancet has published lots of politically-motivated junk, and that doesn’t stop the news media and, I guess, medical researchers, from taking it seriously. Harvard Law School is full of plagiarists and it’s still “Harvard Law School.” Malcolm Gladwell keeps selling books. Dr. Oz is on track to enter the U.S. Congress. I guess the point is that reputations are complicated. This story will indeed tarnish the reputation of the European Heart Journal, but just by a little bit, which I guess is appropriate, as it’s just one little article, right? Also the journal’s behavior, while reprehensible, is absolutely standard—it’s the same way that just about any scholarly or scientific journal will act in this situation. Kinda like maybe one reason that other universities didn’t make more for of a fuss about Columbia’s U.S. News thing is that maybe lots of them have skeletons in their data closets.

I’m glad that Davis did this deep dive. I’m just less confident that this incident will do much to tarnish the reputation of the European Heart Journal, or, for that matter, its parent organization, Oxford University Press. According to its webpage, the journal has an impact factor of 29.983. That’s what counts, right? The only question is whether this paper with the unavailable data, underpowered sample, and bad analysis will pull its weight by getting at least 30 citations. And, hey, after all this blogging, maybe it will!

6 thoughts on ““Fill in the Blank Leads to More Citations”: Junk science and unavailable data at the European Heart Journal

  1. Forget about how you can try out different analyses that generate arbitrary numbers and compare to an arbitrary threshold, in order to fail at answering a question no one asked. Why are papers on increasing citation count even a thing?

    I think the world would be better off if none of I just read ever happened. At best it is a total waste of time. I guess maybe theres some benefit in a “jobs program” sense, but there has to be a more productive way for this energy/time to be spent.

      • I guess that some analysis of citation patterns could be useful.

        Definitely. When I really wanted to do a thorough lit review for myself, I wrote a script that scraped the titles/abstracts/authors/etc of every paper on the topic from pubmed. Then I could look at publications per author/university, how it developed over time, who cited who when certain keywords were present and so on. It was very useful to orient myself on the topic.

        If it had been easier to scrape the entire papers I would have likely tried to go further and semi-automatically digitize the data in each figure as well. Then the results from similar experiments could be easily compared.

        That is nothing like what is described here though.

        • who cited who

          Actually, I did not achieve this one. It was who was a co-author with who. Maybe today it would be easier to get that info though. It would definitely be useful.

    • For me, papers on increasing citation counts are valuable as an indicator of the unreliability of citation counts as a quality metric. I read them as cautionary tales rather than advice, and I had hoped that I was in the majority in that regard. Perhaps not.

  2. Log-link (Poisson) regressions with inference based on the Poisson distribution are really bad. I bump into people misusing them in many settings, where they often generate totally misleading inference. Just use quasi-Poisson (ie use “robust” standard errors). I presume using that here would similarly make the results disappear.

Leave a Reply

Your email address will not be published.