Dead Wire

Kevin Lewis pointed me to this quote from a forthcoming article:

Daniele Fanelli and colleagues examined more than 3,000 meta-analyses covering 22 scientific disciplines for multiple commonly discussed bias patterns. Studies reporting large effect sizes were associated with large standard errors and large numbers of citations to the study, and were more likely to be published in peer-reviewed journals than studies reporting small effect sizes. The strength of these associations varied widely across research fields, and, on average, the social sciences showed more evidence of bias than the biological and physical sciences. Large effect sizes were not associated with the first or last author’s publication rate, citation rate, average journal impact score, or gender, but were associated with having few study authors, early-career authors, and authors who had at least one paper retracted. According to the authors, the results suggest that small, highly cited studies published in peer-reviewed journals run an enhanced risk of reporting overestimated effect sizes.

My response:

This all seems reasonable to me. What’s interesting is not so much that they found these patterns but that they were considered notable.

It’s like regression to the mean. The math makes it inevitable, but it’s counterintuitive, so nobody believes it until they see direct evidence in some particular subject area. As a statistician I find this all a bit frustrating, but if this is what it takes to convince people, then, sure, let’s go for it. It certainly doesn’t hurt to pile on the empirical evidence.

12 thoughts on “Dead Wire

    • +1
      The abstract sounds just like your average, run of the mill social psych paper:
      (1) ‘Associated’ used without a proper qualifier – probably something like linear correlation, which sometimes seems to be the only kind of association a social psych researcher can think of;
      (2) ‘Were not associated’: sounds like accepting the nil-null;
      (3) The amount of (proxy) covariates just screams of multiple comparisons.

    • It’s easy to criticize. But not that easy.

      But one big difference is that this is not noise mining. It’s not clear that the paper needs to bother with statistical inference, as opposed to description. Do you imagine they ignore the next 3000 analyses because it didn’t fit the theory?

      Also, at a minimum, their raw data is published, so anyone will be free to spin it their way.

        • Thanks for sharing the link for the full study! I supposed that, as an early release, the full paper wasn’t available yet.

          My previous comment was based on the press release, which employs the usual social psych vocabulary. The abstract has a much more descriptive tone, and the paper is much more careful in declaring association or lack thereof. So, kudos to the authors for the gigantic work in data collection and careful analyses!

          Even though they do make conclusions based on statistical significance, they report all estimated parameters and execute robustness and independence checks.

          On the other hand, given that they did fit multilevel models and the independence analysis suggests that the biases are independent, why not report the results from a model including all comparisons (Tables S4 and S5)? There are many effects very close to zero; some regulation from a hierarchical model would help! And they do: the estimates are smaller and more uncertain.

          The authors argue that the multilevel model requires transformations that make results less interpretable, which I don’t quite understand. Also, they argue that, in a multilevel model, the heterogeneity is accounted for with a multiplicative factor – which sounds wrong, given that there is not only one way to deal with heterogeneous variance in the residuals.

          In defense of a multilevel model including all comparisons (not necessarily the one they did), take a look, e.g., at the first histogram in Figure S2, the distribution of regression coefficients for the small study indicator variable. Given that the outcome is in log odds ratio and the histogram is centered at the mean (I assume, given that all distributions are centered at zero), how plausible is it for some studies to have a coefficient 30 points above the mean? Those overestimates seems to be driving their “most significant finding” of small study bias, although those points probably have a smaller weight.

          Unfortunately, the raw data is not available on PNAS. The datasets they provide contain the results from the meta-regression, which is a pity. I’m sure Stan could do a great job estimating the full model!

  1. “Large effect sizes were not associated with […] but were associated with having few study authors, early-career authors, and authors who had at least one paper retracted.”

    Does publishing a study with large effect sizes increase your chance of getting tenure? If so, no statistics education will help….

  2. >>>Studies reporting large effect sizes …..were more likely to be published in peer-reviewed journals than studies reporting small effect sizes.<<<

    How do you conclude on the higher publication likelihood of large-effect-studies unless you have a way to peek into the drawer of unpublished studies?

  3. I was going to comment that John Ioannidis’s earlier work in meta-analysis was a good instance of “if this is what it takes to convince people” but then I googled “Daniele Fanelli 3,000 meta-analyses” and got dozens of news stories and press releases…

Comments are closed.