How much should we trust assessments in systematic reviews? Let’s look at variation among reviews.

Ozzy Tunalilar writes:

I increasingly notice these “risk of bias” assessment tools (e.g., Cochrane) popping up in “systematic reviews” and “meta-analysis” with the underlying promise that they will somehow guard against unwarranted conclusions depending on, perhaps, the degree of bias. However, I also noticed multiple published systematic reviews referencing, using, and evaluating the same paper (Robinson et al 2013; it could probably have been any other paper). Having noticed that, I compiled the risk of bias assessment by multiple papers on the same paper. My “results” are above – so much variation across studies that perhaps we need to model the assessment of risk of bias in review of systematic reviews. What do you think?

My reply: I don’t know! I guess some amount of variation is expected, but this reminds me of a general issue in meta-analysis that different studies will have different populations, different predictors, different measurement protocols, different outcomes, etc. This seems like even more of a problem, now that thoughtless meta-analysis has become such a commonly-used statistical tool, to the extent that there seem to be default settings and software that can even be used by both sides of a dispute.

4 thoughts on “How much should we trust assessments in systematic reviews? Let’s look at variation among reviews.

  1. Well is this a function of the subject matter? “Psychosocial effects of companion robots” seems just the sort of area I would suspect ill defined problems, confounders and low effect sizes, external validity problems etc.

  2. Labels like ‘low’, ‘medium’, etc. mean different things to different people, so unsurprising that this is the case. But does it matter? The same amount of evidence for a claim may be evaluated as low by Andrew and high by me (perhaps informed by our prior beliefs), but I can’t see how that could not be the case, or why it would be much of a problem.

    More important than differences between different reviews (different people assessing the same evidence and labeling it) seems to me whether such assessments are consistent within the same review. If to have any meaning and value, studies labelled within a review as ‘low’ should indeed have less evidence to them than studies labelled as ‘high’ (it’s pointless if some ‘low’ studies actually have more evidence than studies labelled ‘high’). It’s an ordinal scale.

    I’m currently working on a systematic review where we had all studies being codes as ‘low’ etc. in terms of evidence by 2 members of the research team. Not because we care about the exact labels, but because we want to know if there is indeed any reliability in such assessments in the sense that evidence in studies is consistent with our assessment of low<medium<high.

  3. At least the reviews are making an attempt to assess the included studies for risk of bias. It would be interesting to know whether the later reviews cited here mention any of earlier ones, and whether they discuss any of this RoB assessment variation, whether it affects the review conclusions etc.

Leave a Reply

Your email address will not be published. Required fields are marked *