“A small but growing collection of studies suggest X” . . . huh?

Lee Beck writes:

I’m curious if you have any thoughts on the statistical meaning of sentences like “a small but growing collection of studies suggest [X].” That exact wording comes from this piece in the New Yorker, but I think it’s the sort of expression you often see in science journalism (“small but mounting”, “small but growing”, etc.). A post on your own blog quotes a New York Times piece using the phrase, “a growing body of science suggesting [X]” but the post does not address the expression itself.

For Bayesians the weight of evidence available now should be all that matters, right? How the weight of evidence has changed with respect to time would seem to offer no additional information. If anything, trends in research should themselves be based on the evidence already revealed, so it seems like double-counting to include growth-in-evidence as evidence itself.

Maybe there is a more complicated justification. For example, if researchers have both unpublished evidence and (weak) published evidence and their research agenda is determined by both, then the very fact that they the number of such studies is “growing” more quickly than would seem to be justified by the (weak) published evidence could itself be an indicator that the unpublished evidence bolsters the (weak) published evidence. That seems way too convoluted to be what the journalist or reader could have had in mind, though!

So I’m curious whether you think “growing evidence” is a statistical howler? There are over 700,000 google hits for the phrase “growing evidence,” so if it really means nothing, that will be news to a lot of writers and editors.

Interesting question. How would we model this process? Sometimes it does seem to happen that a new hypothesis arises and the evidence becomes stronger and stronger in its favor (for example, global warming); other times there’s a new hypothesis and the evidence just doesn’t seem to be there (for example, cold fusion). Still other times the evidence seems to simmer along at a sort of low boil, with a continuing supply of evidence but nothing completely convincing (for example, stereotype threat). Ultimately, though we like to think of the evidence as increasing toward one conclusion or another.

So, maybe the phrase “growing evidence” is ok. But this only works if we accept that sometimes the evidence isn’t growing.

To see this, shift away from the press and go into the lab. It is natural to take inconclusive evidence and think of it as the first step on the road to success. Suppose, for example, you have some data and you get an estimate of 2.0 with a standard error of 1.4. This is not statistically significant—but it’s close! And it’s easy to think that, if you just double your sample size, you’ll get success: double your sample size, the standard error goes down by a factor of sqrt(2), and you get a standard error of 1.0: the estimate will be 2 standard errors away from 0. But that’s incorrect because there’s no reason to assume that the estimate will stay fixed at 2.0. Indeed, under the prior in which small effects are more likely than large effects, it’s more likely the estimate will go lower rather than higher, once more data come in.

So, in that sense, I agree with Lee Beck that the frame of “small and growing evidence” can be misleading, in that it encourages a mode of thinking in which we first extrapolate from what we see, then we implicitly condition on these potential data that haven’t occurred yet, in order to make our conclusions stronger than they should be.

And then you end up with renowned biologist James D. Watson saying in 1998, “Judah is going to cure cancer in two years.” There was a small but mounting pile of evidence.

It’s 2015. Judah did a lot of things in his time, but cancer is still here.

24 thoughts on ““A small but growing collection of studies suggest X” . . . huh?

  1. I take “small but growing evidence” to mean nothing more than (a) the first results were interesting; (b) other scientists began investigating to see if the effect is real under different conditions; and (c) their results were positive as well. So the “but growing” isn’t a statement about being more convinced because the evidence itself is drifting in the right direction, but that simply more studies are being performed and being published, and, finally, that the incremental results published are in teh same direction.

  2. Let me disagree here. I’m going to use the example from my field.
    Before there was no evidence for dark matter. And there was probably a strong prior that there is no other “magic/dark” unknown/invisible matter. But then evidence started to appear. In the beginning it was not strong enough to say — okay it must be dark matter (all the observed effects of every individual experiment probably could every time still being explained by some contrived modification of the existing theories), but it started to accumulate. And I would argue that would quantify as growing small evidence, and I think it is a meaningful concept.
    Also sometimes you don’t have evidence for something, just because there were no experiments probing it (not high enough energy in particle accelerator and so forth). But when the experiments start, they start providing information, and they often start from small evidence, which then grow.

    • Interesting. To paraphrase, ‘small but growing’ is warranted when (a) the moderate evidence seen so far has been sufficient to shift your prior into territory where you now expect any future evidence to be confirmatory and (b) there is more work being done? But how can ‘small’ evidence shift a prior enough that future evidence is strongly expected to be confirmatory unless the prior was already in that direction?

  3. In my profession, I heard these kind of descriptors as marketing talk from pharma reps. A common synonym was “emerging.” Every grandma thinks their grandkids are beautiful. Likewise, every PI thinks his study is important, and if the data does not meet standards it is just unripe and “growing” but never wrong. This is how it is pronounced from podia at meetings and then picked up by sales forces.

    • Mostly I agree with what others have said–it’s mostly meaningless but kind of explains to a general reader why he should take something seriously that many scientists don’t quite believe yet. If most experts don’t believe a 300 year old idea, that’s one thing. If they are undecided on a 3 month old idea, it’s another.

      But I laughed when you used the synonym “emerging.” Yeah, it’s great for when some company has figured out how to make money off something but can’t actually demonstrate it’s useful. You don’t want to get left behind, do you? (I’m actually in pharma, so I get it not from sales reps but from scientific vendors selling research tools of one type or another–and the occasional guy pitching a new project internally.)

  4. One could have evidence on the evidence gathering process – for instance, the number of scientific studies (published and unpublished) on the topic each year, and effect size estimates for each. Such an evidence trajectory should be more accurate than mere aggragation, as it would account for context effects like changes in methods and hypothesis…

  5. I agree with Jonathan above that growing collection of studies may simply indicate that more people are interested in studying whatever it is. Evidence for the phenomena itself may not be increasing: perhaps the number of negative studies is increasing as well, but negative results aren’t published.

    It could be interesting to ask why there’s more interest in a subject. Is a hypothesis gaining credibility? Have financial incentives to investigate the subject changed? Has new technology made it possible to answer questions that couldn’t be answered before?

  6. This is really being over-interpreted. As someone who has written a phrase not unlike that in some lit reviews in my own work, I think it means quite innocently: “This is a new area of active, on-going research.” I.e. don’t interpret the small number of published studies on this subject as meaning scholars in this field aren’t interested in it; rather they’re now starting to study it and there’s a “growing” amount of work on the subject confirming whatever point is about to be made following this statement. Its not a very complicated statement that indicates any broader philosophical point.

  7. It doesn’t mean anything. It is very difficult to think of any actively researched medical claim that could not be supported with the phrase “growing evidence”. Even claims that contradict “large but growing” will be “small but growing”. In fact, in some cases “large but growing” evidence may even be the less reliable type:

    “Based on theoretical reasoning it has been suggested that the reliability of findings published in the scientific literature decreases with the popularity of a research field. Here we provide empirical support for this prediction. We evaluate published statements on protein interactions with data from high-throughput experiments. We find evidence for two distinctive effects. First, with increasing popularity of the interaction partners, individual statements in the literature become more erroneous. Second, the overall evidence on an interaction becomes increasingly distorted by multiple independent testing. We therefore argue that for increasing the reliability of research it is essential to assess the negative effects of popularity and develop approaches to diminish these effects.”

  8. It’s related to the absence-of-evidence issue, where some frameworks leave us unable to conclude anything from absence-of-evidence, regardless of whether the absence is in a low-data context (so we should indeed be unable to conclude anything) or a high-data context (we have seen enough data that a strong signal would have emerged if it were there). In the latter case, the right analysis (e.g. a Bayesian one, or one that takes power into account in a hypothesis testing framework) would be able to convert the absence-of-evidence-where-the-evidence-should-have-been-there into evidence-of-absence.

    “Small but growing” is an informal way of saying (or implying) that we are in the former low-data scenario and _not_ in the evidence-of-absence scenario. In other words, our variance is still high (and this raises the probability that emerging data will eventually support the hypothesis).

  9. Phrases of the form “small but growing” or “increasing numbers of scientists” in news coverage are largely rhetorical in that, typically, there is no attempt by journalists to either calculate the “growing number” or weigh the kind of studies under discussion against any weight of evidence. Were most or all observational? Were most or all in-vitro? The case of bisphenol a is a great example: much hay made of “a growing number of studies” indicating adverse effects, no mention of the failure by much larger multi-generational toxicity studies to replicate either the findings, or the “growing number” of pharmacokinetic studies which failed to show the relevance of in vitro studies to human health and risk assessment. No mention that the NTP set down statistical guidelines over a decade ago as to what might count as statistically valid results and whether the “growing number of studies” claiming harm failed to meet these criteria or whether the “growing number” of replication studies followed these criteria. The phrase should be a warning for any editor to check for precision, relative power, and context

  10. The stock market was growing, so I should have invested (except for those times when it wasn’t). But what I really need to know is whether the market will be higher or lower tomorrow. Can anyone tell me?

  11. “Conservation of expected evidence” is a cool idea I read on lesswrong.com.
    Essentially, knowing we are about to encounter evidence should not shift our prior beliefs. Otherwise, why weren’t we already at those beliefs in the first place?
    They argue this was the basic fallacy that led to witch trials: all forms of evidence were interpreted as pointing in the same direction, so why bother with the charade of a trial when you could just burn your victims and be done with it?

    p(H) = p(H|E)*p(E) + p(H|~E)*p(~E)

    For fun, a simple experiment: A random number generator is going to pick a number between -1 and 1 with uniform probability. This will represent the mean of some measurement of some imaginary population of (to be simple) known standard deviation 0.5 and a normal distribution.
    We’ll have a set of 201 hypothesis (-1.00, -0.99… …0.99, 1.00), with a uniform prior distribution.
    The population will then generate a series of points, which will be used to update the hypothesis. Likelihood = e^(-1.0*(hypothesis – point)^2/(2.0*0.5^2)).

    I repeated this 100,000 times.
    Mean absolute values of estimates after the following number of estimates (mean absolute value of shift in mean likelihood):
    0 update(s): 1.21430643318e-17
    1 update(s): 0.38722805936 (0.38722805936)
    2 update(s): 0.439333052715 (0.178352552243)
    3 update(s): 0.459720083346 (0.117696635661)
    4 update(s): 0.469363422664 (0.0879801791475)
    5 update(s): 0.475562577607 (0.0707849649464)
    Mean values of course stayed approximately 0.

    The positive absolute values of shift shows that probability mass was moved around to better fit evidence whenever we encountered it. Meaning, updates were happening and needed to happen to fit the data.
    However, while we know probability mass generally was getting pushed around, we don’t know in which direction or magnitude it will be. The actual mean of shifts was always roughly zero:
    Following update 1: -0.00329770063062
    Following update 2: -0.000251004574419
    Following update 3: 0.000624363736141
    Following update 4: -0.000522706830521
    Following update 5: 0.000608870136222
    I think we can get excited to know more evidence is coming, but we can’t actually change our beliefs or expectations. For that, we have to see the new evidence — if we just assume it will reinforce our prior beliefs, we risk burning people as witches or thinking Judah will cure cancer 15 years ago.
    It might FEEL like we’ll encounter evidence to strengthen our beliefs, but if we did our math right using all the evidence we actually have, our current probabilities are the best thing to believe and act on.

    Python (2.7.8) code: http://pastebin.com/maNEMKg0

      • Hi,
        I confess, I am not sure precisely what you mean, but reading your other comment, I imagine you may be suggesting a causal diagram such as: http://i.imgur.com/PEmM8Kn.png
        In this case, I do agree that you have a point:
        Knowing that there is growing evidence increases the probability we would assign to there having been some sort of breakthrough, and thus the probability of eventually improved results within the field.
        Knowing that effect sizes have increased over time (which I did not include in the diagram), as you mentioned in your comment, would provide some evidence in favor a breakthrough driving the increasing research. Many other factors like, public interest, are unlikely to be associated with results improving over time when not associated with a breakthrough (the goal of these is of course to find breakthroughs).

        I think it is worth noting in this model that once one has knowledge about the status of breakthroughs, it screens off the correlation between “small but growing evidence” and “improved results”: they become indepdent. In this case, such inforomation should only be useful for those unfamiliar with the field (eg, the target audience of the press release) — meaning, maybe it does have some validity.

        Researchers within the field would have this knowledge, and thus should avoid adjusting the probabilities they assign to the ideas beyond what the evidence suggests; “small but growing” would be screened off from correlation by their knowledge of the field.

Leave a Reply

Your email address will not be published.