“The narrow beach between the continent of clear effects and the sea of confusion”: that sweet spot between signals that are so clear you don’t need statistics, and signals that are so weak that statistical analysis can’t help

Steve Heston writes:

Marginal Revolution has a nice link showing that data provide almost no power to tell us anything about macroeconomic models. Tabarrok calls this “embarrassing.” I’m not so sure.

In my own field of finance, efficient market theory tells us that stock returns will have low predictability. More generally, optimization by rational economic agents makes it hard to predict their behavior. A consumer of a firm with quadratic objective function will have a linear first-order condition. So their behavior will be a linear function of the expectation of something, and (by iterated expectations) will be hard to predict.

This seems to occur in science generally. We can’t see inside black holes. We can’t know quantum positions. String theory has no interesting predictions.

Are there good examples in statistics? We struggle to estimate rare events, unit roots, and the effect of exponential growth. Early Covid forecasts were just silly. I think the data provide no information about the incentive effect of the death penalty. Similarly, data have limited power to gauge causal magnitudes of global warming.

I think these results are “embarrassing” in the sense that the data cannot tell us anything, so there is little point in doing statistics.

Heston continues:

Here is an example of poor understanding. Some Harvard biomedical informatics guy ran a regression on unit-root Covid growth without differencing the log-data. Naturally, he got an R-square of 99.8%. And the s-statistics were highly significant! He did not realize that his exponential forecast was just assuming that the in-sample growth would continue forever. That growth rate was estimated over only 2 weeks, so it was almost meaningless. The type of erroneous thinking told us for years that we had only two weeks to flatten the curve.

I think that statistics is fun, and there are classic areas where data are plentiful, e.g., predicting stock returns. But when the power is low, I am skeptical about pursuing research in the area.

My reply: statistics lives in the sweet spot between data that are so clear that you don’t need statistics (what Dave Krantz used to call the “intra-ocular traumatic test” or “IOTT”: effects that are so large that they hit you between the eyes) and data that are so noisy that statistical analysis doesn’t help at all.

From this perspective, considering this zone of phase transition, the narrow beach between the continent of clear effects and the sea of confusion, statistics may seem like a specialized and unimportant subject. And maybe it is kind of unimportant. I’m a statistician and have spent my working life teaching, researching, and writing about the topic, but I’d never claim that statistics is more important than physics, or chemistry, or literature, or music, or any number of other pursuits.

On the other hand, this beach or borderland, while small compared to the continents and seas, is not nothing—indeed, lots of important things happen here! What’s known is already known, and we spend lots of time and effort on that knowledge frontier (to mix spatial metaphors a bit). Statistics is important because it’s one of our tools for pushing that frontier back, reclaiming the land, as it were. As long as we recognize its limitations, statistics can be a valuable tool and a helpful way of looking at the world.

23 thoughts on ““The narrow beach between the continent of clear effects and the sea of confusion”: that sweet spot between signals that are so clear you don’t need statistics, and signals that are so weak that statistical analysis can’t help

  1. He did not realize that his exponential forecast was just assuming that the in-sample growth would continue forever. That growth rate was estimated over only 2 weeks, so it was almost meaningless. The type of erroneous thinking told us for years that we had only two weeks to flatten the curve.

    The problem is more fundamental than that. Testing was growing exponentially in March 2020, so even with no cases at all you would see a curve like that.

    I don’t understand why people had so much trouble with the concept that getting tested is a key step in generating a “confirmed case”. We need to think about the process that generated the numbers in front of us, not throw the methods into a footnote or appendix.

    • It’s true that testing was ramping up, but it had no strong effect on the takeaway message which was infections were doubling every few days… it honestly didn’t matter if that was every 3 or every 6 or whatever it’s still something you have to change quickly or a lot of people will die.

      Also the idea that quarantine didn’t help is stupid. Anyone who complains about “two weeks to flatten the curve” is an idiot. The point of “two weeks” was that’s how long it would take from the time you implement NPI to the EARLIEST time you would have sufficient information to know whether those NPI were working or you have to do even more NPI. You were always going to need to carry out those NPI until you either had widespread vaccination to reduce the risk of death and hospitalization, or you blew through so many people infected that there was hardly any more point to vaccination. So basically NPI for a minimum 18 months or something.

      Here’s US and UK log(cumulative cases) for the first 1 year of the pandemic. https://ourworldindata.org/explorers/coronavirus-data-explorer?yScale=log&zoomToSelection=true&time=2020-03-01..2021-03-02&facet=none&country=USA~GBR&pickerSort=asc&pickerMetric=location&Metric=Confirmed+cases&Interval=Cumulative&Relative+to+Population=true&Color+by+test+positivity=false

      CA implemented their stay at home order March 19th and many places followed soon after. Two weeks from that was April 2. By April 2 you could see the growth rate (tangent to the log(cumulative cases)) just slightly reduced from the earlier crazy high rate of 10 fold every 7-8 days (or doubling every 2-3 days). The changes in growth rate were coincident with 2-4 weeks past implementation of NPI across the developed world. Testing continued to grow exponentially at a basically constant rate throughout the entire first year:

      https://ourworldindata.org/explorers/coronavirus-data-explorer?zoomToSelection=true&time=2020-03-01..2021-03-02&facet=none&country=USA~GBR&pickerSort=asc&pickerMetric=location&Metric=Tests&Interval=Cumulative&Relative+to+Population=true&Color+by+test+positivity=false

      What was a good estimate for the initial growth rate of the virus? Probably not doubling every 2 days, but probably something like doubling every 4-6 is accurate. At that rate, you’d infect half of the US in 163 days, and at ~ 1% death rate you’d have 1.1M dead, but at that infection rate you’d have way more excess deaths than 1% (because hospitals couldn’t operate properly even as it was). So instead after 163 days ~ 5mo, we had about 177k dead mid August. That indicates a pretty substantial efficacy of NPI even when they weren’t being followed particularly well by lots of people (Sturgis motorcycle rally itself was a clear massive problem).

      That being said, yes there were ridiculous estimates at the beginning of the pandemic about how there would be a pulse of infections and then it would die out in 3 or 4 weeks and the whole thing would be over. That WA state health organization (IHME or some such thing?) put out a bunch of those, they should have been laughed off the stage and taken out back and put in the stocks to have spoiled fruit thrown at them…

      But anyone who says that epidemiological models failed should also be in the laughing stocks… Because as Richard Hamming said, the purpose of computation is insight not numbers. The insight gained from running simple ODE models shows you that if you want to keep from blowing through the population and killing a lot of people, you need to do NPI until you’ve vaccinated 70% of the population or more, and there’s nothing else to do but bite the bullet and do it… It literally didn’t matter if your ODE model overpredicted by a factor of 3 or 4 it had the right dynamics, the reality matched the general behavior, the models were a huge success as long as you knew how to interpret them.

      What WASN’T a success was communication of those results to the general public. People thought “two weeks” means “two weeks till we can all go back to normal” and that was never the case, and only idiots like IHME said such things. People who actually study pandemics like the Kings College group knew exactly what was going to happen. Inside the CDC people who were non-political and not forced to stand next to that blithering idiot running the country knew what was going to happen… Anyone with any skill at all in modeling knew generally… you’d need NPI for a year or more… It was going to take a year until you get a vaccine at least and you’d need to run NPI that whole time to some extent.

      I’m so tired of the narrative that modeling and data were a disaster in the pandemic. Could they have been better? Sure. But they did the job they needed to do, which was to say: NPI for at least a year and get on Vaccines asap.

      • I skipped through your reponse, but I find this last line rather humorous given the topic of the post: modeling and data “did the job they needed to do, which was to say: NPI for at least a year and get on Vaccines asap.”

        Did we need statistics to tell us that? :)

      • I skipped through your response, but I find this last line humorous given the topic of the post: modeling and data “did the job they needed to do, which was to say: NPI for at least a year and get on Vaccines asap.”

        Did we really need statistics to tell us this?

        • There are MANY pandemics that have occurred without anyone much noticing. “Common colds” spread throughout the world and no-one is hospitalized. At the moment a severe Avian Flu pandemic is going on but it doesn’t seem to spread to humans particularly so that’s fortunate. So yes you need data to tell you whether a virus is a serious danger or not. Did you need much stats after mid April? Maybe not… but you did need to know whether the NPI you had in place was still working well enough. So basically yes. When we finally hit Dec 2021Omicron blew through whatever NPI we were doing within a couple weeks. If that had started up in June 2020 we’d want to know it right away and institute more extreme NPI. So yes, we needed stats.

      • We had at least one commenter at this very blog who seemed otherwise knowledgeable but who believed the leading models were “pro-lockdown” because “liberals” for some reason “wanted lockdowns” [which never happened].

    • I’m just saying heres a news article from that April (with chart):

      In early March, things seemed to be turning around. According to data from COVID Tracking Project, daily testing grew exponentially from a few hundred tests on March 5 to 107,000 tests last Friday, March 27.

      https://arstechnica.com/tech-policy/2020/04/americas-covid-19-testing-has-stalled-and-thats-a-big-problem/

      My point is you can’t tell anything from a chart of exponential growing cases when testing is increasing exponentially. If the tests used back then were about 10% false positive under real world conditions then you could get essentially his chart even if there were zero actual cases detected.

      If you remember at the time, the CDC had recently recalled test kits because up 97% false positive rate was reported for some batches. Later it was concluded there were two different sources of false positives involved:

      The evaluation described in this report determined the source of the fluorescence in the early N1 components to be a contaminating template molecule, which was not part of the diagnostic panel assay design, but was synthesized at the CDC around the same time as the manufacturing of the first EUA lot. This template was present in Lab 2 where the “bulk” panel components underwent the original quality analyses and it is likely that the contamination of the N1 bulk material occurred at that time. No other likely source of fluorescence was identified in the sequence data from the N1 RT-PCR products. We also conclude that the design of the N3 components led to primer-probe amplification, explaining the persistence of false reactivity with these oligonucleotides across multiple EUA production lots and across the three sources of primers and probes used in this evaluation.

      https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0260487

      And of course that doesn’t include false positives due to contamination of the samples or analysis lab (rather than during manufacture), small amounts of virus (or just RNA) trapped in the mucus, or debris from recent (but now inactive) infection.

      So we already couldn’t tell whats going on in that chart. It needs to be looked at in the context of testing rates and accuracy. The issues with extrapolation and calculating the correlation are minor in comparison.

      I will check the rest of your comment later.

      • Although CDC tests were contaminated and didn’t work super well, the tests being used in UK, Italy, Germany, etc were different, and showed a similar pattern of exponential infection growth.

        Test positivity declined from 20% in April 6 to about 6% June 6 even as testing rate increased.

        Also, except for the first week or two of March, the tests didn’t grow exponentially, they grew quadratically, with tests per day looking like a straight line. A quadratic trend compared to an exponential growth of cases is hardly contamination of the trend at all. https://ourworldindata.org/explorers/coronavirus-data-explorer?zoomToSelection=true&time=2020-03-01..2020-07-03&facet=none&country=USA~GBR~CAN~DEU~ITA~IND&pickerSort=asc&pickerMetric=location&Metric=Tests&Interval=7-day+rolling+average&Relative+to+Population=true&Color+by+test+positivity=false

        The end result is that the signal from end of March 2020 onwards was primarily from actual infection growth.

        Yes, thinking about the effect of growing tests was worthwhile, but after thinking about it a bit, and looking at the fairly linear trend in tests per day, the result is that test growth wasn’t as big a deal as you are making it out to be.

        • The tweet referenced in the OP was from March 15th, do you agree that given the knowledge available at that time it was impossible to tell whether that was due to infections or testing?

          That is the very limited claim I made. There are obviously other implications you are trying to discuss that stem from that, but start there. Can we agree on that very limited claim?

        • On March 15 2020 looking only at crappy CDC tests you couldn’t know how much testing was affecting everything. Looking at Italy, Germany, UK, and several other European countries and knowing what we do about the dynamics of pandemics in the early stages you could tell that the viral infections were doubling every n days for n somewhere in the 3-10 day range

        • Looking at Italy, Germany, UK, and several other European countries and knowing what we do about the dynamics of pandemics in the early stages you could tell that the viral infections were doubling every n days for n somewhere in the 3-10 day range

          Knowing what we do about the dynamics of testing would lead to a different conclusion. Obviously we need to account for both in our models.

          There is some difference between our approaches I am trying to figure out. And there will be a next time.

        • Back in 2020 I briefly did a model where the testing was ramping up and it affected the cases, I didn’t find that it made a strong difference in the overall conclusion. If you assume a very high false positive rate like 10% it would of course, but for non CDC tests the false positive rate was way way lower than that. Below 1%

          By July Italy and UK for example were both below 1% positivity rate of tests overall, of course we didn’t have that information in March 2020, but negative control tests of european PCR were working correctly and not showing high false positivity. I felt at the time we had enough information to know that the european tests were accurate.

  2. How exactly do you know, given a dataset and a question, that the dataset can’t answer the question? Is it solely that the power of your procedure is too low, or is there another way to quantify this? Does it differ in frequentist vs Bayesian approaches?

    • In a Bayesian approach, you have a model, you have some priors over the unknowns in the model, and you have some data. The data can answer the question if the posterior over the model parameters is sufficiently small to make the answer the question sufficiently obvious.

      I know that’s pretty vague, but I’d like to point out that there are a lot of different types of questions… so there are a lot of different types of answers.

      For example, if you want an answer to “does this medication shorten the time to heal?” it’s sufficient to say that the parameter of interest is probably negative. You could have enormous uncertainty about what it is, but still be very confident that it’s negative (for example, you might think it’s anything from -13 to -1000 days in a 99.9% posterior probability interval)

      It can become quite obvious what you should do even when there’s very high uncertainty. For example, should you approve the COVID vaccine. Yes, there’s possibilities for stuff like pericarditis or whatever, but the risk of that disease and OTHER complications from COVID are pretty high. It’s got a ~ 1% death rate overall so if your vaccine has risk like 100 / 100k of pericarditis but 90% effective against COVID which has 4500/100k risk of pericarditis and 30000/100k risk of adverse health events overall, even if you don’t know very well whether a vaccine has 10/100k or 1000/100k risk of adverse events… you can still say “hell it’s still better than the alternative”… so sometimes you CAN answer your question even with crap data.

  3. This is a case where I think you should adopt the economic way of thinking.

    The beach is where the marginal benefit over your “competition” lies. In that sense, it is the _most_ important region of the conceptual space, regardless of its relative size.

  4. “so clear that you don’t need statistics (what Dave Krantz used to call the “intra-ocular traumatic test” or “IOTT”: effects that are so large that they hit you between the eyes) ”

    When it becomes “clear” can be a huge issue. Take this example from today (OPS+ is a measure that takes into account a baseball hitter’s talent when adjusted for other factors like ballpark).

    https://twitter.com/DOBrienATL/status/1663588836931411968?s=20

    For one person, it’s “clear” that Freeman is better than Soto but the stats. This overlap between what seems “clear” to one person but not to others causes a “debate”.

  5. “From this perspective, considering this zone of phase transition, the narrow beach between the continent of clear effects and the sea of confusion, statistics may seem like a specialized and unimportant subject.”
    Often one problem is that without carefully thinking about the sources of randomness involved, we can mistake something for a clear effect that might be, on consideration, out in the sea of confusion. For example, if we don’t think about dependence among multiple observations (eg in “spatial regression discontinuity” and other geographic quasi-experiments). Formalizing things can help lay out the assumptions being made for even some compelling-looking plot to tell us anything conclusive.

Leave a Reply

Your email address will not be published. Required fields are marked *