Hey! Let’s check the calibration of some coronavirus forecasts.

Posted on April 14, 2020 9:03 AM by Andrew

Sally Cripps writes:

I am part of an international group of statisticians and infectious disease/epidemiology experts. We have recently investigated the statistical predictive performance of the UW-IHME model for predicting daily deaths in the US. We have found that the predictions for daily number of deaths provided by the IHME model have been highly inaccurate.

The UW-IHME model has been found to perform poorly even when attempting to predict the number of next day deaths. In particular, the true number of next day deaths has been outside the IHME prediction intervals as much as 70% of the time. If the model has this much difficulty in predicting the next day, we are concerned how the model will perform over the longer horizon, and in international locations where the accuracy of the data and applicability of the model are in question.

The attached manuscript is a result of this collaborative effort and here is the link.

We hope you find this report of interest and that you will share it with your colleagues and readers. As you can see it is largely about uncertainty quantification.

Indeed we are reconvening the uncertainty conference when COVID19 has hopefully passed and apologies for not being in contact sooner, but what with bushfires, floods (in Australia) and COVID19 it has been a trying last 6 months.

At least Australia seems to be doing something right about the management of COVID19. Unfortunately possible explanations for this, which may help other nations, must remain mere speculation because our government is refusing to release data. I do hope the situation in the US stabilizes soon.

It’s nice to be calibrated. But it’s calibrated to be nice.

Seriously, though . . . this sort of calibration check can only help. It is through criticism that we learn.

P.S. It seems that this is the same model that we discussed a couple weeks ago. As we said then, it’s a curve-fitting model based on second derivatives, so it implicitly relies on a sort of stationarity. As such, I can see that it would provide a sort of baseline prediction but I’d be surprised to hear that people are taking it more seriously than that.

66 thoughts on “Hey! Let’s check the calibration of some coronavirus forecasts.”

Zhou Fang on April 14, 2020 9:10 AM at 9:10 am said:

> “In addition, the performance accuracy of the model does not improve as the forecast horizon decreases. Infact, Table 1 indicates that the reverse is generally true.”

So… maybe some kind of moving average type error term? I don’t think this necessarily says the model is bad or miscalibrated – there’s an observation process contributing additional variability.

Reply ↓
- Terry on April 14, 2020 9:35 AM at 9:35 am said:
  
  When you look at the model’s graphs, you see violent swings in reported deaths from day to day. See, for instance: https://covid19.healthdata.org/united-states-of-america/california
  
  Given the “weekend” effect we just saw in the Swedish data, there is probably some underlying problem with the data day-to-day.
  
  The big problem with the model, though, appears to be its predictions about how deaths will decline rapidly after the peak. Italian data does not support this, and the assumption that deaths will fall soon to nearly is quite hard to believe.
  
  I think the popularity of the UW-IHME model is due to its producing an easy-to-understand plot of how things will play out over time, especially when the peak will occur, which is extremely useful in getting some intuition about what is going on. It is also easily accessible on the internet.
  
  Reply ↓
  - Dale Lehman on April 14, 2020 9:46 AM at 9:46 am said:
    
    Wait – I thought that weekend effect had been largely disbunked? In any case, I think it would be interesting to see how he model performs at different geographical aggregations rather than time aggregations. They focus on the inability of the model to accurately portray uncertainty even in very short term forecasts – and they do a nice job of that. But I wonder how much of the problem has to do with the disaggregation at the state level – in other words, are the total county forecasts better? It is certainly possible that both the aggregate and disaggregated forecasts underestimate uncertainty, but it is also possible that their accuracy diverges. It would be interesting to see – part of how we can improve the model.
    
    Reply ↓
    - jim on April 14, 2020 10:01 AM at 10:01 am said:
      
      “They focus on the inability of the model to accurately portray uncertainty even in very short term forecasts – and they do a nice job of that.”
      
      That’s funny because I think the model does a pretty good job. When I look at the map I see mostly light colors, which means the forecast is nearly accurate.
      
      I don’t like continuous color bands because they always devote too much spectrum to dark colors that aren’t distinguishable from one another. If you look at the color bar only about 25% of the spectrum shows discernable variation. A more useful approach would be to ditch the confidence interval and color by difference from prediction in, say, 20% increments. So the white increment would be ±10% from actual; the light color would be 10-30% from actual, etc. Then you’d see right away where there’s a useful forecast and where it’s too far off the mark to be useful.
    - Dale Lehman on April 14, 2020 10:33 AM at 10:33 am said:
      
      If the 95% prediction intervals are at the state level, and if states are independent (which they are not, but I don’t think that is essential to the first glance at the maps), then I would expect only 2 or so states to be colored one way or the other. What you see as mostly light colors looks to me to be very different than what I would expect if these 95% prediction intervals were remotely accurate.
      
      However, I do think the subsequent comments below (about absolute accuracy, within state variations, etc.) are more pertinent than the initial look at the map.
    - Bob on April 15, 2020 10:38 AM at 10:38 am said:
      
      You are seeing a lot of white because you are looking at a geographical map, and not a population weighted map. A lot of that white is places that are barely populated and therefore have few cases. Coming up with a good prediction when the previous day had 1 death is easy to model.
      
      If instead you look at a population weighted map, and on top of that consider that anything in the 95% confidence interval is already white, you see that the model is missing badly: 95% of the map better be white. It’s a model that ‘only’ messes up in the places where there is an outbreak. That’s as useful as a model that tells you what the football scores will be, but only when there’s no game.
    - jim on April 15, 2020 8:16 PM at 8:16 pm said:
      
      Bob:
      
      Wrong.
      
      NY has the highest per capita fatalities and it’s light colored every day. NJ is just the opposite: second highest per capita fatalities and dark colored every day. LA 3rd per cap, light; MI 4th per cap, dark.
    - Zhou Fang on April 14, 2020 10:56 AM at 10:56 am said:
      
      I don’t know if there’s necessarily a “weekend” effect, but there’s clearly a lot of overdispersion.
    - Terry on April 14, 2020 11:37 AM at 11:37 am said:
      
      “Wait – I thought that weekend effect had been largely disbunked?”
      
      Could be. But my understanding is that the weekend effect goes away when better data is used, which suggests the odd day-to-day patterns in the UW-IHME data are data problems, not real-world patterns.
      
      What I see when I look at the UW-IHME plots is that the death data is strongly negatively autocorrelated. This sounds like a data problem to me. I don’t see how real-world factors could produce this.
      
      But I’m just guessing.
    - Phil on April 14, 2020 1:36 PM at 1:36 pm said:
      
      Daniel, nobody (or at least not me) thought there was a ‘weekend effect’ in the actual deaths. Well, I take that back, there probably is a weekend effect in the actual deaths, but I doubt it’s very large. But there is definitely a huge ‘weekend effect’ in the reporting of Swedish deaths in some sources of numbers. https://experience.arcgis.com/experience/09f821667ce64bf7be6f9f87457ed9aa for example.
      
      I guess there’s a choice to be made, do you want the most recent numbers even if they’re not very accurate, or the most accurate numbers even if they’re not very recent? Of course someone should jointly analyze both sets and come up with something that is more accurate than either one, at each time, but evidently that hasn’t happened. At any rate, Worldometers and the New York Times have both been using a data source from Sweden that has a large weekend effect.
      
      The broader point, though, is that there can be patterns in both the data and the actual deaths, and it may be a big mistake to assume the data are accurate, especially at the level of individual days.
    - Joe on April 15, 2020 11:16 PM at 11:16 pm said:
      
      Annoyingly, that might be a data reporting/quality issue. In the state where I live the central data folks realized somewhat belatedly that the testing and death reports weren’t from that day but from past days in some counties (like, they waited until the weekend to report a bunch of cases from the previous week). They fixed the issue, and it seems like they have the right case numbers by county for the right days. However, after the shift, lo and behold a pronounced “weekend effect” went away.
      
      I mean, maybe there is a weekend effect where data collection and miscoding isn’t an issue, but just from this kind of basic error you’d see a lot of phenomena that weren’t strictly “there”
    - Curious on April 14, 2020 12:50 PM at 12:50 pm said:
      
      I think you are right. Jersey City is likely to have more similarity to NYC in spread and rates than is Buffalo and that is the case for any MSA where the population spreads from a sizable city across state lines.
  - Zhou Fang on April 14, 2020 10:48 AM at 10:48 am said:
    
    I don’t think that’s necessarily the model’s fault. The problem is that the model is trying to make a forecast but the results are heavily influenced by things outside of the modeller’s control or predictive ability – for example, like how well social distancing is done, how testing is expanded, how much provision is given to medical supplies, and whether say, somebody decides to open the country up way too early or not.
    
    Reply ↓
Megan Higgs on April 14, 2020 9:41 AM at 9:41 am said:

I happen to live in Montana, and it’s interesting to note that we had 4 deaths as of the evening of March 30 (first figure where MT is a light shade of red). So the “upper limit of the 95% PI” was probably 2 or 3. 4 is not far off and certainly isn’t alarming to me, nor would it change any practical considerations. We should focus less on whether it’s “in or out” of the 95% PI and more on the actual numbers. From the legend and Figure 2 in the paper, it looks like we’re talking about a maximum of 26 more deaths than predicted by the upper limit of the 95% PI (maybe very close to the upper limit of a 99% PI?). How consequential is the size of that prediction error — in terms of practical use and decision making based on the model?

Reply ↓
- jim on April 14, 2020 9:52 AM at 9:52 am said:
  
  Excactly, or too exactly as the case may be! Using the 95% PI makes the utility of the model seem worse than it is.
  
  Reply ↓
  - paul on April 14, 2020 10:10 AM at 10:10 am said:
    
    If you download the data from the IHME some of it has no spread. There are days with a lower 129 to upper 131 deaths or my favorite 29 to 29 with a 28.9 mean. I actually think comparing 7 day averages or something would be better. I also don’t think the IHME 95% for deaths is that important. Why not only look at deaths over a certain number and just use percentages.
    
    I think this is a case where the IHME gave a number (95% PI) that is important for the modeling, but is actually not that important for what they care about which is hospital beds. Also a daily tally seems less than the peak and how soon it occurs.
    
    As a planner, I would care about how soon do we have to be up and running, how many beds should we get, how many deaths can we deal with, and how long will this last. Daily numbers would not really matter until the peak. Some states had 1 death 4 days before the IHME model said there would be any. I doubt this impacted their response effectiveness.
    
    Reply ↓
jim on April 14, 2020 9:45 AM at 9:45 am said:

Interesting.

Why do people want to predict fatalities? More useful would be hospital admission rates. Will anyone ever go back through the data and connect COVID19 cases to date of admission?

Yeah now that I think about it you could probably improve the whole prediction deal by getting away from fatalities entirely and using admission rates. You don’t have to care why people are being admitted because in some respects it doesn’t matter, and if you track that all the time you have a background anyway. OK maybe keep fatalities too then correlate with admission and testing.

Given the quality of the data I’m surprised they are choosing to report the 95% PI. If a model could project admission rates ±25% a week out that would be spectacular.

Reply ↓
- Witold on April 14, 2020 11:19 AM at 11:19 am said:
  
  Jim, the hospitalisation -> deaths link is done for other infectious diseases (or rather, I know example of flu a bit, and it is/can be routinely done there). So people will do it eventually. For some of COVID-19 people have already been using it for modelling.
  
  Hospitalisations have different inherent biases so to get anywhere close to a sensible estimate you need to track both + add some lab testing data. Overall, if you had to pick one indicator, mortality is still probably the best signal (as long as you include deaths outside of hospitals in your statistics).
  
  Reply ↓
  - jim on April 14, 2020 12:43 PM at 12:43 pm said:
    
    Witold,
    
    I don’t know much about admissions biases, but no doubt there are plenty. And I don’t have experience in this particular area.
    
    But the reason I thought admissions would be better than mortality is that the main reason for modelling seems to be predicting hospital demand, but it seems like mortality could be days or weeks behind demand, especially in the initial phases of an outbreak. I guess once it’s out of control and people are dying almost before admission due to lack of capacity, then mortality could predict demand but at that point the prediction is moot because there’s no way to respond to the demand.
    
    Am I in the ballpark here or no? :)
    
    Reply ↓
    - Witold on April 14, 2020 1:05 PM at 1:05 pm said:
      
      Of course in principle you’re right, i.e. best to have a model for the quantity of interest. But the real latent variable that we want is the number of cases. Deaths happen a week or two after admission, but it takes months to reach the peak, so I guess the time premium on having a prediction 14 days earlier is not large, especially if it’s going to be even more noisy.
      
      Incidentally, predicting of admission rates is hard not just because of the propensity varying across different populations, but also with time. I’ve heard from surveillance networks in hospitals in developing countries that numbers of inpatients are falling, possibly because people don’t want to go to a hospital *because* of the epidemic. Not sure if that’s true, but a good example of how epidemic might not show up in resource utilisation.
    - jim on April 14, 2020 2:12 PM at 2:12 pm said:
      
      cool, thanks! interesting.
      
      I guess it really depends on what you want the model for.
      
      The prob I see with infection rates is that infections always have public health consequences but usually don’t result in admissions. And that’s aside from the much larger problem of even getting decent infection data. As you said people may avoid hospital visits because of the pandemic (some clinics are advising people to avoid unnecessary visits), but if you’re interested in allocating hospital resources you want that to be part of your model elsewise you’ll overestimate demand.
    - Nick Adams on April 14, 2020 5:22 PM at 5:22 pm said:
      
      The ER I work in (in Australia where COVID-19 is currently under control), attendances are 50% of normal. I’l say that again, 50%. The “worried well” and old people are just not turning up.
- Phil on April 14, 2020 1:47 PM at 1:47 pm said:
  
  I think this is a good example of different metrics being appropriate for different purposes. When looking at comparisons between countries or states or whatever, and to judge success at ‘flattening the curve’ (as a way of trying to predict what the future looks like), I haven’t even been looking at hospital admissions or ‘cases’ (meaning people who have tested positive), since both are so context-sensitive. Even ‘deaths’ is problematic because a lot of people dying of coronavirus are not counted as such, and a few people dying of other things are counted as coronavirus deaths, but deaths is way better than cases or hospital admissions.
  
  But yeah, if you are trying to predict how many masks or gowns or doctors or ventilators you need, you probably care a lot about hospital admissions.
  
  If you were to try to model hospital admissions, though, you’d run into real problems when hospitals near peak capacity. You’d need a much more sophisticated model, I think, because p(admissions | symptoms) would be so variable.
  
  Reply ↓
- Joe on April 15, 2020 11:25 PM at 11:25 pm said:
  
  Just where we are, the data for hospitalizations isn’t broken down by county, where deaths and positive tests are. Sure you can do that at the federal level, but for where I am I can’t really do anything particularly predictive with the hospitalization rate, as much as I’d like.
  
  I think there’s a reason for that too- admitting policies vary pretty widely from hospital to hospital (and state to state), for a lot of reasons. You have different hospitalization criteria as one limiting factor, and different pressure on medical supplies on the other, and those are going to have potentially sizable local interaction effects that make hospitalization rate hard to compare across areas. When you’re dead in Washington, you’re dead in New York(though there are coding errors there for cause of death). When you test positive in Texas, you test positive in Arizona (though this isn’t strictly true either). When you’re hospitalized in Miami, you might not have been hospitalized in New Jersey.
  
  Reply ↓
Yuling on April 14, 2020 10:05 AM at 10:05 am said:

I am wondering how reliable such prediction evaluation is based on one relization that is averaged over states. For example if I made a silly prediciton that is [0, infinity] on 48/50 states, and point mass at 0 and infinity in the remaining two states, I will always have 96% emperical coverage for 96%CI among all states, thereby passing the check in figure 1.

Reply ↓
- Ben on April 14, 2020 4:08 PM at 4:08 pm said:
  
  Good point. I think I would like this plot given as like 50 states on the x and the difference between prediction and actual on the y (presumably with intervals if the predictions were intervals) and then a companion plot of the actual outcomes sorted with the same xs.
  
  Reply ↓
- Bob Carpenter on April 14, 2020 4:12 PM at 4:12 pm said:
  
  Having a calibrated estimate with low sharpness (high entropy) is like having an unbiased estimate with low precision (inverse variance), namely not very useful in and of itself. I liked this Gneiting et al. paper on calibration and sharpness.
  
  Reply ↓
  - Yuling on April 14, 2020 5:21 PM at 5:21 pm said:
    
    Hi Bob, yes I understand sharpness. But I am more saying in a longitudinal dataset, the prediction, as well the evaluation of prediction/ sharpness/ precision are also longitudinal. I think it is related to a point you mentioned earlier on a hierarchical model evaluation in ML (in contrast to treating data iid and simply averaging over states as done here)
    
    Reply ↓
Carlos Ungil on April 14, 2020 10:43 AM at 10:43 am said:

I think the model moves too much for those intervals to be credible (in the non-technical sense). See the forecasts for April (as timeseries) and at the end of April and the end of May (vertical lines) with data as of the 5th, 9th and 12th of April for the ten European countries with the worst outlook: https://imgur.com/a/JPaMQey

The predicted intervals for two out of ten countries at the end-of-May horizon (and one for the end-of-April forecast) don’t even overlap from one week to the next. Refining the model is ok but this seems too much of a change and if such “refinements” are to be expected going forward it’s better not to take any of those intervals at face value.

As Terry has pointed out they predict that deaths decline very quickly after the peak. They had to raise the forecast in Italy, France and Belgium after just a few days as deaths are already higher than their original end-of-May forecast (Spain is also quite close).

Reply ↓
Phil on April 14, 2020 11:05 AM at 11:05 am said:

The up-and-down swings pointed out above are from the data, not the model! If you look at the color maps in the paper/post, you can see that the model predicts too few deaths in some states on March 31, and too many in the same states on April 1 (and vice versa). The swings are coming from the data, not the model prediction, which is pretty smooth. The correct conclusion is the opposite of what the paper says. The model is splitting the difference between noisy data pretty well. In late March/early April, the daily death rates in a lot of states are in the single digits or low double digits, so counting noise in the numbers is very significant. The model uncertainty clearly fails to account for this, but this hardly counts as a serious criticism of the predictions, which, as Andrew says, are intended to be a baseline expectation. Looking at the models predictions for mid-April, where the measured numbers are much more meaningful, makes it look pretty good in my opinion. Conventional wisdom at the beginning of the month was that predicting a leveling off in the NY death rate by April 10th was highly optimistic, but it seems to be happening after all. Time will tell…

Reply ↓
- Phil on April 14, 2020 11:20 AM at 11:20 am said:
  
  The comment above was not by Phil Price, it’s some other Phil.
  
  Reply ↓
  - Phil K on April 14, 2020 1:04 PM at 1:04 pm said:
    
    Haha, sorry I forgot that calling myself only Phil would be confusing. I’ll be Phil K from now on.
    
    Reply ↓
Phil P on April 14, 2020 11:22 AM at 11:22 am said:

That’s some other Phil, not Phil Price. I pretty much agree with the comment, though!

Reply ↓
Terry on April 14, 2020 11:48 AM at 11:48 am said:

Recently, there was some discussion here about how modeling should be done using excess-death data.

Streetwise Professor talks about this and reports some results. No idea how good the analysis is, but I have no faith at all in the other analyses, so this sounds like a much more fruitful line of inquiry.

https://streetwiseprofessor.com/

You are seeing a lot of covid-19 numbers thrown around. Virtually all of those numbers are bullshit.

…

The only rigorous way to estimate these but for deaths is excess deaths (i.e., deaths in excess of expected deaths, conditioning on time of year, demographics, etc.). And preferably excess deaths from respiratory illness (or at least excess deaths from non-accidental causes). This is a good template for the analysis. This also presents some good cross-country data, which shows that in Italy and Spain there is evidence of excess deaths. Elsewhere? Not so much. Of particular interest is Sweden, which has implemented mainly voluntary social distancing measures …

Reply ↓
- Carlos Ungil on April 14, 2020 5:48 PM at 5:48 pm said:
  
  NYC has just started to provide better mortality data which includes not just laboratory-confirmed COVID-19 deaths but also probable COVID-19 deaths and non-COVID-19 deaths.
  
  From March 11 (first confirmed COVID-19 death) to April 13 they report 6589 confirmed and 3778 probable COVID-19 deaths and 8184 deaths not known to be confirmed or probable COVID-19 deaths. The latter figure is higher than the expected number of deaths over the period, given that there are around 55000 deaths per year in NYC.
  
  The total number of deaths reported in this period of one month and three days has been 18551, one third of the total number of deaths expected in one full year.
  
  Reply ↓
  - Carlos Ungil on April 15, 2020 3:18 PM at 3:18 pm said:
    
    Today’s numbers give the following increments on the different COVID-19 categories: confirmed +251, probable +281, other +730.
    
    There may be some recategorization and they may still be clearing a backlog of cases. In any case, as of today New York City has reported 19813 deaths over the last five weeks. 275% more than expected.
    
    Reply ↓
- Zhou Fang on April 17, 2020 8:18 AM at 8:18 am said:
  
  Decent quality statistics on excess deaths will only emerge well after this entire thing is over.
  
  Reply ↓
  - Brent Hutto on April 17, 2020 8:23 AM at 8:23 am said:
    
    Exactly. If simple counting of deaths can not be done reliably during the throes of a pandemic, why would we on earth expect reasoning backwards from other statistics to impute deaths in a principled and defensible manner to be possible? That is a much harder problem than simple counting of mortality events meeting a clinical definition of “infected” at the time of death.
    
    Reply ↓
Bill Spight on April 14, 2020 12:08 PM at 12:08 pm said:

I have a rant about the accuracy — or not! — of corona virus forecasts reported to the public in the US. I fear that they have strengthened science denialism in the US. Maybe the problem, I have thought, is that the forecasts were not intended for public consumption, as they are based upon models and assumptions that the general public does not understand. But maybe the problem is that they are intended for public consumption.

I first became alarmed by the response of mainstream and leftish (for the US) reporters to President Trump pooh-poohing of the prediction of a death rate of 3.4%. He said that it would be much lower. These reporters lambasted him for not believing science and the experts. I seldom yell at the TV, but I came close. He might well be right, I thought. This forecast was based upon data on the death rate among patients known to have caught the virus, when most people who have caught the virus have mild symptoms or none at at all. It was a good bet that with better and more widespread testing that figure would come down. I doubt if Trump had thought about that. His doubt was probably based upon science and expert denialism and wishful thinking.

When I was a kid I heard the possibly apocryphal story of the country doctor who, in the time before ultrasound and other pregnancy testing, was renowned for his ability to predict the sex of the baby. When asked about the secret of his predictive ability, he confided that he always predicted the sex that the parents wanted it to be. When he was right, they were thrilled, and when he was wrong, they forgot about it. Trump often tells people what he thinks they want to hear. When he is right and the experts are wrong, his anti-science base crow about it. They are doubly thrilled. Trump 1, experts 0.

During the Black Plague of the mid-13th century, I was told by one TV documentary, the Pope secluded himself in a villa surrounded by blazing fires. The fires were because the plague was suspected to be airborne. It worked for the Pope, and it would work for us today, but we don’t need the fires. That’s one extreme. If everybody isolated themselves for long enough, we would not pass the virus on to each other and it would degrade enough outside of its hosts to cease to be infectious. But we cannot all do that.

At the other extreme, a sizable proportion of the population comes down with the virus, and some of them die of it. Our hospitals, particularly in advanced economies where hospitals are run by bean counters, and thus have little slack and are not prepared for epidemics, because that would be costly, would be overwhelmed and unable to properly care for the surge of patients or even to protect doctors, nurses, and other staff. Under such conditions the death rate would be high.

Any reasonable coronavirus forecasts fall between those extremes. That’s a very low bar, OC. But any forecast would rely upon unknown and, in a non-totalitarian society, uncontrollable factors, in particular human behavior. In addition, early in the course of the epidemic in any sufficiently large society, infection grows exponentially. When and at what numbers the infection rate will level off has a huge uncertainty. And without widespread testing there is insufficient data to expect a reduction of that uncertainty.

Under such conditions it may be well to warn the public about worst case scenarios. Instead of a self-fulfilling prophecy we would want a self-defeating prophecy. Painting a rosy picture, as Trump tries to do, could have the opposite effect. Overt denialism such as that practiced by the governors of Georgia, Florida, and Mississippi, which is not China, is another problem.

What of the projection in the US of a total of 100,000 to 240,000 deaths, disseminated at one of the daily White House press conferences? At first, I thought that it was a scare tactic aimed at changing human behavior, but it did not seem to be a worst case scenario. In fact, if 100,000,000 Americans get infected sooner or later, and 1% (not 3.4%!) of them die of it, then the death toll will come to 1,000,000, an order of magnitude greater. One of the doctors forcefully made the point that the 100,000 figure was the best case scenario, given what was currently happening in the US and what we were doing to slow infection rates. At the end, Dr. Fauci said that we might possibly do better, but did not point out the contradiction or say so forcefully. This was at a time when the evidence was plain that Washington, Oregon, and California had curbed the exponential growth of infection, and there was evidence that New York was seeing some slowing of growth, as well.

Lo and behold, a few days later we have a new projection of only 60,000 deaths (!). What had happened? Was that when the governor of Georgia found out that people with no symptoms could infect others and decided to direct people to stay home? Furthermore, the peak is supposed to be days away. Happy days are here again! At one of the press conferences Dr. Fauci explained that the earlier scare projection was based upon good models, but now we have data. Really? Stop the presses. Did the projection take into account the evidence of the slowdown in New York? We don’t know, because it was not delivered with any information about what data it used or what assumptions it made. IOW, it was not exactly ready for prime time. OC, readers of this blog know that as new data come in, projections may change. But the general public have no experience with such things. Americans know that in court lawyers can find experts to offer opposite opinions. While the new projection may offer hope and relief, it also changes the scoreboard. Science deniers 2, experts 0.

What of the forecast of a peak in a few days or a couple of weeks? OC, it is quite possible that the rate of new infections in New York will level off by the end of April or the middle of May, and will go down later. But peak is not the best word, I think, to tell the public. It implies a short period of leveling, which may not be the case. Will Georgia and other non-Chinese states experience a peak in a week or two? As far as we know, their cases may still be increasing. And even though both the experts and science deniers are predicting a peak soon, if the rate of new cases does not start to decline soon, the scoreboard will be Science Deniers 3, Experts 0. Because they forget your bad predictions. OC, they remember the bad predictions of the others.

This pandemic may be politically bad for Trump, but I am afraid that it will also be good for the science deniers in America.

Reply ↓
- Andrew on April 14, 2020 12:17 PM at 12:17 pm said:
  
  Bill:
  
  Interesting point about the word “peak” implying not just a maximum, but a fast increase followed by a fast decline. We speak of the peak of a mountain but not the peak of a hill.
  
  Reply ↓
  - jim on April 14, 2020 12:59 PM at 12:59 pm said:
    
    I’ve been very happy on many occasions to reach the top of a mere hill and start down the other side! :)
    
    Reply ↓
- Carlos Ungil on April 14, 2020 1:00 PM at 1:00 pm said:
  
  > Lo and behold, a few days later we have a new projection of only 60,000 deaths (!).
  
  That was one week ago. 60,415 became 61,545 on Friday and 68,841 yesterday.
  
  Reply ↓
  - jim on April 14, 2020 2:18 PM at 2:18 pm said:
    
    Converging to actual value by end of pandemic! :)
    
    Seriously though I think people appreciate that there can be variation in projections. Of course some people will try to exploit that variation as a “failure” of models, but if it converges as the pandemic continues, then the general population I think will remember that.
    
    Reply ↓
- yyw on April 14, 2020 1:30 PM at 1:30 pm said:
  
  If a rise in science denialism can force soft sciences (soft in the sense that a pretender would not be slapped in the face fast and hard by validation like they usually would in hard science) to get better, it will be well worth it. Not going to happen though.
  
  Reply ↓
Anoneuoid on April 14, 2020 1:42 PM at 1:42 pm said:

it is quite possible that the rate of new infections in New York will level off by the end of April or the middle of May, and will go down later.

New tests, cases, and deaths have all been dropping in NYC for the last 4 days. The “experts” have all been wrong on smoking too. Reports from China are that it will come out that they were also wrong about vitamin C.

I really don’t see what they have gotten right.

Reply ↓
- yyw on April 14, 2020 2:47 PM at 2:47 pm said:
  
  Experts have made many mistakes in this crisis. Some are unforgivable, such as CDC’s incompetence in test kit preparation and FDA/CDC’s lack of urgency all throughout February. Some are unavoidable given the lack of prior knowledge on this virus and good data. What’s avoidable is a lack of humility and nuances in expert opinions. Discussion of uncertainty, cost, risk, benefits, etc. is sorely lacking in media.
  
  Reply ↓
  - Anoneuoid on April 14, 2020 2:59 PM at 2:59 pm said:
    
    Currently there is a bunch of antibody testing data being hid from the public.
    
    Q: When do you think you’re going to have your first surveillance data that can answer the big questions about the percentage of the population that is asymptomatic or presymptomatic?
    A: I can’t disclose the data, but we’ve got results for Seattle for March, and we’ll have results next week for New York City for the last week of March.
    
    Q: Why don’t you reveal your data next week?
    A: We’re cautious because blood donors are not a representative sample. They are asymptomatic, afebrile people [without a fever]. We have a “healthy donor effect.” The donor-based incidence data could lag behind population incidence by a month or 2 because of this bias.
    
    Q: Infected people often do not have detectable SARS-CoV-2 in their blood, and so far, only viral nucleic acid—not infectious virus—has been found and there have been no reports of transfusion-related transmissions. Is SARS-CoV-2 contamination of the blood supply a concern?
    
    A: At this point, it’s theoretical, and the Food and Drug Administration is recommending against screening either blood or tissue donors with laboratory testing for SARS-CoV-2 RNA. The FDA is very concerned that there’s not an overreaction to the blood safety risk. And I agree with them.
    
    https://www.sciencemag.org/news/2020/04/unprecedented-nationwide-blood-studies-seek-track-us-coronavirus-spread
    
    Reply ↓
    - Kaiser on April 15, 2020 3:13 PM at 3:13 pm said:
      
      Evidence is starting to emerge that antibody testing may be false hope at this stage. Here is the announcement of a Telluride experiment in a very hopeful light in the Atlantic.
      
      https://www.theatlantic.com/science/archive/2020/03/coronavirus-tests-everyone-tiny-colorado-county/608590/
      
      Here is the latest report that they effectively killed the experiment.
      
      https://www.businessinsider.com/coronavirus-antibody-tests-colorado-stalled-new-york-lab-2020-4
      
      The official “reason” was labs were too backed up to continue. If you read till the end, you might just realize that the experiment was early-stopped probably for inefficacy. They took over 4,000 blood samples. So far, 1,900 had returned results. They found… 11 positives. Yes, you read that right: eleven.
    - Zhou Fang on April 17, 2020 8:28 AM at 8:28 am said:
      
      Doesn’t that more likely just mean the population prevalence of the virus amongst the sample is pretty low? This was much closer to a random sample than say, hospital symptomatic patients, so a prevalence of about 0.5% in that sample looks like what you’d expect.
    - Zhou Fang on April 17, 2020 8:30 AM at 8:30 am said:
      
      (There’s a reasonable argument for discontinuation on that basis though, since about 5000 is enough for prevalence data, and you’d probably need a lot more on the basis of that prevalence to evaluate covariate associations.)
- yyw on April 14, 2020 2:50 PM at 2:50 pm said:
  
  Hopefully we’ll see if wearing mask is the right move by the end of this. Just yesterday an infectious disease expert from JHU was still arguing against public wearing masks.
  
  Reply ↓
  - Nick Adams on April 14, 2020 5:37 PM at 5:37 pm said:
    
    The evidence for mask efficacy is well summarised at cebm.net ( an excellent resource from the Oxford Centre for Evidence Based Medicine). It’s pretty sketchy.
    
    Reply ↓
    - jim on April 14, 2020 5:55 PM at 5:55 pm said:
      
      I don’t find any relevant information regarding public mask use at that site. Could you point to a specific page?
      
      In most of the Asian countries that are succeeding in controlling the virus, masks are almost universally used by the general public. Of course, this doesn’t mean it’s certain that masks are slowing the spread, but it’s hard to see why they shouldn’t be used.
      
      Whatever the case, for my money the burden of proof is on people advising against use. They should provide strong evidence *against* efficacy if that’s the advice they’re going to give out.
    - Nick Adams on April 14, 2020 6:12 PM at 6:12 pm said:
      
      Very little mask use in Australia but virus is under control due to social distancing.
      The mask stuff is in the review of PPE (personal protective equipment).
      We don’t have enough PPE here for hospital staff let alone the general public.
    - jim on April 15, 2020 11:09 AM at 11:09 am said:
      
      From one of the studies linked in the PPE review:
      
      “When both house-mates and an infected household member wore facemasks the odds of further household members becoming ill may be modestly reduced by around 19% ”
      
      That’s effective enough. Hardly a definitive study, but with the balance of info showing no harm, WYGTL?
      
      Not too sure what’s up with the general availability of masks there. Here in WA no one is complaining about a current shortage. I had no trouble getting a few KN95s off eBay last week. Although who knows what that means.
- Brent Hutto on April 17, 2020 8:50 AM at 8:50 am said:
  
  The public health authorities/experts have dealt with the marked dropoff in NYC that Anoneuoid commented on three days ago. That’s why they promptly inserted a bolus of 2,500 or so “probable” deaths into the count at that point. Those folks at this point have a vested interest in keeping the death and case count at panic levels as long as possible.
  
  Reply ↓
  - Carlos Ungil on April 17, 2020 8:58 AM at 8:58 am said:
    
    Do you think that the total number of deaths the report (20825 from March 11 to April 15) is made up and the true number is much lower?
    
    Reply ↓
    - Brent Hutto on April 17, 2020 9:14 AM at 9:14 am said:
      
      It depends on which deaths you are counting.
      
      I think they were counting them by one definition for a while then they chose a different set of deaths to count because the higher numbers suited them.
    - Carlos Ungil on April 17, 2020 10:12 AM at 10:12 am said:
      
      That’s the total number of deaths, currently classified as 7563 confirmed COVID-19, 3914 probable COVID-19 and 9348 not known to be confirmed or probable COVID-19.
      
      Do you think reporting 7563 COVID-19 and 13262 non-COVID-19 would be better?
Bill Spight on April 14, 2020 8:04 PM at 8:04 pm said:

If the main purpose for wearing a mask in public is to avoid passing on the virus to others, through the small droplets that we emit, even when talking, then simple cloth masks of the sort that people wear in Japan when they have a cold are well worth wearing. :)

Reply ↓
- Martha (Smith) on April 15, 2020 1:05 AM at 1:05 am said:
  
  My understanding is that the purpose of wearing such masks is to potentially protect both the wearer and others nearby. Of particular concern is people who have the virus but are asymptomatic — they can inadvertently spread the virus to others. But also people who do not have the virus can reduce their risk of getting the virus.
  
  Also relevant: Austin is now under an order from the County Health Authority that “… face coverings must be worn when in a public building, using public transportation or ride shares, pumping gas and while outside when six-feet of social distancing cannot be maintained. Masks are not required when eating or when wearing one would pose a health, safety or security risk.” There were in fact a lot of people wearing masks at the grocery store yesterday.
  
  Also of possible interest: the CDC has a website (https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/diy-cloth-face-coverings.html) about making and using simple cloth masks.
  
  Reply ↓
Jordan on April 15, 2020 11:10 AM at 11:10 am said:

IHME calls those bands “uncertainty intervals” and it’s not clear to me they’re meant as literal 95% confidence intervals. In any case it doesn’t seem right to treat them that way for next-day projections. What I would say is that the IHME folks are positing that deaths/day will be

F(t) + noise(t)

where F is drawn from a certain class of curves (originally exp(quadratic), which is problematic for reasons addressed by others in this thread) and noise can be pretty violent. I think their uncertainty intervals are reflecting some kind of confidence interval *concerning F*. But given the wild day-to-day swings in reported deaths, I don’t think you can or should expect those numbers to land in the interval 95% of the time.

I think it would be more informative to ask: how often are deaths from day t to day t+7 within the band specified on day t-1?

Reply ↓
Kaiser on April 15, 2020 3:27 PM at 3:27 pm said:

I’m surprised we’re so harsh with this group in terms of forecast accuarcy. It’s just a hard if not impossible problem. I like to applaud them for putting out enough data so that this type of analysis can be performed. Years ago, before Google Flu Trends was mercifully killed, you couldn’t get the data from the website. The forecasts were plotted against a qualitative axis (something like “high”, “low”, I don’t recall the exact words).

Reply ↓
Carlos Ungil on April 17, 2020 3:00 AM at 3:00 am said:

Caution Warranted: Using the Institute for Health Metrics and Evaluation Model for Predicting the Course of the COVID-19 Pandemic

https://annals.org/aim/fullarticle/2764774/caution-warranted-using-institute-health-metrics-evaluation-model-predicting-course

Reply ↓
Carlos Ungil on April 17, 2020 2:57 PM at 2:57 pm said:

Influential Covid-19 model uses flawed methods and shouldn’t guide U.S. policies, critics say

https://www.statnews.com/2020/04/17/influential-covid-19-model-uses-flawed-methods-shouldnt-guide-policies-critics-say/

(the article discusses different criticisms of the IHME model, including the one discussed in this blog entry)

Reply ↓
Nate Wilairat on April 17, 2020 3:57 PM at 3:57 pm said:

The major issue in this model even for a few time steps ahead is not the assumption of stationarity in the second derivative, but the assumption for the derivative of log deaths. The ERF assumes linearity in this derivative until near 0, i.e., linear decay after social distancing. There’s strong evidence that in many countries the decay after social distancing is exponential decay, i.e., linear decay of the log derivative of log deaths. The amount of noise in this trend for Italy and some other countries is surprisingly low.

A quick simulation of an SEIR model also suggests that an intervention that reduces R to below 1 results in a trend in the log derivative of recovered (assuming fatalities is proportional to recovered) that is quickly close to linear and is indeed asymptotically linear in time.

Linear rather than exponential decay results in a bad fit to the data that is overly optimistic, which is explains the sharp decline in deaths in all of these curves. As the exponential decline continues, these predictions will look increasingly off. You can see that already in some Italian provinces where every day it predicts a massive drop-off. I’m really surprised that this thing is still up in its current state.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Hey! Let’s check the calibration of some coronavirus forecasts.

66 thoughts on “Hey! Let’s check the calibration of some coronavirus forecasts.”

Leave a Reply Cancel reply