Skip to content
 

The second derivative of the time trend on the log scale (also see P.S.)

Peter Dorman writes:

Have you seen this set of projections? It appears to have gotten around a bit, with citations to match, and IHME Director Christopher Murray is a superstar. (WHO Global Burden of Disease) Anyway, I live in Oregon, and when you compare our forecast to New York State it gets weird: a resource use peak of April 24 for us and already April 8 for NY. This makes zero sense, IMO.

I looked briefly at the methodological appendix. This is a top-down, curve-fitting exercise, not a bottom-up epi model. They fit three parameters on a sigmoid curve, with the apparent result that NY, with its explosion of cases, simply appears to be further up its curve. Or, which amounts to the same thing, the estimate for the asymptotic limit is waaaay underinformed. These aren’t the sort of models I have worked with in the past, so I’m interested in how experienced hands would view it.

I have a few thoughts on this model. First, yeah, it’s curve-fitting, no more and no less. Second, if they’re gonna fit a model like this, I’d recommend they just fit it in Stan: the methodological appendix has all sorts of fragile nonlinear-least-squares stuff that we don’t really need any more. Third, I guess there’s nothing wrong with doing this sort of analysis, as long as it’s clear what the assumptions are. What the method is really doing is using the second derivative of the time trend on the log scale to estimate where we are on the curve. Once that second derivative goes negative, so the exponential growth is slowing, the model takes this as evidence that the rate of growth on the log scale will rapidly continue to go toward zero and then go negative. Fourth, yeah, what Dorman says: you can’t take the model for the asymptotic limit seriously. For example, in that methodological appendix, they say that they use the probit (“ERF”) rather than the logit curve because the probit fits the data better. That’s fine, but there’s no reason to think that the functional form at the beginning of the spread of a disease will match the functional form (or, for that matter, the parameters of the curve) at later stages. It really is the tail wagging the dog.

In summary: what’s relevant here is not the curve-fitting model but rather the data that show a negative second derivative on the log scale—that is, a decreasing rate of increase of deaths. That’s the graph that you want to focus on.

Relatedly, Mark Tuttle points to this news article by Joe Mozingo that reports:

Michael Levitt, a Nobel laureate and Stanford biophysicist, began analyzing the number of COVID-19 cases worldwide in January and correctly calculated that China would get through the worst of its coronavirus outbreak long before many health experts had predicted. Now he foresees a similar outcome in the United States and the rest of the world. While many epidemiologists are warning of months, or even years, of massive social disruption and millions of deaths, Levitt says the data simply don’t support such a dire scenario — especially in areas where reasonable social distancing measures are in place. . . .

Here’s what Levitt noticed in China: On Jan. 31, the country had 46 new deaths due to the novel coronavirus, compared with 42 new deaths the day before. Although the number of daily deaths had increased, the rate of that increase had begun to ease off. In his view, the fact that new cases were being identified at a slower rate was more telling than the number of new cases itself. It was an early sign that the trajectory of the outbreak had shifted. . . .

Three weeks later, Levitt told the China Daily News that the virus’ rate of growth had peaked. He predicted that the total number of confirmed COVID-19 cases in China would end up around 80,000, with about 3,250 deaths. This forecast turned out to be remarkably accurate: As of March 16, China had counted a total of 80,298 cases and 3,245 deaths . . .

[Not really; see P.S. below. — AG]

Now Levitt, who received the 2013 Nobel Prize in chemistry for developing complex models of chemical systems, is seeing similar turning points in other nations, even those that did not instill the draconian isolation measures that China did.

He analyzed data from 78 countries that reported more than 50 newcases of COVID-19 every day and sees “signs of recovery” in many of them. He’s not focusing on the total number ofcases in a country, but on the number of new cases identified every day — and, especially, on the change in that number from one day to the next. . . .

The news article emphasizes that trends depend on behavior, so they’re not suggesting that people stop with the preventive measures; rather, the argument is that if we continue on the current path, we’ll be ok.

Tuttle writes:

An important but subtle claim here is that the noise in the different sources of data cancels out. To be exact, here’s the relevant paragraph from the article:

Levitt acknowledges that his figures are messy and that the official case counts in many areas are too low because testing is spotty. But even with incomplete data, “a consistent decline means there’s some factor at work that is not just noise in the numbers,” he said. In other words, as long as the reasons for the inaccurate case counts remain the same, it’s still useful to compare them from one day to the next.

OK, a few thoughts from me now:

1. I think Mozingo’s news article and Levitt’s analysis are much more clear than that official-looking report with the fancy trend curves. [Not really; see P.S. below. — AG] That said, sometimes official-looking reports and made-up curves get the attention, so I guess we need both approaches.

2. The news article overstates the success of Levitt’s method. It says that Levitt predicted 80,000 cases and 3,250 deaths, and what actually happened was 80,298 cases and 3,245 deaths. That’s too close. What I’m saying is, even if Levitt’s model is wonderful [Not really; see P.S. below. — AG], he got lucky. Sports Illustrated predicted the Astros would go 93-69 this year. Forgetting questions about the shortened season etc., if the Astros actually went 97-65 or 89-73, we’d say that the SI prediction was pretty much on the mark. If the Astros actually went 93-69, we wouldn’t say that the SI team had some amazing model; we’d say they had a good model and they also got a bit lucky.

3. What to do next? More measurement, at the very least, and also organization for what’s coming next.

P.S. Commenter Zhou Fang points us to this document which appears to collect Michael Levitt’s reports from 2 Feb through 2 Mar. Levitt’s forecasts change over time:

2 Feb [305 deaths reported so far]: “This suggests by linear extrapolation that the number of new deaths will decrease very rapidly over the next week.”

5 Feb [492 deaths reported so far]: “Linear extrapolation, which is not necessarily applicable, suggests the number of new deaths will stop growing and start to decrease over the next week.”

7 Feb [634 deaths reported so far]: “This suggests that the rate of increase in the number of deaths will continue to slow down over the next week. An extrapolation based on the sigmoid function . . . suggests that the number of deaths will not exceed 1000 and that it will exceed 95% of this limiting value on 14-Feb-2020.”

9 Feb [813 deaths reported so far]: “An extrapolation based on the sigmoid function . . . suggests that the number of deaths may not exceed 2000 . . .”

12 Feb [1111 deaths reported so far]: “An extrapolation based on the sigmoid function . . . suggests that the number of deaths should not exceed 2000 . . .”

13 Feb [1368 deaths reported so far]: “This together with the data on Number of Cases in (D) suggests that the rate of increase in the number of deaths and cases will continue to slow down over the next week . . .”

17 Feb [1666 deaths reported so far]: “This suggests that the Total Number of Hubei Deaths could reach 3,300 . . . Note that this analysis is based only on Laboratory Confirmed Cases and does not include the 17,000 Clinically Diagnosed Cases.”

21 Feb [2129 deaths reported so far]: “Better analysis in Fig. 4 gives asymptotic values of 64,000 and 3,000 for Number of Cases and Deaths, respectively.”

23 Feb [2359 deaths reported so far]: “final estimate” of 3,030 deaths

2 Mar [2977 deaths reported so far]: “asymptotic values” of 3,150 deaths in Hubei and 190 for the rest of the country.

Some of the above numbers are Hubei, others are for all of China, but in any case it seems that Zhou Fang is right that the above-linked news article is misleading, and Levitt’s predictions were nothing special, just “overfitting and publication bias.”

115 Comments

  1. Elio Campitelli says:

    If the actual observed quantity is the new number of cases, then wouldn’t we be talking about the first derivative?

  2. Ben Goodrich says:

    Just pointing out that rstanarm has a function, stan_nlmer, that can estimate models like this

    http://mc-stan.org/rstanarm/reference/stan_nlmer.html

    No one ever asks about it on Discourse, so I assume it is not used very much. Using MCMC seems to be much more reliable for models like this than using maximum likelihood, even if the journal you are targeting will not publish Bayesian analyses. If anyone has questions about using it, post on Discourse.

    • Andrew says:

      Ben:

      Yeah, I think people have the sense that maximum likelihood or least squares is this rigorous, well-defined thing, and that Bayesian inference is flaky. The idea is that when using Bayesian inference you’re making extra assumptions and you’re trading robustness for efficiency.

      Actually, though, Bayesian inference can be more robust than classical point estimation. They call it “regularization” for a reason!

      • Terry says:

        This is all possible now because of the increase in computing power? This wasn’t possible thirty years ago?

        Of course, Maximum Likelihood on this scale wasn’t possible thirty years ago either.

  3. Rahul says:

    Being overawed by Levits success is a bit like picking the best mutual fund manager based on last 5 years historical performance.

    There’s bound to be an element of regression to the mean here?

    • jim says:

      “There’s bound to be an element of regression to the mean here?”

      That’s what Andrew means when he says Levitt “got lucky”.

      Probably there will be a “regression to the mean” but OTOH what’s the spread of possibilities given his model? Dunno, right? So maybe “regression to the mean” will be within 5%-10%, which would be a damned fine prediction under the circumstances.

      Not to say the dude is a god or anything but he did make a good prediction so we should give him credit for that and keep an eye on the next round.

    • jim says:

      RE: Zhou Fang’s update: so not quite a regression to the mean I guess!

  4. Zhou Fang says:

    there is probably a decent basis for this sort of curve fitting procedure when you are talking about an epidemic in a single location (Wuhan) under a fixed policy of social distancing and isolation. When extended to locations more broadly with widely different environments and regulations such as the US and also even more so the world as a whole, I suspect that this sort of procedure will be much less effective.

    The second derivative might temporarily be negative because of restrictions in one location but when it spreads to a location with poor distancing then the curve could bend upwards again.

    • Absolutely. I think that’s what we’re seeing in Italy right now. see the past few days of the italy curve here:

      https://ourworldindata.org/grapher/daily-covid-deaths-3-day-average

      I suspect we hit a local peak for some region march 22,23 or so, and it started to decline, and then some other region took over and it continued to climb… and when I say region this could even be individual cities.

      • Anoneuoid says:

        That’s an interesting feature, I wish we had number of tests over that period but the latest I saw for Italy was March 9th.

        Also, what did Iran do to stop the acceleration in deaths?

        • Rahul says:

          Reporting changes?

        • My Iranian friends mostly think they just stopped reporting them.

          • Anoneuoid says:

            Actually, if you add China and zoom out the date axis, Iran looks a lot like a lagged China.

            • Apparently there’s evidence that the number of cemetery urns being delivered to Hubei is far larger than the reported death numbers. So maybe they’re both similar for the same reasons?

              :-(

              • Anoneuoid says:

                That isn’t a good proxy because the bodies are being cremated rather than buried more frequently.

              • Right, they’re all being cremated, and there’s maybe 5000 to 10000 urns being delivered… so it’s likely a lot more people died than the ~3000 official numbers.

              • Anoneuoid says:

                Right, they’re all being cremated, and there’s maybe 5000 to 10000 urns being delivered… so it’s likely a lot more people died than the ~3000 official numbers.

                It is more likely it makes business sense to buy in bulk or something (is there a current subsidy to take advantage of?). I really don’t see any reason to read into that number.

              • I think you’re missing the fact that it seems likely to be *urns full of ashes* not empty urns for later use.

                But I agree with Zhou below that it’s hard to know.

              • Anoneuoid says:

                I think you’re missing the fact that it seems likely to be *urns full of ashes* not empty urns for later use.

                But I agree with Zhou below that it’s hard to know.

                This type of proxy made sense when all we knew was from youtube videos out of China back in January. There is no reason to rely on low quality information like that any more.

              • So you believe the government of china is completely honest about its situation?

              • Zhou Fang says:

                Daniel:

                FWIW, I think they are more likely than not honest. Not to say the numbers are correct, but rather errors are born out of e.g. failure to test enough people, than say, some guy just making up numbers. Their behaviour in terms of lifting the Wuhan lockdown, and the lack of recent exported cases from China in locations without a travel restriction, etc all tend to corroborate the figures *qualitatively*, if not in terms of exact quantity.

              • Anoneuoid says:

                So you believe the government of china is completely honest about its situation?

                Another strawman.

                But actually I trust the Chinese government over other governments at this point only because they are actually publishing smoking data and allowing guidelines that include vitamin C. I wouldn’t trust any government on this until they do that.

              • Anoneuoid says:

                Three weeks ago, the paramedics said, most coronavirus calls were for respiratory distress or fever. Now the same types of patients, after having been sent home from the hospital, are experiencing organ failure and cardiac arrest.

                “We’re getting them at the point where they’re starting to decompensate,” said the Brooklyn paramedic, who is employed by the Fire Department. “The way that it wreaks havoc in the body is almost flying in the face of everything that we know.”

                This is vitamin C deficiency. I haven’t seen a single person tested for it yet.

              • Anoneuoid, it was a very straightforward question, not a straw man, I really don’t know what to make of China. I personally think they probably vastly underreported deaths, in the same way that Italy probably has, because after the lockdowns people died in their homes. Since no tests were necessarily done, perhaps they don’t meet the official reporting criteria etc. My personal guess is that rather than the ~ 3k the true toll was probably 2 to 5 times higher.

                Also, i’ve always thought the vitamin C idea was very interesting. Apparently it’s being tried in new york:

                https://nypost.com/2020/03/24/new-york-hospitals-treating-coronavirus-patients-with-vitamin-c/

              • Anoneuoid says:

                Anoneuoid, it was a very straightforward question, not a straw man, I really don’t know what to make of China. I personally think they probably vastly underreported deaths, in the same way that Italy probably has, because after the lockdowns people died in their homes. Since no tests were necessarily done, perhaps they don’t meet the official reporting criteria etc. My personal guess is that rather than the ~ 3k the true toll was probably 2 to 5 times higher.

                Also, i’ve always thought the vitamin C idea was very interesting. Apparently it’s being tried in new york:

                https://nypost.com/2020/03/24/new-york-hospitals-treating-coronavirus-patients-with-vitamin-c/

                All governments lie all the time, its ridiculous to think the Chinese government is honest about that or anything else.

                It is good that in NY patients are getting IV vitamin C but are they making sure it is correcting the deficiency? In China they are giving 10x more: https://www.youtube.com/watch?v=NBbbncTR-3k

                I haven’t seen any data from either place on whether this is sufficient to maintain the patients in the normal range.

              • Zhou Fang says:

                I’d be very cautious using that as a datapoint – we don’t know enough about the baseline level of deaths (for a city of 11 million during a season that usually sees peak deaths that can be a lot), and the assumption of uniformity amongst the various crematoriums given the non-uniformity of the disease and deaths in general makes just multiplying by the number of sites and the time period a rather heroic exercise in extrapolation.

              • Dan F. says:

                I would be very cautious about taking this seriously for a number of reasons.

                From a Bayesian point of view the fact that this “fact” appears mainly in the US media is a strong prior against it. Also it is difficult to see what motivation the Chinese government would have to suppress death figures.

                Any crematory periodically stocks up on urns (in a city the size of Wuhan there must be 200-300 deaths a day in normal conditions – if its crematories serve a wider area, the number could be much higher). A crematory subjected to unusual pressure, as any in Wuhan must have been recently, possibly needs to stock up unexpectedly (normally the number of deaths in a big city is subject to very little variance). Either standard operating procedure or reaction to the current situation could explain a large shipment of urns in a quite mundane manner. The question then becomes whether there is any evidence for regular and repeated stocking of unusual numbers of urns. While this may in fact be happening, the few photos published on Chinese websites do not seem to constitute evidence of this.

              • Sure, but just as in Italy, there’s every reason to believe that the official deaths due to the disease are biased low. The shipment of an excessive quantity of urns relative to expected normal mortality, and the official numbers, is evidence that is consistent with the idea that real fatalities were dramatically higher than official ones. It’s also consistent with other hypotheses. It’s evidence against for example the idea that say total fatalities were depressed by reduced traffic and pollution.

                None of it is strong evidence however.

            • jrkrideau says:

              That isn’t a good proxy because the bodies are being cremated rather than buried more frequently.

              It strikes me that people die all the time. If one in in a state of complete lock-down, bodies will pile up.

              We may be seeing Covid-19 + regular mortality urns.

              • Anoneuoid says:

                A source with knowledge within provincial civil affairs bureau claimed that Wuhan had 28,000 cremations within just one month. The insider added that many of the fatalities were people that had died at home and had neither been diagnosed with nor treated for the virus.

                https://www.inquisitr.com/5970405/wuhan-death-toll-42000/

                This makes sense, they were welding people into apartment buildings and calling people who tested positive criminals. We do not want this hysteria here.

              • Sure, but what is normal mortality for the region? 10M people, I’d guess on the order of 100k/yr so 2 mo would be ~16k not ~40k

                It seems reasonable that a disease like this would double the mortality rate in a region for 2 mo. So I think what’s really happening is no effort at all was spent on determining cause of death at home and only people who died in hospitals who were there at the time of lockdown were counted. The true till was 5-10x the official one

              • Zhou Fang says:

                Daniel: Wuhan is also a regional capital for Hubei (population 59 million) so it’s possible city crematorium facilities also service people in the surrounding area. Further ~40k is based on taking some facilities on some days and multiplying, there’s likely a reporting bias there. Too many unknowns.

        • Rahul says:

          Anoneuoud:

          Since you seem good about evaluating new data: have you seen the interview by the Israeli doctor in Italy who says that one of the new things they discovered is that laying patients on their stomachs while on the ventilator improved respiratory outcomes 100%?

          Any comments on that?

          • Carlos Ungil says:

            https://www.atsjournals.org/doi/abs/10.1164/rccm.202003-0527LE

            “A surprising finding that alternating body position is followed with increased lung recruitability is interesting but needs to be confirmed. The improvement in oxygenation at prone positioning was not statistically significant but seemed to be clinically relevant.”

            • Rahul says:

              Thanks!

              At this point we are grasping for straws.

              But I’ll take clinical anecdotal evidence over an RCT for now.

              That’s one of the issues coming up: I’d rather regional variations on treatment be tried, what even if some prove ineffective or damaging, than stay with a uniform standard of care.

              If we experiment we risk doing worse but also chance discovering something awesome.

              • Anoneuoid says:

                I didn’t hear that but I’m pretty sure I read like a month ago that aspirating mucus and laying in the prone position halved the mortar!ity rate somewhere in China.

  5. Martha (Smith) says:

    “there’s no reason to think that the functional form at the beginning of the spread of a disease will match the functional form (or, for that matter, the parameters of the curve) at later stages. It really is the tail wagging the dog.”

    +1

  6. Bill Spight says:

    The following video from MinutePhysics (Henry Reich) may be of interest. One point he makes is not to plot new confirmed cases versus time, but versus total confirmed cases. On a log-log graph, with exponential growth you get a straight line. The deviations of China and South Korea are obvious, while almost every other place is on the same line. He also smoothes the data with a moving average.

    https://www.youtube.com/watch?v=54XLXg4fYsc&frags=pl%2Cwn

    • Carlos Ungil says:

      If you just want to recognize exponential growth it may be easier to just plot the daily percentage increase (which is simply new cases as a proportion previous cases). With exponential growth you get an horizontal line (constant rate of growth) above zero.

      I like log-log plots as much as the next physicist, but this may not be the best place to use them (in particular when the data doesn’t really mean much to start with, given the continuous changes in testing capacity and policies and the lack of comparatibility between countries).

    • Tom Passin says:

      That should be a semilog plot of total cases, not a log-log plot. A constant growth rate shows up as a straight line on a semilog plot.

      I have been following the data as published by John Hopkins –

      https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

      I have looked a a lot of the curves for various countries. I don’t think that there is much point in trying to do complicated statistical analyses because the data is too varied and there’s not enough. But I have observed some regularities that are interesting.

      1. There are basically two types of countries, those that effectively jumped on it right away, and those that did not. Each group shows similar characteristics, and each group is distinctively different from the other. China is in between the two. Those countries that did not act effectively all ended up with pretty nearly the same growth rate in “confirmed” case count – about a decade in 8 days, which is a bit more than 30% per day.

      2. Many countries, including the US, seems to have had a few cases at the beginning that were well contained. These patients were apparently pretty well isolated. But at some time later, one or several infected people either got out of isolation infectious (perhaps some of the medical staff) or were new arrivals (like the woman in South Korea who infected a large church congregation). This produced a step or steps in the number of cases. After this, in group 2 countries, either the case count increased relatively slowly to reach the 30%/day figure e.g., the US), or the count immediately shot up and then reduced to that value (e.g. Italy).

      Most or all group 1 countries also had such step events, but they generally discovered them quickly quickly and got back under control.

      Strangely enough, India is somewhat of an exception. Their case count looks like a typical group 2 country except that the steady-state growth rate has been distinctly lower than most of the others.

      I take away a few conclusions from all this:

      1. Even if you have good isolation of patients, it is almost certain that there will be some leakage into the population. So you need to be able to detect that and jump on it.

      2. Some countries’ data may be affected by inaccurate reporting (whether politically motivated or not) or by the intensity of testing, but with an exponential growth most of these things only serve to delay the reported curves by a few days. For example, the recent big jump in the level of testing in New York does show as a small step in the data, but it doesn’t really make much difference after a day or two.

      3. The derivatives are noisy but usable; second derivatives would probably be so noisy as to be useless.

      • Brent Hutto says:

        I agree with your take, highly speculative as it must be given the paucity and low quality of the data, and to me one takeaway thought emerges.

        There is a certain fundamental spread of the virus in any functioning society. You can delay it, spread it out, maybe “flatten” it a bit but at some point that rate you’re eyeballing at around 30% per day will emerge.

        It could only be indefinitely suppressed with TOTAL isolation of every infected individual at the very beginning. As soon as you make allowances for real-world things like quarantined elder care facilities requiring health care staff (along with someone to feed them, clean, etc.) then there can’t be long-term TOTAL isolation.

        Any thinking that goes beyond where we’re going to find enough ICU beds next months needs to be in terms of managing this novel disease once it reaches steady-state endemic status. We could take your 30% per day ballpark and get a pretty good idea of when that steady state will happen a national level, in which case our efforts should be entirely focused (IMO) on managing that transition to spread out the impacts as best we can in time and place.

        What seems to be happening now are locally apocalyptic concentrations in a few geographical areas, embedded in a more manageable steady spread of the virus throughout the entire population.

        • 30% per day *is* apocalyptic (doubling every 2.6 days). We hit half the US population about 55 days after the 100th case at that rate.

          in my opinion what we’ll see is that the very fast acting localities will have an extended period where they seem to have control of it, which may be ending in some places now (like south korea). Eventually the caseload exceeds the capacity to test and trace. AT that point, stronger stay at home orders will be necessary for ~ 2-3 wks to bring the caseload below the test and trace capacity… and then they’ll re-start test and trace.

          It seems like the US test and trace capacity was overwhelmed some time in mid Feb. then we had about a month where everyone was talking about “we should have more testing and tracing” while the infection was spreading widely in the US. Around the end of Feb I was concerned enough that it took some convincing to let my kids play soccer games… By Mar 4 I had pulled my kids from school. If we had shut the country Mar 4 or so, we’d be like South Korea today. Instead we shut CA Mar 19, 15 days later, and people were going to the beaches and parks… so it probably didn’t really work until about mar 24, which is about 206 times worse than Mar 4 at that above growth rate.

          I think soon South Korea will do some kind of stronger restriction on movement, work through a caseload bolus, and start reopening in 3 weeks or so with their strong test-and-trace methods.

          • Brent Hutto says:

            But you’re just shifting the time frame around with those measures. Eventually, the rate will commence more or less that 30% per day rate. Presumably that’s rate that will obtain until it saturates the population at somewhere in the 50%-80% range having been infected.

            Not every place in this country or the world is going to experience a NY Metro or Lombardy level of crisis for a few weeks or months. But every place in the world is going to eventually experience endemic COVD-19 (and its mutations).

            Whether a location’s experience is like South Korea’s versus Washington State’s versus Northern Italy’s versus Iceland’s seems to be only weakly (and short-term) associated with the severity and timing of isolation measures. There are a great many other factors at work and beyond demographics (age and comorbidity) and maybe population density there is far too little knowledge and data extant to do more than speculate about what those other factors are.

            • Brent Hutto says:

              Postscript to above:

              By “experience endemic COVD-19” I mean experience long-term, continuing incidence of illnesses and deaths either similar to influenza or potentially several times worse than influenza. Year after year into the indefinite future, unless by some miracle a coronavirus vaccine is invented that’s far more effective than flu vaccines.

            • Phil says:

              Brent, what do you mean “you’re only shifting the time frame around with those measures”?

              I think you probably mean that any exponential growth, no matter how slow, will eventually reach everyone, which is true. But (1) if the measures are sufficiently effective you can get out of the exponential growth regime altogether and get to exponential decay instead; (2) fewer people will die if we have slow exponential growth rather than fast, because we can provide a level of medical care to fewer patients that we can’t provide to more patients; (3) presumably we are developing improved treatment, so people who enter care in a few months will be more likely to survive than those entering care now, so that’s another reason to delay people; (4) eventually we expect to have a vaccine. I have a few more but I’ll stop, because I may have gotten the wrong end of the stick as far as what you mean.

              • Brent Hutto says:

                The amount of time-shifting achievable is small compared with the time frames of your points #3 and #4. And if I understand your point #1 correctly I do not think that is achievable with real-world measures.

                The goals right now should be focused on your point #2. But my ideal would be to find a way to anticipate the worst geographical and temporal concentrations and somehow reallocate resources or spread the treatment burden for the places and times where it would make the most difference.

                I have two specific complaints with how things are being handled right now (leaving aside the horrific bungling of USA public health officials in the Dec-Jan-Feb time frame). Both objections are related.

                First, the messaging to the public should be far more specific than that “spreading the curve” infographic. Everything being imposed should be clear that the goal is to make the transition from no COVD to endemic COVD as manageable and humane as possible. Not defeating it, not stopping it, not containing it. People are going to be enormously outraged when they find out that all this stay-at-home stuff was never going to mean NOT having 50K-250K Americans a year dying from coronavirus.

                Second, measures being imposed should not only be PRESENTED as parts of short-term management of transition into a long-term (maybe permanent) rate of non-trivial disease and death. They should be planned an evaluated in light of their long term viability. If you could stop a virus by two weeks of intensive measures, then almost anything short of having people starve to death might be worth considering. But anything you’re doing now you are sure as heck going to be doing six months from now.

                The things that you can keep up on a state or national level for six months or a year without destroying your society is a much shorter list than the things you can do for two or three weeks on a local or regional scale. I do not perceive that anyone involved in this COVD-19 response is thinking in the sorts of time scales that actually apply. If they are thinking about it, they are keeping it under wraps.

      • Anoneuoid says:

        with an exponential growth most of these things only serve to delay the reported curves by a few days. For example, the recent big jump in the level of testing in New York does show as a small step in the data, but it doesn’t really make much difference after a day or two.

        Look at the number of tests performed vs cases in NY: https://i.ibb.co/7vTm8cC/newyork.png

        Despite all the differences in reporting, etc the number of tests in a state is highly correlated with number of cases: https://i.ibb.co/pXYVx71/covidstates.png

        I don’t think it is possible to interpret the cases/death data without looking at the testing.

        • Absolutely, understanding the testing is critical to understanding the reported numbers. unfortunately there is no real model for testing, why does it happen, where, according to what rules etc?

          On the other hand, if we just did a few thousand wells on a few hundred thousand people scattered around major cities, we’d know the extent of our real infection rates.

          My wife showed me a thread on a biological testing methodology using “bar coding” PCR followed by sequencing that could test something like 20k individuals per machine and give individualized results in a day. These machines are available all around the country in Universities. We could literally be doing a million tests a day using this kind of technology no problem.

          • Brent Hutto says:

            I do wonder if the people who ought to understand this point actually do. Doing “testing” for clinical reasons on a given patient (or potential patient) is not the same as doing “testing” to understand a disease and how it is being communicated. And what we see every time we try to look at the supposed data for COVD-19 is that you can’t just add up a whole bunch of clinical “tests” then peer at the tea leaves (or crank then through a curve-fitting algorithm) to convert those results into epidemiological insight.

            • I’m pretty sure this distinction is completely missing to the typical MD, I think some public health people at typical local government levels might know about the need, but mostly have no power to do anything. I think at the higher levels of public health/epi community it’s known, but the anti-intellectual situation in the US makes it so these people have been totally marginalized in terms of power to control resources or policy.

          • Anoneuoid says:

            But at least you can see that the number of tests performed is accelerating while the proportion of positive cases stays approximately the same.

            Any estimate of the doubling time is just measuring the rate of testing in that case. I don’t see how it “won’t make much difference in a day or two” when the rate of testing is rising exponentially like expected for an epidemic curve.

            • it seems to me a model for the tests is that it was essentially zero tests until Mar 16 when everyone finally got the supplies, and it’s been essentially full capacity at a constant slope (equal number of tests each day) since.

              Tests are given to people who are suspected of having the disease, the accuracy of the criterion for testing remains about constant at 10% of those meeting the criterion have the disease for the moment.

              What this tells me is that at the current testing level, the *growth rate* of the confirmed cases is an accurate assessment of the growth rate of the viral spread. The absolute number could be anything though… so infections have been doing basically

              n = exp(log(2)*(t-t0)/3)

              and we don’t know t0

              • Brent Hutto says:

                I think there’s a gap in your logic, Daniel.

                Yes, the 10% rate staying constant means their screening continues to identify people with a 1-in-10 chance of being infected. So the growth rate of positive test is reflective the growth rate of people meeting the pre-test screening questionnaire.

                Asymptomatic individuals show up nowhere in that flowchart, though. You’re making a hidden assumption that the people seeking testing (who are screened with questions, then tested if they pass the screener) make up a non-varying proportion of those who are capable of spreading the virus. That may be true, but we don’t know and (apparently) will never know.

                That number of asymptomatic, non-test-seeking individuals who are shedding virus is the absolutely most crucial parameter in the entire epidemiology of this thing. And at least in USA nobody is making any attempt whatsoever to estimate that parameter.

              • Another way you can put it is

                n = C*exp(log(2)*(t-t0)/3)

                where t0 is just assigned the value say Feb 15, and we don’t know C

              • Brent, as long as the asymptomatic *fraction* isn’t changing rapidly with time, the logarithmic growth rate of the positive symptomatic test-seekers is basically the same as the logarithmic growth rate of the overall population.

                I don’t see any reason why we’d expect significant changes in the asymptomatic fraction, and by significant I mean doubling or tripling over the last month… sure it might go from say 80% symptomatic on the cruise ships in early Feb to 60% asymptomatic once it’s in the broader population say first week of march, but by the time it’s in the broad population over the last 3 weeks, I strongly suspect the fraction of people who never seek care is pretty constant.

              • Sorry, typo: 80% symptomatic (20% asymptomatic) on cruise ships, to say 60% symptomatic (40% asymptomatic) by early march in the broader population.

              • Carlos Ungil says:

                I can’t follow your “model”. You say “equal number of tests each day”, but wouldn’t the number of “people who are suspected of having the disease” increase over time?

                If you make 100’000 test per day and 10% are positive every day you find 10’000 positives per day. You don’t learn anything about the propagation of the infection. (Which may be what’s happening, but I don’t think that was your point?)

              • Carlos, true, the details do matter here. The important thing is that the graphs of cumulative cases like New York reflect very close fits to exponential growth… So the “approximation” of say 31% of tests are positive based on the linked graphs is really just an average slope of the cumulative infection curve through time during that time window. As time goes further, you’ll expect more and more of the tests are positive… which is why say New Mexico is at 2% of tests, and NC is at 5% of tests, and NY is at 31% of tests… each is at a farther point along the exponential growth, but each is averaged over the same time window.

                As a doctor, eventually you just stop testing because basically everyone seeking care is positive. I’d expect that to occur in NY pretty soon. If testing continues it will be because the public health agencies want to continue the time series to monitor things, not because the information is needed for patient treatment.

                On the other hand, as many including myself have been saying, what’s needed in places like NM and NC is *community* testing, because we’re still “earlier” in the infection there, and could check to see whether mitigation measures are working by comparing community growth rates, and also divert resources to appropriate locations.

              • Brent Hutto says:

                My guess (which it must be with no data) is that far less than 80% and probably even less than 60% of those infected with the virus in the general population get ill enough to seek treatment. I’d imagine a large majority of 65-and-up infected people get sick but I doubt anywhere near half of under-30’s get sick.

                The other unknown parameter which will start to become significant is how many people have had an infection but are over it. Presumably they will be at least somewhat immune to further infection and presumably they can not pass the virus on once they are over it and immune (again, we don’t actually know that it’s just surmise).

              • Anoneuoid says:

                the accuracy of the criterion for testing remains about constant at 10% of those meeting the criterion have the disease for the moment.

                Well this is closer to 15% in IL/IN, 30% in NY, and 90% in DE. I’m just picking ones that are pretty stable over time, some of the testing curves are clearly artefactual like in MD where about 30% were positive Mar 16, which rose to 90% until two days ago when they must have started reporting more negative tests or something (total tests jumped about 10k) so now it is 9%. We’ll see if the same happens for Delaware eventually.

                But your model doesn’t really work because if the virus is infecting orders of magnitude more people, shouldn’t a much larger percent of the tests being coming back positive should be testing positive?

                Instead it has been a relatively stable ratio since March 4th. The first day data was available from covidtracking.com it was 118/866 = 12% while yesterday it was 139,061/831,351 = 16.4%. According to your explanation that is a 1000x increase in prevalence with at most a 2x increase in percent of positive tests: https://i.ibb.co/WprQBGk/covidplots.png

                However, that is what we would expect if say ~10% percent of the population has a fever/cough at any one time and ~10% of those are infected with this virus (or some combination of that and false positives). So then the prevalence in the population is stable at ~1%. Of course, it is not exactly stable but not increasing by 1000x in the last 2 weeks.

                And I mean everywhere I look at data it seems like mortality rate is down.

              • Brent Hutto says:

                Daniel said: “On the other hand, as many including myself have been saying, what’s needed in places like NM and NC is *community* testing, because we’re still “earlier” in the infection there, and could check to see whether mitigation measures are working by comparing community growth rates, and also divert resources to appropriate locations.”

                When the history of this debacle is written, I am virtually certain a consensus conclusion will be that great damage was done by the lack of political will to do any part of what you describe in that paragraph. Shameful.

              • from a calculus perspective:

                if

                y = exp(t/c)

                then

                dy/dt = 1/c * exp(t/c) = 1/c * y

                if you test a constant dT/dt people per day then you can follow the curve dy/dt = C(t) dT/dt until C(t) reaches 1, at which point you saturate your testing ability and the most new cases you can get per day is the constant dT/dt

                During this time period graphed, we’ve been below the dT/dt threshold (the slope of the blue lines in the linked graph is bigger than the slope of the red curves). Soon we’ll saturate the testing ability in places like NY and will hit that “we can’t test fast enough” point.

                if you calculate 1/17 * integrate(C(t),t,mar 16, mar 28) you will get the 0.31 found in the new york graph, or close to it.

              • Anoneuoid says:

                The above post got submitted prematurely, so please excuse any typos.

                I wanted to add:

                Hospital visits in the US down: https://www.cdc.gov/flu/weekly/weeklyarchives2019-2020/data/senAllregt12.html
                Mortality across Europe is down: https://www.euromomo.eu/
                Mortality in the US is down (download the data to see it): https://gis.cdc.gov/grasp/fluview/mortality.html

                Now all that data could have issues, but why is there not a single official datapoint indicating there is a real problem due to this virus? Instead we are relying on a bunch of proxies reported by the news like number of 911 calls, number of urns delivered, the extreme measures governments are taking, etc.

                Now, as a policy I always believe the opposite of what the news tells me (which is a heuristic that has served me well so far) so I am very biased on that. But you have to admit that is low quality data. I don’t see how it could be higher quality than what the cdc and eurmomo are putting out.

              • Brent, yes it’s shameful.

                Also as to your question about the younger population, while the fraction that get seriously ill may be smaller, my understanding is the fraction of people on ventilators who are under ~40 is high (maybe around 50%) it’s not like it doesn’t affect this age group, but yes it probably is a lower fraction than for the older people. The truth is we have almost zero information about this quantity. That’s shameful.

                As to the integration problem. you can estimate this integral using a midpoint rule… Compare the slope of the red curve at say mar23 to the slope of the blue curve, I’d guess that the red curve is about 0.3 times the slope of the blue curve… hence the cases ~ 0.3*Tests

              • Anoneuoid says:

                if you test a constant dT/dt people per day

                This is what number of tests/cases per day looks like (ie, not cumulative):
                https://i.ibb.co/xmp71P9/testscasesperday.png

                So it is not constant, at least not until a few days ago. But keep in mind california has 64k pending tests…

              • Without the calculus, and in words, the fact that the cases follows some line over a period of ~ 10 days is down to that this is a *totally* expected property of the exponential function, it’s very well approximated by a line over “short” timeframes, where short is defined as a small multiple of the e-folding time which in this case is about 4-5 days, so if you look at time frames of Mar 16 to Mar 26 you’re talking about ~ 2 e-folding times, and a line should fit that just fine over that time-period using the slope of the exponential curve at the midpoint.

      • Tom Passin says:

        To follow up, here are three graphs to illustrate my claims. They also go to show the initial similarities between the countries –

        http://tompassin.net/pub/covid/

        The John Hopkins data were current as of this morning, so they are presumably up to yesterday sometime.

        The file covid_us_others.png shows the US in black and various other “group 2” countries. You can see that all the countries had an initial period of containment, and then a breakout, after which they all got to pretty near the same slope. That’s the 30%/day slope. The right-most curve is for India, and has a somewhat milder slope.

        Italy had the misfortune to take a larger hit a around 30 days, but it’s also had more success at containment since then than the others.

        The file covid_us_asian.png shows the US in black and various Asian countries in cyan. The Asian countries , in what I called Group 1, are Hong Kong, Taiwan, Japan, and South Korea. You can see that they all had containment of a few initial cases, then a smallish breakout which they were able to jump on and control within a week, and then a much more modest rise.

        Korea had a huge breakout at about day 30. This as I understand it is known to have been a single infected woman who escaped the monitoring and infected the congregation of a large church. They seem to have detected this breakout and jumped on it within days.

        But these countries haven’t flattened out their curves altogether (unlike China, not shown on this plot), and two of them had an increasing rate of growth in the last week.

        The third graph shows the US, China, and S. Korea, with China being time-shifted 45 days and Korea shifted 15 days, to match up with the US as best I can by eye. You can see that their initial experience was nearly identical to the US, but they achieved much better and quicker control. Note that I shifted the Korean data to match the US at their large mid-point breakout.

        I think it’s pretty clear from these data that the initial experience of a country after a breakout is just about the same for most countries, and for most it’s that 30%/day slope. So if you wait a week before you get started containing it, you will be an order of magnitude behind. Two weeks, two orders of magnitude. Well, it’s about a factor of ten in 8 days, but who’s quibbling? The US had say *45 days* lead time from the Chinese experience. Ouch.

        It’s also clear that a modest breakout – say ten or 20 infected people – is enough to start off a large surge in cases. And that will be really hard to prevent. You have to have really good monitoring and surveillance of the cases, and action within a few days, or you will end up back where you left off.

        Speculative? Yes, maybe. I think the data are pretty clear, and I don’t see an easy way to quantify it more exactly given the detailed individual country histories and the time spans involved.

  7. Swamp Economist says:

    Another +1 about unknown and unstable functional forms and assumptions about asymptotic limits driving the conclusions.

    Also it is not enough that the second derivative is negative if it is close to zero, if the base is large enough – which we don’t know without testing, (approximately) linear growth is still a problem for many cities. So need to also look at the third derivative.

    +1 about data. Really need data on test negatives.

  8. jim says:

    “That’s too close. What I’m saying is, even if Levitt’s model is wonderful, he got lucky. “

    What we know is that his prediction was accurate. Whether or not he was “lucky” at all, or how much luck he had, isn’t knowable. Right? I mean, if he missed by a wide margin would you say it’s unlucky? :)

    It’s surprising that he got so close, and it would be very surprising if he got that close again. But when a forecaster considers the relevant factors carefully and accurately, s/he weighs the odds in h/her favor. So when s/he occasionally hits the nail on the head, it’s not really luck. Its a function of skill in building an accurate model that narrows the range of possibilities.

    • Phil says:

      When a forecaster hits the nail on the head in such a statistically noisy environment it’s a combination of luck and skill. “How much luck isn’t knowable”, well, sure, there are matters of definition and we don’t have the data etc etc, but we still know he got lucky. Joe DiMaggio was an extremely skilled baseball player but it still took a lot of luck for him to hit safely in 56 consecutive games. Luck and skill are not mutually exclusive: it’s possible to be both lucky and skillful. Indeed, my personal twist on the old saying that “it’s better to be lucky than good” is: it’s better to be lucky and good.

      • Zhou Fang says:

        More cynically, a lot of the time it’s overfitting and publication bias. I’ll note in this case that Levitt’s prediction was *not* made on the 31st of January, but on the 7th of Feb, which is fortunate because if he used the datapoints from the much sharper decline around the 31st he would have greatly underestimated the cases and deaths.

        • Zhou Fang says:

          See my comment higher up the page. I think at this point either the reporter misunderstood what Levitt meant, or Levitt can rightfully be accused of some pretty bad cheating.

      • jim says:

        IMO if you ignore the last two digits it’s a lot less lucky but still an excellent and valuable forecast. If that could be repeated it would be great.

  9. Dave says:

    Two points:

    1) Fitting such simple models to pandemics reminds me of this tweet from Carl Bergstrom: https://mobile.twitter.com/CT_Bergstrom/status/1241551432811085825
    Basically, curve fitting to HIV cases with data through 1987 led to a prediction of only 200K total cases. You can see the paper here: http://documents.aidswiki.net/PHDDC/BREG.PDF
    My understanding is that these types of simple curve fitting models for epidemics tend to work best after the peak of new cases has been reached.

    2) Most of the reduction in new cases have come by pretty extreme lock downs. So what happens when you relax those? Do we really think China won’t see another outbreak if they try to resume normal life? We’ve already seen that places that had success and then tried to go back to work have seen new accelerations in cases, like Hong Kong. If China has another big outbreak (of the same virus), does that then invalidate Levitt’s predictions?

  10. Martha (Smith) says:

    Time for the quote from John Von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

    • Richard Juster says:

      Unfortunately, Von Neumann did not say this. Actually, it was attributed to J. Bertrand by Lucian Le Cam on page 165 of his 1990 article. “Maximum Likelihood: An Introduction”, International Statistical Review (1990), 58, 2,pp. 153-171.

      • Richard Juster says:

        I should have included the actual quote: “Give me four parameters and I shall describe an elephant; with five it will wave its trunk.”

      • Carlos Ungil says:

        For what it’s worth, Freeman Dyson told in an interview published in Nature in 2004 that he had the following conversation with Enrico Fermi in 1953:

        I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. He replied, “How many arbitrary parameters did you use for your calculations?” I thought for a moment about our cut-off procedures and said, “Four.” He said, “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” With that, the conversation was over.

        https://www.readcube.com/articles/10.1038/427297a

      • Andrew says:

        Richard:

        Lucian Le Cam was a bit of a bullshitter. His office was down the hall from mine at the University of California. Once I was talking with him about hierarchical models and he said he’d done all of that back in the 1940s when he’d been an applied statistician. I guess he got bored solving all the applied problems in the world so he decided to move to something ore interesting. Too bad he never wrote any of that up, so we mortals had to waste our time rediscovering it from scratch.

  11. Carlos Ungil says:

    I don’t think this study has been mentioned yet around here:

    Estimating the number of infections and the impact of non- pharmaceutical interventions on COVID-19 in 11 European countries

    In this report, we fit a novel Bayesian mechanistic model of the infection cycle to observed deaths in 11 European countries, inferring plausible upper and lower bounds (Bayesian credible intervals) of the total populations infected (attack rates), case detection probabilities, and the reproduction number over time (Rt). We fit the model jointly to COVID-19 data from all these countries to assess whether there is evidence that interventions have so far been successful at reducing Rt below 1, with the strong assumption that particular interventions are achieving a similar impact in different countries and that the efficacy of those interventions remains constant over time. The model is informed more strongly by countries with larger numbers of deaths and which implemented interventions earlier, therefore estimates of recent Rt in countries with more recent interventions are contingent on similar intervention impacts. Data in the coming weeks will enable estimation of country-specific Rt with greater precision.

    https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-Europe-estimates-and-NPI-impact-30-03-2020.pdf

  12. Anoneuoid says:

    from a calculus perspective:

    if

    y = exp(t/c)

    then

    dy/dt = 1/c * exp(t/c) = 1/c * y

    if you test a constant dT/dt people per day then you can follow the curve dy/dt = C(t) dT/dt until C(t) reaches 1, at which point you saturate your testing ability and the most new cases you can get per day is the constant dT/dt

    During this time period graphed, we’ve been below the dT/dt threshold (the slope of the blue lines in the linked graph is bigger than the slope of the red curves). Soon we’ll saturate the testing ability in places like NY and will hit that “we can’t test fast enough” point.

    if you calculate 1/17 * integrate(C(t),t,mar 16, mar 28) you will get the 0.31 found in the new york graph, or close to it.

    […]

    As to the integration problem. you can estimate this integral using a midpoint rule… Compare the slope of the red curve at say mar23 to the slope of the blue curve, I’d guess that the red curve is about 0.3 times the slope of the blue curve… hence the cases ~ 0.3*Tests

    […]

    Without the calculus, and in words, the fact that the cases follows some line over a period of ~ 10 days is down to that this is a *totally* expected property of the exponential function, it’s very well approximated by a line over “short” timeframes, where short is defined as a small multiple of the e-folding time which in this case is about 4-5 days, so if you look at time frames of Mar 16 to Mar 26 you’re talking about ~ 2 e-folding times, and a line should fit that just fine over that time-period using the slope of the exponential curve at the midpoint.

    But what we see is that as the testing flattens out (saturates), the number of new cases is flattening too. It is easier to see with the log y-axis, these curves are practically parallel: https://i.ibb.co/PjDG1KS/testscasesperdaylog.png

    Now % positive has increased from ~10% March 11-18 to closer to 20% the last few days. But other stuff was going on like Trump ordering tests of hospitalized patients to be prioritized on March 23, etc that just as easily explains a small rise in the proportion of positives.

    I guess you gotta plot the output of your equations and make it look like that one for me to see how its supposed to work. I don’t see why the cases curve should be flattening if the infection is spreading exponentially right now. I do understand you can approximate an exponential curve over a few doubling times with a line, but why is it still so closely paralleling the number of new tests?

    • Right, that’s expected behavior of saturating your testing ability. After an initial ramp up, the number of tests you can run in a day is ~ a constant so that the maximum rate that your confirmed cases can increase is this constant… like if 100% of tests are positive right? So as long as you have a curve that’s growing more slowly than this constant, your testing can detect the shape of this curve…

      Now, the number of actual cases increases exponentially… so after a while, let’s say you’ve reached the point where the exponential function needs to increase by 20000 today… but you can only run 10,000 tests… so at this point, your *confirmed* cases can NOT rise faster than 10k per day, and it stops rising exponentially and rises more linearly.

      Once you’re rising linearly, if you plot it on a log scale, you’ll see a logarithmic curve… which is what that curve looks like asymptotically for march 16th onward… so whatever’s going on in that region, it’s saturated the testing capacity and now running tests doesn’t tell us anything much about growth rate of the underlying function anymore.

      • Anoneuoid says:

        Yes, but then the cases/day curve should start approaching the tests/day curve when it flattens. Why would the growth in cases/day start slowing down when only 10-20% were positive and actual cases are growing exponentially?

        I don’t think you will get that behavior if you plot your model.

    • Here’s what you can do to simulate the situation

      The basic idea is you have 20k tests a day you can run. about 4x as many people seek tests as actually have the disease… all the people with the disease (and sufficient symptoms) seeks a test. You detect everyone with sufficient symptoms so long as you have less than 20k test seekers… After you saturate the system, you only detect a fraction of the test seekers, at most all of them, but at best a binomial random number of them based on test seekers vs real cases…

      then plot *red* is the real cases, green is the test seekers, blue is the confirmed cases

      library(ggplot2)

      t = seq(1,40)
      realcases = 100*exp(t/4)
      realincrement = diff(c(0,realcases))

      testseekers = rnorm(NROW(realincrement),4,.25)*realincrement

      maxtests = 20000

      ## now assume that you test *up to* 20k people. if more people are
      ## seeking tests, you test a random subset of the seekers
      ## getting a binomial count of positives for the given frequency

      ntests = rep(0,NROW(t));
      ntests[1] = 100;
      confinc = rep(0,NROW(t));
      confinc[1] = 100;
      for(i in 2:(NROW(t)-1)){
      if(testseekers[i] maxtests){
      confinc[i] = min(realincrement[i],rbinom(1,maxtests,realincrement/testseekers))
      ntests[i] = maxtests
      }
      }

      cumconf = cumsum(confinc)
      cumtests = cumsum(ntests)

      ggplot(data.frame(t=t,conf=cumconf,nt=cumtests,real=realcases))+geom_line(aes(t,cumconf),color=”blue”) + geom_line(aes(t,nt),color=”green”)+ geom_line(aes(t,real),color=”red”) +coord_cartesian(xlim=c(0,35),ylim=c(0,400000));

      ggplot(data.frame(t=t,conf=cumconf,nt=cumtests,real=realcases))+geom_line(aes(t,log(cumconf)),color=”blue”) + geom_line(aes(t,log(nt)),color=”green”)+ geom_line(aes(t,log(real)),color=”red”) +coord_cartesian(xlim=c(0,30),ylim=c(0,log(400000)));

      I think you’ll find it looks just like your curve.

      • Of course that doesn’t work… because of the presence of less than signs… I’ll post it to my blog and link here:

        http://models.street-artists.org/2020/03/30/confusion-about-coronavirus-testing-and-the-role-of-testing-capacity/

      • Anoneuoid says:

        testseekers = rnorm(NROW(realincrement),4,.25)*realincrement

        This assumes that the number of people with symptoms is constantly about 4x the number of new actual cases. Why should it be constant? Instead the proportion of positive “testseekers” (people with fever/cough) should start out very low (most people seeking a test will be flu/etc) and grow to be some substantial proportion as the virus spreads. This amounts to assuming all the other causes of the symptoms are increasing at the same rate as nCoV-19.

        I’ll try to mess with this code to do that.

        • Every time someone gets symptoms their 4 closest associates go out to try to get a test.

          • Or 3 in this case… so that 4 total people seek a test every time someone gets a fever and a cough.

          • Anoneuoid says:

            Ok, but you still need to include the other reasons for having the symptoms. Here there are a constant 10k baseline “testseekers” (due to flu, etc) and then 4x the number of real cases also seeking a test based on contact alone (not sure how accurate that assumption is but ok). There was also some indexing issue with the loop I didn’t feel like figuring out so I just dropped the filler zero from the last spot:
            https://pastebin.com/xSH6Zij5

            This is what looking at cases/tests per day looks like: https://i.ibb.co/w0Y6cpc/perdaysim.png

            Compare to actual: https://i.ibb.co/cr5Yr21/perdayreal.png

            NB: Looking at cumulative just added an extra step so I switched to comparing the perday values.

            I don’t think your model captures the “% positive” aspect of the data. The percent positive moves from 10% to 20% when confirmed cases went from 1550 to 3990 (~2.5x). In the real data the increase in % positive from 10-20% happened over a 140k/2k = 70x increase in positive tests. This is way too slow for your explanation to work.

            • Obviously there are a number of factors going on, my model was just an illustration of the way in which saturation makes a testing system fail to be able to track exponential growth at some point and the failure mechanism is such that the epidemic appears to come under control and grow only linearly…

              I personally simply don’t have any information with which to make a model of real world testing, how those decisions are made it what the factors are determining capacity of policy etc is far too complicated and heterogeneous to just guess at

              • Anoneuoid says:

                Yes, it is just a proof of concept. We are seeing if it can capture gross aspects of the data. Still I don’t think explaining the 80-90% negative test results as due to asymptomatic contact cases vs a baseline of sick people is a good one. As far as I can tell most tests are of people with some kind of symptoms. And the constant proportion of positives in your model rests on that assumption.

              • >Still I don’t think explaining the 80-90% negative test results as due to asymptomatic contact cases vs a baseline of sick people is a good one

                It’s 70% in NY right?

                And in a week it’ll be 40% in new york… and in two weeks new york will start to get itself under control through the stay-at-home orders…

                basically we’ve got complicated dynamics involving lots of changing things: baseline sick people, percentage of sick people who have COVID, testing capacity rollout, changing criteria for testing… etc etc

                All of which are reasons why it’s STUPID to keep doing testing this way. At LEAST half of testing should be devoted to prevalence estimation among varying geographic regions, repeated in panels, through time. This testing need not be via FDA approved diagnostic tests… Research grade tests performed at universities would be far preferable and cheaper and there’s plenty of shuttered capacity out there already.

              • Anoneuoid says:

                And in a week it’ll be 40% in new york… and in two weeks new york will start to get itself under control through the stay-at-home orders…

                New York looks like this:
                https://i.ibb.co/FbsngJ2/nypos.png

                So it is increasing there… we will see. I don’t see where you are getting your estimate for the rate of increase though.

                Also, btw the smoking thing was pretty much just confirmed in the US: https://www.cdc.gov/mmwr/volumes/69/wr/mm6913e1.htm

                They basically sampled an entire nursing home and saw 1/23 people who tested positive were current smokers vs 7/53 who tested negative. So smokers were about 3x less common than expected just like in the Chinese data.

          • Anoneuoid says:

            And here it is with log scale (only the topleft panel is changed): https://i.ibb.co/yqZYWB3/perdaysimlog.png

            Actual data: https://i.ibb.co/PjDG1KS/testscasesperdaylog.png

            In the actual data the curves look way closer to parallel. Maybe you can play with the parameters (maxtest, baseline testseekers, etc) to get those three plots all looking similar?

          • Anoneuoid says:

            Also, I really doubt that 75% of tests in the US are asymptomatic contact cases given how difficult people have reported it is to get tested and all the guidelines say not to get tested unless you are sick.

            • Rahul says:

              Another trend that seems interesting is the whole BCG vaccination immunity aspect.

              It would be interesting to see if there’s any way to get at that data by dissecting the available information.

              Of course, lots of assumptions will be needed. Good spot to use some priors.

              Note I am not talking about the new BCG Vaccine trial. That’s a different story. But we have a cohort of people with childhood immunization against BCG. Question is whether that’s causing any difference to the morbidity or mortality.

              • Anoneuoid says:

                I saw some news coverage of that but no paper. From what I read I didn’t see any accounting for the number of tests in each country so the numbers they cited dont mean anything to me.

                Every website and media piece presenting numbers of cases/deaths without tests is spreading misinformation as far as I’m concerned. And even number of tests is weak info without knowing the testing criteria and other details.

  13. Doesn’t it seem that the simplest model that captures reality requires a fully hierarchal approach and then some, where the top of the hierarchy is clusters of people with a distribution of sizes and “contagiousness”, and then an SIR or an SEIR model is applied to each cluster individually, and something very close to an SIR or SEIR is done to simulate the probability of the virus jumping from one cluster to another?

Leave a Reply