University of Washington biostatistician unhappy with ever-changing University of Washington coronavirus projections

The University of Washington in Seattle is a big place.

It includes the Institute for Health Metrics and Evaluation (IHME), which has produced a widely-circulated and widely-criticized coronavirus model. As we’ve discussed, the IHME model is essentially a curve-fitting exercise that makes projections using the second derivative of the time trend on the log scale. It is what it is; unfortunately the methods are not so transparent and their forecasts keep changing, not just because new data come in but also because they keep rejiggering their model. Model improvement is always a possibility; on important problems we’re always working near Cantor’s corner. But if your model is a hot steaming mess and you keep having to add new terms to keep it under control, then that might be a sign that curve fitting isn’t doing the job.

Anyway, back to the University of Washington. In addition to the doctors, economists, and others who staff the aforementioned institute, this university also has a biostatistics department. In that department is Ruth Etzioni, who does cancer modeling and longitudinal modeling more generally. And she’s not happy with the IHME report.

Etzioni writes:

As a long-time modeler, the latest update from Chris Murray and the IHME model makes me [Etzioni] cringe.

On May 4, the IHME called a press conference to release the results of their model update which showed a staggering departure from their prior predictions of about 60,000 deaths.

The new prediction through August is more than double that, at over 130,000 deaths. Murray, the institute’s director and chief model spokesman told reporters that the update had taken mobility data into account which captured an uptick in movement in re-opening states.

According to Politico, the primary reason given by Murray for the increase was many states’ “premature relaxation of social distancing.” . . .

It makes a nice story, to tell the world that the reason your model’s predictions have changed is because the population’s behavior has changed. Indeed, that was the explanation given by IHME for their model revising its early death toll of about 90,000 dramatically downward. Then, Dr Murray explained that the change in predictions showed that social distancing had been a wild success – better than we could ever have imagined. That, in turn, led to howls that we had over-reacted by shutting down and staying home. I said then that that interpretation was wrong and misleading and placed far too much credibility in the early predictions. The early model results, based on a massive assumption about the shape of the mortality curve, and driven by thin local data, were not to be compared with later predictions based on major updates to both its inputs and its structure.

The same thing is happening now. A quick skim of the IHME’s model updates site leads one to an eye-glazing list of changes, including some that have nothing to do with mobility and everything to do with improving the fit to the past. Here is one that truly raised my eyebrows: “Since our initial release, we have increased the number of multi-Gaussian distribution weights that inform our death model’s predictions for epidemic peaks and downward trends. As of today’s release, we are including 29 elements… this expansion now allows for longer epidemic peaks and tails, such that daily COVID-19 deaths are not predicted to fall as steeply as in previous releases.” This change alters the shape of the assumed mortality curve so it does not go down as fast; it alone could explain a substantial portion of the inflation in the revised mortality predictions.

The proof is in the Washington State pudding. The IHME no longer predicting that we will have less than one case per million on May 28th and can therefore safely reopen, as it did in its previous incarnation. But little has changed here on the policy front and according to Google Mobility reports through April.

I am not aiming my comments at the IHME modeling team, which I imagine is sincerely doing its best to deliver results that match the data and produce ever-more-complex predictions of the future. They are working overtime to fit the rapidly evolving and imperfect data, marching to an impossible drumbeat of deadlines. To their credit, the update does note that both model changes and increased mobility projections could account for the change in predictions. But that never made it into the headlines. And that is a problem of transparency.

Transparency begins in the sincere effort by those who communicate models to make sure that they are properly interpreted by the policymakers and public that are using them. The IHME pays lip service to transparency by documenting their model’s updates on their website. But their pages-long description is chock full of technical fine print and is hard to understand, even for a seasoned modeler like myself. A key part of transparency is acknowledging your model’s limitations, and uncertainties. This is never front and center in the IHME’s updates. It needs to be.

Etzioni concludes:

This epidemic took root and grew massively in every state from miniscule beginnings. We should all be sobered by our real-time experience of exponential growth. If there is an ambient prevalence of more than a handful of cases in any state, then anything that increases the potential for transmission will lead to a re-growth. We do not need a model to be able to predict that. But, as we plan for how to reopen in each state of our union, we need to know what extent of regrowth we can manage. And models can help us with that.

When and how much we can reopen will depend on the surveillance and containment infrastructure that we put in place to control upticks and outbreaks. I am convinced that models can help to think clearly about complex policy questions about the balance between changes that increase transmission and measures to contain it. Models, along with other data and evidence, can guide us towards making sensible policy decisions. I have seen this happen time and time again in my work advising national cancer screening policy panels. But as modelers, we have a responsibility. We must make sure that the key caveats of our work find their way into the headlines and are not relegated to the fine print.

I agree. There is a fundamental difficulty here in that one thing that modelers have to offer is their expertise in modeling. If you make your model too transparent, then anyone can run it! And then what happened to your expertise?

I’m not saying that anyone is making their methods obscure on purpose. Not at all. Rather, the challenge is that it’s hard to write things up. Unless you have a good workflow, obscurity comes by default, and it takes extra effort to be transparent. That’s one advantage of using Stan.

Anyway, transparency takes effort, so you’re only gonna do it if you have an incentive to do so. One incentive is that if your work is transparent, it will be easier for other people to help you. Another advantage is that your work might ultimately be more influential, as other people can take your code and alter it for their purposes. These are big advantages in the new world of open science, but not everyone has moved to this world.

118 thoughts on “University of Washington biostatistician unhappy with ever-changing University of Washington coronavirus projections

  1. “When and how much we can reopen will depend on the surveillance and containment infrastructure that we put in place to control upticks and outbreaks. I am convinced that models can help to think clearly about complex policy questions about the balance between changes that increase transmission and measures to contain it. Models, along with other data and evidence, can guide us towards making sensible policy decisions. I have seen this happen time and time again in my work advising national cancer screening policy panels. But as modelers, we have a responsibility. We must make sure that the key caveats of our work find their way into the headlines and are not relegated to the fine print.”

    Yes!

  2. Oh goody!

    ‘Here is one that truly raised my eyebrows: “Since our initial release, we have increased the number of multi-Gaussian distribution weights that inform our death model’s predictions for epidemic peaks and downward trends. As of today’s release, we are including 29 elements… this expansion now allows for longer epidemic peaks and tails, such that daily COVID-19 deaths are not predicted to fall as steeply as in previous releases.”

    This change alters the shape of the assumed mortality curve so it does not go down as fast; it alone could explain a substantial portion of the inflation in the revised mortality predictions.’

    I glommed onto that just after they published their update document, and posted about it in a couple of places. And not long ago on the other thread here.

    Now I wish the author you’re quoting had included this from on in their update document:

    “Overall, these modeling improvements have resulted in considerably higher projections of cumulative COVID-19 deaths through August, primarily due to longer peaks and slower declines for locations that have passed their peaks.”

    This is in direct contrast to what they’re implying in public, that the upward adjustment is largely due to relaxed social distancing measures.

  3. You’re too polite. The IHME thing was pure garbage from the start. As a modeller myself, it makes me cringe to think about it. What happened to the idea of expertise, that such obvious nonsense could ever get a toe-hold in the field?

      • The distrust of experts among Americans goes back to at least the 19th century.

        “On May 4, the IHME called a press conference to release the results of their model update which showed a staggering departure from their prior predictions of about 60,000 deaths.”

        I am not one to judge any model, particularly given the uncertainties of this time. But as for press conferences, the phrase, “Not ready for prime time” comes to mind. As for caveats finding their way into the headlines, Rotsa Ruck! As a group. editors are innumerate. Don’t give them precise figures if you can’t justify that precision. Be vague where vagueness is required. Just say something like, “Things could get 2 or 3 times worse than we thought. We don’t really know.” Let them make a headline out of that.

      • The problem with experts is that they’re often wrong.

        The great thing about experts is they’re everything you want them to be when you want them to be it. In 2009-2010, the blindsided economic experts took a pummeling from the media and social do-gooders alike. It was all their fault, they completely missed the boat!! But soon everyone and saw how useful they might be for their current cause and went right back to quoting and promoting them just as unquestioningly as ever.

        • Real experts are not often wrong. The problem is that any old guy with a degree from Harvard and some funding gets called an expert.

        • What’s the old saying, discretion is the better part of valour?

          The better part of expertise is having the discretion to not make claims until you have done some real checking. Including checking them with people who will readily point out the flaws in your methods and/or reasoning.

          In statistical terms, the ability and willingness to recognize when your data and methods do NOT support a “positive claim” (in Andrew’s recent formulation) is the most important part of being an expert. Anyone with an above-average IQ and some functional numeracy can learn to fit models in Stan or R. Maybe even become extremely proficient at the procedures. But it seems there’s a paucity of those who can do those things and at the end put their hands up and “Welp, looks like we just can’t answer that question at this juncture”.

    • I am agreeing with all comments critical of the Murray Model. I am actually sick and tired of CNN bringing the IHME Murray Model thing into their evening programming as an Oracle into the future of COVID-19 disease incidence and projected fatalities. I wish they would STOP. And STOP now – please – for the love of Jesus. It’s like weather forecasting. The Murray Model thing is discussed on CNN as a forecast tool, and whadya know, once we get to that forecast projected time point or milestone, the model is WRONG!! We go figure – who knew? Well the answer is WE ALL KNEW. If I got modeling wrong, consistently wrong, every week, and I kept revising my figures to match the data de jour, I WOULD BE FIRED !! Please, let’s stop bending our minds around the Murray Model thing, and focus on more reliable modeling and the real data. I’m fed up with it. And I’m sick of seeing the Murray brigade show up on CNN with their chirpy and cheerful conclusions, which we all know is wrong the moment it is aired LIVE ON CNN.

  4. Andrew:

    “Another advantage is that your work might ultimately be more influential, as other people can take your code and alter it for their purposes. These are big advantages in the new world of open science, but not everyone has moved to this world.”

    When they put out the first version of their model, they did put up the source in github.

    But they don’t appear to have committed new code for awhile. Certainly not for a couple of iterations, and certainly not for this version which is very close to a rewrite as they’ve bolted an SEIR model into it which is informed by actual deaths data appended with the curve fitting death model’s projections and then used to build projections going forward. Though it’s not clear to me exactly where the outputs that form the product are coming from.

  5. Only tangentially related, but I’m not sure where else to post and wanted to check in with you guys since I feel like this has shades of the birthday analysis from the cover of BDA, but what is up with Tuesdays and COVID-19?

    Take a look at daily deaths from the past 3-4 weeks. You’ll see a series of waves with 2-3 lower counts than a jump and 4-5 days of higher counts, then a decline on Saturday or Sunday. The jump is always on Tuesday.

    USA – April 4 (Sun): 1405, April 5 (Mon): 1505, April 5 (Tues): 2288; April 12 (Sun) 1726, April 13 (Mon) 1727, April 14 (Tues) 2566; April 19 (Sun) 1570, April 20 (Mon) 1952), April 21 (Tues) 2698; April (26) 1156, April 27 (Mon) 1383, April 28 (Tues) 2470; May 3 (Sun) 1153, May 4 (Mon) 1324, Today (so far) 2133.

    The UK has the same thing going on.

    Is this just lagged counts from the people who died over the weekend? Are Tuesdays just particularly vicious?

    • > Is this just lagged counts from the people who died over the weekend? Are Tuesdays just particularly vicious?

      The cause seems to be the former — its an artificial jump from lagged counting of deaths from late the past week. If memory serves right, Governors of NJ and NY have both stated this in passing during press-conferences.

    • Yes, there’s clearly a weekly cycle in some of the death and case reporting, likely due work schedules and weekends. If you dig into the statewide data compiled by the Atlantic, you can see that it affects some states more than others, so it’s a bit smeared out in the countrywide numbers.

      The clearest illustration I’ve seen if it is in Sweden’s numbers.

    • > Depending on control measures and
      other factors, cases may come in waves
      of different heights (with high waves
      signaling major impact) and in different
      intervals. We present 3 possibilities.

      Scenario 1: The first wave of COVID-19 in spring 2020 is followed by a series of repetitive smaller waves that occur through the summer and then consistently over a 1- to 2-year period, gradually diminishing sometime in 2021. The occurrence of these waves may vary geographically and may depend on what mitigation measures are in place and how they are eased. Depending on the height of the wave peaks, this scenario could require periodic reinstitution and subsequent relaxation of mitigation measures over the next 1 to 2 years.

      ¤ Scenario 2: The first wave of COVID-19 in spring
      2020 is followed by a larger wave in the fall or winter of 2020 and one or more smaller subsequent waves in 2021. This pattern will require the reinstitution of mitigation measures in the fall in an attempt to drive down spread of infection and prevent healthcare systems from being overwhelmed. This pattern is similar to what was seen with the 1918-19 pandemic (CDC 2018). During that pandemic, a small wave began in March 1918 and subsided during the summer months. A much larger peak then occurred in the fall of 1918. A third peak occurred during the winter and spring of 1919; that wave subsided in the summer of 1919, signaling the end of the pandemic. The 1957-58 pandemic followed a similar pattern, with a smaller spring wave followed by a much larger fall wave (Saunders-Hastings 2016). Successive smaller waves continued to occur for several years (Miller 2009). The 2009-10 pandemic also followed a pattern of a spring wave followed by a larger fall wave (Saunders-Hastings 2016).

      ¤ Scenario 3: The first wave of COVID-19 in spring 2020 is followed by a “slow burn” of ongoing transmission and case occurrence, but without a clear wave pattern. Again, this pattern may vary somewhat geographically and may be influenced by the degree of mitigation measures in place in various areas. While this third pattern was not seen with past influenza pandemics, it remains a possibility for COVID-19. This third scenario likely would not require the reinstitution of mitigation measures, although cases and deaths will continue to occur.

  6. What I would like to see for any modeling group is to have very transparent but accessible report of the performance of their models. A grid of plots including prediction using multiple versions of their models (not just the latest version, and by model I mean modeling approach), trained using data up to multiple time points (2 months ago, 1 month ago, today, etc. instead of just today), under different assumptions (including policy scenarios), etc. Basically out of sample validation.

    Are there modelers doing some version of that?

  7. The field of economic modeling (predicting the future) attracts lots of money even though as far as I can see any success is only like the chimps throwing darts at the dartboard. Some random successes. Yet “ignoring expert economists” generally gets treated with derision.

    Epidemiology seems a lot simpler by comparison. But it has the problem of exponential growth. (Most people on this blog are way more maths literate than me, but for those who aren’t see the simple thought experiment at the end). Changing a few unknown parameters by even small amounts can produce 1000x more or less cases/deaths within a month. And that’s changing them within very reasonable bounds.

    And again, “ignoring expert epidemiology” seems to likewise be treated as being anti-science. Or in other quarters the large changes in model outputs from one month to the next are treated as examples of how little these epidemiologists know.

    How about:

    “We have very little idea what will happen but it will be a lot worse if we do this rather than that.”

    I suggest that this group would get fired and replaced with a new group with lots of maths and charts. Somehow certainty is a bankable trait, even when based on nothing.

    —-
    Example – every person on average infects 2 other people every 4 days. How many people will be infected within a month? Now we slightly adjust these parameters to infecting 3 other people very 3 days. How many more people will be infected within a month?
    For non maths people I’m guessing they might be thinking 2x, 3x maybe. But it’s more like 500-1000x.

    • Just to follow up with how difficult any kind of solid numbers are..

      We currently have very little clarity on the strength of 3 different methods of transmission.

      – Is it mostly the virus in small water droplets, which travel usually less than 2 meters?
      – Aerosols, which can float around for hours, be recycled by the air conditioning system?
      – Virus landing on surfaces and being picked up by the next victim?

      Not knowing the strengths of these mechanisms means that the effect of changing specific behaviors is mostly unknown.

      Example recent papers:

      Turbulent Gas Clouds and Respiratory Pathogen Emissions – Potential Implications for Reducing Transmission of COVID-19
      https://jamanetwork.com/journals/jama/fullarticle/2763852

      Prolonged viability of SARS-CoV-2 in fomites
      https://files.osf.io/v1/resources/7etga/providers/osfstorage/5e9c0463d697350662be334e?action=download&direct&version=1

      • Stevec says: “We currently have very little clarity on the strength of 3 different methods of transmission.”

        FINALLY! This is the main question!!!

        But we have a lot of clarity on this:

        Cruise ships. Nursing homes. Health care workers. Immigrants and people in large families. Places with small rooms and close quarters generate high numbers of infected people.

        Throughout the SIP orders, grocery, hardware, office supply and drug stores have been open with apparent no ill effects. Places with high ceilings and high airflow don’t generate large numbers of infected people.

        This tells us that the primary mode of transmission isn’t drops left on surfaces: it’s aerosol transmission in close quarters with poor circulation.

        It also tells how we can move forward in relative safety. First people who work outside should be able to continue working. Second all indoor spaces should do everything possible to increase air circulation and replacement. Third most stores should be able to open at limited capacity with social distancing maintained. Forth Offices should be able to reopen with limited capacity – meaning people should continue to work from home when possible.

        OTOH, restaurants, hair salons, barber shops, etc – typically small spaces with many people, and where employees and customers work closely – should only open with very limited capacity, limited employees, increased air flow, and mask requirements.

        Last but not least, social gatherings should be restricted tightly for at least a while longer and concerts, indoor sports events, grad ceremonies etc should be cancelled or have attendance severely restricted.

        If we continue thinking through our environment in this way we should be able to restore our economy without an explosion of infections.

        • It’s not really clear yet if public transportation is a major infection environment. People are in close quarters in public transportation, but they don’t remain that way for many hours, as would happen in a retirement home or cruise ship; they change over rapidly. Also with people frequently moving and doors opening and closing, air flow is probably much stronger in public transit than in other crowded environments. But initially at least public transit should have restricted capacity.

        • I remember from the beginning when tracing of infection chains was still possible that in China (and Germany as well) some cases could be traced to bus transportation. The persons were not in close proximity so that it was concluded the infection must have happened due to aerosol transmission.

          There was a study published (iirc) that found viable virus loads in aerosols “hanging” in the air even after ~2 hours.

          Otoh infections did not happen even if people inevitably should have inhaled aerosols like in families or cab transportation.

        • Jakob Said:

          >some cases could be traced to bus transportation.
          Sure. I believe that. I’m sure you *can* catch it anywhere at any time. The question is what’s the probability? I think it’s clear that transit has at least an elevated probability.

          >virus loads in aerosols “hanging” in the air even after ~2 hours.
          This of course requires that the air is still. If, for example, someone’s breathing out virus particles in Home Depot with strong air flow, the high concentrations they breath out could disperse fairly quickly especially when the ceiling is 50ft high

          >Otoh infections did not happen even if people inevitably should have inhaled aerosols
          I believe that too, again probability depends on many things: viral concentration, air flow, duration of exposure, and individual variation.

        • > Example recent papers

          > FINALLY!

          My reaction was similar. This sort of mechanistic information is really useful for the day to day — definitely more so as an individual than these big projections.

          Isn’t everyone kinda wishful-thinking this will all blow over in the Summer (at least temporarily)? I know I am! Do we know anything about the effects of temperature/humidity on the lifespan of the virus in air? Isn’t that was makes the flu less of a summer problem?

          DLakeland posted a video of someone breathing with/without a mask that made a really compelling case for mask wearing. I don’t think I really understood what a mask does in terms of airflow until I watched it. I guess it should be obvious? I dunno. But any “Wear a Mask” PSAs should be occompanied by such videos/photos.

          Also this post definitely affected my behavior: https://statmodeling.stat.columbia.edu/2020/03/17/do-these-data-suggest-that-ups-amazon-etc-should-be-quarantining-packages/ . The second paper linked “Prolonged viability of SARS-CoV-2 in fomites”, seems to say the virus lives a lot longer on plastic than the 3/17 post! Oh no!

          > Cruise ships. Nursing homes. Health care workers. Immigrants and people in large families

          > grocery, hardware, office supply and drug stores have been open with apparent no ill effects

          Well the other difference is that in the first case, the people involved are staying in the same place. In the second case lots of people are transient.

          > If we continue thinking through our environment in this way we should be able to restore our economy without an explosion of infections.

          Yeah I agree. This mechanistic stuff seems really valuable. Maybe this information is out there and we’re just drinking from the wrong firehose? It’s really hard to know how to act if we’re trying to think about covid19 like a big 1/N game of Russian roulette.

        • “I don’t think I really understood what a mask does in terms of airflow until I watched it”

          Yeah the mask retards the velocity of exhalation and even if the opening sizes are larger than the virus, it could trap some viral particles.

          Experts didn’t pick up on the environmental difference between use by health care workers and use by civilians. For health care workers the opening size makes a huge difference because the primary concern is inhalation from a high concentration environment.

          For civilians the opening size is less important because the mask is to prevent *exhalation* not inhalation, and during exhalation the mask retards air flow as well as trapping some particles; so the spread is lower as well as the number of viral particles emitted.

          But using non-medical masks in indoor gatherings would fail as the experts have claimed, because the longer people stay together the greater the concentration of particles in the air and the more the inhalation becomes relevant.

  8. Andrew wrote: “If you make your model too transparent, then anyone can run it! And then what happened to your expertise?”

    Andrew also wrote: “I’m not saying that anyone is making their methods obscure on purpose. Not at all.”

    I think what you are saying is that researchers are responding to this incentive (and others) by choosing to passively allow misleading results out into the world, but they’re not choosing to actively hide model details. I think we should call this the Model Trolley Problem:

    If you were to realize that your own model is flawed and potentially harmful to public interest, would you be willing to actively hide details of your model to protect the work? Would you be willing to passively allow your results to go out with less detail to the same effect?

    Like the actual Trolley Problem, people die either way, but the person making the choice is likely to see themselves as less responsible for the consequences when their involvement is mostly passive.

    We’ve discussed many research groups that have answered the passive version of the dilemma with an emphatic “Yes!” Would they have the scruples to answer the active version negatively? Who knows? Researchers rarely have to make that choice because there are so many loopholes built into our standard practices that almost any unethical end can be achieved with total deniability. To coin a phrase, all that is necessary for bad science to triumph is for good researchers to do nothing unusual.

    • Very interesting. If I understand it, they’re refuting the popular heterodox notion that ventilators are killing people and just suggesting that a better assembly line for ventilating patients saves lives. Ah… But where’s the RCT? Without it, aren’t results supposed to be meaningless?

      • “Ah… But where’s the RCT? Without it, aren’t results supposed to be meaningless?”

        I wouldn’t say that “results are supposed to be meaningless” without an RCT. I’d just say that they are preliminary — and might be evidence for trying an RCT.

      • What I had heard (from non-medical media) is that the *way* NYC was using ventilators early on was killing people – not that ventilators shouldn’t be used at all, but that some patients who didn’t really need ventilators were being put on them and too-high, lung-damaging flows/volumes were used.

        I don’t know if that’s true, given the tendency of media to confuse scientific/medical details, but I don’t think it’s incompatible with this report – that document says “Patients in our cohort were managed with established ARDS therapies including low tidal volume ventilation […]”.

        • Also, when you are putting more people than you should on ventilators you can’t micromanage them as well as when you only have a few.

        • True. And Boston might also have better outcomes because the health workers are less overworked, and therefore making better decisions, than in NYC at its peak?

          Also, the Boston report says that “prone ventilation” was used. Proning was another thing that was reported in the (non-medical) media as something that was missed early on.

  9. “But, as we plan for how to reopen in each state of our union, we need to know what extent of regrowth we can manage.”

    I’m no statistician but just a visitor following a link form another site. For the vast population of non-healthcare sector workers and business owners, I would imagine it’s actually how much regrowth we, as a complex society that has made it through many bad viral outbreaks, can “accept” rather than “manage.

    Another layman’s question: Do we really know for certain what increased mobility due to opening up will in fact do, or might a modeler just assume that, knowing it would deliver the increase in cases he, possibly subconsciously, desires?

    • Paul:

      We can’t know for certain or even to a good approximation what might happen next. That’s one reason why it’s important for the models to be transparent, so that (a) we understand the assumptions leading to their conclusions, and (b) when, inevitably, the model’s predictions fail, we can understand what happened.

      • Just another very lay person here, also following a link from another site – hmm, a trend? I was interested to read your comment on model transparency.

        Here in Illinois, our governor claims to be using several models in his decision making process but declines to identify details about those models and the researchers doing the work. There’s some evidence that they are modellers at the University of Chicago and the University of Illinois at Chicago. Academics there have declined to discuss their models with the public even in general terms, let alone detailed ones.

        I believe this is bad public policy for a few reasons: First, it prevents other qualified modellers from offering criticisms or improvements.

        Second, it prevents the public from understanding even in a general sense the basis for the governor’s decisions. Ironically, that works to the governor’s disadvantage because it undermines public support for those decisions by making them seem capricious or ill-thought-out.

        Third, it drives a wedge between academics and the general public. The general public hears, “You’re too ignorant to understand our work so we won’t even bother to try to explain it.” Then some of that public responds, “We’re not gonna let a bunch of so-called experts push us around!” and refuses to follow even well-supported science. Or in finest Scott Walker fashion, declares, “If you’re gonna talk to us like that, we’ll show you! We’ll cut your budget and salaries!”

        • Tj:

          I agree. As I wrote in a post last month, the war against coronavirus is different than a war against a human enemy, as there is no need for secrecy or deception.

        • There is no need for secrecy or deception unless one is intent on using the virus as an excuse for a totally different war.

        • It’s not that vague to me.

          Every day in our editorial pages I see another cause grasping on to the COVID coattails. No one wants to miss out on the massive cash horde from the fed and the states!! Git yer hand out now before it’s too late!!!

      • Transparency is one thing, broadcast to the general public is another. The general public do not, as a rule, have the background or experience to understand all of the assumptions of the models, and either do not expect expert predictions to fail and are disappointed when they do, or have come to expect expert predictions to fail, and when they do, blame the experts.

        • On transparency – and broadcasting to the public:

          –snip–

          Hours after Doug Ducey, the Republican governor of Arizona, accelerated plans to reopen businesses, saying the state was “headed in the right direction,” his administration halted the work of a team of experts projecting it was on a different – and much grimmer – course.

          […]

          “We’ve been asked by Department leadership to ‘pause’ all current work on projections and modeling,” Steven Bailey, the bureau chief for public health statistics at the Arizona Department of Health Services, wrote to the modeling team, composed of professionals from Arizona State University and the University of Arizona, according to email correspondence reviewed by The Washington Post.

          The move to sideline academic experts in the middle of the pandemic reflects growing friction between plans to resume economic activity and the analysis of epidemiologists that underscores the dangers of rolling back restrictions. Officials in Arizona said they would rely on “real-time” information, as well as modeling conducted by federal agencies, which is not released publicly.

          […]

          The Arizona health department was pulling back “the special data sets which have been shared under this public health emergency effort,” according to the Monday email from Bailey, which was first reported by an ABC affiliate in Phoenix.

          […]

          Going forward, Arizona will use modeling developed by the Federal Emergency Management Agency and the Centers for Disease Control and Prevention that “ensures our hospitals have capacity for any situation,” Ptak said.

          But Humble said the state is eluding accountability by relying on nonpublic modeling.

          https://www.thehour.com/news/article/Arizona-halts-work-of-experts-predicting-a-later-15252336.php

    • “Do we really know for certain what increased mobility due to opening up will in fact do”

      We don’t have the foggiest idea. All we know is that if you keep people apart they can’t transmit the virus. How they actually do transmit the virus when they’re not kept apart is irrelevant in models AFAIK. (enlightenment may await me but I don’t think they’re quite down to modeling the transmission rate in sports bars vs. brew pubs, at least not for *this* disease)

      I suspect in most models it’s either explicitly or implicitly assumed that there is a constant transmission rate for all activities in the area in question that can vary over time. That’s what makes this such an impressive virus: it’s geographic knowledge is amazing! It’s hip to the political meaning of the 49th parallel and the St Lawrence River (US vs Canada); the crest of the Rockies (Idaho and Montana); the now-abandoned course of the Mississippi River (Arkansas, Tennessee, Mississippi and Louisiana); and the Rio Grande (US Mexico), not to mention any number of other real geographic features and imaginary geographic lines.

      • You might want to rephrase the idea of a “constant transmission rate that can vary over time”.
        And I’m sure nobody thinks that the base transmission rate for a state is caused by its geography.

        • Might have been sarcasm? :)

          There is evidence for the superspreaders – like in Germany’s Gangelt case or in Singapore’s recent outbreak – so mobility in combination with close proximity are still conditions where transmission is very probable. Usage of face masks should/could help, but I’d assume it is of limited impact if the distances < 6 ft. and fast, constant air exchange isn't ensured.

        • Mendel Says:

          “You might want to rephrase the idea of a “constant transmission rate that can vary over time”.”

          Fair enough :) Transmission rate that’s a function of time only, and not a function of the environmental factors that actually control it.

          “And I’m sure nobody thinks that the base transmission rate for a state is caused by its geography.”

          Then why are they modelling by states, countries and cities? I was being facetious but the reason they’re modelling on political boundaries is that they’re seeking outcomes controlled by government interventions, which is introducing a very strong bias and completely irrelevant bias. It’s safe to say transmission rates are higher in Seattle WA than in Dusty WA, but the models blend these two places together as though we could step from one to the other.

        • OK, well not government interventions aren’t irrelevant but it’s definitely problematic to assume it’s the only factor.

        • “OK, well not government interventions aren’t irrelevant but it’s definitely problematic to assume it’s the only factor.”

          Well, I guess it’s a good thing then that the IHME model doesn’t assume it’s the only factor, right?

        • Don’t assume intent when laziness can explain it?
          They’re probably modeling on political boundaries because the source data is aligned on political boundaries by default. The actual differences are caused by demographic variations etc.
          (Why is the transmission rate higher in Seattle than in Dusty?)

        • Martha (Smith)

          Mendel said, “And I’m sure nobody thinks that the base transmission rate for a state is caused by its geography.”

          Not “caused by” — but “influenced by” or “affected by” or “partially dependent on” sure fit the situation.

  10. I’m not seeing a huge contradiction in adjusting the model for longer tails and saying “it’s because social distancing isn’t as strong as we assumed”. The original model here is the published data from Wuhan, where cases and deaths dropped comparatively sharply due to a rigorous lockdown and big-brother-like contact tracing and concomitant isolation, with R estimated at 0.5 or below in this phase. No Western country has been able to emulate that, (four-day) R=0.7 seems the best that Germany can achieve.
    So, less efficacy of social distancing leads to a model that accounts for longer tails, and relaxing the measures sooner exacerbates that. So the message isn’t exactly wrong, just oversimplified. We’re seeing lots of messaging on social media that is oversimplified and false; so something that is oversimplified and mostly correct looks acceptable these days.

    That said, there ought to be another level of reporting that’s accessible to an educated, non-specialist audience. That’s the kind of transparency we also need.

    • “I’m not seeing a huge contradiction in adjusting the model for longer tails and saying “it’s because social distancing isn’t as strong as we assumed”.”

      The contradiction is between their public statements and what they’ve said in their somewhat technical update document. The latter clearly states that changing the function they fit to the death data was the primary cause.

      California is a great example. Their projection of total deaths for California more than doubled (the new projection is 119% of the May 3rd model’s projection), and there is ZERO change over the life of the projection of the social distancing metric. ZERO. It is locked in at a 50% reduction in mobility from the pre-social distancing data.

      So obviously less social distancing is not the source of doubling. It is only due to the change in the model itself.

      You can go and see that here:

      https://covid19.healthdata.org/united-states-of-america/california

      • There’s been some sort of weird feedback loop in the IMHE stuff almost from the start. They made a laughably bad initial curve fitting choice and since then it’s like they just try arbitrary changes and ask their intended consumers (state and federal agencies?) “Does this look better?”. If not, they try something else more or less at random. Or so it appears.

    • I agree with about all you have written, Mendel. I think that masks were mandated in China as well.

      There are good commentaries coming out of Europe. Trust or lack of trust in expertise is an issue no doubt. What would a conflicts of interests free reporting look like though? It is marginalized for a variety of reasons.

  11. This week The Economist had a column discussing the evolution of climate modeling and I was struck by the parallels with COVID modeling. Both are complex systems (I’m thinking their complexities are similar, within an order of magnitude on some measure). Climate modeling appears to have vastly larger amounts of data, and data that is generally of higher quality (since much of it consists of physical measurements, which while subject to measurement error, seem to me to be less subject to error than measuring things like cause of death). The column describes how decades of climate modeling have resulted in several collaborative international efforts to build models – and the models have considerable uncertainty in their forecasts. But the models are useful (if the uncertainty is appropriately recognized).

    It seems to me that we are trying to do with COVID what has taken decades with climate modeling, but to accomplish this in weeks, not decades. The pressing timetable is understandable, as the decisions required must be far more accelerated. However, the modeling is not close to the type of consensus achieved with climate models – and, even the latter is far from lacking in controversy. I would suggest that the models are not up to the task required for dealing with this pandemic and the necessary decisions that need to be made.

    Of course, that does not mean we should abandon models. But it seems to me that any model that begins with an assumed IFR or an assumed R0, is not worth examining any further. At the minimum, a model should begin with ranges of plausible estimates of key parameters, along with documentation of the sources. Anything less becomes an exercise if navel-gazing modeling. Exponential curves diverge remarkably over fairly short time frames. I’m not sure there is much contributed by each new study and each new iteration that simply changes some assumptions and shows a new set of highly granular forecasts.

    I think we need to distinguish between the ripeness of the moment for exploring pandemic modeling and the potential use of these models to inform decisions. The former is to be celebrated – what a great time for mathematical modeling to be explored and appreciated. It is a great time to be teaching this, and learning this. But that does not mean that these efforts can be productive for the immediate decisions that are required. I think what is missing is more research on how to make decisions under such extreme uncertain conditions, rather than additional models that reflect how little we actually know. I’m thinking that more attention needs to be paid to ideas such as fragility, precautionary principles, irreversible decisions, and other concepts that emphasize the precariousness of the moment. The pursuit of the ‘one model to rule them all’ seems to lack the humility this requires. But the potential publicity and recognition, combined with the political receptiveness of the audience, seems to be an irresistible attraction.

    • Dale –

      I think there’s another similarity, which isn’t really a function of the models themselves but of how people use the models to confirm biases and fight tribal wars.

      In the climate world, some people look only at the high end protections if models, basically ignore the low end projections and confidence intervals, misconstrue the basic nature of the models’ output (either due to a lack of understanding or due to motivated reasoning) as predictions (and not projections that are contingent on various parameters), and then conclude that the models are useless. Often they go on to reverse engineer malignant motivations in the part of modelers.

      Many them go even further to denigrate the very exercise of modeling, effectively ignoring their own tacit acceptance of modeling output in their everyday lives in non-politicized contexts, and indeed even the reality that basically the only way anyone understands anything is through a reliance on the construction of mental models.

      The problem is captured well in the adage that all models are wrong and some are useful. To that we can add that their usefulness is highly subjective and depends on which values one brings in as their evaluative criteria.

      There is necessarily a balance, IMO, in determining to what extent “error” translates into “wrong” when we evaluate modeling. As the adage implies, even “wrong” models can be useful. At some point, however, it seems to me that the magnitude of error can render a model effectively useless.

      But you can’t even get to that question if people mistakenly look at models as making predictions rather than conditional projections, and truncate ranges and fail to incorporate confidence intervals as they evaluate models’ usefulness.

    • > but to accomplish this in weeks, not decades

      Yeah. This goes back to the cathedral/bazaar thing — even the Manhattan program took years.

      > I think what is missing is more research on how to make decisions under such extreme uncertain conditions

      My question is what decisions can be made at local levels for stuff like this? I think at this point, Federal thinking/decision making is a wash.

      At a state level in NY, it’s pretty clear that covid-19 is turning into some sort of weird political power-play:

      1. State decides budget in early April, allowing for $10 billion in cuts to public spending without raising taxes: https://www.governor.ny.gov/news/governor-cuomo-announces-highlights-fy-2021-budget

      2. In late April state threatens to make those cuts unless feds provide the money: https://www.cityandstateny.com/articles/politics/new-york-state/cuomo-warns-82-billion-cuts-localities.html

      I assume the state is serious, and I am not taking any federal bailout for granted. So state level NY is a wash.

      And so what is the decision making unit here? How can scientists even help? I’m not volunteering, but I am curious.

      Presumably even if they haven’t done it yet, states/cities/localities should be developing testing plans. It’s clearly not coming from the top down. What are the roles of universities/local governments/local media in this? Is there anything salvageable there, or is it just gonna be a free for all?

      > The pursuit of the ‘one model to rule them all’ seems to lack the humility this requires. But the potential publicity and recognition, combined with the political receptiveness of the audience, seems to be an irresistible attraction.

      Yeah.

    • This is a really excellent comment, and its points deserve to be widely heard and considered. If one knows the underlying processes and the issue is parameterizing on the basis of relative effect strength or timing, I can see modeling being the primary basis for decision-making. In the absence of this it is really a thought experiment: suppose we posit X1, X2, X3…. and calibrate to past experience, what would we expect to happen? There is a lot to be gained by thought experiments, especially if we conduct lots of them and begin to see high-level patterns emerging. But the result of any particular such experiment can’t bear much weight.

      I’ve paid a fair amount of attention to the integrated assessment models (IAMs) used in climate research, and that’s what I’ve found. I wouldn’t put much stock in any particular trajectory (policy mix X leads to climate stabilization Y) over the next 40 years or whatever. There is much to be learned from looking at the extent to which forecasts vary according to structural assumptions (and not just parameter uncertainty). And then that knowledge can be an input into the kind of decision-making Dale describes.

      • In other words, right back to fundamental concepts in Bayesian Decision Theory. It’s not any given sample from the posterior that matters… if we posit say 10,000 independent samples from the posterior, *each one* counts equally.

    • As a climate modeller who has looked into the covid thing recently, there is one huge difference that I am still struggling to get over. Climate modellers compare their model outputs to reality, and attempt to calibrate the parameters of the model so as to give reasonable agreement.

      Epidemiologists don’t. At least, the ones in the UK don’t. Which….leaves me absolutely stunned.

      • I suspect the climate modeling world has its share of people doing what you’re seeing among the Epi modelers. The purpose (often stated, sometimes implied) of these COVID-19 models is to scare the world into doing what the modelers feel is right (i.e. lockdown early and stay locked down indefinitely).

        Surely in the climate change arena you’ve come across plenty of people building models designed for maximum shock and awe. That’s the sort of thing it’s difficult to recalibrate to reality, because it’s not reality they’re trying to convey. They are trying to show how if people don’t do right, the situation will be far, far worse than current reality.

        • Actually….no, you are wrong in just about every aspect. There are a few around the fringes doing wacky out-there stuff designed for impact, but almost all the mainstream climate modelling is just people trying to do their best (a bad model gets found out and laughed at pretty quickly, no scientist would want to be associated with such a thing). The epi people are much the same, they just have a complete blind spot about calibration for some reason, probably cultural.

        • I doubt if the modelers are interested in scaring the public. OC, the public has no experience with exponential growth, and, when they find out, are likely to be scared.

          I addressed this question some days ago. The 100,000 – 240,000 projection, while scary, was not anywhere close to what they could have come up with if they wanted to scare the public. In fact, the President now seems to be using that projection as a baseline.

    • Dale and Jonathan both provide very important comments.

      Dale said,
      “I think what is missing is more research on how to make decisions under such extreme uncertain conditions, rather than additional models that reflect how little we actually know. I’m thinking that more attention needs to be paid to ideas such as fragility, precautionary principles, irreversible decisions, and other concepts that emphasize the precariousness of the moment. The pursuit of the ‘one model to rule them all’ seems to lack the humility this requires. But the potential publicity and recognition, combined with the political receptiveness of the audience, seems to be an irresistible attraction.”

      Is it more *research* on how to make decisions under such extremes uncertain conditions? Or rather more *discussion* about making decisions under uncertainty? I think it’s the latter. There seem to be a lot of books these days with titles like ” Decision Making Under Uncertainty”. (Perhaps discussion and critiques of some of these books would be a worthwhile topic for discussion on tis blog. I can easily imagine that at least some of these books end up giving “rules” for decision making under uncertainty — one-size-fits-all-pronouncements that don’t fit all circumstances.) And we probably need (in teaching and writing) more discussion/emphasis on ethics — Things like being careful to state your premises, and emphasizing that your conclusions aren’t valid if those premises are not satisfied.

      Joshua’s point that “The problem is captured well in the adage that all models are wrong and some are useful. To that we can add that their usefulness is highly subjective and depends on which values one brings in as their evaluative criteria” is one that is important, but is more often than not neglected in teaching statistics and scientific reasoning. Lots we need to do here to improve things!

      • Well, I’ve written two such books on decision making under uncertainty myself! But I don’t provide rules – rather I emphasize the need to be explicit about the sources and extent of important uncertainties. My books emphasize Monte Carlo simulation methods, but also discusses their limitations. The “research” I am suggesting could also be considered “discussion” and involves how to make decisions such as we face currently. While Monte Carlo simulation might be possible and productive, I suspect it would look like yet another model “to rule them all.” I think the nature of the risks we are facing do not currently lend themselves to that kind of quantitative analysis (I hesitate to say this, since certainly some analyses are better than others, and it may be possible to produce a valuable model for our current decisions). Rather, I’d like to see some novel (pun alert!) thinking about how to make the kind of decisions we are currently making. What I keep seeing is the same methodologies applied in a dizzying array of models, showing almost any decision that you may favor as the “right” decision to make.

        • I think essentially NO effort has been made to do a principled bayesian decision theory analysis with a real-world cost function (which itself may have uncertain components to it, such as “prices” for life years and etc).

        • Dale –

          > I think the nature of the risks we are facing do not currently lend themselves to that kind of quantitative analysis (I hesitate to say this, since certainly some analyses are better than others, and it may be possible to produce a valuable model for our current decisions).

          I think they are a tool – to hopefully help to bound the uncertainties. Unfortunately, they are often seen as a way to eliminate uncertainties (which they aren’t intended to and can’t do) or as a way to leverage uncertainty (given that they have to be wrong) as a weapon to hamstring decision-making.

          Again, however, I don’t think that is a problem with the models but a problem with how people will use any information or analysis in contexts like climate change, and now even COVID, to confirm biases and advance tribal agendas. The fact that models become a focal point and leverage point in the climate wars isn’t a function of the models themselves. Climate models and COVID models are wrong like any models. But even if they were perfect they would still be used as weapons in the arsenal of climate warriors.

        • Joshua said,
          “I think they are a tool – to hopefully help to bound the uncertainties. Unfortunately, they are often seen as a way to eliminate uncertainties (which they aren’t intended to and can’t do) or as a way to leverage uncertainty (given that they have to be wrong) as a weapon to hamstring decision-making.

          Again, however, I don’t think that is a problem with the models but a problem with how people will use any information or analysis in contexts like climate change, and now even COVID, to confirm biases and advance tribal agendas. The fact that models become a focal point and leverage point in the climate wars isn’t a function of the models themselves. Climate models and COVID models are wrong like any models. But even if they were perfect they would still be used as weapons in the arsenal of climate warriors.”

          So the question is: How can we get practitioners out of this mindset of misuse/abuse of models? I don’t know the answer — except possibly to recruit enough people to persistently say, “Why are you doing it this way? Why are you choosing this model? Why? Why? Why?”

        • Dale said, “What I keep seeing is the same methodologies applied in a dizzying array of models, showing almost any decision that you may favor as the “right” decision to make.”

          I agree that this indeed would be disconcerting!

  12. One nitpick. I looked at the faculty at the IHME and it doesn’t seem to be staffed with economists. MDs, mathematicians, epidemiologists. I’m not sure where you got the impression that it is an economics division.

    • Tom:

      The director of the center, Chris Murray, is described as “a physician and health economist.” I was focusing on the economics but I agree that the physician part is relevant too.

      Going to the page on their senior management team and looking at the professors, I see:

      Joseph Dieleman, Associate Professor, PhD in economics
      Emmanuela Gakidou, Professor, PhD in heath policy
      Simon Hay, Professor, doesn’t say what his final degree is, but it could be epidemiology
      Stephen Lim, Professor, PhD in epidemiology and health economics
      Rafael Lozano, Professor, MD and Masters in social medicine
      Ali Mokdad, Professor, PhD in quantitative epidemiology
      Christopher Murray, Professor, MD, DPhil in international heath economics
      Mohsen Naghavi, Professor, MD, PhD in epidemiology
      Stein Vollset, Professor, MD, PhD in biostatistics
      Theo Voss, Professor, MD, PhD in epidemiology and health economics

      So, yeah, I guess you’re right. Adding it up, I see:
      5 MD’s
      4 or 5 doctorates in epidemiology
      4 doctorates in economics or health economics
      1 doctorate in health policy
      1 doctorate in biostatistics

      So there are least as many epidemiologists as economists on the faculty.

      • That’s the management team. There are 59 faculty and affiliated faculty members at the center and they work on a variety of problems, not just this COVID forecast exercise. I agree that the director published the press release and article in MedRxiv along with “IHME COVID-19 health service utilization forecasting team.” It would be nice to know who actually developed the model as part of that team (Etzioni describes the director as “the chief model spokesman” not as the developer). If it was all economists, then perhaps they should let their non-economist colleagues at the interdisciplinary IHME be involved.

        My nitpick was labeling IHME as staffed by economists, either solely or as a majority (“In addition to the economists who staff the aforementioned institute”). It is still possible that this project was undertaken by economists since I can’t find the developers anywhere.

        Looking at the first row of faculty, you have: MD/MS epidemiology, PhD math, PhD epidemiology, PhD biochemistry (but works for the Gate Foundation), PhD environmental health (maybe economics?).
        http://www.healthdata.org/about/team/faculty

  13. From my experience in the private sector, achieving transparency in a model is quite challenging due to tight deadlines and the limited communication skills of many who work in model development.

    On the IHME model: I don’t really find it especially non-transparent. While I was not willing to spend the time to delve deeply into details, I could quickly see that it was essentially trying to extrapolate future deaths by assuming the mortality curves all had a shape determined by a common function that used the timing of social distancing policies as a factor. In some states, it appeared their timing was off. In all cases, the model neglected issues such as states phasing in policies as “advisory” at first, voluntary changes in people’s behavior, and variations in population density and structure.

    That was enough to imply that the best one could say is that the model forecasts were not entirely implausible without going into a deeper dive.

  14. All models are abstractions of reality. Is the Murray IHME model an exercise in curve fitting? Perhaps/probably. Is there a degree of specificity being applied to the problem that is problematic? Evidently. Is there an implicit feedback loop that is (very) problematic (and somewhat unmodel-able given the gyrations coming from DC)? I believe yes. Initial high infection/fatality estimates drove policy towards social distancing stratagems which drove the #’s down, which drove the updated model’s estimates down which allowed policy makers to conclude that the situation was improving, which drove the discussion more towards opening up the county etc etc. Curve fitting is more appropriate for inference, but clearly problematic in many instances wrt forecasting. Would a more mathy model (differential eqn) type model be better? I dunno. Would a more parsimonious model be better? i don’t know that either, but in general that’s the advice from an econometric perspective. And while parsimony has been commented on in this blog (https://statmodeling.stat.columbia.edu/2004/12/10/against_parsimo/), i’d suggest that a simpler model, with fewer moving pieces and wider bands would be less subject to scorn than what’s being used today. Bottom line is that the math is the math on exponential growth and it’s all bad and the real debate comes down to the trade-offs between public health and the economy.

    And i’ll throw in again, the differences in prediction v. forecasting. The govt is throwing point estimates out there under the prediction rubric and the media jumps on those predictions. A change in statistical tone is warranted. #standarderrors

  15. Is it generally understood that Covid-19 is mutating? It mutates within every person, and as it comes out of people it is mutating on a larger scale, suggesting recombinant mutation must be occurring. The notable effect so far has been toward greater transmissibility. That means reproduction rate estimates in models can be way off: as in, evidence shows that a strain can begin in an area and be displaced within weeks by a more transmissible one.

    See this paper for an example: https://www.biorxiv.org/content/10.1101/2020.04.29.069054v1.full

  16. In my opinion, the worst thing that the IHME did was set the bar so low that it induced so many people to do covid modeling. Think of all the hours (not to mention CPU cycles) wasted – mine included as I also got dragged into working on a paper.

    BTW the behavior of SIR type models is tame enough to get away with simpler integration schemes such as forward Euler. Stan and other packages like tensorflow probability do more harm than good with DOPRI – the integrator gets stuck trying to adapt the stepsizes when really bad parameters cause the ODEs to become stiff when they really shouldn’t be stiff.

    • Josh:

      If the “really bad parameters” are possible, then the stiffness of the differential equations is just something you need to deal with. If the “really bad parameters” are not realistic, then you should rule these out in the prior, which should resolve the problem. Approaches that don’t use priors won’t make the problem any better; they’ll just seem to work because you choose good starting values, but then you’re using the starting values as a sort of surrogate prior distribution, which is a bit hacky and can cause computational problems.

    • > the worst thing that the IHME did was set the bar so low that it induced so many people to do covid modeling

      This doesn’t seem like a bad thing to me.

      I thought the problem with the IHME model was some decisions were made based on predictions from it, and making predictions based on a polynomial curve fit is an established bad idea in epidemiology?

      > Stan and other packages like tensorflow probability do more harm than good with DOPRI – the integrator gets stuck trying to adapt the stepsizes when really bad parameters cause the ODEs to become stiff when they really shouldn’t be stiff.

      If you find problems with Stan’s ODE solvers, or in general have suggestions for probabilistic modeling with ODEs, throw that up on https://discourse.mc-stan.org/ and ping me @bbbales2. We don’t want to be doing the wrong thing. If it’s something we can’t work around, we’d appreciate a case study too (https://mc-stan.org/users/documentation/case-studies.html)!

      • I don’t think necessarily Stan does anything wrong… it’s just that real world problems will not necessarily exclude regions of parameter space where the problem becomes numerically unstable without rather informed priors. sometimes our probability knowledge could come in the form “if you need a tiny stepsize to solve this, this parameter isn’t very probable”… I’m not sure how you could express that in Stan.

        • > “if you need a tiny stepsize to solve this, this parameter isn’t very probable”… I’m not sure how you could express that in Stan.

          I’ve encountered this type of problem in other differential equation models and ended up using something like this kludge (e.g., assign -infinite log likelihood if the step size gets too small). Of course, this can’t be done strictly in Stan, but if we could make parameters of the ODE solver, well, parameters instead of data (or being invisible, I don’t think stepsize per se can be set; I’m a few versions behind, so maybe this is different now) we could put priors on them and do this in a principled way. It would essentially be a prior on the smoothness of the function, like you’d use in a GP model.

          Of course, I have no idea how difficult such a thing would be to incorporate since it would seem to require going “under the hood” and I’m not qualified to judge that. Still, I would certainly find it useful if possible!

        • > Of course, this can’t be done strictly in Stan, but if we could make parameters of the ODE solver, well, parameters instead of data (or being invisible, I don’t think stepsize per se can be set; I’m a few versions behind, so maybe this is different now) we could put priors on them and do this in a principled way. It would essentially be a prior on the smoothness of the function, like you’d use in a GP model.

          That’s a really amusing idea. I will try it later if I get a chance.

        • Yeah, it really comes about from interactions of the parameters. When using fully factorized priors, individually the priors on each variable might be reasonable while some combinations get us into trouble. This is particularly true if we want the priors to be broad so they are only weakly-informative. I suppose figuring out a good interaction prior just takes a bit more work, looking at the spectrum of the ODEs – this is non-trivial I think for more-complicated problems.

      • > I thought the problem with the IHME model was some decisions were made based on predictions from it, and making predictions based on a polynomial curve fit is an established bad idea in epidemiology?

        You’re right.

        > We don’t want to be doing the wrong thing

        I think it’s working as expected. I would perhaps like to have a bit more explicit control, for instance be able to specify the minimum step size. If a particular parameter combination leads to a wildly bad prediction, I don’t really care how wildly bad it is to numerical tolerance – I’d rather just cheaply approximate that particular solution.

        It’s easy enough to implement a new solver in the function block of the stan code so this wasn’t a big deal in the end. With the built-in solver we had some issues in the burn-in phase. I suppose as Andrew points out that putting better priors in should be the way to go.

        • It’s actually more of a problem in tensorflow-probability due to how computations are broadcasted. If I’m batch integrating over many samples of parameters there, then a few of the samples being in bad regions causes the entire computation to adapt in step-size so everything is carried along for the ride. However in tfp the integrators are pure python so it was easier there than in Stan (for me) to implement another integrator.

        • > for instance be able to specify the minimum step size

          Aaah, okay if you use the signature with the tolerance options the last argument ‘max_num_steps’ lets you set the maximum number of timesteps to take in a solve. I think that is maximum steps across the whole integration, so it’s kinda the inverse of a minimum timestep.

          > It’s easy enough to implement a new solver in the function block of the stan code so this wasn’t a big deal in the end.

          I’ve heard about people doing this a lot, which I assume means it’s happening a lot more than I’m hearing about. We’re working on some stuff so people don’t end up having to write and debug their own solvers.

          What kind of solvers did you use in the end?

Leave a Reply to Jakob Cancel reply

Your email address will not be published. Required fields are marked *