New dataset: coronavirus tracking using data from smart thermometers

Posted on March 22, 2020 10:34 PM by Andrew

Dan Keys writes:

I recently came across the new coronavirus tracker website which is based on data from Kinsa smart thermometers. Whenever someone takes their temperature with one of these thermometers, the data is sent to Kinsa. Thermometer users also input their location, age, and gender. The company has been using these data for a few years to track the flu within the US, and they are starting to do something similar with coronavirus (as described in more detail on their technical approach page).

I’m reaching out to you about this because I’ve seen your writing about coronavirus modeling on the blog, and it seems like you might be interested in doing something with this dataset or in touch with other people who are. These data could make it possible to track the spread of coronavirus in the US without the testing bottleneck. The best option that I know of for trying to get access to the dataset is to reach out to Kinsa on their contact page.

(I [Keys] also have some initial thoughts on the data and some modeling that could be done, but I don’t have the technical skill to do the modeling well. I’d be happy to share thoughts with anyone who is interested in trying to use this dataset.)

40 thoughts on “New dataset: coronavirus tracking using data from smart thermometers”

jim on March 23, 2020 12:10 AM at 12:10 am said:

Cool! And freaky. Will my smart phone also know when I had sex and who I had sex with? Will my girlfriend know that too? :)

Slightly different topic:

When do we lift mobility restrictions? I’m curious what people think about that, it’s not immediately obvious to me what the answer might be.

Here in WA we added only one fatality today, and the fatality rate has been climbing ~linearly (~5-10/day) not exponentially. I feel like we’ll be battened down for at least another week but if the fatality rate slows, what’s the cue for ending or lightening restrictions?

Reply ↓
- Ricard on March 23, 2020 5:57 AM at 5:57 am said:
  
  Hi Jim,
  Currently on ‘holiday’ at home here in the UK before resuming working from home next week. From my reading of the situation restrictions will (or should) not be lifted until a vaccine becomes available, so it’s more like a year away rather than next week. Hope I’m wrong but sadly doubt it.
  
  Reply ↓
  - jim on March 23, 2020 10:02 AM at 10:02 am said:
    
    Yes, looking at what’s going on elsewhere a stupid question on my part. But I’ve been watching WA closely and I don’t understand why it’s not exploding here, given that we were the first on the map in the US by a long shot. I’ve been withholding optimism because of the testing situation but that appears to be substantially improved. NY and CA have blown right past us. Many ppl here started self-regulating nearly a month ago, so is that having an effect? Or is there some hangup in the data?
    
    Reply ↓
    - Phil on March 23, 2020 1:23 PM at 1:23 pm said:
      
      I don’t see how it could be a data issue: ‘cases’ depends heavily on testing and diagnosis, but ‘deaths’ doesn’t — every death gets reported, and a large fraction of coronavirus deaths are reported as such — and Washington has a much slower growth rate than other states with lots of cases: https://www.nytimes.com/interactive/2020/03/21/upshot/coronavirus-deaths-by-country.html (scroll down for individual states)
    - jim on March 23, 2020 7:29 PM at 7:29 pm said:
      
      “I don’t see how it could be a data issue”
      
      That’s what I thought, although it seems possible that fatalities aren’t getting tested. And a higher jump of 15 fatalities today so a paper back-log from the weekend?
      
      Whatever the case it’s still far slower than most of Europe.
- Anoneuoid on March 23, 2020 5:57 AM at 5:57 am said:
  
  If you let medical issues continue to dominate, never for an extended period of time. As soon as restrictions are lightened the same thing will happen again within a year. If not due to this virus, it will be another one.
  
  Meanwhile the same experts who advise this course of action are still ignoring the lack of smokers in the data and the IV vitamin C that got added to the Chinese treatment guidelines. In fact they are advising the opposite of what the data indicates (calling vitamin C a myth, etc).
  
  Reply ↓
- Zhou Fang on March 25, 2020 10:23 AM at 10:23 am said:
  
  It is extremely difficult to distinguish between a linear and an exponential on that kind of timescale.
  
  The proper cue for lightening restrictions is, AIUI, if we are testing and successfully contact tracing everyone with the disease.
  
  Reply ↓
Bill Spight on March 23, 2020 3:43 AM at 3:43 am said:

Hmmm. Looking at their map, IIUC, the incidence of high temperatures is greater in Florida than expected by the flu, and the likely suspect is Covid-19. So it looks like the next hot spot for it will be Florida in a week or two. That is in line with the observations of journalists over the past couple of weeks that many people in Florida have not been avoiding close contact, neither young people on Spring Break nor elderly residents. At least localities have started to close their beaches. That may have come too late, however. We shall see.

Reply ↓
- Julien Riou on March 23, 2020 5:09 AM at 5:09 am said:
  
  Yes I made the same observation. If true this is a recipe for a true disaster in the next few weeks considering the old population in this area. Mortality appears to be delayed by 2-3 weeks compared to infection (see Linton et al, Journal of Clinical Medicine 2020).
  
  Reply ↓
- Dan F. on March 23, 2020 1:58 PM at 1:58 pm said:
  
  I can’t find on the website any information about the demographics of the people using this thermometer. Suppose mainly old, retired people with existing health issues buy these thermometers. Since such people are concentrated in places like Florida and Arizona, such a circumstance would introduce substantial bias in the data. Perhaps some insurer or medical provider locally popular in Florida promotes these devices? The data seems of questionable utility without knowing something about the demographics of the users (at least their distribution and age profile).
  
  Reply ↓
  - Martha (Smith) on March 23, 2020 10:27 PM at 10:27 pm said:
    
    “The data seems of questionable utility without knowing something about the demographics of the users (at least their distribution and age profile).”
    
    +1
    
    Reply ↓
- Bill Spight on March 31, 2020 7:57 AM at 7:57 am said:
  
  March 31 update.
  
  It looks like New Orleans has become the next hotspot, not Florida. And the reporting of new confirmed cases has become widespread across the US. This morning I scanned the California data, and much of the increase is in smaller counties. Perhaps that is because of more widespread testing now. There also may be a weekend effect, with some cases not reported until Monday.
  
  Reply ↓
Bob on March 23, 2020 5:31 AM at 5:31 am said:

Neil Furguson frim Imperial, who is the driving force behind the current UK strategy, has posted this on twitter

https://twitter.com/neil_ferguson/status/1241835454707699713

Undocumented C code, designed for flu pandemics.

Reply ↓
- Julien Riou on March 23, 2020 6:19 AM at 6:19 am said:
  
  Neil Ferguson is one of the most respected scientists in the field. I cannot understand how he thought that “I wrote the code (thousands of lines of undocumented C) 13+ years ago to model flu pandemics” would be a good enough excuse not to release his code. It is going to backlash hard (and deservedly so).
  
  Reply ↓
  - Bob on March 23, 2020 8:18 AM at 8:18 am said:
    
    There’s going to be many backlashes from this, the Imperial college paper has been used to beat the UK government over the head with since it came out. The media have run headlong with it, now it turns out it’s bodged together academic code, untested, undocumented and designed for another purpose.
    
    I’d bet any money that it’s substantially wrong, God knows what kind of hardcoding and errors are in that code. That’s before you consider how he estimates the parameters without a previous pandemic to work with.
    
    I can’t believe he’s treated as the oracle of coronavirus when that’s what he’s working with.
    
    Reply ↓
    - More Anonymous on March 23, 2020 11:57 AM at 11:57 am said:
      
      Bob and Julien,
      
      I think it’s best to understand the Ferguson’s work here as a “paper on what could happen” for a subject that has too big a parameter space for a “paper on what will happen.”
      
      The target audience for this paper seems to be public decision makers who connect with the “give me some realistic scenarios” approach a lot better than a “let’s create a model with a small enough parameter space to explore then explore it all and quantify the uncertainty” approach.
      
      Looking at the other papers from the Imperial college of London team, they have been some of the best I have seen on Covid-19 — and more generally I have respected the work by authorship groups that include Ferguson for years.
      
      The current paper doesn’t look as good to me, but I suspect that Ferguson and colleagues know their audience and have targeted it at them not people like us.
      
      Also, I would like to see more on the combination of case tracking and a low trigger for initiating local social distancing, since that might be a useful approach after (if) the initial waves are brought under control and before vaccines are available.
    - Bob on March 23, 2020 3:15 PM at 3:15 pm said:
      
      The others might look better, but if they’re based on a shoddy implementation of a poor model, what good are they?
      
      I’ve no problem with the fact that they are working in a very uncertain situation and need to come up with plausible cases for decision makers.
      
      My problem is that his paper has been seized upon as showing ‘the science has changed’. When people pull at the threads of this model (which has started in earnest), I think the consequences could be severe.
    - More Anonymous on March 23, 2020 5:15 PM at 5:15 pm said:
      
      The other papers are based on different approaches — you can find them as reports 1-9 on the Imperial College COVID-19 Response Team website.
      
      I agree this report is not “the science has changed.” To me it seems more like, “yes, we checked a bunch of scenarios and it seems as bad as initial impressions would suggest for an R0 of 2-2.5 and 1% mortality.”
    - Dan F. on March 23, 2020 1:53 PM at 1:53 pm said:
      
      It goes without saying, or it should in any case, that a paper written during a genuine crisis in a few days is likely to be a bit sloppy.
    - Phil on March 23, 2020 2:46 PM at 2:46 pm said:
      
      Whether that’s a good bet depends on how you define ‘substantially’ in ‘substantially wrong’.
      
      Here’s my back-of-the-envelope model, which I’ll claim without evidence is at least as good as the 100K lines of C code.
      
      You’ve probably seen https://www.nytimes.com/interactive/2020/03/21/upshot/coronavirus-deaths-by-country.html
      
      UK, Spain, Italy, France, China, all agree that without major preventative measures you get a doubling of deaths about every 2 to 3 days. There’s a big difference between 2 and 3 days but let’s take 3 days. That’s a factor of 1000 in a month, and a factor of a million in 2 months.
      
      But if something can’t go on forever, it won’t, and at some point you run out of vulnerable people to kill. There are only about 4 million people 65 or over in the UK (about 40% of whom have a ‘limiting longstanding illness’). Even if all 4 million become infected (which they won’t), that ‘only’ gives you 50K-100K deaths…although if they all get sick at around the same time, the death rate per infection would go up a lot because the health care system is so swamped: the shortage of doctors, nurses, ventilators, masks, etc. that we’ve all been hearing about. Could the death rate double? Perhaps it could; we can look at the worst-hit parts of Italy and find out, maybe.
      
      So let’s say 50K-200K deaths among the elderly from business-as-usual, with the lower number representing a non-overwhelmed medical system and many elderly escaping infection, and the upper representing very widespread infection among the elderly, and a medical response system that can’t handle it. That’s only a factor of 4 between high and low, can I really be that certain? No. So I’m saying 30K – 300K. There, that’s better.
      
      Some middle-aged people will die too, and immunocompromised younger people. So add another 20K – 100K. What have I got? 50K-400K people in the UK dying, if drastic steps aren’t taken to control the virus. Look at those log plots and tell me you don’t think that’s possible.
      
      The headline number of the Imperial College paper was that 500,000 people in the UK “could” die of coronavirus if major steps aren’t taken. That’s outside my back-of-the-envelope upper limit, but could you get there? Yes: if the death rate in the elderly is closer to 10% than the 1-2.5% range that I assumed, you can get there. Or if there are waaay more vulnerable people in the non-elderly population than I assumed. Basically a number like 500K in the UK seems really unlikely…but as an upper bound for what would happen without a major lockdown I don’t think it’s ridiculous. If anyone says the number couldn’t possibly exceed 200K I’d say they really need to justify that with some great arguments and evidence.
      
      In short (too late!): I’m not sure it’s a safe bet that the numbers in that report are ‘substantially wrong.’ You’ve gotta allow at least a factor of 2 in wiggle room in a disease for which the number of deaths doubles every 3 days, and I doubt their upper estimate is wrong by much more than a factor of 2, although you can get into the usual philosophical discussions about what it means for a number to be ‘wrong’ if it’s proposed as an upper limit and reality takes you well under it.
    - Daniel Lakeland on March 23, 2020 3:33 PM at 3:33 pm said:
      
      Phil, I feel like physicists and engineers have a bit of a special skill that’s not so common in other fields, the back of the envelope calculation. This is a super common thing to do in physics and engineering, but I think way less common in medicine, biology, even econ or whatever. The fact that people aren’t so familiar with the techniques makes it harder for them to handle situations where there’s a lot of uncertainty and numbers can be off by factors of 2 or even 5… there’s a bit of paralysis when you don’t know whether you should plug in 15% or 35% for example. Anyway, thanks for this calculation, I agree with it, and I think your numbers are a little low as they ignore the potential for a very high spike in mortality when the ICU is unavailable. I think easily mortality could go to 90% for people with severe cases at any age if there are basically 15 or 20 people for every ICU bed. So then mortality becomes essentially equal to the percentage of people with severe cases, which could be 5 to 15% of all cases. So while under some circumstances we expect maybe 0.5% mortality for say 40 year olds… under those extreme circumstances we could see say 10%, a factor of 20 higher.
    - Bob on March 23, 2020 3:41 PM at 3:41 pm said:
      
      Let’s say then his number is 500k without drastic steps and yours is 400k. What if the drastic steps are taken, maybe for 3-6 months, but the number keeps climbing towards your 400k. What does that say about his original number?
      
      What if the steps are taken and the final number is closer to 20k? How does that compare to his projection with the measures? Did his model provoke an overreaction?
      
      This is essentially a wartime economic response, and Ferguson is advising based on a 13 year old, poorly written code base designed for flu pandemics.
      
      Another thing, it’s a deterministic ODE model, so scenarios are generated by tweaking parameters. If these equations are coupled nonlinearly, it’s not possible to explore the possible outcomes in a systematic way. It’s based on a hunch.
    - Phil on March 24, 2020 2:17 PM at 2:17 pm said:
      
      Daniel, you’re right, having just today read some news coming out of Italy, I clearly underestimated the death rate per infected old person in the case of an overwhelmed medical system. My upper limit for the UK should be more like a million. We’ll never get a realization of the scenario because they are finally taking it seriously (thank god).
      
      Bob, I’m not sure whether your questions were rhetorical or not. I’m not really arguing that a 100,000-line codebase that someone admits was not well written isn’t likely to have some errors in it. And I agree with you that, from the description of what it does, it’s…peculiar at best, I guess…that it required so much code. One hopes that at least they ran some test cases with known answers and got them right, and that they ran a bunch of other cases and got answers that looked right. I really am not making bets on the value or validity of the code, just saying that the top-line number that I saw in the news is reasonable.
      
      I think there’s a huge amount to be learned from a decent model and I don’t advocate replacing more sophisticated models with back-of-the-envelope calculations. That’s especially true for systems like a pandemic, in which a small change in a parameter value can change the number of fatalities by hundreds of thousands. Figure out which parameters those are and what you can do to change them, and you have done a great service. A better model can help with that. Even just understanding the ways in which reality differs from reality can be valuable.
      
      More Anonymous, you’re right that days-to-doubling plays a big role, indeed a huge role. That’s been the whole idea of ‘flatten the curve.’ For my back-of-the-envelope calculation I used a 3-day doubling time, which is what we’ve seen for the past few doublings, but maybe it’s really 2 days in dense urban areas and 6 days elsewhere and the urban areas will saturate, or something. I didn’t have a specific timeline in mind when I did the calculation above, was just thinking of ‘the next few months’ in sort of a vague way, so my numbers wouldn’t change much if I used 4 or even 5 days instead of 3, although the death rate will surely be much higher if we stay at 3 because at 3 the hospitals will cease to be able to help most patients within a few weeks. Anyway thanks for the reality check.
    - Phil on March 24, 2020 2:19 PM at 2:19 pm said:
      
      I’m not sure where I got the “100,000 lines” thing. Maybe it’s just “thousands”, which makes a lot more sense and seems reasonable, if it’s not many thousands.
    - Anoneuoid on March 24, 2020 2:20 PM at 2:20 pm said:
      
      They clearly tuned it to the seasonal pattern of flu epidemics. I wouldn’t take it seriously at all since then all the seasonal influence ended up in the transmission dynamics.
    - Bob on March 25, 2020 7:54 AM at 7:54 am said:
      
      The 100k comes from Ferguson himself, he posted that on twitter.
    - More Anonymous on March 23, 2020 5:08 PM at 5:08 pm said:
      
      Phil, Daniel, & Bob — Consider how sensitive back-of-the-envelope 2-month mortality estimates are to the days-to-doubling number. An error in estimated days-to-doubling is unlikely to be cancelled out by an error in the other direction anywhere else in the calculation, which makes these kind of back-of-the-envelope estimates less useful than they would be for most Fermi-estimate-ish problems.
      
      For example, imagine there are x deaths at day 0. If days-to-doubling is 7, then there are 380 * x deaths at day 60 (2 months). If days-to-doubling is 2, then there are 1 billion * x deaths.
      
      My sense is that a high level, a lot of the value that comes from the “differential equation with population structure” type approach to epidemics is that it in effect deconstructs the days-to-doubling into a quantity that arises from a combination of other parameters, such as R0s, disease prevalence in different age groups and settings, etc. Days-to-doubling doesn’t get deconstructed in a simple linear or multiplicative way, like you’d usually use for back-of-the-envelope estimates, but it still gets broken down. By breaking days-to-doubling down to other parameters, errors in opposite directions can cancel more nicely again, providing better final estimates. That’s only my understanding though, not something I hear from others, and could well be wrong on it.
      
      Phil, I agree that a good approach is also your method “just assume everyone gets it, see how bad it could be”, but your 1% mortality in over-65s seems low from Diamond Princess numbers and other reports.
    - Daniel Lakeland on March 23, 2020 10:24 PM at 10:24 pm said:
      
      One of the most important parameters is actually the dimensionless ratio of the duration of illness to the doubling time. d/t… if d/t is large then the spread happens all at once and everyone is sick at the same time.
      
      if d/t is small then the spread happens slowly and recovery occurs before there are many more cases.
      
      in this case it looks like d ~ 20 days, and t ~ 3 to 4. So the ratio is large, and it’s a very risky situation for everyone to get sick all at once and overwhelm everything.
      
      Qualitatively this is all the answer you need and uncertainty in the t is not that important, because it’s clearly in every major country with monitoring between around 2 and 5 days.
- Andrew on March 23, 2020 8:13 AM at 8:13 am said:
  
  Yeah, that’s all a bit weird. Also this:
  
  I am happy to say that @Microsoft and @GitHub are working with @Imperial_JIDEA and @MRC_Outbreak to document, refactor and extend the code to allow others to use without the multiple days training it would currently require . . . They are also working with us to develop a web-based front end to allow public health policy makers from around the world to make use of the model in planning. We hope to make v1 releases of both the source and front end in the next 7-10 days.
  
  I’m guessing that they’d be better off starting with the model and rewriting the simulation from scratch, rather than refactoring the code. This sounds like a probability/statistical modeling problem, not a software problem, and sticking with the old code sounds like a mistake.
  
  Reply ↓
  - Bob on March 23, 2020 8:32 AM at 8:32 am said:
    
    Their problem is if the new code gives a different answer, then what?
    
    The ‘thousands of lines of code’ bit should ring alarm bells. Apparently his model is five coupled ODEs, that doesn’t take thousands of lines of code. That’s before you consider whether taht’s the best way to model it anyway.
    
    He’s opened a msssive can of worms, this may not end well.
    
    Reply ↓
    - Anoneuoid on March 23, 2020 9:12 AM at 9:12 am said:
      
      I guess this is it, which shows a peak in deaths In Late May to early June:
      
      https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf
      
      It ignores seasonality, so I don’t see why it ever got taken seriously in the case of the flu to begin with.
    - Bob on March 23, 2020 9:25 AM at 9:25 am said:
      
      That’s just the ooutput though, with a bit of discussion around parameters. You need to see the actual equations, the fact that nobody seems to know what they are is not good news in my opinion.
      
      That report though demonstrates the problem, academics hack together the code that gives them the plots and tables that they want. It’s not transparent, it’s not robust, but it’s driving advice behind monumental decisions.
    - Anoneuoid on March 23, 2020 9:45 AM at 9:45 am said:
      
      Right, but a model that predicts a flu epidemic can peak in the middle of the summer is wrong anyway. Whatever the ultimate reason(s) for the the seasonality it obviously is extremely important to the epidemiology of influenza. That model effectively assumes no seasonality.
Benjamin Vigoda on March 23, 2020 6:53 AM at 6:53 am said:

Facebook Data for Good tooling for Disease Prevention Maps is available to any nonprofit or university counterpart working on COVID19 forecasting, vulnerability analysis, or to determine how to best allocate resources.

https://dataforgood.fb.com/tools/disease-prevention-maps/

Anyone who might be able to benefit from these tools – we’ll be happy to get them on-boarded.

Reply ↓
- Slava Nikitin on March 23, 2020 11:32 AM at 11:32 am said:
  
  Benjamin, do you have a list of nonprofits or universities that use the FB data? I am interested in landing a hand in analyzing it.
  
  Reply ↓
Anonymous on March 23, 2020 2:52 PM at 2:52 pm said:

On the kinsa map, it is weird that Seattle has low disease levels. When I checked the map on Saturday, the New York area also had low disease levels! (They are moderate now but were low then.)

I don’t think this map can be a good way of finding covid when it misses high case numbers in New York and Seattle.

I think it makes sense to be cautious about the quality of this data. In the past there was lots of enthusiasm for the similar Google flu trends website but then it fell apart.

If they share the data though, that would be helpful.

Reply ↓
- jim on March 24, 2020 9:45 PM at 9:45 pm said:
  
  I suspect it’s actually right about Seattle. We’re not being swept by pandemic.
  
  King (Seattle’s) County has ~94 fatalities and 1300 cases; but at least 35 fatalities are from the nursing home where the outbreak first occurred. The metro total (Snohomish, King and Pierce counties) has 111 fatalities and ~2020 confirmed cases out of 3.7M people. It’s hard to imagine that amount would be distinguishable from the background of flu cases.
  
  It probably makes sense to be cautious about this data but it will be interesting to see how the Florida situation shapes up. Hopefully it won’t turn out like it looks like it could.
  
  Reply ↓
  - Brent Hutto on March 25, 2020 7:34 AM at 7:34 am said:
    
    The state of Washington has reported 86 flu deaths for the 2019-2020 season (up through March 14).
    
    The number of COVD-19 deaths in the state has been 123 since the beginning of the outbreak.
    
    https://www.doh.wa.gov/Portals/1/Documents/5100/420-100-FluUpdate.pdf
    
    Seems distinguishable from the background of influenza mortality but similar order of magnitude.
    
    Reply ↓
    - jim on March 25, 2020 7:58 AM at 7:58 am said:
      
      Fatalities are a similar order of magnitude, but the map supposedly measures “illness” or fever.
      
      WA reports 2500 cases COVID19. CDC suggests about 10% of US population has been infected w/ influenza this season; 10% of WA is ~700K. I guess we don’t know the true infection rate of COVID19 but even if it’s 10x the reported rate, 700k >>> 10x 2.5K.
      
      Whatever the case it will be interesting to see how things shake out.
      
      “CDC estimates that so far this season there have been at least 38 million flu illnesses, 390,000 hospitalizations and 23,000 deaths from flu.”
      
      https://www.cdc.gov/flu/weekly/index.htm
Brent Hutto on March 25, 2020 8:20 AM at 8:20 am said:

A really interesting map would be one that color codes COVD-19 mortality as a multiple of influenza mortality during the same period, maybe a series of weekly maps starting eight weeks ago. And another map for COVD-19 hospitalizations as a multiple of influenza hospitalizations.

To me that’s a good way of assessing the scale of the problem. Something twice as bad as the flu in a given area during a given week is cause for concern. Something ten times as bad as the flue in a given area during a given week is much more worrisome. I wonder how many areas in how many weeks have exceeded 10x the impact of the flu.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

New dataset: coronavirus tracking using data from smart thermometers

40 thoughts on “New dataset: coronavirus tracking using data from smart thermometers”

Leave a Reply Cancel reply