Mortality data 2015-2020 around the world

Posted on January 29, 2021 9:42 PM by Andrew

Ariel Karlinsky writes:

Me and a colleague (Dmitry Kobak) just published (for now just on medRxiv, publication upcoming soon we hope) a large (the largest to date!) dataset of weekly, monthly or quarterly deaths in 79 countries from 2015 to 2021 – Documenting large increases in mortality in most of them, in tandem with COVID deaths in developed countries, and much higher figures in developing countries.

I [Karlinsky] tweeted about it with some figures here.

And the dataset is available for public use here.

I’d think this data would be already available through the World Health Organization or something? Leontine Alkema has done a lot of analysis of this sort of data. But, again, I don’t know the details.

34 thoughts on “Mortality data 2015-2020 around the world”

Rahul on January 29, 2021 10:16 PM at 10:16 pm said:

Is India missing? India is a very curious case. Would have loved to see whether India saw a mortality spike or not.

Reply ↓
- Ariel Karlinsky on January 30, 2021 5:42 AM at 5:42 am said:
  
  We were not able to locate data for India, and their NSO’s response to our request was simple: “These are not available”. All in the paper :)
  
  Reply ↓
Rahul on January 29, 2021 10:36 PM at 10:36 pm said:

Dumb question: so the x axis is 12 months?

Reply ↓
- Ariel Karlinsky on January 30, 2021 5:43 AM at 5:43 am said:
  
  Depends on the country. For some the data is weekly, for others monthly and for some quarterly. More info on the repo page & paper.
  
  Reply ↓
Daniel H. on January 30, 2021 8:46 AM at 8:46 am said:

I have two covid-mortality-related questions that’ve been bugging me for a while. Maybe someone here knows some details or wants to investigate (I realize I could do so myself, but hey, maybe someone here is right now looking for covid-related research topics).

1) mortality in Australia
I’ve done some rough estimates for actual infections via two routes: a) via death numbers and estimated infection fatality rates (infections = deaths / IFR) and b) via confirmed cased and a compensation factor depending on ratio of positive tests (the higher, the more infections we’re missing in official cases). Both a and b agree for all countries I checked EXCEPT AUSTRALIA where, between July and October only, deaths were 5 times higher (!) than they should’ve been based on infection numbers. Maybe they ran a really weird testing scheme (producing <1% positive rate but missing many infections) or infections were basically only in the high-risk group?

2) differences between official covid deaths and excess mortality
Based on a quick check of the mortality data provided by our world in data, official covid deaths and excess mortality …
… mostly agree for e.g. Austria, Belgium, Chile, Croatia, France, Germany, Hungary, Israel, Slovenia, Sweden, Switzerland
… show additional excess deaths for: Bulgaria, Canada, Czechia, maybe Denmark, Estonia, Italy, Lithunia, Netherlands, Poland, Spain, United States
This is just by a quick glance at the data and while trying to adjust for other fluctuations in the data (in many countries, Jan-Feb 2020 was better than previous years for normal flu, so excess deaths are negative for these months), but visually it’s quite obvious that say US and Spain are having ~30% higher excess deaths than official covid deaths for high-covid times while e.g. Israel and Switzerland don’t. My own predjudices are that maybe Israel and Switzerland are somewhat better organized (both regarding health-care and data organizing), but this is probably a stupid monocausal explanation that brakes down once you check more countries and reasons.
The excess vs. confirmed deaths question might especially be interesting for the ongoing discussion if covid infections increase death risks in the months afterwards (which should show up in excess mortality rates, right?). Also this might be interesting statistics-related as all I’ve been doing is eyballing plots and there might be better ways to investigate.

Reply ↓
- JFA on January 30, 2021 3:14 PM at 3:14 pm said:
  
  I can only speak to the US context, but whenever I compare excess deaths (using the upper bound cutoff provided by CDC) to covid deaths, they are pretty similar. Early weekly excess deaths were much larger while since October, measured covid deaths have been larger than excess deaths. The mean difference between weekly excess deaths and covid deaths provided by the CDC is around 400 and with a correlation of 0.92. The totals match up relatively well.
  
  Reply ↓
  - Daniel H. on January 31, 2021 2:28 PM at 2:28 pm said:
    
    JFA: I see where you’re coming from. For a second reference for my reasoning, see this plot here created by Dmitry Kobak:
    https://github.com/dkobak/excess-mortality/blob/main/img/countries.png
    US has a excess / Covid-reported ratio of 1.3, along with many other countries in a comparable range (Italy 1.7, UK 1.5, Spain 1.4), but there are definitely more extreme cases (Russia 6.5! The github repo includes a lot of analysis on this which I’ve only briefly skimmed); so maybe we’re just missing many cases with official counting and these ratios are very noisy to begin with…
    
    Reply ↓
    - JFA on January 31, 2021 3:19 PM at 3:19 pm said:
      
      Weird… using the CDC data on C-19 deaths, for the excess mortality / covid deaths I get 1.04. Using data from the New York Times to get covid death counts, I still only get 1.08. Nowhere near 30%.
    - Dmitry Kobak on February 1, 2021 8:54 AM at 8:54 am said:
      
      Hi. 1.3 in the figure somebody shared above was from the analysis done by The Economist some time ago.
      
      Here is our up-to-date figure from the study Andrew blogged about: https://raw.githubusercontent.com/dkobak/excess-mortality/main/img/all-countries.png
      
      For the US, I get 370k COVID-19 deaths until Jan 10, and excess is 440k. That’s 1.2 ratio. Note that the excess is based on the CDC all-cause mortality time series corrected (“weighted”) for underreporting in recent weeks (which is quite strong in the US). Details on the source in https://github.com/akarlinsky/world_mortality.
    - Andrew on February 1, 2021 9:11 AM at 9:11 am said:
      
      Dmitry:
      
      Is there a mistake in this figure? The caption says something about “Blue” but I don’t see any blue on the graphs.
    - Dmitry Kobak on February 1, 2021 10:10 AM at 10:10 am said:
      
      “Blue” refers to blue numbers in each panel. So e.g. for the US the ratio 1.2 is in blue font.
      
      I’ll edit the caption to clarify.
    - JFA on February 1, 2021 1:31 PM at 1:31 pm said:
      
      The linked picture is for data through Jan. 1 https://github.com/dkobak/excess-mortality/blob/main/img/countries.png, so not that long ago. It also seems like you are using the high estimate for excess deaths when you can also use the low estimate.
      
      I also couldn’t replicate your 440k through Jan. 10. The closest it gets using the high estimate is 435k ending with 12/19/2020.
      
      The CDC’s data also gives 379k for Covid deaths through Jan. 10 (373k through Jan. 03). The low estimate for excess deaths through Jan.10 is 386k. That’s a ratio of ~1.02.
      
      The high estimate is calculated by subtracting the mean of the past years from the observed number. The low estimate is calculated by subtracting the upper bound of the 95% prediction interval from a prediction model from the observed number. I don’t know what estimates other countries provide, but if the countries provide multiple estimates for excess death, you should probably should, too.
    - Dmitry Kobak on February 1, 2021 4:56 PM at 4:56 pm said:
      
      My excess mortality estimate (440k) is the 2020 mortality (importantly, we take “weighted” 2020 data from CDC that accounts for underreporting in recent weeks) minus the baseline mortality which I obtain by taking 2015–19 mortality, fitting a model with linear year effect and categorical week effect, and extrapolating to 2020.
      
      Subtracting the mean of the past years is not really valid because the number of deaths in US was rising over recent years (so this yields an overestimation of excess mortality). Subtracting the upper bound of the prediction interval does not really make sense to me either (that’s an underestimation). I am essentially subtracting the prediction of the prediction model.
    - Andrew on February 1, 2021 4:58 PM at 4:58 pm said:
      
      Dmitry:
      
      OK, you can forget about landing that job at the Hoover Institution. They’ve informed us that the pandemic was over last summer.
    - confused on February 1, 2021 6:48 PM at 6:48 pm said:
      
      >>Subtracting the mean of the past years is not really valid because the number of deaths in US was rising over recent years (so this yields an overestimation of excess mortality)
      
      I have been wondering about this since people started discussing excess deaths; all else being equal, deaths should be rising each year since the US population is both growing and aging on average. Glad to see this was taken into account.
    - JFA on February 2, 2021 8:03 AM at 8:03 am said:
      
      “Subtracting the mean of the past years is not really valid because the number of deaths in US was rising over recent years (so this yields an overestimation of excess mortality)”. That’s certainly true, but I bet using the average with a small upward drift also overestimates excess mortality. I guess the conception of excess death is a little tricky. Let’s say you have a prediction of deaths (with some error). Should you consider any observation that is above the point estimate as excess, even if it falls within your error bars? Standard practice says no.
      
      I get you are taking the weighted 2020 data. That’s what I was looking at in the CDC data. What happens when you subtract the observed from the upper bound on your 95% prediction interval? Given the shifting age profile of the US (boomers are getting really old), have you considered age adjusting. It seems to have some impact if you are using the point estimate as your baseline: https://www.acpjournals.org/doi/10.7326/M20-7385. That paper’s measure of excess deaths after adjusting for age is close to what the lower estimate (using the upper bound of the 95% prediction interval) from the CDC is.
    - Dmitry Kobak on February 2, 2021 3:25 PM at 3:25 pm said:
      
      >> Let’s say you have a prediction of deaths (with some error). Should you consider any observation that is above the point estimate as excess, even if it falls within your error bars? Standard practice says no.
      
      Well, I compute the excess by subtracting the prediction, and compute the uncertainty by taking the predictive variance of the prediction. I double-checked, and for the USA I am currently getting 435k plus minus 12k. So if you want a 95% interval, it would be [411,459]. But the point estimate is 435.
- Nick Adams on January 30, 2021 7:26 PM at 7:26 pm said:
  
  With regard to Australia:
  The 2nd wave mid-year was almost entirely confined to aged-care facilities in one state (Victoria) hence the IFR is more like 5%.
  
  Reply ↓
  - Daniel H. on January 31, 2021 2:19 PM at 2:19 pm said:
    
    Nick: thank you so much for this info! I probably could’ve found this myself if I knew where to look, but it’s really good to know and solved the puzzle for me!
    Cheers, Daniel
    
    Reply ↓
- Kurt Schulzke on January 31, 2021 7:01 AM at 7:01 am said:
  
  I’ve done a deep dive on the CDC excess deaths model for the U.S. and have posted results at https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html.
  
  IMO, zero excess is plausible both for the U.S. as a whole and probable in most states. As my write up and visualizations illustrate, the plausible range of “excess” depends on an array of subjective choices, e.g., temporal and geographic binning, which the CDC’s Tech Notes mostly do not attempt to justify. Why does the CDC model train on only five years of data? (The 1918 flu killed an estimated 6.6 per thousand.) Why does the CDC model not report based on deaths per thousand? And is the Poisson-regression-based Farrington algo binned weekly (as deployed in the R surveillance package) really appropriate for the purpose of measuring excess deaths for long-term policy choices? Most notably, the CDC counts only “positive” excesses, assuming away negatives or deficits. While they attempt to justify this “revenue only” accounting treatment, it’s pretty unconvincing in my view.
  
  The Tri-state area is a huge outlier. NYC and NJ don’t belong in the same data set with the rest of the U.S. All but maybe four of the top 50 weekly high death counts in 2020 occurred in states north of 38 deg latitude. Why is this? In many states (e.g., UT, OR, WA, , I don’t see an excess as even plausible.
  
  Side note: Anyone know where to access MOMO data on all-cause deaths in the EU? I’m finding lots of pretty pictures but no data.
  
  Reply ↓
  - JFA on January 31, 2021 4:02 PM at 4:02 pm said:
    
    This from your post “The brown line visualizes the mean CDC expected deaths produced by the model.” I don’t believe that is correct. I think the line represents the upper bound on the 95% prediction interval.
    
    “While they attempt to justify this “revenue only” accounting treatment, it’s pretty unconvincing in my view.” I don’t believe this has been an issue this year, but subtracting the upper bound threshold of predicted deaths from observed deaths is inappropriate to find any deficit as you would just be over counting. You would want to subtract the observed number from a lower bound estimate on predicted deaths (which is not provided in the CDC dataset).
    
    Reply ↓
  - Anoneuoid on January 31, 2021 4:40 PM at 4:40 pm said:
    
    Very interesting. You can get monthly mortality data by age going back to 1959 from here:
    https://data.nber.org/mortality/
    http://www2.nber.org/data/vital-statistics-mortality-data-multiple-cause-of-death.html
    
    Looks like this (don’t know/remember why the chart is only going to 1995 there):
    https://www.docdroid.net/Mt4AUcA/momort-pdf
    
    Some other links with historical population, births, deaths, etc data:
    https://pastebin.com/bvbz51RR
    
    I used this for a model of measles years ago so some of those links may be dead, but you should be able to find at least annual mortality binned in to ten-year age groups going back to 1900.
    
    Here is the csv I had for population going back to 1900:
    https://pastebin.com/8sK2AVeU
    
    And here is deaths/pop by binned ages going back to 1900:
    https://pastebin.com/CVdD5d4v
    
    Maybe that will help.
    
    Reply ↓
    - Anoneuoid on January 31, 2021 4:55 PM at 4:55 pm said:
      
      Forgot to share the monthly mortality from NBER: https://pastebin.com/1K9sHPQ2
      
      Once again, only goes to 1995 for some reason even though more recent data is available at that site.
Chebyshev on January 30, 2021 9:19 AM at 9:19 am said:

This is a pretty good analysis of the UK mortality data.
https://architectsforsocialhousing.co.uk/2021/01/27/lies-damned-lies-and-statistics-manufacturing-the-crisis/

Reply ↓
- jerome on January 30, 2021 10:54 AM at 10:54 am said:
  
  Chebyshev — that’s a very lengthy and loosely focused reference; what relevant conclusion did you discern from it?
  
  And the problem with the subject Karlinsky/Kobak dataset paper here is that we don’t know how accurate their raw data was or if they presented it correctly.
  
  Reply ↓
- Dale Lehman on January 30, 2021 11:06 AM at 11:06 am said:
  
  It appears to be a very thorough analysis – but in the spirit of Andrew’s latest post, I am far from convinced. The details are important and that analysis does delve into the details – what types of deaths are reported and how have they varied, how are they measured, a long time series of data is used, and many other notable features. But the author’s bias is readily apparent, and I think he fails to follow some of the advice Andrew provides in the subsequent post.
  
  I don’t have the time or energy to refute the analysis, but I did use the mortality data provided in this post. I simply fit a time series model to the UK data from 2015 through the end of 2019 and used this to forecast the percentage excess deaths for 2020 (the first 51 weeks of which are in the linked data set above). A 90% confidence interval for the percent excess deaths (beyond what is forecast from the time series model) includes zero for the first 3 months of 2020. Then it markedly increases, reaching over 50% by April and not returning to zero until the end of May. The confidence interval becomes positive once again for the Sept – end of year time frame.
  
  I don’t propose this to be a definitive analysis and I am not including many of the features that the author of that UK analysis did. But somehow that analysis seems at odd with a simpler look at the data – I feel like he buried the lead. A better, and more convincing in my mind, analysis would begin with the overview of what excess deaths look like over time, highlighting the COVID time period, and then breaking down what is contributing to those figures. Some measure of the variability should also be used to highlight how 2020 differs, or does not, from earlier time periods. Instead, there was too much emphasis on the “manufactured crisis.” I would propose this as a case in point for the subsequent post on “how we’re duped by data and how we can do better.”
  
  Reply ↓
- Ariel Karlinsky on January 30, 2021 12:42 PM at 12:42 pm said:
  
  I skimmed this analysis, and I disagree. We see an incredible tracking in the UK (and many other places) between COVID deaths and excess deaths. So much so that even during lockdowns and other measures, when COVID deaths were low, excess deaths were low as well. See for example this figure that tracks them using our dataset: https://raw.githubusercontent.com/akarlinsky/covid-excess-mortality-tracking/main/United%20Kingdom_excess_mortality_vs_covid.png?token=ALRQ66KBEVMRWDZTXXMKJWDAD3KLY
  
  Reply ↓
- dhogaza on January 30, 2021 2:04 PM at 2:04 pm said:
  
  “On 20 April, the World Health Organisation (WHO) issued the ‘International guidelines for certification and classification (coding) of COVID-19 as cause of death’. These instructed medical practitioners that, if COVID-19 is the ‘suspected’ or ‘probable’ or ‘assumed’ cause of death, it must always be recorded, in Part 1 of the death certificate, as the ‘underlying cause’ of death. In contrast, co-morbidities such as cancer, heart disease, dementia, diabetes or chronic respiratory infections other than COVID-19 should only be recorded in Part 2 of the death certificate as a ‘contributing’ cause.”
  
  Think about this a bit. He’s complaining that if someone has a comorbidity that it is wrong to list covid as the cause of death, with the comorbidity being a contributing cause.
  
  Back in April, a friend in the UK, while in the hospital being diagnosed for cancer and being given initial treatment, caught covid and died. The “Architects for Social Housing” author seems to believe that he should not have been listed as having died of covid.
  
  This is simply wrong.
  
  As to the WHO document, it says:
  
  “2. DEFINITION FOR DEATHS DUE TO COVID-19
  A death due to COVID-19 is defined for surveillance purposes as a death resulting from a clinically compatible illness, in a probable or confirmed COVID-19 case, unless there is a clear alternative cause of death that cannot be related to COVID disease (e.g. trauma). There should be no period of complete recovery from COVID-19 between illness and death.
  
  A death due to COVID-19 may not be attributed to another disease (e.g. cancer) and should be
  counted independently of preexisting conditions that are suspected of triggering a severe course of COVID-19”
  
  Nothing wrong with this at all. Especially early on, with limited testing available, diagnosis was often made due to clinical observations of the symptoms and course of the disease. Diabetes (for instance) doesn’t cause symptoms clinically compatible with covid, so if someone with diabetes dies of a disease which is clinically compatible with covid it would be wrong to list diabetes as the primary cause of death, rather than a contributing factor.
  
  TL;DR the author’s a covid denialist. And Architects for Social Housing doesn’t sound like a medical journal to me, either.
  
  Reply ↓
- Nick Adams on January 30, 2021 8:00 PM at 8:00 pm said:
  
  Australian data contradicts the conclusion of this article.
  1. Widely available PCR testing meant that almost all deaths attributed to Covid were PCR positive (e.g. 674/682 Covid-attributed deaths Jan-August had a positive PCR). It is reasonable to assume that the confirmed Covid infections were a contributory factor, if not the main factor, causing the deaths.
  2. Prolonged and severe lockdowns were undertaken throughout the year in Victoria and NSW, the 2 states that contributed most of the Covid cases and deaths.
  3. Despite this, non-Covid related deaths were less than or equal to expected in 2020 compared to previous years. This includes deaths due to cancer, heart disease and suicide. This is reflected in the total mortality rate for 2020 which was 6% less than expected despite the excess Covid deaths.
  
  From the above it appears that lockdowns and reduced access to medical care did not cause excess deaths. What lockdowns did do was to stop the epidemic – only a handful of community acquired cases have occurred Australia wide in the last few months, all due to leakage from quarantined overseas visitors. These cases were prevented from spreading due to robust testing and tracing, and quarantine of all contacts.
  It’s a bit late for most of the rest of the world but if you want to stop an infectious disease epidemic it’s pretty clear what needs to be done.
  Those who claim that lockdowns are ineffective and harmful are deluded.
  
  Reply ↓
  - confused on January 31, 2021 5:22 AM at 5:22 am said:
    
    I will generally agree, in principle anyway — but it doesn’t seem that any nation that isn’t an island or effective-island like South Korea has had success comparable to Australia, NZ, Iceland or Taiwan. Even Canada and Germany are much closer to the US or UK than Australia or NZ.
    
    (Except possibly China, and I don’t trust their numbers — elsewhere e.g. Singapore we have seen outbreaks in ‘marginalized’ communities even where measures were otherwise strong, do we really believe that say the Uyghurs are being tested for COVID? And even if their numbers are true, that model couldn’t be followed in Western democracies.)
    
    So I agree that strong measures obviously give much better results *when they work*.
    
    But I think there may be a fairly plausible argument that by mid-March when places like the US realized they had a problem it was already too late for measures to accomplish very much. I think the US has gotten some marginal benefit by pushing infections later, to a time when treatment was better understood and therefore IFR a bit lower… but I am not sure how large of a benefit this really is.
    
    Reply ↓
  - Marc E on February 7, 2021 5:58 PM at 5:58 pm said:
    
    The rather insultingly worded conclusion does not follow. In fact, until the last paragraph, I thought Nick Adams was making an argument about overly broadly defined coronavirus mortality and the failure of lockdowns.
    
    1. Most PCR tests in the US, the EU, and Australia are of the 40 cycles variety, with the result that they are overly sensitive (many raised an alarm earlier in 2020, but unfortunately the rapid politicization of the pandemic drowned out such voices (https://www.nytimes.com/2020/08/29/health/coronavirus-testing.html). This is not the first time PCR tests have created faux epidemic panics: https://www.nytimes.com/2007/01/22/health/22whoop.html If you die of a blood cloth in an elder-care facility in Victoria and your PCR test is positive, you are likely to be counted as a Covid-19 victim, regardless of the lack of corresponding symptoms.
    
    2. Yes, Victoria and NSW contributed the bulk of the official Covid-19 deaths, just like the Tri-State area contributed disproportionately to official US Covid-19 deaths. States with much laxer policies generally fared better, including Florida, despite its much older population. Same for Sweden, which has no notable excess mortality for 2020, just as predicted by its much-maligned health authorities.
    
    3. So, Australia is just like Sweden, or Belarus, except that some of its states resorted to unprecedented, draconian methods of curtailing civil liberties. And except that Australia is an island. Sure, blame it all on visiting foreigners, just like China does.
    
    Add to this the fact that the fear promoted by political opportunists and much of the media caused drops as high as 40% in emergency room visits for heart attacks and strokes, and postponed cancer tests and treatments, among other disruptions.
    
    So, no. It does not follow that “lockdowns stopped the epidemic” or that fear and lockdowns did not cause preventable deaths. Speaking of “harmful deluded.”
    
    Reply ↓
Dmitry Kobak on February 1, 2021 10:12 AM at 10:12 am said:

Hi Andrew, to directly answer this: “I’d think this data would be already available through the World Health Organization or something?” — the monthly death numbers are in principle available in the United Nations database (UNdata), but with huge reporting lags. There are almost no data for 2020 there at this point.

Reply ↓
Vic on August 7, 2021 9:47 AM at 9:47 am said:

well, the usual questions arise immediately — what’s the margin of error in “this data” and what are common sources of error in such a huge measurement/compilation?

Reply ↓
- Ariel Karlinsky on August 7, 2021 10:15 AM at 10:15 am said:
  
  This is detailed in our paper, here: https://doi.org/10.7554/eLife.69336. See Data limitations and caveats section.
  
  If something’s still unclear, ask here and I’ll respond.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Mortality data 2015-2020 around the world

34 thoughts on “Mortality data 2015-2020 around the world”

Leave a Reply Cancel reply