## Number of deaths or number of deaths per capita

Pablo Haya writes:

Currently, there is a lot of data analysis in the news media showing multiple aspects of the COVID-19 crisis. Many of them compare the virus spread and evolution between different countries, or between different regions within each country. They use to compare the absolute frequency of several metrics such as confirmed cases or deaths, or they normalize these metrics by population. In this case, which of these alternatives is more convenient for making a fair comparison?

According to John Burn-Murdoch, from Financial Times, adjusting for population size is a bad idea as COVID-19 is a transmissible disease, and when you normalize, higher per-capita numbers just mean smaller country, not anything different about how that country’s dealing with COVID-19.

What do you think?

My reply: When comparing countries, I think it makes sense to look at per-capita rates, not absolute numbers, but as Murdoch points out in the linked thread, you could also compare regions within a country. The disease does not respect political boundaries (except to the extent that people are not traveling between countries, but then they might not be traveling between regions within countries either). If you’re comparing rates of growth on the log scale, then it doesn’t matter whether or not you divide by population, because that’s essentially constant over time. Murdoch is right that if you make a graph, starting the time axis at the first death within a country, then per-capita comparisons can be misleading. But if you start the time axis at a constant per-capita rate, the denominators will again cancel out. I guess the general point is that there’s no absolute right or wrong way to make these plots: different graphs show you different aspects of the data.

1. Ed Hagen says:

Carl Bergstrom has an interesting thread on this:

2. Tom Passin says:

Partly what to look at depends on what kind of thing you want to examine. If you want to know how large a part of its population a country has (or has not) protected, and compare with other countries, you would want to look at a per-capita basis. But that probably won’t be very helpful until near the end of the pandemic, when the cumulative numbers have settled down. In the early stages, similar to what Bergstrom says in the Twitter thread linked to by Hagen above, the curves will be driven by local spread of infections. One person may infect a dozen others, say, and they end up infecting hundreds in their local area a few days later. You can see this effect in the data from many countries, including the US. But at first, it’s likely to be just their local area. So a per capita country calculation would be very misleading.

For comparing how effectively different countries have been able to contain the growth of the disease, I’ve been finding that the fractional growth rates vs time seem to very helpful, especially presented semi-logarithmically. In other words, you may see the growth rate to be 30%/day at one time and 10%/day two weeks later. Depending on the noise, of course, that would show a real and large improvement.

Also, in some data sets (Hopkins for the US, for example), the fractional death rate seems to lag the fractional incidence rate by 7 – 9 days. But some fluctuations are pretty well synchronous between the incidence and death rates. That would suggest data reporting anomalies, such as catching up on the records after a busy weekend, etc, or a change in the reporting standards.

• Arthur says:

@Tom Passin: “depends on what kind of thing you want to examine”

yeah — what question do want answered?

” # of Deaths ” gives an obvious estimate of the magnitude of the problem, if geographical — that’s very useful in deciding what action is necessary (if any) and how much action (cost vs benefit).

U.S. politicians decided that a 20% cut in GDP and a huge upheaval in the daily lives of 90% of the the American population was a very worthwhile cost to perhaps save some thousands of lives from Covid-19.
Current Covid-19 death estimates are about 60,000 this year (not unusual for any seasonal Flu).

No such drastic action was even considered in the 2017-18 seasonal Flu season (80,000 U.S deaths), nor the 1957-58 Asian Flu pandemic (116,000 U.S. deaths)

• Tom Passin says:

I suppose this post was a troll, but I’m going to say something about it anyway. @Arthur:

“U.S. politicians decided that a 20% cut in GDP and a huge upheaval in the daily lives of 90% of the the American population was a very worthwhile cost to perhaps save some thousands of lives from Covid-19.

Current Covid-19 death estimates are about 60,000 this year (not unusual for any seasonal Flu)”

It’s not about current estimates. It’s about estimates of what the numbers would be if no countermeasures were taken. Comparing covid-19 with the 1918 influenza, covid apparently is more infectious and has a rather higher fatality rate. So there was good reason to worry.

Looking at data for covid from various countries, initial growth spikes of more than 100%/day look common. If that rate, or anything close to it, had continued, it wouldn’t have been a matter of “some thousands of lives”.

• NF says:

Beat me to it. The countermeasures surely reduced and slowed transmission.

Also, FYI, some measures were taken during the Asian Flu pandemic. My father, who contracted that flu, very specifically remembers having school closed for a week or two to try to slow the spread among children. (Of course, this was easier on the economy back then since far fewer mothers worked.)

• Curious says:

Arthur:

Where are you getting that number for deaths attributed to influenza for the 2017-18 season? Am I pulling the wrong data from the CDC? Because the number I see is 15,620.

Season Influenza_Deaths
2015-16 3448
2016-17 6954
2017-18 15620
2018-19 7171
2019-20 6905

• Joe says:

I feel like per capita is a useful measure when population numbers effectively mirror density measures (pop/sq km or mile) but fall apart when they don’t (and was a personal quibble of mine with the generality of some of the agent based modelling). The only place I really think per cap rates start to fall apart when comparing to other countries are countries like the US, Brasil, and Russia (to some extent). It’s not just that people are more spread out in the US compared to other countries, but there’s pretty huge variation in population densities across regions even when populations are relatively homogenous. I’d much rather compare cases (or deaths or hospitalizations) per population density than per capita population across countries (and regions, if you could get that data)

3. In my analysis, I’m not using time series but plotting daily count (on y-axis) and cumulative total (on x-axis). Then I apply local regression (loess) smooth line.

Comparing between time series and chart with loess smooth line, the latter provides a leading indicator as to the trend – trending up or trending down.

When crisis decelerates, time series graph will form “s” curve, eventually. However, loess smooth line “n” curve will tell whether crisis is slowing down or otherwise.

Under imaginary situation when counter hits zero, time series graph will form “s” curve and loess smooth line will form “n” curve as seen here https://rizami.com/covid-19-best-case-scenario-assumption/

Likewise, certain countries have already shown the tell-tale signs of “n” curve, despite “s” curve is still a straight line as seen here https://rizami.com/covid-19/

Appreciate for a feedback, thanks.

4. Ron Kenett says:

Come on – these aggregate measures need to be calibrated to make any sense.
They are apparently affected by:
– age distribution
– population density
– pre Corona death causes
– etc

• Andrew says:

Ron:

Oh, yes, definitely. I raised this issue in my post from last month, “Estimating the mortality rate from corona?” We can understand country totals as averages over subpopulations.

• Tom Passin says:

In a situation like the covid pandemic, you will get lost in the weeds if you try to disaggregate all these factors. That would be appropriate later on, for learning things about specific kinds of populations, but at this time, it’s not going to be either of much help nor is it even going to be feasible, because of the amount and quality of the data available.

5. Dale Lehman says:

I’m starting to think that comparing countries is a mistake. Yes, you need to look at a rate (e.g. per million of population), but what do you use as the denominator: China, Wuhan, Hebei, ….? So, comparing rates across countries totally depends on what you think the relevant population is. What I think is more promising is to compare the shapes of the diffusion curves across countries and not focusing on the absolute levels (which will vary by orders of magnitude even using rates). Some countries/locations seem to show the rate of deaths (measured in various ways, but preferably using some type of moving average to smooth out the numerous data quality issues) peaking and sharply declining, and these places appear to be the places that put the most severe mitigation in place (Korea, China, Italy). Others have not peaked yet and have used less severe mitigation (which includes testing, quarantining, tracking – the US, UK for examples). Eventually we might have enough data to estimate the cross-geography differences in the mortality rate, but I don’t think we are there yet. But perhaps the more important question concerns the evidence of how much the curve has been shifted down – this may be directly related to the types of mitigation that are used, and that is a policy-relevant question.

• Phil says:

I’ve just been thinking about this stuff with an eye towards forecasting what will happen here in the US by looking at what has been happening in countries that are ahead of us in their response. For instance, the US log mortality curve — log(cumulative deaths) vs time — has been flattening for a week. How can we extrapolate? One thought is to look at Italy: when their curve was at the same slope ours is now, they had about 10k deaths. That was about two weeks ago. Now they have about 22k and still flattening; it looks like 25k or so is where they’re likely to be in another two weeks. So that’s a factor of 2.5, once they had achieved an 8-day doubling time. The U.S. is at 26k, so that would put us at 65k total. That’s a calculation that can be used to derive per capita numbers, but doesn’t use per capita numbers in the extrapolation, obviously. It does assume that the social distancing measures work about the same in Italy and the U.S.

I’m not sure what to think of the coronavirus mortality undercount in this context. Both Italy _and_ the US have an undercount, and there’s no reason to believe it’s the same percentage in both countries. But I think that if the official coronavirus deaths flatten out, the unofficial ones are probably doing the same, so maybe the extrapolation still works; you’d just need to say “That 65k U.S. total needs to be adjusted upwards by x% due to undercount.”

• Brent Hutto says:

The much-derided IMHE model is basically a slightly elaborated form of “…forecasting what will happen here in the US by looking at what has been happening in countries that are ahead of us…”.

And FWIW they come up with something very close to “65K total”. Go figure!

6. Jon Johnson says:

Wouldn’t population density be important?

7. zbicyclist says:

“when you normalize, higher per-capita numbers just mean smaller country”

As with almost ANY geographic rate, the smaller entities will TEND to be either at the high end or the low end.

This is, in part, a sample size effect. But the bigger effect is that smaller entities are more likely to be homogeneous. For example, New York City has a variety of neighborhoods that vary from rich to poor, but a lot of the suburbs are either rich, or poor.

8. Morris39 says:

Not a member of your union but lurk here b/c the discussion is often interesting (and always polite). A question if I may. Why not compare life-years lost (not deaths) given some care to allow for testing rates? Or compare year over year statistics on that basis? Some axiom in statistics or ethics?

• Martha (Smith) says:

Life-years lost does make sense to me. But it may be that different measures are more useful for different purposes, which suggests reporting more than one measure.

• elin says:

I’m sure that over time people will be doing all of these things to understand what has happened and make projections. There are many different purposes for analysis of data about diseases and their impact and they will be used in many articles and reports to come.

9. Bob says:

Murdoch is a hack, plain and simple. He’s there to push the FTs editorial line on the crisis, and plays constant games with his charts to achieve this. The people who want to believe it will happily retweet him, and now they’re right because the FT says so.

Look at his regression ‘analysis’ about why population density doesn’t matter but ‘lockdown timing’ does. This is all to avoid acknowledging that London may have a higher death rate than say Dublin, because it is a fundamentally different type of city. Instead, they want to push the narrative that the UK government ‘locked down too late’.

Previously he’s claimed that the UK was on a ‘steeper trajectory’ than Italy, when his own chart contradicted this. A couple of days ago he claimed the London death toll was accelerating when his own chart showed this was not true.

One day he switched from a moving average to a spline function, giving a tortuous explanation why this was a good idea. He switched back the next day.

10. RudyB says:

In your book ‘Statistical Methods in Spatial Epidemiology’ you mention the nuisance parameter (usually population) and that it cannot be uncertain (pp 313). This means, that we humans move, and cancel out all certainty of location as scalar. The tensor nature of movement requires quantities like transport flows (air travel, cars etc) that capture the true quantities, not density — a rather weak, scalar quantity. Gould and Wallace (1994) (geographer and physicist duo) https://www.tandfonline.com/doi/abs/10.1080/04353684.1994.11879669 had an excellent exposition of the AIDS pandemic and how it should’ve been measured based on transport topology and spatial tensors instead of temporal and aspatial vectors … Rich countries suffer more due to more transport than poor countries, hence the higher evidence of COVID19 in developed countries.

• Martha (Smith) says:

Intersting.

• Martha (Smith) says:

Oops — my eye caught the typo right after I pressed Submit — Should be “interesting”, not “intersting” (which sounds like either getting stung by a bee right between the eyes, or a sting operation by Interpol}.