Follow-up on yesterday’s posts: some maps are less misleading than others.

Yesterday I complained about the New York Times coronavirus maps showing sparsely-populated areas as having a case rate very close to zero, no matter what the actual rate is. Today the Times has a story about the fact that the rate in rural areas is higher than in more densely populated areas, and they have maps that show the rate in sparsely populated areas! 

I’m not sure what is going on with these choices. It does make sense to me to show only rural areas if you are doing a story on the case rate in rural areas, and it would make sense to me to show only urban areas if you were doing a story on the case rate in urban areas, but neither of these make sense to me as a country-wide default. (It’s also a bit strange to me that they changed the scale, showing average cases per million on the new plot with numbers up to about 800; while showing average cases per 100,000 on the other plot, with numbers up to about 64, which is 640 per million. These are not wildly different and could work fine on the same scale.)

I could imagine leaving some areas blank if there are literally no permanent residents there — National Wilderness and National Forest, for instance — but if they are going to do that, they should not use the same color for ‘zero population density’ that they use for ‘zero coronavirus case rate’. These mean different things. That’s what I really dislike about the other plot: the same color is used for low-population areas, independent of the rate. Everywhere else on the map the color means “rate”, and then there are these huge sections where they color means “population density.”  On this one, at least they use different colors for the places where they aren’t showing us the data (white) and where the rate is low (gray). So, of the two, this one is better. But I think they should just combine the two plots. 

22 thoughts on “Follow-up on yesterday’s posts: some maps are less misleading than others.

  1. I think this map is fine too.

    Separating the rural counties is worthwhile because a small number of cases in a very low population county skews the case rate high when the number of cases is small.

    I do think it would be worthwhile to show a pair of maps for both rural and urban, with the case rate on one map and the total count on the other.

    “Everywhere else on the map the color means “rate”, and then there are these huge sections where they color means “population density.” ”

    Everywhere else on the map means “below threshold” or “small n”.

    • “Separating the rural counties is worthwhile because a small number of cases in a very low population county skews the case rate high when the number of cases is small.”

      Yes, that’s right, that’s the issue that was behind the “All maps of parameter estimates are misleading” paper. But there are a bunch of ways one could choose to handle that besides plotting all of those high values as “zero or low.” One possibility is to aggregate low-population areas into areas that combine to have large enough sample size. That’s what they have done to make this map, it appears: here they are showing counties, not portions of counties. I think that’s a fine solution.

      It is impossible to make a map like this that is free of artifacts. There are going to have to be compromises. I just think the compromise on the default map that I complained about is terrible. Plotting places with very high rates using the same color that means very low rate, that’s just really bad.

      I know it doesn’t bother you, jim, and you are entitled to not be bothered by it.

    • Especially with map visualisations I feel no matter what choices one makes there’s always going to be people that complain!

      What one likes is just so damn subjective.

        • Only objective measure I can think of is to put both versions of the graph on two groups of a GRE style comprehension test and compare how well people can respond correctly to derivative questions.

          I doubt pure argument can resolve these preference questions.

        • Rahul,
          Yes! If a large fraction of your intended audience misinterprets your map in important ways, it’s a bad map. This is true no matter how much explanatory text you add and what disclaimers you put on it.

          Of course, maps do get used for multiple reasons and it’s possible for a map to be good for some purposes and not for others.

          With the default New York Times map, they’ve chosen to use the same color to display low case rates as to display case rates (no matter how high) in areas with low population density. The result is a map in which huge areas of the country are technically of unknown case rate, but the legend labels them as “low or no.” One thing that’s obvious is that nobody, in the intended audience or any other audience, can answer questions about the case rate in those low-population areas: it’s simply not shown on the map. At best that’s an unnecessary restriction: as the map included in this post illustrates, it is possible to display the rates in those areas. But also, I strongly suspect that if you ask most NY Times readers to answer some questions about, say, whether the COVID case rate is higher in rural or urban areas of the state, based on the default map (the one in the previous post, not this one) most of them will give the wrong answer that the rate is lower in the rural areas; after all, those are the ones where the legend says the rate is “zero or low.”

        • > If a large fraction of your intended audience misinterprets your map in important way, it’s a bad map.

          Is there any indication that a large fraction of the audience misinterprets this map in a more important way than the alternative?

          > One thing that’s obvious is that nobody, in the intended audience or any other audience, can answer questions about the case rate in those low-population areas: it’s simply not shown on the map. At best that’s an unnecessary restriction: as the map included in this post illustrates, it is possible to display the rates in those areas.

          Something else that it’s quite obvious is that you’re wrong. That map does not display the rates in those areas, it display the rates in the counties where those areas happen to be. Nobody can answer questions about the case rate in those low-population areas: that information is simply not shown on the map. (Unless there are counties without any population nucleus shown in the map, which doesn’t seem to be the case.)

        • Carlos,
          “Is there any evidence that a large fraction of the audience misinterprets this map [the one that shows ‘zero or low’ in much of the country] in a more important way than the alternative”?

          I think there is. One piece of evidence is the fact that the NYT felt the need to make a new map in order to talk about the rate in rural areas. If the original map worked for this, they wouldn’t have had to make the new map.

          Another piece of evidence is my own experience. For weeks I saw those big areas listed as “zero or low” and assumed the rate in those areas really was low; after all, we knew cities had been hit hardest and first, and it seemed possible to me that these areas really did have low rates of infection or at least diagnosis. I am admittedly not representative of the intended audience of the map, but in the way that matters here I am a _more_ sophisticated consumer of statistical graphics. If I was misled, I think it’s safe to say many, many other people were as well. (It’s only a couple of weeks ago that I realized that something must be ‘wrong’ with the maps in those areas).

          You say “Something else that it’s quite obvious is that you’re wrong. That map does not display the rates in those areas, it display the rates in the counties where those areas happen to be. Nobody can answer questions about the case rate in those low-population areas: that information is simply not shown on the map.” Exactly! That is what I was trying to point out when I said “nobody, in the intended audience or any other audience, can answer questions about the case rate in those low-population areas: it’s simply not shown on the map.” That is my point. Nowhere on the map or its legend is there an indication that those rates are not shown on the map; the legend simply lists them as “zero or low cases.” Which is not true. In fact there are a lot of cases which is why they created this additional map — the one shown in this post — to show that the rate in many of those areas is quite high…so high, in fact, that you argue in a separate comment that it’s a good thing they changed the scale from that on the original map because otherwise it wouldn’t be clear how high some of these areas are!

          I have to say, I’m kind of flabbergasted to get pushback, even from just two or three people, against the idea that plotting large areas of the country as “zero or low,” when in fact they are very high, is misleading!

        • You’re missing my point entirely!

          The map shown in this post where the whole county is colored in whatever color corresponds to the county doesn’t tell you anything about the rates in the low-density areas which were not painted in the map shown in the previos post (only the populated areas of the county where colored in whatever color corresponded to the county). The map shown in this post doesn’t contain more information [*] than the other map (if anything it contains less information).

          [*] apart from the change in scale, which has nothing to do with the choice of painting or not painting the low-density areas of the county

        • Carlos,
          So you think the original map includes all of the data from the county, both rural and non-urban, but displays it all at the little dots representing densely populated areas? I don’t think that’s the case; for one thing, you pointed out yourself that this map — the one that excludes metro areas — uses (and presumably requires) a different color scale to handle the larger values. I think the data from sparsely populated areas were literally not included on the previous map, but are included when making this one.

          Even if everyone in a county lives in one town, I think it’s a mistake to plot the data at the town level in sparsely populated counties. In some counties the original map has dots only a few pixels on a side. At the very least there should be some minimum size for the dots…but also, in no circumstance should they use the same color to use both “low population” and “low case rate”, as they do on the standard map. On the map in this post they at least use white and gray to distinguish these.

          In order to make their standard map, they need data at the town level; in order to make the present map they need data at the county level. So why not color the county with the county-average color, and each town with the town’s color; that is, combine the two maps. Like all maps that would have shortcomings, but at least you wouldn’t have a huge area of the country labeled with “Few or no cases” when in fact the case rate in those areas is as high as anywhere. I guess I’m just repeating myself with that last point, but, as I said, I am surprised to find disagreement about that proposition.

        • > So you think the original map includes all of the data from the county

          Phil, I really don’t see much space for debate.

          Take Clallam County, WA in the north-west corner of the map. It shows four disconnected areas, all with the same color. Do you think it’s a coincidence?

          Is it a concidence that in every county with multiple colored areas (and there are many of them) a single color is used?

          > In order to make their standard map, they need data at the town level;

          Clearly they don’t, as they somehow managed to create the map using COVID data at they county level which is what they have in general.

        • I hadn’t noticed that every city in a county is colored the same! For crying out loud, then why not color the whole county the same? This makes the original map even worse, or at least less justifiable!

  2. > (It’s also a bit strange to me that they changed the scale, showing average cases per million on the new plot with numbers up to about 800; while showing average cases per 100,000 on the other plot, with numbers up to about 64, which is 640 per million. These are not wildly different and could work fine on the same scale.)

    With one scale the darker value is 56+. Over 80% of the counties in North Dakota or South Dakota, for example, would fall in that bin.

    With the other scale we may get a more detailed picture as those counties in the Dakotas may fall in three separate buckets: 50-60, 60-70 and 70+.

  3. I like this map, but is there zip code version? Arizona is definitely not mostly urban. Unlike Texas and states east of Texas, the counties out west tend to be quite large. This map makes Arizona look mostly urban because, for example, Pima County contains the Tucson metro area and is therefore excluded, yet Pima County is mostly desert. I suspect the arrow for “Metro areas are not shown” points to a part of California where there are no metro areas.

  4. It seems that metro areas are counties that have a town with more than 50,000 people in them, which means that some of the counties shown as metro will have lower population densities than some of the counties shown as rural. I drive an hour and a half to go grocery shopping, but by this map I’m in a metro area. This seems like a problem, given the subject of the map. Like most kinds of categories, mapping categories reflect arbitrary decisions that may make sense for some addressing some questions, but not others.

    • Yeah, official Census Bureau definitions of metro area / nonmetro / rural often don’t line up with what I would expect those terms to mean!

      They tend to consider rather small cities as “metros” (Cheyenne, Wyoming for example) and to draw metro area boundaries very widely, including some very rural areas (the Dallas-Fort Worth metro area includes a lot of counties, but the actual urban area is mostly just Dallas + Tarrant + Rockwall + southern halves of Collin and Denton counties).

  5. Looking at numbers from the Dakotas, and the fact they have few measures in place, I wonder if they are going to see a truly ‘uncontained’ epidemic that actually does go to herd immunity?

    I really wonder what % of infections they are detecting. South Dakota has confirmed cases over 4% of their population – if they detect 1 in 5, that’s 20%…

Leave a Reply to Carlos Ungil Cancel reply

Your email address will not be published. Required fields are marked *