“1.7%” ha ha ha

Jordan Ellenberg writes:

Lots of people sharing this today.

Isn’t this exactly the kind of situation where they should have done some kind of shrinkage towards the national mean, as in that thing you wrote about kidney cancer rates by county? i.e. you see, just as you might expect, the extreme values of “proportion of people who said they were gay” are disproportionately taken by small states.

My reply:

If I don’t have the individual-level survey data that would allow me to do full-scale Mister P, yes, I’d fit a multilevel model to the state-level averages. I wouldn’t quite just partially pool toward the national mean; I think it would make sense to include some state-level predictors.

In any case, I think it’s tacky to report poll numbers to fractional percentage points. That kind of precision simply isn’t there.

P.S. More discussion of variances of large and small states in the comments.

12 thoughts on ““1.7%” ha ha ha

  1. Reminds me of an old joke: Why do economists add numbers after the decimal point in their forecasts? To show they have a sense of humor.

  2. Clicking through on the Mister P link, I have to say that the gray scales on Kastellec, Lax, and Phillips’ Figure 2 leave something to be desired. Setting the scale based on relative popularity is misleading. The map graphics imply that Bork and Souter were comparably loved in UT and comparably loathed in NY. Seemed fishy on visual inspection and, sure enough, the figure that follows indicates that wasn’t the case – in absolute terms at least. Similarly, the graphic suggests that O’Connor was loved in MS but loathed in AL but the histogram in the following figure shows she was uniformly popular. Not a huge deal, just a graphsmanship thing.

    My recollection is that Gallup did a lousy job forecasting last November’s election. That makes me a bit skeptical of anything numbers they report.

  3. “In any case, I think it’s tacky to report poll numbers to fractional percentage points.”

    Total national sample size is 206,000. The overall national figure is 3.5%, so I’d rather not have that reported as rounded up to 4% or down to 3% (depending upon what the second decimal is). A fair amount of information would be lost from rounding.

    A lot of the states at the top of the list look like gay retirement / downscaling destinations: lower cost, quieter places for homosexuals to head for when the bright lights of San Francisco (Oregon, Nevada), Los Angeles (Hawaii, Nevada), and New York (Vermont, Maine) start to lose their luster.

    • Steve:

      Actually, I’d be happy with 3% or 4% but, sure, 3.5% is not so horrible given that it’s a single number and it happens to be right in between two integers. I don’t see any excuse to using the decimal place to distinguish between the 50 states, but the extra significant figure for the national level is certainly defensible.

      Regarding your second point, it would be possible for Gallup to break down the data by age (and also, of course, following the recommendation to use MRP to get better numbers, which will become more of an issue once the data start getting subdivided).

  4. “i.e. you see, just as you might expect, the extreme values of “proportion of people who said they were gay” are disproportionately taken by small states.”

    Yes, but you’d also expect small states to be disproportionately affected by gay migrations in or out. If X number of gays decide to retire to a particular state, for example, it’s going to affect, say, Hawaii more than Florida.

    For example, I noticed 30 years ago that a lot of entertainment stars who were rumored to be gay, such as Jim Nabors and Richard Chamberlain, moved to Hawaii. Sure enough, all these years later, Hawaii comes in first out of the 50 states. Coincidence? Maybe. Maybe not.

    • Yes, this issue actually comes up with multilevel modeling in general. To what extent do we expect the theta_j’s (the underlying state-level parameters, in this case) to vary more for small states, which is a separate issue than the sampling variation. In general, I’d expect the theta_j’s to vary more for small states, but not by as much as the sampling variation will vary. John Boscardin and I considered this a bit in our 1996 article, also in my 1998 paper with Gary King. For MRP we typically do not actually include this systematic variation in the variance, as I don’t think it ultimately makes that much of a difference in the results, but it could be included if it were of interest.

      One way of saying this is: California has (approximately) 47 times the population of Rhode Island. If California were 47 independent “Rhode Islands,” you’d expect its standard deviation in just about anything to be 1/7 that of the smaller state. Of course, California and other large states are not so close to the national mean. This can be explained by two factors: first, the 49 (hypothetical) pieces of California are not independent. In some sense, California is only 2 or 3 or 4 “states,” not 49. Second, California and Rhode Island are both affected by the same national trends. Katz, Tuerlinckx, and I present some mathematical models for this in our (underappreciated) 2002 paper.

      • “In some sense, California is only 2 or 3 or 4 “states,” not 49.”

        When it comes to sexual orientation among adult residents, geographic variation can be extreme. I was recently driving around the Greater Palm Springs area in Riverside County, which consists of about a dozen municipalities a couple of hours east of Los Angeles. The municipality of Palm Springs, the original core of the area, is now largely a retirement community for Southern California gays. Other municipalities are dominated by retired straight couples interested in golf or tennis, or by young families looking for a cheaper cost of housing than in more coastal areas.

        • Just to be clear, I’m not saying that California has only 2 or 3 or 5 types of places. I’m comparing California’s variation to that of a small state such as Rhode Island, which of course will also have its own internal geographic diversity.

  5. My general feeling on precision is that the precision should be slightly more than justified by the uncertainty.

    3.5 +-0.2 makes sense, but 3.5 +- 1.1 also makes sense to me. Rounding up to say 4 +- 1 means that the roundoff error is an additional 50% of the quoted uncertainty. The problem comes when they don’t bother to present the uncertainty. Then 3.5 sounds like you mean 3.4 to 3.6

  6. ” Total national sample size is 206,000. ”

    What was the Response-Rate ?

    Why does a reputable, professional survey organization hide it ?

Comments are closed.