A few months ago we reported on an article from the Columbia Journalism Review that made a mistake by comparing numbers from two different sources.
The CJR article said, “Before the 2016 election, most Americans trusted the traditional media and the trend was positive, according to the Edelman Trust Barometer. . . . Today, the US media has the lowest credibility—26 percent—among forty-six nations, according to a 2022 study by the Reuters Institute for the Study of Journalism.” That sentence makes it look like there was a drop of at least 25 percentage points (from “most Americans” to “26 percent”) in trust in the media over a six-year period. Actually, though, as noticed by sociologist David Weakliem, the “most Americans” number from 2016 came from one survey and the “26%” from 2022 came from a different survey asking an entirely different question. When comparing comparable surveys, the drop in trust was about 5 percentage points.
This comes up a lot: when you compare data from different sources and you’re not careful, you can get really wrong answers. Indeed, this can even arise if you compare data from what seem to be the same source—consider these widely differing World Bank estimates of Russia’s GDP per capita.
It happened to the CDC
Another example came up recently, this time from the Centers for Disease Control and Prevention. The story is well told in this news article by Glenn Kessler. It started out with a news release from the CDC stating, “More than 1 in 10 [teenage girls] (14%) had ever been forced to have sex — up 27% since 2019 and the first increase since the CDC began monitoring this measure.” But, Kessler continues:
A CDC spokesman acknowledged that the rate of growth highlighted in the news release — 27 percent — was the result of rounding . . . The CDC’s public presentation reported that in 2019, 11 percent of teenage girls said that sometime in their life, they had been forced into sex. By 2021, the number had grown to 14 percent. . . . the more precise figures were 11.4 percent in 2019 and 13.5 percent in 2021. That represents an 18.4 percent increase — lower than the initial figure, 27 percent.
Rounding can be tricky. It seems reasonable to round 11.4% to 11% and 13.5% to 14%—indeed, that’s how I would report the numbers myself, as in a survey you’d never realistically have the precision to estimate a percentage to an accuracy of less than a percentage point. Even if the sample is huge (which it isn’t in this case), the underlying variability of the personal-recall measurement is such that reporting fractional percentage points would be inappropriate precision.
But, yeah, if you’re gonna compare the two numbers, you should compute the ratio based on the unrounded numbers, then round at the end.
This then logically brings us to the next step, which is that this “18.4% increase” can’t be taken so seriously either. It’s not that an 18.4% increase is correct and that a 27% increase is wrong: both are consistent with the data, along with lots of other possibilities.
The survey data as reported do show an increase (although there are questions about that too; see below), but the estimates from these surveys are just that—estimates. The proportion in 2019 could be a bit different than 11.4% and the proportion in 2021 could be a bit different than 13.5%. Even just considering sampling error alone, these data might be consistent with an increase of 5% from one year to the next, or 40%. (I didn’t do any formal calculations to get those numbers; this is just a rough sense of the range you might get, and I’m assuming the difference from one year to the other is “statistically significant,” so that the confidence interval for the change between the two surveys would exclude zero.)
There’s also nonsampling error, which gets back to the point that these are two different surveys, sure, conducted by the same organization but there will still be differences in nonresponse. Kessler discusses this too, linking to a blog by David Stein who looking into this issue. Given that the surveys are only two years apart, it does seem likely that any large increases in the rate could be explained by sampling and data-collection issues rather than representing large underlying changes. But I have not looked into all this in detail.
Show the time series, please!
The above sort of difficulty happens all the time when looking at changes in surveys. In general I recommend plotting the time series of estimates rather than just picking two years and making big claims from that. From the CDC page, “YRBSS Overview”:
What is the Youth Risk Behavior Surveillance System (YRBSS)?
The YRBSS was developed in 1990 to monitor health behaviors that contribute markedly to the leading causes of death, disability, and social problems among youth and adults in the United States. These behaviors, often established during childhood and early adolescence, include
– Behaviors that contribute to unintentional injuries and violence.
– Sexual behaviors related to unintended pregnancy and sexually transmitted infections, including HIV infection.
– Alcohol and other drug use.
– Tobacco use.
– Unhealthy dietary behaviors.
– Inadequate physical activity.In addition, the YRBSS monitors the prevalence of obesity and asthma and other health-related behaviors plus sexual identity and sex of sexual contacts.
From 1991 through 2019, the YRBSS has collected data from more than 4.9 million high school students in more than 2,100 separate surveys.
So, setting aside everything else discussed above, I’d recommend showing time series plots from 1991 to the present and discussing recent changes in that context, rather than presenting a ratio of two numbers, whether that be 18% or 27% or whatever.
Plotting the time series doesn’t remove any concerns about data quality; it’s just an appropriate general way to look at the data that gets us less tangled in statistical significance and noisy comparisons.
I was disturbed by the YRBSS report. It does not mention the terms “margin of error” or “sampling error” but only offers the following statement in an Appendix: “Differences between prevalence estimates were considered statistically significant if the t-test p-value was <0 05." The data will be released later this spring, but this is the report issued to the general public. What fraction of the public will know what that means? What fraction will appreciate the limitations (some would say it is completely worthless) of that statement. The trends report does show the raw % over time which is good – but without any indication of random (not to mention non-random) sampling variation, I suspect most readers of the report will just focus on the point estimates and whether they go up or down.
I wouldn’t worry about “the public.” The fraction of the authors writing such sentences that understand it is very near zero. Once someone understands what it means, they wouldn’t bother with it unless forced to by confused reviewers or whoever.
But (ignoring systematic error), I think most would agree that 11.4% is already bad, we would hope it decreases over time. So if 13.5% is observed two years later, what is the chance there was a meaningful decrease?
Well, it does depend on sample size, among many other things, doesn’t it? I agree even the lower number is horrible, but your logic essentially reduces us to looking at point estimates in highly imperfect surveys. Is that really where you want to go? I was hoping we would move more towards embracing and conveying uncertainty: this seems like a step in the opposite direction.
I think this means sample size is ~ 17k, at least for 2021:
https://web.archive.org/web/20230221050407/https://www.cdc.gov/healthyyouth/data/yrbs/pdf/YRBS_Data-Summary-Trends_Report2023_508.pdf
I’m not saying to look at point estimates, not sure where you got that. I’m looking to place meaningful bounds on the value.
I mean, even w/ systematic error, we can be pretty sure the number didn’t drop from 11.4% to 1%. But what about 5%, or 8%?
Really the statistical (sampling) error is probably swamped by systematic (as is typically the case). I’m not willing to spend the time to look deeper into the methods though.
“Meaningful bounds” is precisely the issue. I don’t disagree with your conclusion regarding this single comparison, but the report is full of statements about what increased or decreased or did not change – all based on the p<.05 criterion. Many of those comparisons are less clear than the one you are discussing. So, the report equates "meaningful bounds" with a 95% confidence interval for the difference, and even goes beyond that to declare no change if 0 is within that interval. I'm not comfortable with that approach, even if I agree with your statement about that single comparison.
The rape measure is a lifetime measure and has been largely ignored by CDC and the news media for two decades while it fluctuated between the shocking 10 and 12 percent prevalence for girls.
The problem is not the sudden alarm over this disturbing evidence, the problem is that a dubious two-year spike in results is being conflated with long-term mental health declines among girls and that this spike is being blamed on boys even though the pandemic likely decreased contacts of girls with boys their age (i.e. we need to look at adult men if there really was a spike).
Left ignored is also the fact that YRBS results indicate a large majority of youth rape victims were raped already before high school.
Because usage of the English language by published authors and others often comes up in this blog “of discerning ears,” I was struck by this:
“The proportion in 2019 could be a bit different than 11.4% and the proportion in 2021 could be a bit different than 13.5%.”
From https://www.thesaurus.com/e/grammar/different-from-vs-different-than/
——————————————————————
Both different from and different than are accepted in standard American English, and both have been in use for the last 300 years. But is one of these phrases more accepted than the other?
Which is correct: “different from” or “different than”?
In formal writing, different from is generally preferred over different than. This preference has to do, in part, with the historical use of the word than. This term entered English as a conjunction often used with comparative adjectives, such as better, taller, shorter, warmer, lesser, and more, to introduce the second element in a comparison. Different is not a comparative adjective. Thus, when different than first started appearing in English, it sounded grating or less natural to discerning ears.
From has been used with the verb differ since at least the 1500s, which paved the way for different from to be readily accepted into the lexicon. William Shakespeare used different from in The Comedy of Errors: “This week he hath been heavy, sour, sad, / And much different from the man he was…” Other pairings have popped up over the years, including different against, but different from and different than remain the two most useful among English speakers.
Different than is common in American English, but might sound strange to British ears, and in the UK, different to is a common alternative that is seldom used in the US.
————————————————————————————–
I have always used “different from” and find that “different than” sounds annoying and wrong. For example I can’t imagine the Supremes singing
Love child, always second best
Love child, (different than) different than the rest
But then again I’m pale male and stale so I’m expected (if not entitled) to be curmudgeonly.
Just to clarify what may not be entirely clear in Andrew’s post: Kessler was NOT under the impression that 18.4% is an accurate measure of relative rise in the population; he knew it is merely the relative difference between two numbers (that happen to be YRBS results).
Kessler of course objected to BOTH the inane miscalculation AND the inexcusable misrepresentation by CDC of sample results as entirely accurate measures of population.
“It’s not that an 18.4% increase is correct and that a 27% increase is wrong.”
One is right and the other is wrong as a relative difference between two NUMBERS (survey results). How one interprets these numbers is a separate issue and this is stated very clearly in Kessler’s article (he even cites a CDC doc that warns against the use of relative differences).
David:
Yes, I liked Kessler’s article. It’s super-clear.
To anyone interested: please note that the Washington Post article was able to address only some of the problems with the CDC misrepresentation of 2021 YRBS results.
CDC DASH manipulated the information it fed to the press so that it fit the agenda of its leaders, and did it quite boldly. It made it look as if a wave of adolescent violence was the cause of much of the mental health declines in girls, which is nonsense for a number of reasons and yet is likely to influence policy makers and politicians for years to come after the news media amplified the message nationwide.
CDC has been refusing for weeks to answer a simple question: was the 2019 removal of sexual violence questions from over 15% of questionnaires due to a CDC decision or due to state or local demands? (I presumed the latter but I’m no longer so sure.)
It never responded to my polite request that researchers be allowed by CDC to differentiate between when a student did not reply to a question and when the student was not asked the question (currently impossible to tell apart in the YRBS data provided by CDC).
And so on.
For a better understanding of the issues, please see:
CDC Misinformation on Girls and Violence: Why it Matters
CDC and YRBS: Time for Transparency
at
https://theshoresofacademia.blogspot.com/
Wouldn’t we also need to consider different survey answer behavior when looking at survey data? I mean, as pointed out in the blog mentioned above (https://theshoresofacademia.blogspot.com/) we have different cohorts which answer this question. How a person answers a survey depends very much on social attitudes and how they feel right then at that moment when they fill in the survey. So if we identify a long term trend that says there is an increase in sexual violence against young girls, wouldn’t we also somehow have to account for changing attitudes in how to respond to such a question? Specifically, let us assume #MeToo influenced these children- maybe not directly, but by the way people talk to them about what is OK and what is not OK for others to do to another person; leading to a decrease in the social desirability bias (i.e. the bias that people are prone to answer whatever is socially desirable), as the default ‘don’t ever talk about it’ has become less socially desirable [please note that I speak in a descriptive way about social desirability and not in a normative way]. If we assume that to be true, we would have an increase in the reported cases even if the relative number of crimes remains constant. The way I see it, social attitudes is a confounding factor, and I cannot fully buy into the narrative of major increases of sexual crimes against girls. I do not wish to imply that there is no increase; I do wish to imply though that I am uncertain that any strong conclusion is really warranted from this kind of survey data. Maybe they somehow corrected for what I perceive to be a major source for bias- therefore please correct me if I’m wrong.
You are correct in that stigma and prestige factors, related to any YRBS question, may play a role — one of many reasons for caution.
The underlying issue is far more basic though: all other concerns aside, there is a substantial possibility that the increase could have been, as Andrew notes, largely due to a random sampling error.
CDC officials, however, decided to ignore ANY caution at all and instead paint the picture of a massive wave of general violence engulfing girls, conflating it with the very real wave of mental health problems that started its rise a decade ago.
It was the rhetoric of CDC officials, not the imagination the press, that generated headlines like this: Teen girls ‘engulfed’ in violence and trauma [Washington Post], CDC Says Teen Girls Are Caught in an Extreme Wave of Sadness and Violence [NBC NY], CDC sees alarming rise in violence, sadness in teen girls [CBS News] and Teen girls and LGBTQ+ youth plagued by violence and trauma [NPR].
All these headlines are close-to-verbatim citations of CDC officials. And the “27% rise in rape results” miscalculation was of a great help in selling this rhetoric.
Had CDC officials said something like “We are worried the results indicate there may have been a spike in sexual violence against girls during the pandemic,” it would have been entirely fine. And then yes, concerns such as yours would then need to be discussed.
One last comment:
Obviously the elephant in the room is the political backlash against YRBS due to LGBTQ+ issues: is that the reason behind the recent censorship of YRBS questionnaires?
The problem is that with CDC officials refusing to provide any explanation of the YRBS questionnaire censorship, we are in the dark.
CDC has been refusing for weeks to answer my simple question: was the 2019 removal of sexual violence questions from over 15% of questionnaires due to a CDC decision or due to state or local demands?
I presumed the latter but I’m no longer so sure, as I’m having less and less faith in the competency of CDC DASH leadership. On the other hand, if conservative backlash is a factor, it may get worse and worse (see Florida) and the validity of YRBS will suffer more and more. And I’ve seen no sign that CDC DASH leadership understands this.
If someone ever wrestles an honest explanation from CDC regarding YRBS questionnaire censorship, please let me know in a comment at https://theshoresofacademia.blogspot.com/