What was the worst statistical communication experience you’ve ever had?

In one of the jitts for our statistical communication class we asked, “What was the worst statistical communication experience you’ve ever had?” And here were the responses (which I’m sharing with permission from the students):

Not sure if this counts, but I used to work with a public health researcher who published a journal article impugning a major pharmaceutical company. The data on which she based her argument was incorrect! When this mistake came out, readers were upset, and the article was widely read and emailed because it was being criticized. It was ultimately one of the ten most-emailed articles that appeared in the journal that year, and she bragged about this distinction, not recognizing that it was actually a bad thing.

Me trying to present the findings of a study I did on a company’s website usage and their customers’s behavior. I had no idea on how to present the correlations I found and clearly display my causation hypotheses, or how to translate them into actionable insights.

I used an event on the news to explain bayes theorum and conditional probability to my friends. One of them stopped listening when I started to write mathematical symbols on paper.

Trying to explain to the New York City Council Speaker why a regression line is a “good fit” even though none of the data points actually fall on that line.

With the little experience that I do have, I would have to say interpreting a speck phone case advertisement was the worst statistical communication I have had. It was a simple venn diagram with three circles. The three categories were “people who workout”, “people who don’t”, and “people who would if it weren’t all hard and stuff”. In the middle where all the circles intersected there was the speck logo. First thought, these are all disjoint. However, reading more into it “people who would if it weren’t all hard and stuff” is basically another way of saying people who don’t workout. But “people who workout” is still disjoint from “people who don’t” because they have nothing in common. So there wouldn’t be anyone in the middle where all the circle intersected. To me the speck symbol in the center is then implying that the people in that section would have a speck case, but there is no one so no one has a speck case. This would just be poor advertising. Another person suggested that what everyone has in common is that they all have a speck case. So if you do or don’t workout, you still have a speck case. So everyone would then be in the middle and everyone would have a speck case. To this, another person said bluntly, you’re wrong. No where on the ad does it talk about these “people” having a speck case, so I thing the ad has flaws. This isn’t a very serious occasion, but this conversation occurred several times with the same group of people and we are split on what the ad is suppose to mean. If you want to the see the advertisement I can show you and then I can have your input on it!

During my internship in marketing company this summer, when I extracted bunch of data from SQL Server and copied part of the output to present to my supervisor, I didn’t explain what means by each column and didn’t give him which table I used. So he was very confused about my result and just told me the result he want it to be. However, from my side, the result he want is exactly what i represent to him. So I thought he gave me a harsh time. Then I went to his office, explaining what i did regarding the data I got. He got what I mean and make me think it is because I didn’t make my result understandable. It is the worst statistical communication.

Trying to explain what a density is in an interview for a tutoring job when I was fresh out of highschool. I totally knew what it was, but that didn’t seem to have any impact whatsoever on my ability to explain it.

I sat in many meetings at the UN where data was presented by chairs of a committee and no one in the room had a math background that could explain it clearly. The worst time was when we were looking at military expenditures over time. Every country documented theirs differently and there were about 5 languages being translated to English. I wasn’t able to speak up since I was just taking notes on the meeting but greatly looking forward to this class to help me learn how to communicate everything I understand about statistics!

Talking about my research during my internship last summer. I had an hour-long talk and my talk required substantial background that most of the audience did not have. Instead of simplifying my content, I decided to try to teach some of the background during the talk instead, but did not do that particularly effectively.

My worst statistical communication experience happened when I did the GARCH model to analyze the volatility of S&P prices during last fifteen years. Since I had to deal with the data first before I input the data in your model. I spent a lot of time standardizing the data and bridging different returns to make sure the comparison accurately. It was a huge project to complete the project.

I have not had very many, but most recently, a conversation with a friend who works in data science. She was working with data for the purpose of bringing attention to the lack of New York govt funding to poorer school districts. I criticized how she was analyzing the data, and she explained to me that whether or not the data represent the truth, her job is to take pieces of data to bring attention to subjects in need. I unfortunately saw how data can be used as a weapon.

Arguing with my suddenly vegan father about whether the “China Study” proves that vegan diets are the healthiest possible option.

What was your worst statistical communication experience?

24 thoughts on “What was the worst statistical communication experience you’ve ever had?

  1. I prepared testimony which included a description of the concept of statistical significance illustrated by a bar chart showing the frequencies of heads with a fair coin flipped ten times. The judge in the case interrupted the live testimony, pointing at the chart and saying “What does the bar at zero mean? Is that when the coin lands on its edge?” The answer (and the only one acceptable under the circumstances) “That’s a very good question, Your Honor.”

  2. I was testifying as an expert witness in case about extrapolations made from Medicaid reimbursement data based on a sampling of claims. The State (rather its consultants) had used an 80% confidence interval, badly constructed. I used a more robust approach which yielded a confidence interval (I forget which %) that ranged from a negative to a positive number. The judge interjected at that point, in disbelief, that my confidence interval included the possibility that there were no fraudulent claims and the possibility that the small sample contained all of the fraudulent claims that would be found if all the claims had been examined.

    Trying to explain the difference between a statistical confidence interval and what might be believed to be true was not a pretty sight (perhaps if I had some Bayesian training, I might have been able to do this better, but I suspect not).

    • If it were known that there were at least 1 fraudulent case (and I’m assuming there has been at least one that was caught), then a Bayesian prior on the number of fraudulent cases would have support only on positive numbers. The posterior distribution would therefore have ONLY positive numbers. This is a perfect example of a confidence interval that includes logically IMPOSSIBLE values where the Bayesian interval wouldn’t, which is one of the many reasons for throwing in the towel on CIs and going fully Bayesian.

      • But Daniel, you can go to likelihood shaped confidence intervals that have appropriate frequency coverage and avoid that problem.

        Fair, you may wish to go to Bayesianly obtained intervals that have appropriate frequency coverage or even posterior probability intervals – but then you have more “splaining to do”.

        The communication problem Dale ran into it mostly a misunderstanding about what weird creatures confidence intervals can be even though there are some sensible sub-species of confidence intervals.

        • +1
          A lot (tho’ not all) justifications for Bayes over Freq. are based on the ‘halfway house principle’. We don’t always need further regularisation to re-enter the real world.

  3. I was presenting population forecasts to a legislative finance committee so they could ask questions about the numbers as these helped set my agency’s budget for the next fiscal year. It was my first time doing this presentation as I had only been on the job a few months. I talked about the new way we were creating forecasts and how much more reliable they were than in the past. I paused to take a drink of water and heard snoring. The committee chairman who was sitting five feet from me had fallen asleep. I looked around the room and saw a few other heads nodding off and others who were reading unrelated material and realized that I had lost the room. I have never talked about modeling procedures again to a legislative committee.

  4. Oh man, so many to choose from. There’s this gem in EPA’s technical manual for their ProUCL software:
    “The presence of outliers in a data set destroys the normality of the data set (Wilks, 1963; Barnett and Lewis, 1994; Singh and Nocerino, 1995). Often the occurrence of outliers in a data set suggests that the data set comes from a mixture of several populations (e.g., several onsite areas, background areas with geological, anthropogenic, and other natural variability,…). It is highly likely that a data set consisting of outliers do not follow a normal distribution; whereas the use of a lognormal model tends to accommodate outliers and a data set with outliers and observations from mixture populations can be modeled by a lognormal distribution. This does not imply that that potentially impacted locations or unusual locations represented by those identified outliers do not exist.”

    Destroyers of normality!

    There was a long thread on the AmStat forum (Statistics and the Environment section; “ProUCL’s use of Gamma ROS”) on this issue with some interesting exchanges, mostly based on this miscommunication or misunderstanding that data need not follow any distribution.

    Another recent favorite of mine was trying to convince a state agency that a certain sampling design was sufficient to detect the *presence* (probability of detecting at least one individual assuming that individuals are present and distributed according to a Poisson point process with a specified density and a known detection efficiency) of a rare and endangered species. They balked that a relatively small amount of area was sufficient to characterize a 7 mile stretch of a large river. There was a lot of back and forth trying to explain how the goal of detecting at least one individual is different than the goal of estimating how many individuals are present.

  5. I remember the terrible speck phone case ad! I stared at it for a while while I was on the train, trying to figure out wtf. It wasn’t even clear to me what speck was or what they were selling.

  6. I was modelling wage premiums using identical models on two independent data sets from the same time period (I believe Andrew refers to this as ‘The Secret Weapon’). One data set returned a positive premium for the variable in question, the other data set returned a negative premium. In both cases they were significantly different from zero using a regular 95% CI. I presented this to my boss, reasoning that the results indicated that you couldn’t be sure there was a positive or negative premium associated with the variable. She instead was worried that it might give the impression that we didn’t really know what the premium was, as we had historically used only one of the data sets and believed the premium to ‘truly’ be positive as a result. She asked if there was any way to increase the size of the errors associated with both models in order to make the confidence intervals overlap each other. By this reasoning, we then couldn’t be sure that the negative premium wasn’t significantly different to zero, and so we could reject it. I failed to explain properly that even if that were possible, that would only increase the uncertainty about what the premium might be.

  7. My examples are rather underwhelming:

    An Engineering Prof. I worked with regularly dropped points from regressions till the lines became what he wanted them to and the R^2 (uggh) was respectably high. In Excel too. He thought it was his skill & intelligence in dropping the “bad” data-points from “polluting” the regressions.

    My boss in a Chemistry R&D Unit used to select the best “run” of an experimental pilot trial to report to the management for scale up funds. His reasoning was that the technician must have made some non-optimal decisions or errors or carelessness in the other runs so the best run is something that can be achieved, given enough diligence, & so they can design for it.

    I knew a Consulting firm that came out with an annual ranking of the best countries to do business in. They invested a lot of $$ in collecting good data but the high point of the exercise was the pre-release week when the bosses sat in a conference room & tweaked the weights of various attributes till the rankings looked “good”.

    Sadly, I think none of these guys think they are doing anything wrong.

  8. Naive me: “How do we know the our results can’t be explained by a difference in the amount of injury the rats got rather than due to the drug? If our theory is the drug works, shouldn’t we try to falsify that theory? Also don’t we have to check as many other explanations as possible?”

    Response: “Falsifying! That would be unethical. Just go get me those stars.”

  9. I interned with a large government bureau and a data scientist encouraged me to run a chi-square test on a table with low cell counts. Why? Because “You EXPECT the cell counts to be higher right? Therefore the expected values should be fine!”

    • Ah, the old problem of using a common word to stand for a technical definition. Leads to lots of misunderstanding and miscommunication.

  10. I work in a government agency. Another staffer, who I didn’t know at the time, called me to say he was doing a study, and, without saying anything about the study, asked “What sample size do I need to be 95% confident?” Thinking he could only be joking, I started laughing. He was not pleased at all, and let me know it.

    • “What sample size do I need to be 99% confident” or words to that effect were very very common questions when I first started working with engineers on doing surveys of damages. I think I’ve pretty well explained now that basically more random samples usually implies that the range of possibilities we’ll consider after the fact is narrower.

      I still have to explain though that if I randomize a list and then they go through it top to bottom picking the locations they want to look at, or have convenient access to or whatever, that the sample is not a representative random sample. Waving a magic RNG at the list doesn’t ensure you can’t goof it up after subsetting.

  11. How about CBS’ worst statistical communication problem ever (thank you for the serendipitous timing, internet):


    “In September of 1983, a flashy new game show called Press Your Luck hit the daytime broadcast on CBS….The brainchild of two veteran television producers, it was billed as the most ‘technologically advanced’ program of its kind….Of the Big Board’s 54 outcomes (18 squares with 3 rotating options each), 9 were a “Whammy.” That meant that, on any given spin, a player had 1 in 6 odds of losing everything. What’s more, the team that had programmed the board was confident that both the speed and ‘random’ nature of its sequences would prevent contestants from winning more than $25,000….After six months of scrupulous examination, Larson realized that the “random” sequences on Press Your Luck’s Big Board weren’t random at all, but rather five looping patterns that would always jump between the same squares…Finally, 40 successful spins and $102,851 later, Larson passed his final 3 spins to Ed Long, fearing that he was beginning to lose focus.”

    Now, an interesting accounting problem is whether CBS lost more to this little bit of OCD-inspired genius, or from Lets Make a Deal, which was basically accidentally designed to change the 1/3 chance of hitting the big prize to 1/2. In that case, Marilyn solved it publicly, but I don’t know if it affected contestant behavior or not, since apparently very few people actually believed her (I guess that would be pretty easy to check, if you bothered to go back and collect the data).

Comments are closed.