N=43, “a statistically significant 226% improvement,” . . . what could possibly go wrong??

Posted on April 16, 2024 9:50 AM by Andrew

They looked at least 12 cognitive outcomes, one of which had p = 0.02, but other differences “were just shy of statistical significance.” Also:

The degree of change in the brain measure was not significantly correlated with the degree of change in the behavioral measure (p > 0.05) but this may be due to the reduced power in this analysis which necessarily only included the smaller subset of individuals who completed neuropsychological assessments during in-person visits.

This is one of the researcher degrees of freedom we see all the time: an analysis with p > 0.05 can be labeled as “marginally statistically significant” or even published straight-up as a main result (“P < 0.10”), it can get some sort of honorable mention (“this may be due to the reduced power”), or it can be declared to be a null effect.

The “this may be due to the reduced power” thing is confused, for two reasons. First, of course it’s due to the reduced power! Set n to 1,000,000,000 and all your comparisons will be statistically significant! Second, the whole point of having these measures of sampling and measurement error is to reveal the uncertainty in an estimate’s magnitude and sign. It’s flat-out wrong to take a point estimate and just suppose that it would persist under a larger sample size.

People are trained in bad statistical methods, so they use bad statistical methods, it happens every day. In this one, I’m just bothered that this “226% improvement” thing didn’t set off any alarms. To the extent that these experimental results might be useful, the authors should be publishing the raw data rather than trying to fish out statistically significant comparisons. They also include a couple of impressive-looking graphs which wouldn’t look so impressive if they were to graph all the averages in the data rather than just those that randomly exceeded a significance threshold.

Did they publish the raw data? No! Here’s the Data availability statement:

The datasets presented in this article are not readily available because due to reasonable privacy and security concerns, the underlying data are not easily redistributable to researchers other than those engaged in the current project’s Institutional Review Board-approved research. The corresponding author may be contacted for an IRB-approved collaboration. Requests to access the datasets should be directed to …

It seems like it would be pretty trivial to remove names and any other identifying information and then release the raw data. This is a study on “whether older adults retain or improve their cognitive ability over a six-month period after daily olfactory enrichment at night.” What’s someone gonna do, track down participants based on their “daily exposure to essential oil scents”?

One problem here is that Institutional Review Boards are set up with a default no-approval stance. I think it should be the opposite: no IRB approval unless you commit ahead of time to posting your raw data. (Not that my collaborators and I usually post our raw data either. Posting raw data can be difficult. That’s one reason I think it should required, because otherwise it’s not likely to be done.)

10 thoughts on “N=43, “a statistically significant 226% improvement,” . . . what could possibly go wrong??”

Dale Lehman on April 16, 2024 10:34 AM at 10:34 am said:

But what if my MRI scans fell into the wrong hands and they figured out who I was? At least I might gain some understanding about how my brain works.

It looks like n=23 for the cognitive assessments, due to COVID limiting the ability to conduct these. Also, while they randomized assessment, it clearly couldn’t be a blinded study since I assume people could tell if they were smelling something or not – they don’t mention how that might influence the results.

Reply ↓
- Birdpipe on April 16, 2024 11:13 AM at 11:13 am said:
  
  Raw data can be very helpful for those trying to break down an experiment or possibly try and replicate it. With the IRB, I feel it makes sense to have everyone abide by the same set of rules no matter how trivial it may be for the results. This helps protect anyone who wants to participate.
  
  Reply ↓
  - Dale Lehman on April 16, 2024 12:01 PM at 12:01 pm said:
    
    I’m not sure I understand your second point. Are you saying that protecting the privacy of participants should be the same, regardless of how serious consequences might be if their identify were discovered? If so, I don’t agree. Nobody is saying that the identity of these people should be revealed – anonymization is necessary, but almost always imperfect. How perfect anonymization needs to be should be related to the seriousness of the consequences, in my opinion. As your first point indicates, there is value in having access to the raw data. IRB’s have a balancing act: “no matter how trivial it may be for the results” seems to deny such balancing and demand the most restrictive access limitations for all cases.
    
    Reply ↓
- anon on April 16, 2024 5:42 PM at 5:42 pm said:
  
  This actually is a concern with MRI studies for privacy reasons. The anatomical scan contains facial features and not just the brain. If you want to publish the data, it’s recommended to “deface” the anatomical scan, and there are tools to make it fast and easy to do.
  
  Reply ↓
Raphael K on April 16, 2024 3:00 PM at 3:00 pm said:

May I also refer the readers to the Pubpeer record of the study: https://pubpeer.com/publications/25269E619F766BEDBF5ACD4F4ADF82#1
It states that a new product is being developed based on this study (remember that Procter & Gamble funded this research).
I am sorry to see that the authors have put a lot of effort into this project and out comes… a 226% improvement. I think both people with *no* training in statistics and people with *plenty* of training in statistics would be sceptical about this result. I wonder if the presence of statistics limits the ability of inexperienced statisticians to critically evaluate their results. Perhaps their mental capacities are so tied up getting the computer to produce results that they have no imagination left that their result might be wrong.

Reply ↓
Josh on April 16, 2024 3:17 PM at 3:17 pm said:

226 % improvement? But did they check for an interaction with participants zodiac signs?

Reply ↓
John N-G on April 16, 2024 4:15 PM at 4:15 pm said:

The “226% improvement” comes from the control group scoring 0.73 points worse on a particular test administration post-treatment than pre-treatment, while the treatment group scored 0.92 points better. The math is: (0.92 – -0.73)/-0.73 = -226%. Then they changed the sign since it’s an improvement.

Note that if the control group had done only marginally worse (0.01 points) post-treatment such that the treatment was not as beneficial, they would have seen (0.92 – -0.01)/-0.01 = -9100%, or a 9100% improvement! Rotten luck that the control group declined so much.

Also, I love it that the first author of a paper on the health benefit of smells is named Woo.

Reply ↓
- Anoneuoid on April 16, 2024 6:53 PM at 6:53 pm said:
  
  What exactly does one point mean on the test? From looking it up, it might be either remembering 1/15 extra words or sometimes researchers sum the repeated tests so 1/(5*15) = 1/75 extra words. But maybe its something else, there are many variations on this test and the authors don’t explain it.
  
  This became “People in the enriched group showed a 226% increase in cognitive performance compared to the control group, as measured by a word list test commonly used to evaluate memory.”
  https://scitechdaily.com/a-whiff-of-genius-simple-fragrance-method-boosts-cognitive-capacity-by-226/
  
  Reply ↓
- Max Shepsi on April 17, 2024 5:52 AM at 5:52 am said:
  
  On that logic, imagine if the control group had scored exactly the same post-treatment as pre-treatment… I guess those in the treatment group would have become some sort of omniscient beings!
  
  Reply ↓
John Richters on April 16, 2024 5:48 PM at 5:48 pm said:

Andrew:

Methinks the unavailability of study data has nothing to do with the baseless security and privacy issues they cite and everything to do with the unwillingness of Proctor & Gamble (their corporate sponsor) to (ahem) gamble with the profit potential of the new product it’s developing based on the claimed study results.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

N=43, “a statistically significant 226% improvement,” . . . what could possibly go wrong??

10 thoughts on “N=43, “a statistically significant 226% improvement,” . . . what could possibly go wrong??”

Leave a Reply Cancel reply