“Why we sleep” data manipulation: A smoking gun?

In his post, Matthew Walker’s “Why We Sleep” Is Riddled with Scientific and Factual Errors” (see our discussions here, here, and here), Alexey Guzey added the following stunner:

We’ve left “super-important researcher too busy to respond to picky comments” territory and left “well-intentioned but sloppy researcher can’t keep track of citations” territory and entered “research misconduct” territory.

From the Public Health Service (PHS) Policies on Research Misconduct – 42 CFR Part 93 – June 2005, here’s what the National Institutes of Health says:

Sec. 93.103 Research misconduct.

Research misconduct means fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results.

(a) Fabrication is making up data or results and recording or reporting them.

(b) Falsification is manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.

(c) Plagiarism is the appropriation of another person’s ideas, processes, results, or words without giving appropriate credit.

(d) Research misconduct does not include honest error or differences of opinion.

This came up a couple years ago in discussions of Game-of-Thrones-hating business-school professor Brian Wansink.

The above-shown bit from How We Sleep seems to fall right in category (b) of research misconduct: “changing or omitting data or results such that the research is not accuraely represented in the research record.”

OK, those are the National Institutes of Health rules. Now check this out from the web page of the University of California, where Matthew Walker is a tenured professor:

“Why we sleep” with Matthew Walker . . .

Matthew Walker earned his PhD in Neurophysiology from the Medical Research Council in London, UK, and subsequently became a Professor of Psychiatry at Harvard Medical School. He is currently Professor of Neuroscience and Psychology at the University of California, Berkeley, and Director of the Sleep and Neuroimaging Laboratory. He has received numerous funding awards from the National Science Foundation and the National Institutes of Health [emphasis added], and is a Kavli Fellow of the National Academy of Sciences.

If I were living in this country, I’d be pretty angry to hear that U.S. National Institute of Health dollars are being used to fund work that satisfies the National Institute of Health definition of research misconduct. . . . Hey, I do live in this country! Color me angry that my tax dollars are being spent in this way. But I don’t live in California so maybe I have no standing to complain about his support from the state university.

Still, at a salary of $214,029.00, you’d think the guy could afford to hire a research assistant who was competent enough to be able to redraw graphs from the literature without omitting key data. Or maybe an overly enthusiastic research assistant cut off the left bar on the above graph on purpose, in order to make the claims in Why We Sleep seem more compelling? I guess we’ll never know.

This seems like a good time to revisit that Dan Davies line:

Good ideas do not need lots of lies told about them in order to gain public acceptance.

40 thoughts on ““Why we sleep” data manipulation: A smoking gun?

  1. I’m curious as to why 5 hours of sleep has a lower risk of injury. If it looks like I’m only going to get 6 hours of sleep tonight, should I put on a TV show and stay up another hour?

      • So what’s the more egregious ethical violation here?

        We have the omission of a column of data. This is certainly inappropriate. However, if the data were robust and we performed a regression on the data with and without the five hour column, the impact on the results would be modest. In other words, the trend would hold. Only the slope would be affected. The general conclusion would be true, and probably no much weaker.

        On the other hand, we have a chart of 112 self-reported responses from a school (American? Grade school? High school?) being extended to represent all of humanity. We have self-reported results. We have a very very small sample. We have a horrendously skewed sample population. A robust study could turn the entire thing upside down, not only negating but potentially reversing the conclusion.

        From my POV, attacking the missing column is almost a joke. by far the most serious ethical violation is the misrepresentation of the data.

        • Not a joke. This crosses any ethical lines. You may be right about what the most important practical violation/implication about this research is. If you look at the original study, even their conclusion was suspect – they combined 8+ hours of sleep and compared it with <8 hours to get their p=.04 conclusion. Surely this represents a bunch of practices exposed on this blog as poor practice. But leaving out a column of data – regardless of whether it affects the conclusion or not – is an ethical breach that is not excusable. It also illustrates a degree of hubris that should not be accepted. I think if we would consider this "almost a joke" then we are well down the slippery slope with no hope of climbing back up.

        • I think the “joke” aspect here is that in truth the bigger ethical violation, which is widespread, is portraying something like a tiny little study of self-reported results from a small number of grade school children as evidence of universality.

          Focusing on leaving out one bar in a graph compared to writing a whole book based on flimsy evidence is like saying “also Charles Manson didn’t pay his parking tickets”. It may be true, and it may be true that one should pay parking tickets rather than dodging them, but it’s a sort of fiddling while rome burns…

        • Put another way, EVEN if the entire book turns out to be totally correct, as it stands, it doesn’t seem like we should call this science. Science isn’t just about being right, in fact it’s not really about being right at all… it’s about having *good reasons to believe something* even if it later turns out with more evidence you were wrong.

          I think the good reasons are what’s missing here. these are kind of flimsy cherry picked reasons it seems.

        • Curious: I guess I was assuming that Daniel’s disclaimer (“please note I’m not claiming that research misconduct is like murder”) would carry over to my post. I’m not claiming that research misconduct is like racketeering either (and I guess Al Capone murdered a few people too).

          My analogy was (only) meant to suggest that while directly addressing misconduct remains a rare event (because institutions will do whatever it takes to avoid the embarrassment of having [had] a fraudster on the staff, cf. the recent YouTube video by “Coffee Break” about the Florida State criminology case), if offenders are “brought to book” at all, it will have to be in some other way.

        • I can’t help but see that line of reasoning (that this isn’t as much of an ethical breach since it made little difference and involved a poorly interpreted study to begin with) in the same light as the defense that withholding Ukranian aid wasn’t so bad since it was eventually given to them. I’m not advocating for or against impeachment or whether the withholding of aid was or was not justified. What I see as the same issue is the particular defense of an act on the basis that it did not matter much. When reasons cease to matter based on the results, then I think we have lost any reason to care about statistical or scientific (or evidentiary) processes at all.

          If we compare the two ethical violations – leaving out the bar in the graph vs. over-intepreting the applicability of one small observational study – I find the former a more serious one, even if the second may be more consequential. As a number of people on this blog have expressed (from time to time), decisions need to be made even when the evidence is less than what is desired. It doesn’t excuse over-hyping conclusions, but they are more understandable to me. But when an awkward part of the data is simply omitted, that becomes intentional deception. I think that is a more serous ethical breach, even if its consequences may be smaller.

        • Dale:

          I agree with you. Hiding the bar in that graph is clear research misconduct. It might be that Walker gets out of this by blaming a research assistant or by attacking his accuser or, more likely, by just not responding to the criticism at all. Recall that the American Political Science Association never retracted Frank Fischer’s award. Walker’s a bigshot, and bigshots stick together. It would be awesome, though, to see Bill Gates, Joe Rogan, NPR, Ted, etc., say publicly that they’re bothered by the data misrepresentation in How We Sleep. We’ll see what happens.

        • Note, I don’t condone the leaving out of the bar… it’s just that I think we shouldn’t give people a pass when they write books claiming a high level of certainty on relatively limited information. That’s even more intentional deception. It’s just less obviously provably intentional without the hints, like the left-out bar… Basically the leaving out of the bar shows the intent for the more serious offense… and it’s the more serious offense that needs to be focused on, the bar is just evidence of the more serious offense.

          It’s a little over the top, but here’s an analogous situation: a man shoots his wife in the middle of the night as she comes home from a night out with her friends, and later claims he was afraid of a burglar breaking in… We later find out that the gun was intentionally purchased on the black market a few weeks earlier a few days after the wife asked for a divorce and her mother died leaving her millions of dollars… Do we book this man for misdemeanor improper transfer of a firearm, or murder?

          By itself, the “accidental” shooting is a plausible if somewhat flimsy explanation… but put together with the clear intent to acquire the firearm through illicit means and the motive for personal gain… it shows intent to murder. Focusing on the firearms transfer offense would be insane.

          please note I’m not claiming that research misconduct is like murder, I just needed an example where there’s a big difference in importance between two possible “crimes”.

        • Dale and Andrew: I agree with me and Daniel! :)

          The only reason egregious misrep of data seems less of an ethical violation is that it has become so common that it seems excusable. Its not.

        • Daniel: The Feds finally got Al Capone for tax evasion, but I think that most people would agree that the important point was they they got him. Unfortunately in science, the equivalent of racketeering appears to be a “grey area”, so maybe we will have to settle for concrete evidence of lesser crimes. It’s a shame that we currently can’t “prosecute” racketeering in science, so we have to wait for people to put a procedural foot wrong. Maybe we need the nice new concept that I saw mentioned on Twitter just now, “NATO for scientists”.

        • “As a number of people on this blog have expressed (from time to time), decisions need to be made even when the evidence is less than what is desired. ”

          That’s correct. But the evidence offered still must relate to the question.

          I think to say this is OK because it’s a “gray area” is a huge mistake. It isn’t a gray area. The data aren’t the least bit representative. To allow this as a justifiable research practice is just an invite to others to do the same thing.

        • Nick’s approach appears to be:

          1. Assume an Al Capone level of guilt.
          2. Over-prosecute for lesser crimes.

          Which results in the benefit that the convicted who have been found guilty in the court of public opinion are now held to account for previous as well as current crimes. In America this approach has resulted in prisons filled with people of color for minor offenses. I would hope a more sensible approach could be found fir scuence.

        • Curious, thanks for that. I agree with you. I actually think that Al Capone was convicted of tax fraud is a *gross error* in justice. In fact, our tax system is horribly broken, and ought to be revamped enough that a 7 line excel spreadsheet could calculate anyone’s taxes (basically add up all different sources of income, multiply by some fixed percentage… be done… but this only works in combination with a universal basic income).

          Dale Lehman has a good point though, leaving out a bar in a graph is prima-fascie evidence of wrongdoing… So we can easily decide something wrong has been done.

          Nevertheless, I think it’s more important to go after misrepresentation of knowledge by experts. We can’t easily decide, but we need to spend the time to collect information and make a decision. We can’t just let people keep sucking down government grants and spitting out nonsense polluting global knowledge.

        • Andrew & Daniel:

          While I take both your points on what should happen under the circumstances of intentional wrongdoing, I am not sure the evidence presented rises to that level in the absence of additional information. For the sake of discussion let”s say that instead of having been dropped the first bar was binned with the second. Would that be morally equivalent? Let’s further assume the now binned bar was labelled to reflect this binning. Would that? Would a mislabelling?

          As we all make errors and mistakes, being human and all, it is important to distinguish between those and that done with intent. If intent can be accurately inferred, then by all means I agree with your suggested remedies. If it cannot, then we are left with our assumptions and whether we land on the side if a presumption of innocence or a presumption of guilt.

        • Curious:

          I like the NIH definition:

          Falsification is manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.

          The second graph above seems like a clear case of falsification. It’s hard for me to see how this could not be done with intent, but, hey, all things are possible.

        • Andrew:

          By this definition one could conclude that any researcher who does not include all data, all methods, and all results in a report would be guilty of such a breach. It seems a sizable percentage of published papers might fall into this category. Is this definition overly expansive or is corruption that expansive?

        • jim
          While I agree with your sentiments, I will take issue with the clarity you appear to feel with regard to the ethics of interpreting this particular study. I am not an expert on sleep physiology – just as I am not an expert on psychology, nutrition, or many of the other topics regularly discussed on this blog. As billo’s comment below suggests, there may be knowledge that – despite this study’s small sample size and apparent lack of application more widely – perhaps this study is in lines with other research that has been done. This is why I see the ethical damage from overstating the study’s applicability as somewhat less serious than the clear misrepresentation embodied in the figure.

          I don’t want to suggest that claims overstating the universality of this one small and selective study’s findings are not serious. But I don’t think it is good to automatically dismiss them. Too often, we look at the statistical errors and make declarations about the content of the studies without knowing much about that content. I’ve seen this happen with a number of the psychology studies that have been discussed on this blog. I find many of those studies poorly conducted and badly interpreted. I tend to think that many researchers in that area are behaving unethically. But I am reticent to state that as clearly as I might for an area that I have more expertise in. So, I’m being more equivocal in my criticism. You appear to be ready to convict a researcher overstating claims from one particular research study as the most egregious sort of ethical violation. I find this less clear than an overt misrepresentation of the data – which is inexcusable, regardless of whether it has material implications or not.

        • This result has some support in replication. See, for instance, Gao B, Dwivedi S, Milewski MD “CHRONIC LACK OF SLEEP IS ASSOCIATED WITH INCREASED SPORTS INJURY IN ADOLESCENTS: A SYSTEMATIC REVIEW AND META-ANALYSIS” Orthopaedic Journal of Sports Medicine 2019 7(3 suppl 1) https://doi.org/10.1177/2325967119S00132. Note that the last author of the metanalysis is the author of the paper being discussed. It’s open source, so you can download the PDF from the Sage site.

        • bilio, I don’t think anyone is contending that the hypothesis might not be true. It’s certainly reasonable. The issue is the distortion of the supporting data.

        • The two are related — seeing the lower injury rate for people with 5 hrs of sleep should instantly make you very suspicious about the quality of the data

  2. Andrew: Excellent analysis. I’m glad you highlighted this particular pearl from Guzey’s critique. It stands out among the others as an unambiguous diagnostic bell-ringer for category B. Depending on how far someone is willing to stretch the principle of charity, there may be wiggle room for finessing, explaining away, or even forgiving some of Walker’s other indiscretions. On this one though, I can’t squint hard enough to see gray.

  3. All this sounds a lot like how some climate scientists handled criticism of tree-ring temperature proxies.

    Particularly distinctive is the way the anonymous website makes it difficult for a reader to see Guzey’s work, so most readers will not be able to check if Walker is accurately characterizing Guzey’s criticism.

    • Terry:

      Unfortunately, I don’t think it’s at all distinctive that the website makes it difficult for a reader to see Guzey’s work. This is standard operating practice for people who duck criticism. When for some reason they decide they can’t just follow the David Brooks strategy and avoid addressing the criticism entirely, they’ll often just vaguely allude to the criticism, or maybe provide an irrelevant link, without getting into specifics. Recall Susan Fiske with her detail-free discussions of methodological terrorists, or Cass “Stasi” Sunstein.

  4. >Still, at a salary of $214,029.00, you’d think the guy could afford to hire a research assistant who was competent enough to be able to redraw graphs from the literature without omitting key data. Or maybe an overly enthusiastic research assistant cut off the left bar on the above graph on purpose, in order to make the claims in Why We Sleep seem more compelling? I guess we’ll never know.

    I’m not sure how much weight the talk about research assistants has because it seems that Walker might have worked on the book alone. Here’s the acknowledgements section of his book:

    >Acknowledgments

    >The staggering devotion of my fellow sleep scientists in the field, and that of the students in my own laboratory, made this book possible. Without their heroic research efforts, it would have been a very thin, uninformative text. Yet scientists and young researchers are only half of the facilitating equation when it comes to discovery. The invaluable and willing participation of research subjects and patients allows fundamental scientific breakthroughs to be uncovered. I offer my deepest gratitude to all of these individuals. Thank you.

    >Three other entities were instrumental in bringing this book to life. First, my inimitable publisher, Scribner, who believed in this book and its lofty mission to change society. Second, my deftly skilled, inspiring, and deeply committed editors, Shannon Welch and Kathryn Belden. Third, my spectacular agent, sage writing mentor, and ever-present literary guiding light, Tina Bennett. My only hope is that this book represents a worthy match for all you have given to me, and it.

    Publisher is named. Editors are named. Agent is named. No research assistants are named and he doesn’t write anywhere that any research assistants helped with writing the book.

    As a control, I looked at another book that came to mind mind as that which probably used a lot of research assistants: “The Rise and Fall of American Growth” by Robert J. Gordon. It’s acknowledgements:

    >While the process of writing the chapters that appear here did not begin until the summer of 2011, as long ago as 2008 I began hiring one or more research assistants each year to find, compile, and highlight the book and article sources. My deep gratitude extends to all of those who worked as RAs on the book, including Ryan Ayres, Andrea Dobkin, Burke Evans, Tyler Felous, Robert Krenn, Marius Malkevicius, William Russell, Andrew Sabene, Neil Sarkar, Spencer Schmider, Conner Steines, John Wang, Scott Williams, Edwin Wu, and Lucas Zalduendo.

  5. Another core point is the book both starts and ends with claims that we sleep less than we used to (the sleep duration epidemic). But there are almost no studies that show that happening anywhere. The only ones that do use simple self reported sleep duration (How much do you usually sleep) which has failed at every attempt at validation and those datasets get republished over and over. People can’t really estimate reliably or validly how much they sleep. So we’re left with time use surveys which look somewhat like sleep diaries.

    According to the latest analysis of the time use data even in America or the UK sleep is increasing (https://doi.org/10.1093/sleep/zsy012 & https://doi.org/10.1111/jsr.12753) and most countries around the world are not getting worse anyway (https://doi.org/10.1007/s40675-015-0024-x)

  6. Walker does respond to the criticism of the left out bar in the graph. He indicates that the sample for the bar was really too small. I think this point invalidates the claim of misconduct, although I would have preferred to see the bar with the disclaimer.

    From ( https://sleepdiplomat.wordpress.com/ )
    As shown in the original study, as the athletes’ sleep amount progressively decreased from 9 to 6 hours, injury risk increased. This was depicted in a graphic in the book. The study also reported a group of individuals getting 5 hours of sleep a night. This group’s injury risk was also high, though lower than the adjacent 6-hour group. However, noted by the authors in communication, the 5 hour group was of a size small enough to challenge robust estimate of injury risk, and was not represented in the book’s graphic. Related, the authors averaged the data for statistical power, comparing those getting less than 8-hours with those getting more than this amount.

Leave a Reply

Your email address will not be published. Required fields are marked *