What is the relevance of “bad science” to our understanding of “good science”?

We spend some time talking about junk science, or possible junk science, most recently that book about sleep, but we have lots of other examples such as himmicanes, air rage, ages ending in 9, pizzagate, weggy, the disgraced primatologist, regression discontinuity disasters, beauty and sex ratio, the critical positivity ratio, slaves and serfs, gremlins, and lots more examples that I don’t happen to recall at this moment.

Why do I keep writing about this? Why do we care?

Here’s a quick reminder of several reasons that we care:

1. Some junk science is consequential. For example, Weggy was advising a congressional committee when he was making stuff up about research, and it seems that the Gremlins dude is, as the saying goes, active in the environmental economics movement.

2. The crowd-out, or Gresham, effect. Junk science appears in journals, careful science doesn’t. Junk science appears in PNAS, gets promoted by science celebrities and science journalists. The prominent path to success offered by junk science motivates young scientists to pursue work in that direction. Etc. There must be lots of people doing junk science who think they’re doing the good stuff, who follow all ethical principles and avoid so-called questionable research practices, but are still doing nothing in their empirical work but finding patterns in noise. Remember, honesty and transparency are not enough.

3. There’s no sharp dividing line between junk science and careful science, or between junk scientists and careful scientists. Some researchers such as Kanazawa and Wansink are purists and only seem to do junk science (which in their case is open-ended theory plus noisy experiments with inconclusive results), other people have mixed careers, and others of us try our best to do careful science but still can fall prey to errors of statistics and data collection. Recall the 50 shades of gray.

In short, we care about junk science because of its own malign consequence, because people are doing junk science without even realizing it—people who think that if they increase N and don’t “p-hack,” they’re doing things right—and because even those of us who are aware of the perils of junk science can still mess up.

A Ted talkin’ sleep researcher misrepresenting the literature or just plain making things up; a controversial sociologist drawing sexist conclusions from surveys of N=3000 where N=300,000 would be needed; a disgraced primatologist who wouldn’t share his data; a celebrity researcher in eating behavior who published purportedly empirical papers corresponding to no possible empirical data; an Excel error that may have influenced national economic policy; an iffy study that claimed to find that North Korea was more democratic than North Carolina; a claim, unsupported by data, that subliminal smiley faces could massively shift attitudes on immigration; various noise-shuffling statistical methods that just won’t go away—all of these, and more, represent different extremes of junk science.

None of us do all these things, and many of us try to do none of these things—but I think that most of us do some of these things much of the time. We’re sloppy with the literature, making claims that support our stories without checking carefully; we draw premature conclusions from so-called statistically significant patterns in noisy data; we keep sloppy workflows and can’t reconstruct our analyses; we process data without being clear on what’s being measured; we draw conclusions from elaborate models that we don’t fully understand.

The lesson to take away from extreme cases of scientific and scholarly mispractice is not, “Hey, these dudes are horrible. Me and my friends aren’t like that!”, but rather, “Hey, these are extreme versions of things that me and my friends might do. So let’s look more carefully at our own practices!”

P.S. Above cat picture courtesy of Zad Chow.

42 thoughts on “What is the relevance of “bad science” to our understanding of “good science”?

  1. Totally agree with this. We’ve talked about the perverse incentives and they need to be addressed if we are to improve things – we need to stop rewarding people for publications (regardless of quality), use of fancy methodologies, TED talks, headlines, etc.). We also need a greater sense of personal responsibility, regardless of these external incentives, though I’m not sure how we promote that (can it be done through education?). But one additional aspect of the problem occurs to me. For those of us that teach, teaching promotes many of these bad habits. Students rarely reward instructors for modesty in their claims, humility, and hesitancy to draw conclusions. I think these pressures are much worse, the less selective the school, the larger the class sizes, and the greater the teaching load. That means that the majority of academics spend most of their time in front of classes where they are rewarded for over-simplifying and over-stating. That’s been my experience. To the extent it is an accurate portrayal, it is difficult to see how this doesn’t spill over into research practice (although its effects on teaching are even more damaging).

  2. I recall a year or so ago Andrew requested some examples of “good” statistical analyses from the blog readers. I also recall very few, if any, responses to that request (I may have misremembered).

    For whatever reason, maybe blog readers prefer to rubberneck zany statistical practice than solid, if usually less dramatic, “conventional science”. Maybe that’s the same or similar reason why the news media loves those stories.

  3. +1 to “None of us do all these things, and many of us try to do none of these things—but I think that most of us do some of these things much of the time. We’re sloppy with the literature, making claims that support our stories without checking carefully; we draw premature conclusions from so-called statistically significant patterns in noisy data; we keep sloppy workflows and can’t reconstruct our analyses; we process data without being clear on what’s being measured; we draw conclusions from elaborate models that we don’t fully understand.” We could all stand to be a lot more humble and honest with ourselves about how little we understand. It’s too bad it’s so difficult for people to figure out how.

    • From my favorite philosopher

      [“Adult conversations” in science subscribe to] “the idea that science is a communal enterprise, and one that should involve people with different backgrounds, inclinations, and talents, so that the greatest variety of angles is explored. … [CS] Peirce defines the private self not in terms of anything exquisite or divine, but in terms of error and ignorance. What makes our private selves unique is that we differ from others in that we are wrong about different things and that we are ignorant about different things. Hence, for Peirce, scientific inquiry—which seeks to alleviate error and ignorance—is in essence a process of self-effacement.” [Peirce: A Guide for the Perplexed 2013, Bold not in the original].

      But finding out how you wrong is painful – like early experience in an exercise program.

      • I love the scientific/perspectival argument for diversity – we want it because seeing things from different perspectives is how we arrive at deeper/broader understanding of the world. That is inherently good for science. I’m actually disappointed that this benefit has been lost to some degree in the diversity arguments recently, which focus so much on righting historical wrongs. Obviously that part is important, but to me there is an immediate positive scientific value in diversity, because it means bringing in new perspectives that are likely to reveal new things about the world.

        • I think the problem is alienation of those of us who value one aspect of diversity ( diversity of perspective) but not the other ( correcting of a historical wrong).

        • About 7 or 8 years ago, I used to go to trivia night just about every week with a few friends. We were all science nerds who also have broad interests, and we would typically do pretty well but not great. The place we played was a pizza joint near the UC Berkeley campus, so there would be us plus a bunch of students. Some of the teams would have, say, a grad student in history and one in physics and one in literature, plus a few other people, and such teams were better than us. We would typically finish in 4th, 5th, or 6th place out of about 15 teams; a few lucky times (or when other teams were shorthanded) we would finish in the top 3 and win a minor prize.

          Trivia night got more and more popular (and more and more competitive) over the course of several months, and we started having to get there earlier and earlier to get a table. One day I showed up and something like 5:15 to try to get and hold a table until the game started at 7. There was a guy sitting at a four-top, watching baseball. I asked if I could have his table when he left. He asked me to join him, which I did, and we got to talking about trivia etc. It turned out he was a retired street sweeper driver who had never graduated from high school. We had a pleasant conversation; he stayed as the other members of my team arrived; and he ended up joining our team. On any given set of questions he didn’t know as many answers as any one of us did…but when he did know an answer there was a decent chance it was an answer none of the rest of us knew. We didn’t need one more person who knew what elements are liquid at room temperature, or about how far is it from the earth to the sun, or even who was Nixon’s chief of staff or what year did Shakespeare die; we already had several people who knew those things, none of which this guy (Adrian) knew. But he did know what year Bridge Over Troubled Water was the #1 song, and what was Operation Ranch Hand, and many other things behind. From that day forward we became the team to beat.

        • Jrc:

          I agree. Phil’s story is perfect; with a little effort he could design a psychology study using this theme, work in some evolutionary theory, publish it in PNAS, go on NPR, and get a Ted talk out of it. Inspirational and backed by science!

          Also, yeah, maybe have the story take place in a Thai restaurant. The Thai places in Berkeley are amazing. OK, probably not as good as the ones in L.A. because everything’s better in L.A., but a zillion times better than what you’ll find in NYC.

    • Ulrich:

      This reminds me of a principle that, for an applied statistician, what makes you look good is not doing a clever data analysis, but rather if you study a large effect. The researchers who study ESP could be the world’s most brilliant statisticians, but there’s nothing there for them to discover, so at best they’ll look like people who’ve wasted their time.

      • I’ve been wondering about this ESP phenomenon you mention from time to time.

        How could one demonstrate it exists?
        What if there are only a handful of people in the world that actually possess it?
        Why assume it is some uniform faculty evenly distributed?

        I agree that random sample no matter how big will reveal nothing, but it doesn’t mean ESP doesn’t exist. If aliens landed on earth and sampled several thousands or millions of droplets of sea water, they could conclude there is no large life on earth.

        You and I know there are such things as whales, sharks, nudists, etc.

        • Navigato:

          Sure, all things are possible. I’m skeptical of ESP in part because people have been carefully looking for it for about 100 years and haven’t found any good evidence yet. So it sounds to me like wishful thinking, kind of like believing in unicorns or ghosts or whatever. But if they ever find something—or, for that matter, if they ever run into a live unicorn—then, yeah, that would be cool.

        • I’ve been wondering about this ESP phenomenon…

          Don’t bother. Note that the kind of ESP mentioned here from time to time is (Daryl Bem’s) retrocausal ESP, which is an even worse case of “anything goes” thinking; of idle, pseudoscientific speculation than telekinesis.

  4. I’m mostly disappointed nowadays. It’s been 15 years since people began to point out that MDA‐MB‐435, a breast cancer cell line used in breast cancer research, is almost certainly a mislabeled melanoma strain from a man; and yet researchers continued to model breast cancer progression based on misidentified cells. https://www.tandfonline.com/doi/abs/10.4161/cbt.6.9.4624 By 2018 the mix up was undeniable https://www.tandfonline.com/doi/abs/10.4161/cbt.6.9.4624 and yet here we are in 2021 and people are still pumping out breast cancer models derived from cells that are obviously not breast cancer cells https://pubmed.ncbi.nlm.nih.gov/33475350/

    It’s as if the grant system is akin to the Sorcerer’s Apprentice – once it has summoned a bucket of paper on a subject more and more mindless scientist-brooms keep dumping more and more absurd papers into the river of discourse; nobody knows how to stop it and nor does anyone seem to care.

  5. That was one of the standard arguments for the skeptic movement: by learning what is wrong with ideas that are fairly clearly recent nonsense, like alien encounters or bigfoot, and which its not too unpopular to speak against, we inoculate ourselves against the harder-to-see errors we are more likely to make, and learn to speak out against them when that is riskier. The big showy errors of reasoning which purveyors of nonsense make are easier to analyze than everyday errors of reasoning, like a spherical cow model is easier than the kind of math and physics an engineer or surveyor needs.

  6. Here’s one more reason why it’s important to point out junk science: Not everything has to be science. There are many instances of junk science that, if reframed as personal observation, philosophical reflection, or speculation, might have been legitimate, interesting, and laudable within that domain. There has been great pressure to frame everything, even wisdom, in terms of what “research has shown.” You’re not allowed to say something because you find it true. Or you’re allowed, but no one will listen. However, if you cite *studies* in support of your argument, you have immediate credibility, even if the studies are flawed.

    Some years ago I pitched a piece to one of those popular online commentary publications–something along the lines of Slate or Salon. The topic was solitude, and the piece was supposed to accompany my book. The editor said that my piece needed to cite studies. It wasn’t that kind of piece, yet the editor didn’t see room for any other kind. She said that’s what the readers wanted. Studies.

    But more often than not, when I take a piece of commentary that cites studies, and go look into the studies themselves, there’s an incongruity between what the commentator is saying and what the studies suggest–not to mention a lack of conclusiveness in the studies themselves. “Studies” become shorthand for logical sloppiness, which is the opposite of what should happen.

    In other words, if people swoon or bow at the very mention of “studies,” then they are surrendering their own judgment to a buzzword. Anything, be it an argument from personal experience or a study at a top reserch university, must still be evaluated on its merits. “Studies” should not constitute a free pass, nor does everything have to cite studies.

    • +100

      What Ezra Pound said of bad writing is probably just as true of bad studies. “The chief cause of false writing is economic. Many writers need or want money. These writers could be cured by an application of banknotes.”

      I’m not saying we should just pay sloppy scientists not to do or publish their studies. (I think Pound was being ironic.) But I think most of us agree that the incentives to put bad science out there are real, and largely economic. If you could get and hold an academic post through thoughtful observation, reflection, speculation, and critique, presenting these as such, without conducting a “study”, then much of the problem would be solved.

    • > Studies” become shorthand for logical sloppiness, which is the opposite of what should happen.

      Do you have a study to support that assertion?

      OK. That was a joke, it still… If people improperly reference studies or readers fallaciously consider a reference as shortcut…that’s where the problem lies – not in an expectation that assertions be well-referenced.

      I only say that because, while I don’t dismiss the importance of the issues you raise, I reflexively want to ask what are the alternatives being offered.

      Considering the ubiquity of broad dismissal of “expertise” and broad acceptance of unreferenced opinion as if it is the equivalent of well-referenced opinion, I think it’s important to ask that question

      • I think it is a matter of the domain of science, vs. the domain of “personal observation, philosophical reflection” etc.

        Obviously one can’t refute science with the latter, but that doesn’t mean it doesn’t have its own validity.

        • Confused: I think you’re overextending things: “personal observation, philosophical reflection, etc.” do have a legitimate place in science . In particular, personal observation is an important part of refuting a “scientific” claim that is overly general: A claim of the sort “A implies B” may be considered scientific, but if there is an example where A is true, but B is false, then the claim is overly general — there is more scientific work to be done to tease out when A implies B and when A does not imply. B.

      • Joshua: Suppose Emerson’s editor asked him to cite studies in support of the following assertion: “There is a time in every man’s education when he arrives at the conviction that envy is ignorance; that imitation is suicide; that he must take himself for better, for worse, as his portion; that though the wide universe is full of good, no kernel of nourishing corn can come to him but through his toil bestowed on that plot of ground which is given to him to till.”

        The whole thing would be ruined: “A recent longitudinal study shows that personal attitudes toward education undergo a pro-autonomy, pro-originality shift at some point between the ages of 16 and 28.”

        confused put it well–it’s a matter of domain–though Martha (Smith) makes a good point that the domains of science and “personal observation, philosophical reflection,” etc. can correct and influence each other.

        As Thomas Basbøll says, there are economic incentives not only to put bad science out there, but to adorn articles and other writings with “science,” regardless of its applicability or quality. There is a market not only for flashy science, but for writing and talks with a scientific veneer.

        Citing a study does not improve an argument unless (a) the argument is *about* something in the study’s domain, be it sociology, psychology, or whatever the field may be; (b) the study is relevant to the argument; (c) both the argument and the referenced study are of high quality; and (d) the author understands and acknowledges both its applications and its caveats. *Not* citing a study does not imply carelessness, amateurism, or disregard for evidence; reflection and speculation also require discipline and accuracy. While Emerson’s statement above may not hold true for everyone, it evokes something through its language. The question is not, “What percentage of the population experiences this shift?” but rather, “What is at the heart of the shift that Emerson describes here?” It leads to introspection rather than to surveys.

        There was a time when I learned how to revise my own writing, without being told by others what it needed (though others’ comments became even more important than before in certain ways). Similarly, there was a time when I learned how to improve my cello playing on my own, to some extent, without being told by a teacher what I was doing right or wrong. But that’s only at the surface of the phenomenon Emerson describes. There’s something else that happens to a person, maybe far along in adulthood: a realization that you won’t be alive forever, that mimicing others is a waste of time, that for better or worse, what you have to offer is yours alone, and the best thing you can do is go ahead and offer it as well as you can. But this paraphrase or interpretation is nowhere close to Emerson’s resonant wording with its rhythms and syntactic subtleties. His is no simple dashing off of opinion. It carries flashes of the very genius he describes. It enacts its own points, which it could not do if weighted down with unnecessary citations. There are some who find Emerson overblown. But you can sink into one of his sentences and stay with it for a long time, which you could not do if it dutifully and unnecessarily referenced studies.

        • Thanks for this, Diana. What you’re pointing to is happening quite explicitly in my field of writing instruction, where pedagogies are increasingly required to be “research-based”. On this view, teachers of literature cannot encourage their students to find their own voice in writing by quoting Emerson’s “envy is ignorance” unless they’ve got a study to back up the effect of this sort of “intervention”. (I won’t go into the quality of what often passes for a “study” in this field.) In fact, a professor of culture and literature, with decades of teaching experience, is supposed to keep his opinion of his students’ writing to himself because he lacks the relevant “expertise”, i.e., training in “writing studies”.

        • Thomas –

          > On this view, teachers of literature cannot encourage their students to find their own voice in writing by quoting Emerson’s “envy is ignorance” unless they’ve got a study to back up the effect of this sort of “intervention”.

          I won’t dispute the existence of such a phenomenon, not its (negative) importance. But I think that phenomenon should be viewed along with the other side of the equation, where in a similar extreme farming, students are taught that merely expressing their voice is sufficient and even more troublesome in my experience, students fail to understand that a “thesis” needs to be an arguable point.

          > In fact, a professor of culture and literature, with decades of teaching experience, is supposed to keep his opinion of his students’ writing to himself because he lacks the relevant “expertise”, i.e., training in “writing studies”.

          Again, this feels polemical to me. In my experience, many times teachers do not know how to meaningfully convey to students, their impressions about their writing.

          My favorite example is when a student told me that a teacher told her to make her writing more concise, and when then asked how to do that, the teacher replied “Say the same thing but with fewer words.”

          Obviously, there are different aspects of giving feedback to students about their writing, and there are different domains of writing that involve different purposes and different sorts of audiences. But I have a reflexive response, a concern about an “old man yelling at clouds” about “kids today” in a lament about how good things were back when I had to walk ten miles to and from school in the snow, uphill both ways.

          Sure, the questions you raise are important, but I don’t think that they should be raised with a cynical voice about the importance of “expertise” in giving feedback to students on their writing.

        • I think we mainly agree. Some people are good at teaching writing and some people aren’t. I don’t think the people who do “studies” are significantly better at it. Of course, they’d quickly produce a study to prove that they are!

        • Diana –

          > Suppose Emerson’s editor asked him to cite studies in support of the following assertion: “There is a time in every man’s education when he arrives at the conviction that envy is ignorance; that imitation is suicide; that he must take himself for better, for worse, as his portion; that though the wide universe is full of good, no kernel of nourishing corn can come to him but through his toil bestowed on that plot of ground which is given to him to till.”

          >> The whole thing would be ruined: “A recent longitudinal study shows that personal attitudes toward education undergo a pro-autonomy, pro-originality shift at some point between the ages of 16 and 28.”

          lol.

          I guess in a sense that is so. But I’m not sure we need to create a dichotomy between the two approaches. When I read broad claims about sweeping societal generalizations, even in a non-expository context, my mind often immediately goes towards wondering about the implicit biases in the generaliz-er. For example, in this case I might wonder about the generalization about “men” and wonder if the exclusion of women is of any relevance. So where should the lines be drawn as to when emotive and highly unqualified descriptions of societal phenomena should be appreciated merely for their emotive or perhaps alegorical effect?

          > confused put it well–it’s a matter of domain–though Martha (Smith) makes a good point that the domains of science and “personal observation, philosophical reflection,” etc. can correct and influence each other.

          I don’t disagree with either of those commments.

          > As Thomas Basbøll says, there are economic incentives not only to put bad science out there, but to adorn articles and other writings with “science,” regardless of its applicability or quality. There is a market not only for flashy science, but for writing and talks with a scientific veneer.

          Yah. I get a little queasy when I see such broad characterizations. Not to say that I think there isn’t an “incentive” problem within the marketing of science – but because I see so much cynical application of that reasoning (I’ve spent a lot of time in the climate-o-sphere where the theory that the idea of anthropogenic climate change is a hoax perpetrated by scientists seeking to line their pockets) I worry about where such rhetoric becomes counterproductive. Or someone with a useful hammer (incentives are important) looking for nails.

          > Citing a study does not improve an argument unless (a) the argument is *about* something in the study’s domain, be it sociology, psychology, or whatever the field may be; (b) the study is relevant to the argument; (c) both the argument and the referenced study are of high quality; and (d) the author understands and acknowledges both its applications and its caveats.

          Again, in broad strokes I don’t disagree – but for me that list is too categorical. The links between arguments and domains can be ambiguous, relevance to arguments can be relative, even some aspects of low quality studies can have some value – if only to show that an issue is more complicated than how the researchers framed it, and understanding of applications and caveats are moving targets that are developed through engagement.

          > *Not* citing a study does not imply carelessness, amateurism, or disregard for evidence; reflection and speculation also require discipline and accuracy.

          Yah. I don’t disagree with that.

          > While Emerson’s statement above may not hold true for everyone, it evokes something through its language. The question is not, “What percentage of the population experiences this shift?” but rather, “What is at the heart of the shift that Emerson describes here?” It leads to introspection rather than to surveys.

          I don’t see this as an either or scenario. I think both sorts of questions are of value.

          > There was a time when I learned how to revise my own writing, without being told by others what it needed (though others’ comments became even more important than before in certain ways). Similarly, there was a time when I learned how to improve my cello playing on my own, to some extent, without being told by a teacher what I was doing right or wrong. But that’s only at the surface of the phenomenon Emerson describes. There’s something else that happens to a person, maybe far along in adulthood: a realization that you won’t be alive forever, that mimicing others is a waste of time, that for better or worse, what you have to offer is yours alone, and the best thing you can do is go ahead and offer it as well as you can. But this paraphrase or interpretation is nowhere close to Emerson’s resonant wording with its rhythms and syntactic subtleties. His is no simple dashing off of opinion. It carries flashes of the very genius he describes. It enacts its own points, which it could not do if weighted down with unnecessary citations. There are some who find Emerson overblown. But you can sink into one of his sentences and stay with it for a long time, which you could not do if it dutifully and unnecessarily referenced studies.

          I think of sometimes discussing writing with non-Americans, where they’ve described an American academic expository and more hierarchical form of prose, targeted specifically to explicitness, as “ugly” or boring or insulting to the reader (as if they aren’t capable of integrating nuance and need to be treated like children).

          So sure, it always goes back to who is your audience, and what is your purpose.

          But again I approach this within a frame that is situated in a particular time in a particular society – where, IMO, as a society we are struggling with how to rethink our understanding of the importance of “expertise.”

  7. I’ve been exploring the consequences of Andrew’s criticisms for psycholinguistics over the last few years, and the findings are very sobering. One thing I took away from this exercise was that I can write a paper in which the General Discussion says, the data don’t tell us anything clearly, but here are some possibilities. This has had mixed success. In some cases I managed to publish the paper in a top journal in my field. In other cases (this is the more frequent outcome), the editor or reviewer rejects the paper on the grounds that no decisive outcome or conclusion was reported. Editors and reviewers force authors down this road, or try to. I know it’s common to complain about reviewers’, nobody likes to be criticized. But I have a strong impression that both editors and reviewers lack (statistical) education, and therein lies the problem.

    Some good news: so far, my strategy has not negatively impacted my students’ careers. I have eight professors working worldwide that came out of my lab, several are now tenured or close to tenure. It’s possible to step away from the hype and still do the best work one can given the circumstances.

    • Actually, in hindsight the expectation of a decisive analysis of clinical research examples in my thesis might have been the major reason for being asked to do revisions.

      I had used the clinical research examples to just demonstrate various approaches rather than discern the most decisive. In the examination, I could not grasp why the examiners where expecting this in a statistical thesis – as if clinical realities (costs and benefits) played no part.

      Only when I read the written examiners report many years later did this jump out at me (my advisor and I thought they wanted to see some hard math in the thesis and I focused on provided that but did feign the discernment of decisive analyses in examples).

      So not just those who lack statistical education but maybe more lack an understanding of the scientific method and economy of research…

    • Shravan said,
      ” In other cases (this is the more frequent outcome), the editor or reviewer rejects the paper on the grounds that no decisive outcome or conclusion was reported. Editors and reviewers force authors down this road, or try to. I know it’s common to complain about reviewers’, nobody likes to be criticized. But I have a strong impression that both editors and reviewers lack (statistical) education, and therein lies the problem.”

      I’m not so sure that the problem is that that they lack statistical education, but that the problem is intolerance of uncertainty. Ideally, tolerance of uncertainty should be part of statistical education, but it often is not.

    • I think Nosek genuinely believes that the IAT really does measure implicit attitudes including racist attitudes so he is acting in “good faith” you might say. I am not sure people are aware that such conclusions are based on the speed to which people respond to words/pictures on computer screens with no validity with existing (pre-IAT) measures (rather conveniently IAT tests results are not predicted to correlate with reliable questionnaire measures of attitudes so you will get shouted down if you say it lacks convergent validity based on the absence of a correlation with other measures).

Leave a Reply

Your email address will not be published. Required fields are marked *