The all-important distinction between truth and evidence

Yesterday we discussed a sad but all-too-familiar story of a little research project that got published and hyped beyond recognition.

The published paper was called, “The more you play, the more aggressive you become: A long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior,” but actually that title was false: There was no long-term study. As reported in the article, the study lasted three days, and in each day the measurements of outcomes may well have been conducted immediately after the experimental treatment. The exact details were not made clear in the article, but there’s no way this was a long-term study.

Yesterday I discussed how mistake could’ve made it through peer-review. Today I want to talk more generally about incentives.

But first, let’s step back a moment . . .

Before discussing any further, let’s just consider how ridiculous this all is. A paper was published by legitimate researchers (one of them is the Margaret Hall and Robert Randal Rinehart Chair of Mass Communication at Ohio State University) in a legitimate journal (Journal of Experimental Social Psychology, impact factor 2.2), it receives over 100 citations and national press coverage—and the title is flat-out wrong. (3 days is not “long term.”)

And, the most amazing thing of all . . . nobody noticed! An experimental science paper that mischaracterizes its experiment in the title—that’s roughly equivalent to a math paper declaring 2+2=5.

You know that saying, “The scandal isn’t what’s illegal, the scandal is what’s legal”?

Something similar here. The scandal is not that somewhere, in some journal, some authors screwed up and mis-titled their paper and the reviewers didn’t notice. Mistakes happen. I’ve messed up in published work in lots of different ways. No, the scandal is that this huge error was sitting there, in plain view, for five years! And nobody noticed. Or, I should say, if anybody noticed, I never heard about it. I guess that’s part of the problem right there, that it’s not so easy to correct the published record.

Incentives, incentives

My second-favorite bit of the above-linked article:

Another limitation is that our experiment lasted only three days. We wish we could have conducted a longer experimental study, but that was not possible for practical and ethical reasons.

Fine. If you can only do a 3-day study, just change “long-term” to “3 day” or, maybe, “5 minute” in the title of your paper. How hard is it, really, to just say what you really did? I guess, as commenters keep saying, it’s the incentives. Label your paper as a 3-day study and you might not get the citations and influence that you’ll get by calling it “long term.”

From the webpage of one of the authors of this paper:

So, consider three alternatives in designing and writing up this study:

1. Do a truly long-term study, following a group of people for a few years. Hmmm, that’s lots of work, don’t wanna do that!

2. Do a 3-day study, each day redoing the intervention and testing immediately after. This is inexpensive and likely to get solid results—but it’s not very interesting, will be hard to get published in a good journal and hard to get publicity later on.

3. Do a 3-day study, each day redoing the intervention and testing immediately after. But then put “long-term” in the title and hope for the best! This gives you all the convenience of the easy option, but with the potential for the major citations and media exposure that would be appropriate for an actual long-term study.

The incentives favor option 3.

You don’t have to be a “bad guy” . . .

Apparently these are the rules of the game, at least in some areas of science: You do an experiment which somewhere gives you statistical significance, you get the paper accepted at a journal, and then you misrepresent what you’ve learned, in the title of the paper and in the publicity material. (See the end of this post for an example where, in a single sentence of the publicity materials, one of the authors made 3 different claims, none of which are supported by the data in the research article.)

This sort of behavior is, in my opinion, destructive to science. But it happens all the time, to the extent that I doubt the authors even realized what they are doing. After all, they may well be personally convinced that their research hypotheses are true, thus, in their view, they may be saying true statements.

The idea that it may be true but it’s not supported by the data—the distinction between truth and evidence—that seems to be difficult for a lot of people.

I strongly doubt these researchers are trying to misrepresent their evidence. Indeed, once you become aware of the misrepresentation, and once you become aware of the distinction between truth and evidence, it becomes difficult to grossly misrepresent the evidence, if for no other reason than that it seems so obvious and embarrassing.

So, it’s not that these researchers really think that 3 days, or 5 minutes, is “long-term.” They just feel they’ve made a general discovery, they have no reason not to believe they’ve identified a long-term effect, and they’re reporting the truth, as they see it.

The trouble is that lots of outsiders—journalists and the general public, policymakers, and other scientists—might naively take these unsupported claims at face value, and think that Hasan et al. really did conduct “a long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior.” Which they didn’t.

There are so many incentives for researchers to misrepresent their data, and at the same time we have to deal with ostriches who say things like, “The replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.” So, yeah, there’s a reason that we keep screaming about all this.

I don’t care so much about this particular paper, which I’d never even heard of until someone sent me an anonymous email about it. But I do care about the larger issue, which is what’s happened to the scientific literature, when it’s considered OK, and unremarkable, to misrepresent your study in the title of your paper.

Like a harbor clotted with sunken vessels.

46 thoughts on “The all-important distinction between truth and evidence

  1. As I remark the suffusion of marketing ploys into scientific papers is not caught by most people. Expertise has been deemed to be superior in all dimensions. Well some significant % of audiences at least, I speculate. As you suggest authors may not be aware. I am not privy to any information that they are or are not.

  2. I can understand calling 3 days “long-term”. It’s long enough to come down from the adrenaline rush. But redoing the intervention each day seems to invalidate the entire point of doing a “long term” study.

    • All measurements should be in dimensionless form, and if they had said “our unit of time is the half-life of adrenaline in the human body estimated by others at x hours and hence our measurements extending over 10x hours is long term” I’d have been defending them like crazy, but no, that’s clearly not what they’re doing.

      • btw a quick google of “half life of adrenaline” comes up with an estimate of 2-3 minutes. So by that measure if they had evaluated people at 20-30 minutes after playing, and again 24 hours later, and at 1 day and 2 days etc… and made a principled argument that this was long term on this basis… I’d defend that pretty strongly… But I’d also argue that it’s only long term with respect to the adrenal secretion process. obviously human learning extends to decades, and so any learning or habituation process would require a different study entirely possibly a 30 year long within-subject design.

    • D:

      Sure. But isn’t it kind of amazing that a serious scientific journal would publish a paper where the title so obviously contradicts the contents of the article. And nobody noticed, nobody cared?

      To me, this goes beyond a general statement that published papers are often of low quality, unreliable, do not replicate, have serious errors, etc. Brian Wansink’s articles, for example, are full of errors, but you have to actually read the papers and look at the numbers to find the problem. In this case, though, it’s hard to avoid noticing the glaring error in the title. But it seems that nobody did!

  3. > may well be personally convinced that their research hypotheses are true, thus, in their view, they may be saying true statements.
    Certainly was my experience with some clinical researchers.

    Once it was so clear to me that I asked my director for permission to withdraw from on going studies which had negative financial consequences for my director. They gave me permission to stop interacting with them.

    Just the other day I noticed they had played a role in setting treatment guidelines – which is troubling (at least to me).

    They certainly understood how to be successful and obtain influential positions.

    Part of it is like how smoking became normalized as something one could do anywhere, anytime around anyone (all that payola that went into lighting up in movie scenes). Getting publications, awards and prestige is now something academics can do almost anyhow (short of making up data).

    • A quote from ASA Connect, April 5, 2018:

      “True story:
      My head of tech support at ****** once had a doctor screaming at him on the phone, “I don’t care how you do it, MAKE MY DATA SIGNIFICANT!”

  4. I’m no fan of this line of research, nor am I a fan of Bushman. Also, I’m far from an expert in this area. But I find it hard to get too critical of the title and the “long-term” claims. From the bits I have read about such studies, it appears that “short-term” effects are considered to be on the order of minutes – in other words, right after playing a violent video game, what are the impacts on people aggressive tendencies? In that context, 3 days is indeed a “long-term” effect. To a lay reader, the title – and the headline media reports – may well convey something different than what researchers in this area may consider short and long-term. So, the question is really, what is disturbing about this research?

    I find the title the least of my concerns. The errors in the research, the media reporting, the poor incentives to produce headline stories, the problems with peer review, etc. are all far more concerning to me than the title of this article. To put it bluntly: I am not convinced that the title is equivalent to 2+2=5. In the context of other research in this area, the title may be accurate, just not what my ill-informed context thinks of as “long-term.”

    • Andrew has the answer: just say “3-day.” Let readers decide whether that is “long-term” or not. This would completely eliminate the need to litigate the title. The authors or the journal made a different choice. The title without doubt led directly to the poor media reporting, since a lay reader encountering the words “long-term” and “video games” will think of a kid playing games for weeks or months. None of this is difficult to see.

      • I think the 3 days thing is even more misleading than the title or what Andrew is suggesting. Essentially, the researchers had people play games for 20 minutes, 40 minutes, and 60 minutes (they just broke up the time across three days; i.e., each person played 20 minutes each of the 3 days). So really, this is equivalent to doing this study between subjects at one time period where subjects play games for either 20, 40, or 60 min. I don’t think anyone would ever call such a design “long-term.”

        • Right. I think the issue is that “long-term” will imply to laypeople that this is a multi-year or at least multi-week follow-up study of accumulated effects. Which is what creates the incentives Andrew was talking about.

      • The problem is that “3 days” is already a fuzzy approximation to the exact amount of time allocated to game play. Just how much precision should be allocated to the language in the title of the article? The issue is that titles of articles are meant to be fuzzy roadmaps of what is in the paper. If you add too much numerical precision you gain accuracy, but the main concept gets lost and one cannot see the proverbial forest for the trees. And titles are all about forests, not trees.

        Here’s an article in the Guardian which makes the point that too much precision in a London tube map makes for a useless diagram. The same argument applies to titles of journal articles: https://www.theguardian.com/science/political-science/2013/feb/12/too-much-precision

        Andrew felt more precision was necessary in the title here, while the authors felt otherwise. Either way, there is always a call for balance and judgement in these situations, and there will be differences of opinion. As such, I hardly see much to get particularly excited about regarding Gelman’s criticisms of this.

        • Bozo:

          I’m not demanding numerical precision, I’m demanding that the title not be misleading.

          If the authors had wanted an accurate but imprecise title, they could’ve called it “A short-term study…”

          It would be as if, I dunno, suppose you write a biography of Marv Throneberry. You don’t have to call it “Marv Throneberry, the story a lifetime .237 hitter.” But it would be misleading to call it “Marv Throneberry, the story of a world-class slugger.” Unless, of course, you were being ironic.

        • This is a straw man argument, as nobody is defending the concept of intentionally trying to mislead. In this case, I’m not convinced that the authors were dishonest, as they may have felt that 3 days is indeed long-term in the context of video game play. Some here seem to agree, some don’t. However, I see no evidence of dishonesty – though I haven’t delved into this research.

        • Bozobrain:

          There is no straw man argument here. The authors titled their paper, “A long-term study…”, even thought the study was for only three days. This is misleading.

          As I wrote in my post above, I strongly doubt these researchers are trying to misrepresent their evidence. I’m guessing that they just think their hypothesis about long-term effects is true, hence the title could seem reasonable to them. I don’t think the authors, or the editors, or the reviewers, were sensitive to the distinction between truth and evidence. Hence the above post.

    • Dale:

      My view, and I think AG’s point, is that this title is not an academic crime. It’s not like it’s outright lying; if you were to read the paper, it would become readily apparent that “long-term” doesn’t mean what you think it means in this paper. But rather, this title is more like a canary in the coal mine. That one can skim the paper and realize the title is misleading…and yet there are papers citing this as though the title were the true findings is a bit of a scary thought.

      In short, it implies that research in this field could easily be just propagation of priors. If that’s the case, why not save money by not collecting data?

      • > why not save money by not collecting data?
        I think the field is already saving enough money by not actually reading papers they cite!

        I have often wondered what percentage of those citing my papers actually read them – might make for interesting survey research.

        My prior would about 1/3 with Beta(6,12).

        • I have often wondered what percentage of those citing my papers actually read them

          This is very difficult to know but I have seen researchers say that they have not read a paper, only the abstract or possibly only the title which they then referenced.

          It may be that some researchers read what they consider the “key” papers but do some quick reference padding by sticking in papers whose title or abstract sounds like it is somehow related. They just forget to specify that it was only the abstract they read

          I have seen papers cited (in non-refereed papers) that clearly shows that the authors did not read the paper. When you are writing a paper on the use of bicycle helmets for cyclists, citing a paper on the use of elbow and knee pads for skiers is a bit of a stretch.

          Then there is a case a friend of mine stumbled over while doing a meta-analysis. There was one original paper that researchers in that field always referenced (I think it was a ritual). My friend noticed that the very first paper to reference it misspelt the author’s name. Every reference from then on misspelt the name.

        • There seem to be a lot of “ritual” practices (beliefs?) in citing references. Another one is giving a reference that is too vague to give the needed information. For example, I recall a biology paper that said “we used linear regression” and gave a reference to a textbook — not a page number, just the book. And I so often encounter something like, “We used SAS PROC MIXED”, with no details about which factors were fixed or random, let alone why they were classified that way.

          It seems that many people use this type of “ritual behavior” as a substitute for thinking, providing sound reasoning, details. etc. They just don’t seem to get that it’s all about the thinking, reasoning, details, etc.

      • A:

        Yes, exactly. As the saying goes, the scandal is that this flagrantly misleading title is not an academic crime. The scandal is that it’s considered OK to misrepresent in this way, indeed considered so OK that I doubt the journal editor or any of the authors of the paper even saw it as a problem.

  5. I was wondering how this study was referenced by other scholars and came across this in the American Academy of Pediatrics Policy Statement Virtual Violence. They suggest this article provides evidence that “…experimental linkages between virtual violence and real-world aggression have been found. For example, a recent experimental study conducted in the real world motivated parents to change their children’s media diet by substituting prosocial programs in place of violent ones. This study found decreases in aggression and improvement in overall behavior.”

    Huh?
    1) This study did not look at “real-world aggression” – just how much noise a person exposed another person to in a laboratory
    2) No parents were examined and did not alter their children’s “media diet.”
    3) There were no children! (this was an adult sample with the average age around 22)

    Reference: http://pediatrics.aappublications.org/content/early/2016/07/14/peds.2016-1298#xref-ref-8-1

    (interesting side note – of the 13 referenced articles in this policy statement 7 of them were authored by Bushman)

    • Phoneguy:

      Wow, that’s odd. I clicked through to find the author; it’s “Dimitri Christakis, MD, MPH – Former Council on Communications and Media Executive Committee Member.” I guess maybe someone could contact him and ask how he could’ve gotten this so wrong. Maybe he was thinking of a different study and just messed up the citation?

        • Zad:

          I don’t know. I sent the following email to the author of the report:

          Dear Dr. Christakis:

          In the context of a discussion of research on violent video games, someone pointed us to this article that you wrote: http://pediatrics.aappublications.org/content/early/2016/07/14/peds.2016-1298

          In that article, you wrote:

          “But experimental linkages between virtual violence and real-world aggression have been found. For example, a recent experimental study conducted in the real world motivated parents to change their children’s media diet by substituting prosocial programs in place of violent ones. This study found decreases in aggression and improvement in overall behavior.”

          and for this you cited the following reference:

          Hasan Y, Begue L, Scharkow M, Bushman BJ
          The more you play, the more aggressive you become: a long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior. J Exp Soc Psychol. 2013;49(2):224–227

          But this study being reference was conducted in a laboratory, did not measure aggression (only how much noise a person exposed another person in the laboratory), did not involve children’s media (it was an adult sample with average age around 22), and did not measure overall behavior.

          This is concerning, in that it seems like a problem if the American Academy of Pediatrics is making a recommendation based on a misinterpretation of a study. Or was it a different study that you were referring to? Perhaps the references got misaligned?

          Thanks in advance for your help.

          Yours,
          Andrew Gelman

          I’ll post something when he replies.

  6. We wish we could have conducted a longer experimental study, but that was not possible for practical and ethical reasons.

    “Practical reasons” = “We have the resources to do a single longer-term study, or to churn out lots of short-term bagatelles. Guess which option produces the more impressive CV?”

  7. I think what is ironic is that your correspondent sent you this article thinking that the error bars were off, but in reality, it was the title! Which he/she also didn’t seem to notice. Further supporting your hypothesis that people *do* take this stuff at face value.

  8. C’mon a description like “long term” is obviously context dependent. I mean is looking at the effect of the introduction of horses on a society for 100 years long term? Well yes, if your concern is the effect of mobility on trade and warfare but certainly not if your concern is the effect on the prevalence of genes for digesting milk as an adult.

    You are assuming the only context this study can be considered in is the policy context of what playing video games does to levels of violence in society. In that sense this certainly isn’t long term. On the other hand, if almost all studies have only looked at acute effects immediately after playing this is long term in the context of showing some kind of persisting non-immediate psychological effect. Indeed, it seems plausible to me the reason no one complained about the title was the academic audience to which it was intended were used to mostly seeing much shorter studies so found the title perfectly reasonable.

    Given this issue of context dependence it’s not even theoretically possible to ensure that titles always convey the correct summary to journalists who don’t read the article. The problem is people just reading the title without even considering looking at what the article said.

    • Peter:

      1. You write, “You are assuming the only context this study can be considered in is the policy context of what playing video games does to levels of violence in society.”

      No, I am not assuming that at all, nor did I say I was assuming that.

      2. You write, “if almost all studies have only looked at acute effects immediately after playing this is long term in the context of showing some kind of persisting non-immediate psychological effect.”

      But this study also only looked at acute effects immediately after playing. They did the study for three days, each day looking at acute effects immediately after playing.

      3. You write, “The problem is people just reading the title without even considering looking at what the article said.”

      I disagree. Perhaps (just to make up numbers) 1000 people will read an article in some detail, 10,000 will read the abstract, 100,000 will see only the title, and a million will hear about the article only indirectly in a news report or in some other summary.

      It’s absolutely unavoidable that many people will only see the title. So authors of titles have some responsibility to not be misleading.

      • I’m more concerned that they have used up “long-term” for making a measurement within 5 minutes of something 3 days in a row. What will be the phrase for running an experiment that follows people for a decade, or even a lifetime?

      • I think your comment basically proves the main point of Andrews post. You say “if almost all studies have only looked at acute effects immediately after playing this is long term in the context of showing some kind of persisting non-immediate psychological effect.” Why are you assuming the authors in this study didn’t measure “acute effects immediately after playing?” Because that is exactly what they did. Perhaps you are incorrectly assuming they measured the outcome at some later point because they had the term “long-term” in the title?

  9. After reading all these comments and reflecting on it further, I think the focus on “long term” is still incorrect. There is no objective definition of “long term” and context does matter. If most of the research in the area considers “short term” as within a few minutes of play, then 3 days might reasonably be considered “long term.” I think attempts to deny this point are off target.

    On the other hand, the point about titling articles accurately and responsibility of authors is, I think, correct. I had hoped to show that the apparent mis-practice embodied in this title was true in all subjects – including economics, my own. But when I looked at most economics titles they are more descriptively accurate, such as “Evidence on….” rather than statements that encourage media misrepresentation (although I’m sure if I look hard enough I can find economics examples of that). And, the authors are well aware of how the media may pick up on a provocative title. So, yes, they bear responsibility for how they title their work.

    Why is it then so acceptable (perhaps even required in order to establish a reputation) to overstate a study’s findings in psychology? [I would speculate that this practice is actually increasing in other fields as well, though I’ve not tried to find any evidence of that] That does speak to the incentives, the way people are trained in the field, the established power structures in academics/research, the general over-saturation of our brains with information and noise, and the general degradation of media reporting – a setting making it increasingly likely that overstatement is required if someone wants to be heard.

    So, in the end, I still don’t think the title itself is like saying 2+2=5. I do think it is a symptom of many troubling problems and I do think the authors have responsibility to accurately state what they have found and not encourage poor research practice. If the people at the top of their profession do not model this, then they are contributing to the problems.

    • >then 3 days might reasonably be considered “long term.”

      except they repeat the treatment each day, so each day is a measurement taken (presumably) a few minutes after treatment.

      This is 3 repetitions of the usual short study, not a study of the effects of play on behavior at t=1,2,3 days

  10. And this the root cause of the replication crisis. Researchers don’t really want to know the TRUTH.

    All they care about is publishing and becoming famous. If they really cared about the truth, they would have written the conclusion way differently because the evidence provided can no way tell us the truth.

    • Anoop:

      I disagree. I think all these researchers, even the worst of them, care about the truth. They just think they already know the truth, and they think of discussions of evidence, data quality, statistics, etc., as a sort of “red tape” or distraction from the larger issues. That’s my point above, that they care about truth but they don’t understand about evidence.

      • Hi Andrew,

        From your post, “but they don’t understand the evidence”.

        Seriously, the chair of the dept, and authors with Phd’s with multiple papers on the same topic don’t understand the difference between a “3-day acute study” and “long-term effects”? Then we have a bigger problem on our hands for sure.

        All you said about “they think they know the truth and they think of discussions of evidence, data quality, statistics, as a sort of “red tape” or distraction from the larger issues” could be lumped into how badly they want to know their truth. If they REALLY want to know the truth, they would read all this and finally come to the conclusion that they all matters and do it differently. Just my opinion may not be “true” :)

        • Anoop:

          They’re just people. Even the Margaret Hall and Robert Randal Rinehart Chair of Mass Communication at Ohio State University is just a person. People have blind spots.

          I think the problem here is not quite that they don’t understand the difference between a “3-day acute study” and “long-term effects.” I mean, if you asked them directly if a 3-day study can tell you about long-term effects, they’d understand that the answer is No, it can’t. But nobody asked them directly!

          To me, this looks like a special case of a much more general phenomenon, which is that people write things that sound good but make no sense. Calling it a “a long-term experimental study” sounds good, so that’s that.

          You might think I’m being unfair here, but I don’t think so. We’ve seen so many examples of this.

          Here’s one we discussed awhile ago: “That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.” That’s the last sentence of a famous paper that has no evidence whatsoever about people being “instantly becoming more powerful”; indeed it has no data about power at all. Anyway, this is just a familiar example. We see it all the time. People write things that sound good. “Instantly becoming more powerful” sounds good. “A long-term experimental study” sounds good.

          I agree with you that “we have a bigger problem on our hands for sure.” But I think this may be more of a problem of language than of science. Politicians, pundits, and just plain ordinary people do the same thing, saying things that sound good even though they’re not true. I speculate that this is a linguistic aspect of our human tendency to generalize.

  11. I have a favorite story that touches on both “cites my paper without reading it” and “misleading title” themes. Early in my career some colleagues published a paper whose title was “Validity of the [such and such survey instrument]…” but the conclusion of the study was that instrument was entirely non-valid. If you read even the abstract, let alone the whole paper, it was clear that the instrument simply did not measure what it claimed to measure.

    Yet that paper has been cited dozens of times over the years as support for using the instrument. The citing papers usually say something along the lines of, “This instrument has been validated…” with a footnote to the paper which demonstrated lack of validity. Either people read the paper and chose to lie when citing it or much more likely they simply did a quick Google Scholar search on the name of the instrument, say the phrase “Validity of…” and Bob was, as they say, their uncle.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *