More bad news in the scientific literature: A 3-day study is called “long term,” and nobody even seems to notice the problem. Whassup with that??

Someone pointed me to this article, “The more you play, the more aggressive you become: A long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior,” by Youssef Hasan, Laurent Bègue, Michael Scharkow, and Brad Bushman. My correspondent was suspicious of the error bars in Figure 1. I actually think the error bars in Figure 1 are fine—I’ll get to that later, as it’s not the main issue to be discussed here.

Long-term = 3 days??

The biggest problem I see with this paper is in the title: “A long-term experimental study.” What was “long term,” you might wonder? 5 years? 10 years? 20 years? Were violent video games even a “thing” 20 years ago?

Nope. By “long-term” here, the authors mean . . . 3 days.

In addition, the treatment is re-applied each day. So we’re talking about immediate, short-term effects.

I’ve heard of short-term thinking, but this is ridiculous! Especially given that the lag between the experimental manipulation and the outcome measure is, what, 5 minutes? The time lag isn’t stated in the published paper, so we just have to guess.

3 days, 5 minutes, whatever. Either way it’s not in any way “long term.” Unless you’re an amoeba.

Oddly enough, a correction notice has already been issued for this paper but this correction says nothing about the problem with the title; it’s all about experimental protocols and correlation measures.

According to Google Scholar, the paper’s been cited 100 times! It has a good title (also, following Zwaan’s Rule #12, it features a celebrity quote), and it’s published in a peer-reviewed journal. I guess that’s enough.

What happened in peer review?

Hey, wait! The paper was peer reviewed! How did the reviewers not catch the problem?

Two reasons:

1. You can keep submitting a paper to journal after journal until it gets accepted. Maybe this article was submitted initially to the Journal of Experimental Social Psychology and got published right away; maybe it was sent a few other places first, in which case reviewers at earlier journals might’ve caught these problems.

2. The problem with peer review is the peers, who often seem to have the same blind spots as the authors.

I’d love to know who were the peer reviewers who thought that 3 days is a long-term study.

Here’s is my favorite sentence of the paper. It comes near the end:

The present experiment is not without limitations.

Ya think?

More tomorrow on the systemic problems that let this happen.

The error bars in Figure 1

Finally, let me return to the fun little technical point that got us all started—assessing the error bars in Figure 1.

Here’s the graph, with point estimates +/- 1 standard error:

Here’s the question: Are these error bars too narrow? Should we be suspicious?

And here’s the answer:

The responses seem to be on a 0-7 scale; if they’re uniformly distributed you’d see a standard deviation of approximately 7*sqrt(1/12) = 2.0. The paper says N = 70; that’s 35 in each group so then you’d see a standard error of 2.0/sqrt(35) = 0.34 which is, hmmm, a bit bigger than we see in the figure. It’s hard to tell exactly. But, for example, if you look at Day 1 on the top graph those two entire error bars fit comfortably between 3 and 4. It looks like they come to approximately 0.6 combined, so that each individual s.e. is about 0.15.

So the error bars are about half as wide as you’d expect to see if responses were uniformly distributed between 1 and 7. But they’re probably not uniformly distributed! The outcome being studied is some sort of average of coded responses, so it’s completely plausible that the standard error is on the order of half what you’d get from a uniform distribution.

Thus, the error bars look ok. The surprising aspect of the graph is that the differences between groups are so large. But I guess that’s what happens when you do this particular intervention and measure these outcomes immediately after (or so it seems; the paper doesn’t say exactly when the measurements were taken).

33 thoughts on “More bad news in the scientific literature: A 3-day study is called “long term,” and nobody even seems to notice the problem. Whassup with that??

  1. As a parent I worry about my children playing intensive amount of video games for three days in a row. That does feel “long” to me in terms of exposure. As a historian of reading most “long term” studies in social sciences make me cringe given they might mean years/decades at most. I think it is important to contextualize the idea of “length” given the subject matter. This is not a defence of the piece just that in “family time” doing something 3 days straight is long.

      • As someone who used to be pretty heavily into the ‘gaming’ culture, 20 minutes a day is *nothing*. Nothing. It’s barely enough to even get the controls.

        For better or for worse, during high school, I played for… 5-6 hours, a day? Nearly every day?

        That was definitely too much, and there’s no way any adult could feasibly do that if they have a job or a spouse. But even so, if I ever do play games nowadays, 20 minutes is barely enough to even digest the controls, let alone learn the mechanics of a game. An hour is a minimum playtime window, in reality. And many games require a lot of ‘maintenance’ and remembering what you were doing when you left it before. That alone can eat up 30 minutes. Games are time consuming if you’re really invested, and it really should only be the people who are willing to invest time that should be studied anyway, imo.

        Three days in a row, for 20 minutes? That’s a laughably small amount of time.

  2. With respect to citation counts, the fundamental premise that more citations is good is not as clearly valid as many seem to think. The same applies to peer review. If the “peers” are all mediocre, the reviewing will be mediocre, and abundant citations may indicate worse than nothing – they may indicate recognition by fellow incompetents. The US elected Trump after all! But science/engineering/etc are not and should not be democratic, at least not in judging what is quality. The criteria of quality researchers should carry more weight.

    These sorts of issues are serious problems with all the metrics/schemes currently used for judging academics and communicating their work and these problems are insufficiently addressed.

    Something is badly wrong if crackpots, or worse yet, fools, are highly paid chaired professors at supposedly top tier universities. (“Fools” sounds strong, but what else to call the authors of some of the examples discussed on this blog? Some could be called “charlatans”, but that involves value judgments.)

    What saves the day still, but may not do so forever, is that nothing useful comes out of junk, whereas solid engineering, science, and medicine eventually lead to something useful. In systems where there is a functioning mechanism for utilizing such advances this leads to feedback that favors quality whether highly cited or not. This explains why there are still math departments that do more than teach service courses.

    • “Something is badly wrong if crackpots, or worse yet, fools, are highly paid chaired professors at supposedly top tier universities”

      I am starting to think that once the possible crackpots/fools/charlatans get in academia, there is a high change that they will select further crackpots/fools/charlatans due to several characteristics of academia (e.g. letters of recommendation, flawed evalation metrics, bad work/researchers/students may actually get you further ahead, etc.)

      I am also starting to think that may already be past the tipping point, in such a way that many members of the general public think and act more scientifically than highly paid chaired professors. Or in the way that real science, and scientific reasoning, happens outside universities and other institutions like journals more and more (e.g. on twitter and blogs and pre-prints). Perhaps this can be compared to other historical events/processes like the Enlightenment.

  3. Another issue with this study seems to be the Y-axis of the CRTT assessment of aggression. In the article they indicate they standardized the assessments but this does not seem to be what was done in the Figure 1b. I am having a bit of trouble telling exactly how they combined together duration and intensity to yield an overall aggression score. This is particularly problematic because the CRTT has not standardized scoring method and one of the authors of this paper has scored it 17 different ways. http://www.flexiblemeasures.com/crtt/index

    • I agree this is a well-known problem with this measure that further calls into question the reliability of results from this study. Is the study data openly available?

  4. Also – Bushman has been mentioned a lot recently – duplicate publications, voodoo dolls, now this study. It seems like he might not be the most careful researcher and/or has really bad luck picking collaborators because he has had a good number of retractions, expressions of concerns, and corrections to studies he authored. Perhaps this is because he is so productive that he has had so many issues? Here are some that I found and this doesn’t even include the two duplicated publications in Current Opinion in Psychology (I’m assuming at least one of them will also be retracted).

    Retractions

    Çetin, Y., Wai, J., Altay, C., & Bushman B. J. (2016). Effects of violent media on verbal task performance in gifted and general cohort children. Gifted Child Quarterly, 60(4), 279-287.

    Whitaker, J. L., & Bushman, B. J. (2014). “Boom, Headshot!” Effect of Video Game Play and Controller Type on Firing Aim and Accuracy. Communication Research, 41(7), 879-891.

    Expression of Concerns

    Benjamin, A. J., Jr., Kepes, S., & Bushman, B. J. (2017). Effects of weapons on aggressive thoughts, angry feelings, hostile appraisals, and aggressive behavior: A meta-analytic review of the weapons effect literature. Personality and Social Psychology Review.

    Corrigendum and Erratum

    Bushman, B. J., & Whitaker, J. L. (2010). Like a magnet: Catharsis beliefs attract angry people to violent video games. Psychological Science, 21(6), 790-792.

    Hasan, Y., Bègue, L., & Bushman, B. J. (2012). Viewing the world through “blood-red tinted glasses”: The hostile expectation bias mediates the link between violent video game exposure and aggression. Journal of Experimental Social Psychology, 48(4), 953-956.

    Hasan, Y., Bègue, L., Scharkow, M., & Bushman, B. J. (2013). The more you play, the more aggressive you become: A long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior. Journal of Experimental Social Psychology, 49(2), 224-227.

    Van den Bulck, J., Çetin, Y., Terzi, Ö., & Bushman, B. J. (2016). Violence, sex, and dreams: Violent and sexual media content infiltrate our dreams at night. Dreaming, 26(4), 271.

    Carnagey, N. L., Anderson, C. A., & Bushman, B. J. (2007). The effect of video game violence on physiological desensitization to real-life violence. Journal of experimental social psychology, 43(3), 489-496.

    Whitaker, J. L., & Bushman, B. J. (2012). “Remain calm. Be kind.” Effects of relaxing video games on aggressive and prosocial behavior. Social Psychological and Personality Science, 3(1), 88-92

    • In all fairness it should be pointed out that a few of these issues seem to be minor corrections but a good number are major issues which altered the findings of the original studies.

  5. You were probably being glib, but violent video games have been around for at least 25 years. From wikipedia: “Doom (typeset as DOOM in official documents)[1] is a 1993 science fiction horror-themed first-person shooter (FPS) video game by id Software. It is considered one of the most significant and influential titles in video game history, for having helped to pioneer the now-ubiquitous first-person shooter”

    • Ah, good ol’ Doom. Been one of my favourites since I first got my hands on the shareware version back in the day. In these little over 20 years it has made me into a violent and unpredictable psychopath. Anyone looking for a good Doom 1 WAD, check out No End In Sight–and you can become a violent psychopath too!

      • I was ruined before Doom – I tell you I can’t drive by a buffalo herd without stopping the wagon and shooting every last one of them…even though I can only fit like half of one of them in the back for food.

        • even though I can only fit like half of one of them in the back for food.
          No, no, no! You want the hides to make buffalo robes. They are indispensable for winter travel in a cutter or “one-horse open sleigh.

          My family had a buffalo robe til the early 1960’s. Bit ratty looking but still ….

          Sheeh, Easterners/city folk!

    • Technically the way the field defines “violent video game” as any character acting aggressive against any other character in any way, even simple 80s games like “space invaders” count as “violent video games.” Seriously. Look at some of the early 80s research.

    • I lived and breathed Doom through high school. It was one of the most violent video games in existence at the time; nowadays it looks like a goofy cartoon.

      Coincidentally enough, just last night I played through some old Doom WAD levels I made when I was in high school. They’re still fun. Doodling Doom level ideas on notebook paper was what I did when I should have been paying attention in physics class. In retrospect, I made the right decision.

  6. A study announced in January from the UK found no claimed effect. It’s become difficult to write a sentence about this material: they aren’t announcing a relationship but a potential effect, one measured over some time period from instantiation to measurement in a specific setting with a specific study population that is suspected of representing a larger population with such a degree of accuracy that the error rates found in the study population don’t magnify to the point where the potential effect disappears or becomes chaotic or in some other way unreliably noisy. That’s a lot of words and I didn’t even get through the idea! Didn’t even start to discuss the methodology’s claims to represent a process that generates this potential effect as a result of this process and not for other material reasons. And I’m still not done describing the issues with this kind of thing.

    • Amoebas are practically immortal so they no doubt would find the claim to be even more ridiculous yet coming from one of us unsurprising nevertheless – which is perhaps why they try to eat our brains whenever they get the chance.

      Oldie but goodie: “It is possible, then, that in looking at an Amoeba we may be gazing upon something which has been continuously living from the time when life first appeared upon the globe, and the question, ‘Is Amoeba immortal?’ is by no means chimerical.” Journal of the Royal Agricultural Society of England (1890).

  7. To get the long term effect, just extrapolate:
    day 1: 3.8
    Day 2: 4.3
    Day 3: 5.9
    Just extrapolate the exponential growth for, say, 15 years of exposure. Easy.

  8. Dr. Gelman, I believe the variance of a uniform pmf is [(b-a+1)^2-1]/12, so the standard deviation you would expect would be even higher here. (sqrt[63/12])

Leave a Reply to jrkrideau Cancel reply

Your email address will not be published. Required fields are marked *