This post didn’t come out the way I planned.
Here’s what happened. I cruised over to the British Psychological Society Research Digest (formerly on our blogroll) and came across a press release entitled “Background positive music increases people’s willingness to do others harm.”
Uh oh, I thought. This sounds like one of those flaky studies, the sort of thing associated in recent years with Psychological Science and PPNAS.
But . . . the British Psychological Society, that’s a serious organization. And the paper isn’t even published in one of their own journals, so presumably they can give it a fair and unconflicted judgment.
At the same time, it would be hard to take the claims of the published paper at face value—we just know there are too many things that can go wrong in this sort of study.
So this seemed like a perfect example to use to take what might be considered a moderate stance, to say that this paper looked interesting, it’s not subject to obvious flaws, but for the reasons so eloquently explained by Nosek et al. in their “50 shades of gray” study, it really calls out for a preregistered replication.
So I went to the blog and opened a new post dated 25 Dec (yup, it’s Christmas here in blog-time) entitled, “Here’s a case for a preregistered replication.”
And I started to write the post, beginning by constructing a long quote from the British Psychological Society’s press release:
A new study published in the Psychology of Music takes this further by testing whether positive music increases people’s willingness to do bad things to others.
Naomi Ziv at The College of Management Academic Studies recruited 120 undergrad participants (24 men) to take part in what they thought was an investigation into the effects of background music on cognition. . . .
The key test came after the students had completed the underling task. With the music still playing in the background, the male researcher made the following request of the participants:
“There is another student who came especially to the college today to participate in the study, and she has to do it because she needs the credit to complete her course requirements. The thing is, I don’t feel like seeing her. Would you mind calling her for me and telling her that I’ve left and she can’t participate?”
A higher proportion of the students in the background music condition (65.6 per cent) than the no-music control condition (40 per cent) agreed to perform this task . . .
A second study was similar but this time the research assistant was female, she recruited 63 volunteers (31 men) in the student cafeteria . . . After the underling task, the female researcher made the following request:
“Could I ask you to do me a favor? There is a student from my class who missed the whole of the last semester because she was very sick. I promised her I would give her all the course material and summaries. She came here especially today to get them, but actually I don’t feel like giving them to her after all. Could you call her for ￼￼￼￼￼me and tell her I didn’t come here?”
This time, 81.8 per cent of the students in the background music condition agreed to perform this request, compared with just 33 per cent of those in the control condition. The findings are all the more striking given that the researchers’ requests in both experiments were based on such thin justifications (e.g. “I don’t feel like giving them to her after all”).
Shoot, this is looking pretty bad. I clicked through to the published paper and it seems to have many of the characteristics of a classic “Psychological Science”-style study: small samples, a focus on interactions, multiple comparisons reported in the research paper and many other potential comparisons that could’ve been performed had the data pointed in those directions, comparisons between statistical significance and non-significance, and an overall too-strong level of assurance.
I could explain all the above points but at this point I’m getting a bit tired of explaining, so I’ll just point you to yesterday’s post.
And, to top it all off, when you look at the claims carefully, they don’t make a lot of sense. Or, as it says in the press release, “The findings are all the more striking.” “More striking” = surprising = implausible. Or, to put it another way, this sort of striking claim puts more of a burden on the data collection and analysis to be doing what the researchers claim is being done.
Also this: “no previous study has compared the effect of different musical pieces on a direct request implying harming a specific person.” OK, then.
When you think about it, even the headline claim seems backwards. Setting aside any skepticism you might feel about background music having any consistent effect at all, doesn’t it seem weird that “positive music increases people’s willingness to do others harm”? I’d think that positive music would, if anything, make people nicer!
And the reported effects are huge. Background music changing the frequency of a particular behavior from 33% to 80%? Even Michael LaCour didn’t claim to find effects that large.
As is unfortunately common in this sort of paper, the results from these tiny samples are presented as general truths; for example,
The results of Study 1 thus show that exposure to familiar, liked music leads to more compliance to a request implying harming a third person. . . .
Taken together, the results of the two studies clearly show that familiar and liked music leads to more compliance, even when the request presented implies harming a third person.
Where are we going here?
OK, so I wrote most of the above material, except for the framing, as part of an intended future post on a solid study that I still wasn’t quite ready to believe, given that we’ve been burned so many times before by seemingly solid experimental findings.
But, as I wrote it, I realized that I don’t think this is a solid study at all. Sure, it was published in Psychology of Music, which I would assume is a serious journal, but it just as well could’ve appeared in a “tabloid” such as Psychological Science or PPNAS.
So where are we here? One more criticism of a pointless study in an obscure academic journal. What’s the point? If the combined efforts of Uri Simonsohn, E. J. Wagenmakers, Kate Button, Brian Nosek, and many others (including me!) can’t convince the editors of Psychological Science, the #1 journal in their field, to clean up its act regarding hype of noise, it’s gotta be pretty hopeless of me to expect or even care about changes in the publication policies of Psychology of Music.
So what’s the point? To me, this is all an interesting window into what we’ve called the hype cycle which encompasses not only researchers and their employers but also the British Psychological Society, which describes itself as “the representative body for psychology and psychologists in the UK” and also an entirely credulous article by Tom Jacobs in the magazine Pacific Standard.
I have particular sympathy for Jacobs here, as his news article is part of a series:
Findings is a daily column by Pacific Standard staff writer Tom Jacobs, who scours the psychological-research journals to discover new insights into human behavior, ranging from the origins of our political beliefs to the cultivation of creativity.
A daily column! 365 new psychology insights a year, huh? That’s a lot of pressure.
The problem with the hype cycle is not just with the hype
And this leads me to the real problem I see with the hype cycle. Actual hype doesn’t bother me so much. If an individual or organization hypes some dodgy claims, fine: They shouldn’t do it, but, given the incentives out there, it’s what we might expect. You or I might not think Steven Levitt is a “rogue economist,” but if he wants to call himself that, well, we have to take such claims in stride.
But what’s going on with the British Psychological Society, that in some way seems more troubling. I don’t think the author of that post was trying to promote or hype anything; rather, I expect it was a sincere, if overly trusting, presentation of what seemed on the surface to be solid science (p less than 0.05, published in a reputable journal, some plausible explanations in the accompanying prose). And similarly at Pacific Standard.
The hype cycle doesn’t even need conscious hype. All it needs is what John Tukey might call the aching desire for regular scientific breakthroughs.
You don’t have to be Karl Popper to know that scientific progress is inherently unpredictable, and you don’t need to be Benoit W. Mandelbrot to know that scientific breakthroughs, at whatever scale, do not occur on a regular schedule. But if you want to believe in routine breakthroughs, and you’re willing to not look too closely, you can find everything you need this week—every week—in psychological science.
And that is how the hype cycle continues, even without anyone trying to hype.
OK, here we are, at that point in the blog post. Yes, some or all the claims in this paper could in fact represent true claims about the general population. And even if many or most of the claims are false, this work could still be valuable in motivating people to think harder about the psychology of music. I mean, sure, why not?
As always, the point is that the statistical evidence is not what is claimed, either in the published paper or the press release.
If someone wants, they can try a preregistered replication. But given that the authors themselves say that these results confound expectations, I don’t know that it’s worth the effort. It’s not where I’d spend my research dollars. In any case, as usual I am not intending to single out this particular publication or its author. There’s nothing especially wrong with it, compared to lots of other papers of its type. Indeed, what makes it worth writing about is its very ordinariness, that this paper represents business as usual in the world of quantitative research.
As always, we get stories which I can’t take seriously because they assume the truth of population statements which haven’t actually been demonstrates. For example:
Why should positive background music render us more willing to perform harmful acts? Ziv isn’t sure – she measured her participants’ mood in a questionnaire but found no differences between the music and control groups. She speculates that perhaps familiar, positive music fosters feelings of closeness among people through a shared emotional experience. “In the setting of the present studies,” she said, “measuring connectedness or liking to the experimenter would have been out of place, but it is possible that a social bond was created.”
Both the researcher and the publicist forgot the alternative explanation that maybe they are just observing variation in some small group that does not reflect any general patterns in the population. That is, maybe no explanation is necessary, just as we don’t actually need to crack open our physics books to explain why Daryl Bem managed to find some statistically significant interactions in his data.
The aching desire for regular scientific breakthroughs
Let me say it again, with reference to the paper by Ziv that got this all started. On one hand, sure, maybe it’s really true that “background positive music increases people’s willingness to do others harm,” despite that the author herself writes that “a large number of studies examining the effects of music in various settings have suggested” the opposite.
But here’s the larger question. Why should we think at all that a little experiment on 200 college students should provides convincing evidence overturning much of what we might expect about the effects of music. Sure, it’s possible—but just barely. What I’m objecting to here is the idea—encouraged, I fear, by lots and lots of statistics textbooks, including my own, that you can routinely learn eternal truths about human nature via these little tabletop experiments.
Yes, there are examples of small, clean paradigm-destroying studies, but they’re hardly routine, and I think it’s a disaster of both scientific practice and scientific communication that everyday noisy experiments are framed this way.
Discovery doesn’t generally come so easily.
This might seem to be a downbeat conclusion, but in many ways it’s an optimistic statement about the natural and social world. Imagine if the world as presented in “Psychological Science” papers were the real world. If so, we’d routinely be re-evaluating everything we thought we knew about human interactions. Decades of research on public opinion, smashed by a five-question survey on 100 or so Mechanical Turk participants. Centuries of physics overturned by a statistically significant p-value discovered by Daryl Bem. Hundreds of years of data on sex ratios of children, all needing to be reinterpreted because of a pattern some sociologist found in some old survey data. Etc.
What a horrible, capricious world that would be.
Luckily for us, as social scientists and as humans trying to understand the world, there is some regularity in how we act and how we interact, a regularity enforced by the laws of physics, the laws of biology, and by the underlying logic of human interactions as expressed in economics, political science, and so forth. There are not actually 365 world-shaking psychology findings each year, and the strategy of run-an-experiment-on-some-nearby-people-and-then-find-some-statistically-significant-comparisons-in-your-data is not a reliable way to discover eternal truths.
And I think it’s time for the Association for Psychological Science and the British Psychological Society to wake up, and to realize that their problem is not just with one bad study here and one bad study there, or even with misapplication of certain statistical methods, but with their larger paradigm, their implicit model for scientific discovery, which is terribly flawed.
And that’s why I wrote this post. I could care less on the effect of pleasant background music to change people’s propensities to be mean. But I do care about how we do science.