If you do an experiment with 700,000 participants, you’ll (a) have no problem with statistical significance, (b) get to call it “massive-scale,” (c) get a chance to publish it in a tabloid top journal. Cool!

Posted on November 13, 2014 9:09 AM by Andrew

David Hogg points me to this post by Thomas Lumley regarding a social experiment that was performed by randomly manipulating the content in the news feed of Facebook customers. The shiny bit about the experiment is that it involved 700,000 participants (or, as the research article, by Adam Kramera, Jamie Guillory, and Jeffrey Hancock, quaintly puts it, “689,003”), but, as Tal Yarkoni points out, that kind of sample size is just there to allow people to estimate tiny effects and also maybe to get the paper published in a top journal and get lots of publicity (there it is, “massive-scale” right in the title).

Before getting to Lumley’s post, which has to do with the ethics of the study, I want to echo the point made by Yarkoni:

In the experimental conditions, where negative or positive emotional posts are censored, users produce correspondingly more positive or negative emotional words in their own status updates. . . . [But] these effects, while highly statistically significant, are tiny. The largest effect size reported had a Cohen’s d of 0.02–meaning that eliminating a substantial proportion of emotional content from a user’s feed had the monumental effect of shifting that user’s own emotional word use by two hundredths of a standard deviation. In other words, the manipulation had a negligible real-world impact on users’ behavior. . . .

The attitude in much of science, of course, is that if you can conclusively demonstrate an effect, that its size doesn’t really matter. But I don’t agree with this. For one reason, if we happen to see an effect of +0.02 in one particular place at one particular time, it could well be -0.02 somewhere else. Don’t get me wrong—I’m not saying that this finding is empty, just that we have to be careful about out-of-sample generalization.

Now on to the ethics question. Lumley writes:

The problem is consent. There is a clear ethical principle that experiments on humans require consent, except in a few specific situations, and that the consent has to be specific and informed. . . . The need for consent is especially clear in cases where the research is expected to cause harm. In this example, the Facebook researchers expected in advance that their intervention would have real effects on people’s emotions; that it would do actual harm, even if the harm was (hopefully) minor and transient.

I pretty much disagree with this, for reasons that I’ll explain in a moment. Lumley continues:

The psychologist who edited the study for PNAS said

“I was concerned,” Fiske told The Atlantic, “until I queried the authors and they said their local institutional review board had approved it—and apparently on the grounds that Facebook apparently manipulates people’s News Feeds all the time.”

Fiske added that she didn’t want the “the originality of the research” to be lost, but called the experiment “an open ethical question.”

To me [Lumley], the only open ethical question is whether people believed their agreement to the Facebook Terms of Service allowed this sort of thing. This could be settled empirically, by a suitably-designed survey. I’m betting the answer is “No.” Or, quite likely, “Hell, no!”.

Amusingly enough, this is the same Susan Fiske who was earlier quoted in support of the himmicanes study, but that doesn’t seem to be particularly relevant here.

I don’t feel strongly about the ethical issues here. On one hand, I’d be a bit annoyed if I found that my internet provider was messing with me just to get a flashy paper in a journal (for example, what if someone told me that some researcher was sending spam to the blog, wasting my time (yes, I delete these manually every day) in the hope of getting a paper published in a tabloid journal using a phrase such as “massively online experiment”). Indeed, a couple of years ago I was annoyed that some researchers sent me a time-wasting email ostensibly coming from a student who wanted to meet with me. My schedule is a mess and it doesn’t help me to get fake appointment requests. On the other hand, as Fiske notes, corporations manipulate what they send us all the time, and any manipulation can possibly affect our mood. It seems a bit ridiculous to say that a researcher needs special permission to do some small alteration of an internet feed, when advertisers and TV networks can broadcast all sorts of emotionally affecting images whenever they want. The other thing that’s bugging me is the whole IRB thing, the whole ridiculous idea that if you’re doing research you need to do permission for noninvasive things like asking someone a survey question.

So, do I consider this Facebook experiment unethical? No, but I could see how it could be considered thus, in which case you’d also have to consider all sorts of non-research experiments (the famous A/B testing that’s so popular now in industry) to be unethical as well. In all these cases, you have researchers, of one sort or another, experimenting on people to see their reactions. And I don’t see the goal of getting published in PNAS to be so much worse than the goal of making money by selling more ads. But, in any case, I don’t really see the point of involving institutional review boards for this sort of thing. I’m with Tal Yarkoni on this one; as he puts it:

It’s not clear what the notion that Facebook users’ experience is being “manipulated” really even means, because the Facebook news feed is, and has always been, a completely contrived environment. . . . Facebook—and virtually every other large company with a major web presence—is constantly conducting large controlled experiments on user behavior.

Again, I can respect if you take a Stallman-like position here (or, at least, what I imagine rms would say) and argue that all of these manipulations are unethical, that the code should be open and we should all be able to know, at least in principle, how our messages are being filtered. So I agree that there is an ethical issue here and I respect those who have a different take on it than I do—but I don’t see the advantage of involving institutional review boards here. All sorts of things are unethical but still legal, and I don’t see why doing something and publishing it in a scientific journal should be considered more unethical or held to a more stringent standard than doing the same thing and publishing it in an internal business report.

P.S. This minor, minor story got to me what seems like a hugely disproportionate amount of attention—I’m guessing it’s because lots of people feel vaguely threatened by the Big Brother nature of Google, Facebook, etc., and this is a story that gives people an excuse to grab onto these concerns—and so when posting on it I’m glad of our 1-to-2-month lag, which means that you’re seeing this post with fresh eyes, after you’ve almost forgotten what the furore was all about.

19 thoughts on “If you do an experiment with 700,000 participants, you’ll (a) have no problem with statistical significance, (b) get to call it “massive-scale,” (c) get a chance to publish it in a tabloid top journal. Cool!”

Jens on November 13, 2014 10:52 AM at 10:52 am said:

I think one could even take the ethical argument one step further.

Given that Facebook (and others) manipulate what we see all the time, and everybody who cares should already know this, I don’t see that this experiment does any (additional) harm. On the contrary, this may be the first time that “normal people” have a chance to see what effect this manipulation can have on us. I think having such results published in the open rather than privately available only to some corporations (as they already were) could be hugely beneficial.

Reply ↓
Thom on November 13, 2014 11:07 AM at 11:07 am said:

There are two aspects to the ethical issue though. One is the study itself – and there one can make an argument (although I don’t fully agree with it) that facebook, google etc. do similar stuff all the time so it is OK. My take is that it isn’t surprising and isn’t a really serious ethics violation, but that doesn’t make it ethical. The second is whether the academic authors involved in the study knowingly used the involvement of facebook to get round institutional ethics procedures. That is a pretty serious breach in my view.

Reply ↓
question on November 13, 2014 11:42 AM at 11:42 am said:

It’s comforting that would-be society manipulators are relying on dynamite charts and strawman NHST to inform their plans.

Reply ↓
Rahul on November 13, 2014 11:49 AM at 11:49 am said:

Just because non-research settings or corporations routinely do certain things is not a good reason for universities or journals to automatically consider those practices as not unethical.

There’s no reason for a university-IRB’s ethical standard to be perfectly at par with a corporation’s.

People expect TV networks, advertisers, big bad corporations etc. to manipulate us. I don’t think they view university researchers in the same light. And I don’t see any good reasons to drag down the level of public trust in academia to the same low level as corporations.

Reply ↓
- Elin on November 15, 2014 10:34 AM at 10:34 am said:
  
  +1
  
  Reply ↓
Lee Sechrest on November 13, 2014 2:24 PM at 2:24 pm said:

Someone else pointed out (I can’t remember where) in the voluminous internet discussion of this study that there is NO evidence that anything other than verbal behavior was affected in any way. All that was demonstrated is that the intervention presumably influenced respondent’s word choices for a brief period of time.

Lee

Reply ↓
jflycn on November 13, 2014 11:32 PM at 11:32 pm said:

People expect their emotion be manipulated by the movie, but not by the movie theater.

It is understandable if a movie theatre A/B test sound effects to improve service. But it is not OK to cut a line in the movie to see people’s reaction.

Andrew’s arguments are not valid.

Reply ↓
- Popeye on November 14, 2014 10:30 AM at 10:30 am said:
  
  Facebook doesn’t simply show you everything your friends have posted in chronological order. Only some posts are shown to you, and Facebook decides on the order in which they are displayed. They also put ads in your feed.
  
  This is just what they do. They want to maximize user engagement with the site, so that they can make as much money as possible (from advertisers). They are not simply a passive conduit.
  
  Reply ↓
anonymous on November 13, 2014 11:56 PM at 11:56 pm said:

Not being a Facebook user, I’m unclear on this study. Facebook sends news feeds, and then reads people’s postings to their friends later on? On the question of ethics, what if they changed the news, for example, to announce 4 new ebola cases in NYC? I think the most disturbing part of all this is the feeling that we are all in one constant commercial or marketing study. The crowd should thwart these studies by filling their posts with “I am not influenced by your sending me biased news feeds, so please stop,” or otherwise skewing involuntary studies.

Reply ↓
Pingback: Friday links: female ESA award winners, #overlyhonestcitations, academic karma, and more | Dynamic Ecology
L on November 14, 2014 7:34 AM at 7:34 am said:

“[Users] do not expect the algorithm to systematically choose to harm them (which is what they get if they happen to be on the downer side of the experiment).

“Suppose I stand out on my street corner and hand out candy to anyone walking by. Today it’s lemon, tomorrow it’s cherry, another day grape, but they’re all colored blue, so you don’t know what you get until you taste it. You may like one better than the other, but they’re pretty much all more or less OK. My friend in the house across the street takes pictures of everyone, and analyzes whether lemon or cherry is preferred. We do this for months at a time. Then one day I decide to run an experiment. Some folks I give cherry, and some folks I give cherry laced with quinine (bitter but fairly harmless for most people). Then we watch what happens. I think most people would agree that that would be a pretty nasty trick.”

https://freedom-to-tinker.com/blog/felten/facebooks-emotional-manipulation-study-when-ethical-worlds-collide/#comment-21782

Reply ↓
- Adam Schwartz on December 3, 2014 9:08 AM at 9:08 am said:
  
  Your post raised a couple thoughts for me. The first, building on your example, and perhaps it’s reductio ad absurdum, what if the company in question was A/B testing whether adding a small amount of cocaine to their product would cause you to buy more. Certainly, not enough cocaine to be deadly, but enough to “motivate” (by way of addiction) to purchase. Clearly a problematic experiment.
  
  The second thing that popped to mind was motivation. It’s true that companies A/B test stuff all the time to figure out what will drive customers to purchase more. I’d consider this behavior amoral. Selling product is what they do and isn’t really good or bad. Once you become aware of your product doing harm (eg Tobacco), continuing to push your product becomes immoral. However, the motivation of this study wasn’t the same as trying to sell product. True, they got an effect size so negligible that the odds of harm being done is truly small. However, they didn’t know that entering into the study, and their manipulation could have had substantial effects on an individuals well being – moving it, in my mind, from an amoral situation to an immoral one.
  
  Reply ↓
Pingback: Shared Stories from This Week: Nov 14, 2014
polymath on November 17, 2014 1:54 AM at 1:54 am said:

IRBs are a good way to supervise academic researchers. If an IRB doesn’t supervise the research methods, no one will. Other than IRBs, most supervision of PIs is actively discouraged (we call it academic freedom). Also, participants in academic studies often receive no intrinsic benefit from their interaction with the researchers.

IRBs are a much less good way to manage people in industry who are learning about their customers. When a supermarket moves milk around the store and observes the effect on customer traffic patterns and order sizes, when a retailer compares revenue during a 30% sale with revenue during a 10% sale, when an advertiser compares two different creatives and uses the one that performs better, or when a consumer internet service observes user behavior in response to changes in its user interface or content selection criteria, an IRB is completely inappropriate.

Learning about interactions outside an IRB context isn’t an ethical violation, it’s life. The participation of academic researchers in an otherwise unremarkable industry study, or the publication of results from such a study, should if anything be encouraged, because broadening awareness of the study allows it to be critiqued.

Participation in a study that causes real harm should be an ethical violation. Despite all the controversy about this study, it just wasn’t harmful. Putting milk in the back of the store isn’t “harmful” even if people need to walk more. Many people complained about how harmful this study might have been, but those complaints were largely wrong. The effect size is tiny.

On a related but different note, I’m troubled by the fetishization of “consent” in this context (cf. “To me [Lumley], the only open ethical question is whether people believed their agreement to the Facebook Terms of Service allowed this sort of thing”). True consent takes time and often the issues are hard to understand. Giving that time and energy can easily have consequences that are more negative than being in the bad arm of one of these internet company experiments.

Imagine being asked to “consent” when you walked into a supermarket, that the supermarket could move items around the store from day to day and observe your response. If you actually spent the 60 seconds an average shopper would need to understand what the supermarket was asking to do, and compared that to the harm of being in the bad arm of some experiment where you have to walk a little farther, you’d be upset that the questioner just wasted 60 seconds of your life. Like the annoying and useless banner disclaimer, required by EU law, that “this site uses cookies, and by using the site you consent,” only worse because true consent requires the user to understand (e.g., what cookies are).

People are pretty resilient. We need to be because the world is an uncertain place. Let’s be realistic about comparing the harms of these treatments (mostly epsilon) to the harm of obtaining consent (often significant for true consent).

Reply ↓
Nate on November 21, 2014 10:20 AM at 10:20 am said:

You can find a wee p-value with any sample size so long as you “correct” the data enough times.

See “Everything wrong with P-values under one roof” http://wmbriggs.com/blog/?p=9338
and “The alternative to P-values” http://wmbriggs.com/blog/?p=9414

“Repeat: the p-value is silent as the tomb on the probability the alternate hypothesis is true. Yet nobody remembers this, and all violate the injunction in practice.”

Reply ↓
Pingback: Manipulação de opiniões no Facebook… Manipulação? | Mineração de Dados
Pingback: A/B testing with large samples | Bespoke Data Insights
Pingback: What's misleading about the phrase, "Statistical significance is not the same as practical significance" - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
Pingback: What’s a “small” effect, anyway, and when are they worth caring about? | Dynamic Ecology

19 thoughts on “If you do an experiment with 700,000 participants, you’ll (a) have no problem with statistical significance, (b) get to call it “massive-scale,” (c) get a chance to publish it in a tabloid top journal. Cool!”

Leave a Reply Cancel reply