50 shades of gray: A research story

This is a killer story (from Brian Nosek, Jeffrey Spies, and Matt Motyl).

Part 1:

Two of the present authors, Motyl and Nosek, share interests in political ideology. We were inspired by the fast growing literature on embodiment that demonstrates surprising links between body and mind (Markman & Brendl, 2005; Proffitt, 2006) to investigate embodiment of political extremism. Participants from the political left, right and center (N = 1,979) completed a perceptual judgment task in which words were presented in different shades of gray. Participants had to click along a gradient representing grays from near black to near white to select a shade that matched the shade of the word. We calculated accuracy: How close to the actual shade did participants get? The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01). Our conclusion: political extremists perceive the world in black-and-white, figuratively and literally. Our design and follow-up analyses ruled out obvious alternative explanations such as time spent on task and a tendency to select extreme responses. Enthused about the result, we identified Psychological Science as our fall back journal after we toured the Science, Nature, and PNAS rejection mills. The ultimate publication, Motyl and Nosek (2012) served as one of Motyl’s signature publications as he finished graduate school and entered the job market.

Part 2:

The story is all true, except for the last sentence; we did not publish the finding. Before writing and submitting, we paused. Two recent papers highlighted the possibility that research practices spuriously inflate the presence of positive results in the published literature (John, Loewenstein, & Prelec, 2012; Simmons, Nelson, & Simonsohn, 2011). Surely ours was not a case to worry about. We had hypothesized it, the effect was reliable. But, we had been discussing reproducibility, and we had declared to our lab mates the importance of replication for increasing certainty of research results. We also had an unusual laboratory situation. For studies that could be run through a web browser, data collection was very easy (Nosek et al., 2007). We could not justify skipping replication on the grounds of feasibility or resource constraints. Finally, the procedure had been created by someone else for another purpose, and we had not laid out our analysis strategy in advance. We could have made analysis decisions that increased the likelihood of obtaining results aligned with our hypothesis. These reasons made it difficult to avoid doing a replication. We conducted a direct replication while we prepared the manuscript. We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at alpha = .05.

Part 3:

The effect vanished (p = .59).

Their paper is all about how to provide incentives for this sort of good behavior, in contrast to the ample incentives that researchers have to publish their tentative findings attached to grandiose claims.

P.S. I wrote this a couple months ago (as regular readers know, most of the posts on this blog are on a delay of two months or so) but now that it appears I realize it is very relevant to our discussion of statistical significance from the other day.

22 thoughts on “50 shades of gray: A research story

  1. Disclosure, I didn’t read the whole paper yet, but I grepped it for “bet” and didn’t find any hits. It looks like a great one though; like the Tukey quote.

    What if each paper was published with a bet that authors place on the positive association? Then haters could go and replicate to falsify the bet. There’s a whole mega-host of problems associated with this (http://ashokarao.com/2013/07/04/on-bets/) – but this seems like a pretty good way to discern papers where authors believe what they publish from, well, that where the “ample incentives” dominate.

    Simple practice: each co-author, on the title page, inserts his odds parenthetically.

    I’m not saying there’s no value in results where the author doesn’t have confidence (though it’d be nice to know what exactly it is). This practice would encourage a focus on what might actually be important for such papers than the results themselves.

    • Ashok:

      It would be easy enough to compute these posterior probabilities. The key is to use a real prior. With a flat prior, even a completely non-stat-signif result that’s 1 se from zero will still give an 84% posterior probability that the effect is positive, while a routine 2 se result yields an implausibly optimistic 97.5% posterior probability. An appropriately zero-centered prior will pull these probabilities toward 50%.

        • Here’s AG from another post: “Here [when I say “real prior”] I’m not talking about a subjective prior that is meant to express a personal belief but rather a distribution that represents a summary of prior scientific knowledge. Such an expression can only be approximate (as, indeed, assumptions such as logistic regressions, additive treatment effects, and all the rest, are only approximations too), and I agree with Senn that it would be rash to let philosophical foundations be a justification for using Bayesian methods. Rather, my work on the philosophy of statistics is intended to demonstrate how Bayesian inference can fit into a falsificationist philosophy that I am comfortable with on general grounds.”

  2. Pingback: 50 shades of gray: A research story | Symposium Magazine

    • Neat!

      I was involved in something similar, but did not think about _spinning it_ into a methodology paper :-(
      Some regression analysis had found a significant increase in hospital mortality in the wee hours of the morning (which folks had been worried might be the case), but the senior clinician was not convinced and suggested we wait six months when we would get another year of data. We did and it went away. The junior clinician was furious pointing out he could have obtained two publications out of this (the false positive and then truer negative) and he never worked with us again.

      An interesting point about the Sterling 1959 paper they quote, the topic was suggested by RA Fisher.

      And I like their points “Transparency can improve our practices even if no one actually looks, simply because we know that someone could look” and “Openness is not needed because we are untrustworthy; it is needed because we are human”. I tried to get those ideas across here http://statmodeling.stat.columbia.edu/2012/02/12/meta-analysis-game-theory-and-incentives-to-do-replicable-research/

  3. Pingback: Spurious significance, junk science | Frederick Guy

  4. Pingback: 50 shades of gray: A research story | IPOPSOS

  5. Pingback: Edgar Alfonseca » Education attainment, the poverty rate, and Medicaid coverage (NYC) Pt. 3

  6. Pingback: Friday links: reproducibility, toddlers=urban naturalists, resigning your faculty position, and more | Dynamic Ecology

  7. Pingback: Uncategorised links | The Yorkshire Ranter

  8. Word-fail in “The disinterest in replication is striking given its centrality to science” is especially sad given the importance of the sentiment expressed. A “disinterested” party has no conflict of interest—what the writers meant is “lack of interest”.

    Everything in the above paragraph is true except the first word: the word-fail has been fixed in the final version of the paper (link provided in comments by Brian Nosek): “The lack of interest in replication is striking given its centrality to science”. Good work!

  9. Pingback: Links 8/5/13 | Mike the Mad Biologist

  10. Pingback: Friday Links | Meta Rabbit

  11. Pingback: Around the Blogs, Vol. 101: Long Wait, Long List | Symposium Magazine

Comments are closed.