A study fails to replicate, but it continues to get referenced as if it had no problems. Communication channels are blocked.

Posted on October 24, 2018 9:16 AM by Andrew

In 2005, Michael Kosfeld, Markus Heinrichs, Paul Zak, Urs Fischbacher, and Ernst Fehr published a paper, “Oxytocin increases trust in humans.” According to Google, that paper has been cited 3389 times.

In 2015, Gideon Nave, Colin Camerer, and Michael McCullough published a paper, “Does Oxytocin Increase Trust in Humans? A Critical Review of Research,” where they reported:

Behavioral neuroscientists have shown that the neuropeptide oxytocin (OT) plays a key role in social attachment and affiliation in nonhuman mammals. Inspired by this initial research, many social scientists proceeded to examine the associations of OT with trust in humans over the past decade. . . . Unfortunately, the simplest promising finding associating intranasal OT with higher trust [that 2005 paper] has not replicated well. Moreover, the plasma OT evidence is flawed by how OT is measured in peripheral bodily fluids. Finally, in recent large-sample studies, researchers failed to find consistent associations of specific OT-related genetic polymorphisms and trust. We conclude that the cumulative evidence does not provide robust convergent evidence that human trust is reliably associated with OT (or caused by it). . . .

Nave et al. has been cited 101 times.

OK, fine. The paper’s only been out 3 years. Let’s look at recent citations, since 2017:

“Oxytocin increases trust in humans”: 377 citations
“Does Oxytocin Increase Trust in Humans? A Critical Review of Research”: 49 citations

OK, I’m not the world’s smoothest googler, so maybe I miscounted a bit. But the pattern is clear: New paper revises consensus, but, even now, old paper gets cited much more frequently.

Just to be clear, I’m not saying the old paper should be getting zero citations. It may well have made an important contribution for its time, and even if its results don’t stand up to replication, it could be worth citing for historical reasons. But, in that case, you’d want to also cite the 2015 article pointing out that the result did not replicate.

The pattern of citations suggests that, instead, the original finding is still hanging on, with lots of people not realizing the problem.

For example, in my Google scholar search of references since 2017, the first published article that came up was this paper, “Survival of the Friendliest: Homo sapiens Evolved via Selection for Prosociality,” in the Annual Review of Psychology. I searched for the reference and found this sentence:

This may explain increases in trust during cooperative games in subjects that have been given intranasal oxytocin (Kosfeld et al. 2005).

Complete acceptance of the claim, no reference to problems with the study.

My point here is not to pick on the author of this Annual Review paper—even when writing a review article, it can be hard to track down all the literature on every point you’re making—nor is it to criticize Kosfeld et al., who did what they did back in 2005. Not every study replicates; that’s just how things go. If it were easy, it wouldn’t be research. No, I just think it’s sad. There’s so much publication going on that these dead research results fill up the literature and seem to lead to endless confusion. Like a harbor clotted with sunken vessels.

Things can get much uglier when researchers whose studies don’t replicate refuse to admit it. But even if everyone is playing it straight, it can be hard to untangle the research literature. Mistakes have a life of their own.

33 thoughts on “A study fails to replicate, but it continues to get referenced as if it had no problems. Communication channels are blocked.”

Ruben Arslan on October 24, 2018 10:27 AM at 10:27 am said:

This unfortunately doesn’t seem to be an isolated case at all in psychology.
See my brief citation analysis of the studies that did and did not replicate in the Reproducibility Project Psychology: https://rubenarslan.github.io/posts/2018-09-23-are-studies-that-replicate-cited-more/
Not only do citations not predict replicability (even I wasn’t so optimistic), there isn’t even a drop in citations for those studies that didn’t replicate compared to those that did (as we might have expected from the stronger signal of retraction which does lead to a drop). Might not be the best set of studies to look at though (the RPP didn’t particularly highlight individual studies), but you’d think if someone builds on a study replicated in the RPP, they check the details.

Reply ↓
- Anonymous on October 25, 2018 3:25 PM at 3:25 pm said:
  
  “Not only do citations not predict replicability (even I wasn’t so optimistic), there isn’t even a drop in citations for those studies that didn’t replicate compared to those that did (as we might have expected from the stronger signal of retraction which does lead to a drop)”
  
  Should you not be familiar with ’em yet, the following two blog posts from 2012 & 2015 by Jussim might be relevant/interesting here:
  
  https://www.psychologytoday.com/us/blog/rabble-rouser/201207/social-psychological-unicorns-do-failed-replications-dispel-scientific
  
  https://www.psychologytoday.com/us/blog/rabble-rouser/201505/slow-nonexistent-scientific-self-correction-in-psychology
  
  I reason psychology is/has been in such a bad state (e.g. publication bias/file-drawing, questionable research practices) that the “input” (e.g. prior reasoning, theorizing, and studies) doesn’t matter much for the “output” (e.g. coming up, and performing, and writing up new studies). I reason this might be reflected by your quote above, and (the gist of) the two blogs i linked to.
  
  I tried to find a solution to (among other things) the problem you (and the blog post by Jussim) possibly make clear by trying to make it so that the “input” actually matters for the “output”. I reason this all especially (and perhaps only) begins to matter when some “higher” standards (e.g. higher powered research, pre-registration, using and interpreting “non-significant” findings, etc.) will be used. I thus also reason that it might be especially useful for those researchers that want to adhere to these “higher” standards to work together in relatively small groups of researchers (my guess would be around 5 could be “optimal”).
  
  I reason that a research and -publication format that focuses not only on both “original” studies and “replication”, but more importantly perhaps, what happens AFTER both of these types of research have been performed, could result in improving many possible current problematic issues in psychological science. I hereby also reason the starting point of a research process (e.g. an “original” or “replication” study) might be of less importance than that what happens after that. You can find the idea/format here: http://statmodeling.stat.columbia.edu/2017/12/17/stranger-than-fiction/#comment-628652
  
  Reply ↓
Martin Modrák on October 24, 2018 12:35 PM at 12:35 pm said:

Ruben Aslan did a nice exploratory analysis, whether studies that replicate get more citations that studies that don’t. His conclusion is that non-replication is not visible in the citation record: https://rubenarslan.github.io/posts/2018-09-23-are-studies-that-replicate-cited-more/

Reply ↓
currentevents on October 24, 2018 1:36 PM at 1:36 pm said:

Replication demands are a form of scientific bullying. Amy Cuddy says that she will not be silenced any more and that her career is thriving

https://twitter.com/amyjccuddy/status/1055146199093862401

She says “I’m speaking up now about things that happened over the last few years because I’ve finally found the courage and sense of purpose to do so. Yes, I want to stand up for myself. I also want to demonstrate that others can do the same — and for each other.”

https://twitter.com/amyjccuddy/status/1054863971902197760
She says ” I have a lot of feelings abt how my field, outside @Harvard, failed to stand up to clear bullying of a junior female scientist who’d already made significant contributions to her field and who clearly supported people from under-represented groups in several ways.”

Reply ↓
- Andrew on October 24, 2018 2:35 PM at 2:35 pm said:
  
  Currentevents:
  
  Leaving aside questions about whether Cuddy has already made significant contributions to her field—it’s not my field and I’ll leave it to others to judge which contributions are significant and which are not—let me emphasize that I have not ever made a “replication demand.” Indeed, in may situations where replications are possible, I don’t recommend replication as I think the designs are too noisy for replications to tell us anything useful (except for telling us the negative information that the study was so noisy).
  
  The original replication of power pose by Ranehill et al. that got this particular story going, was done by people who thought that power pose had a large effect. If you think something has a large effect, it makes sense to replicate the study to understand it further. If you’re skeptical about a study, you could recommend replication, but it can also make sense to recommend an investment in careful measurement instead. In any case, I don’t think it’s appropriate to demand replication. If you really want a study replicated, you can replicate it yourself.
  
  Finally, I fully respect Cuddy’s inclination to speak up for herself. I think that in speaking up she should be careful to accurately represent the actions of others. For example, she wrote that Joseph Simmons and Uri Simonsohn “are flat-out wrong. Their analyses are riddled with mistakes . . .” I never saw Cuddy present any evidence for these claims, and I don’t think it’s true that their analyses were flat-out wrong or riddled with mistakes. To the extent that I’ve inaccurately represented others, I’d like to correct that too.
  
  Reply ↓
  - John on November 18, 2019 11:02 AM at 11:02 am said:
    
    Have you seen this?
    
    Power Posing Is Back: Amy Cuddy Successfully Refutes Criticism
    Kim Elsesser (now trying to solve women’s issues at work—including the wage gap and sexual harassment. I’ve taught courses on gender for eight years at UCLA, have published in the New York Times and Los Angeles Times and have discussed gender issues on Fox News, NPR and BBC. My book, Sex and the Office also dives in to some of the obstacles holding women back. My unique perspective comes from my varied background. I hold a Ph.D. in Psychology from UCLA and two graduate degrees (management and operations research) from the Massachusetts Institute of Technology (MIT), and an undergrad degree in mathematics and computer science from Vassar College)
    https://www.forbes.com/sites/kimelsesser/2018/04/03/power-posing-is-back-amy-cuddy-successfully-refutes-criticism/#24801b983b8e
    
    P-Curving a More Comprehensive Body of Research on Postural Feedback Reveals Clear Evidential Value for Power-Posing Effects: Reply to Simmons and Simonsohn (2017),
    Psychological Science
    Amy J. C. Cuddy, S. Jack Schultz, Nathan E. Fosse
    First Published March 2, 2018
    https://doi.org/10.1177/0956797617746749
    
    I can’t see any reaction of any of the critics looking at its google scholar citations.?
    
    Reply ↓
    - Andrew on November 18, 2019 11:22 AM at 11:22 am said:
      
      John:
      
      Yes, I’ve seen this. It’s terrible. Regarding the particular claims of the Psychological Science paper, see here. I’ve been told that Psychological Science refused to publish a correction because they didn’t want to deal with the controversy.
    - John on November 18, 2019 1:20 PM at 1:20 pm said:
      
      Thank you so much for the quick reply. I don’t understand though. Cuddy, Schultz and Fosse claim to have refuted the criticism towards the power pose research in a March 2018 reply using a p-curve analyses that included 55 studies instead of just 34 as was the case in the paper that criticized the study by Carney, Cuddy, and Yap (a bit odd that a reply comes from a different team than those who did the original paper but ok). You wrote here in October 2018 that “she wrote that Joseph Simmons and Uri Simonsohn “are flat-out wrong. Their analyses are riddled with mistakes . . .” I never saw Cuddy present any evidence for these claims”
      but her reply that you did not mention to me seems to remain uncontested, not even in a working paper or a blog note.? Now as a reply you point to a new newspaper article that criticizes studies based on magnitude-based inference (MBI) based papers. What does that have to do with the power pose research? For their reply on the power poses the researchers did not make use of that method.
    - Andrew on November 18, 2019 1:26 PM at 1:26 pm said:
      
      John:
      
      My bad. I presented an unrelated link that happened to be on my browser. The relevant link is an article by Carol Nickerson linked here. See also this article by Marcus Crede (press release here).
    - John on November 18, 2019 2:11 PM at 2:11 pm said:
      
      Ok, so there is a 2019 study that says they should have compared “power” poses to neutral poses to identify if the effect really comes from the “power” poses. Seems reasonable to me. However, that was not the point of the original criticism, which Cuddy et al claimed to have refuted months before you wrote that she never presented any evidence for her claims even though now you write you knew of the reply. Why did you then claim she never presented any evidence? Had you not seen her reply for more than half a year and only learned about it later?
      The other criticism, which seems to remain unpublished, criticizes that of the “17 replications” that Cuddy claimed confirmed her power posing results she never actually presented 17 replications but just 15 and some were no real replications, some found no significant effect, some in the wrong direction, some only confirmed the results for sub-groups. That is a strong argument, but as long as it is not published there is no reason for Cuddy et al to react. They can just keep claiming that their Psychological Science reply refuted all criticism and collect articles like in Forbes, the Boston Globe, the “CBS This Morning” podcast and elsewhere that help to sell power posing. If even researchers like me will have trouble seeing through this, you know how easy it will be to fool journalists.
    - Andrew on November 18, 2019 2:28 PM at 2:28 pm said:
      
      John:
      
      1. I don’t see any evidence that Cuddy presented that Simmons and Simonsohn “are flat-out wrong. Their analyses are riddled with mistakes . . .” The Cuddy/Schultz/Fosse paper that you cite argues that the results of Simmons and Simonsohn change if you include more studies in the meta-analysis. That is not an argument that they were wrong, nor that their analysis was riddled with mistakes. Cuddy didn’t say that the analysis of Simmons and Simonsohn was fine but that it could change if new studies were added to the analysis; she claimed that they “are flat-out wrong. Their analyses are riddled with mistakes . . .”
      
      2. The article by Crede is published. Not published in Psychological Science, in part because the editors of Psychological Science said they didn’t want to publish more on the topic. Other criticisms of power pose have been published; see for example the papers listed here.
      
      3. Nickerson’s unpublished criticism is unpublished because it takes a lot of work to publish these things, and nobody really has the motivation to do this. (Nickerson was ill when she prepared the document and she is no longer alive.) Tracking these things down is just exhausting. I agree that a reporter doing the google will find headlines ranging from “This Simple ‘Power Pose’ Can Change Your Life And Career” to “Power Poses Don’t Actually Work” to “The ‘Power Poses’ That Will Instantly Boost Your Confidence.” It’s really too bad, and it hasn’t helped that Psychological Science published a flawed paper and then refused to publish criticism of it. In that sense, this is part of a larger issue in science and “post-truth” communication more generally. I see no easy solution here.
    - John on November 18, 2019 3:18 PM at 3:18 pm said:
      
      Ok, I totally agree she didn’t convincingly refute the criticism. It would however have helped to at least mention her claim to have refuted it in the reply to understand the discussion. As for the Meta-Psychology publication, I cannot even find that journal in Scopus, so I am afraid it will not be widely read.?
      I am very sorry to learn about the loss of Carol Nickerson. Her contribution should not be lost. What would you think would be an appropriate way to take up her findings for a comment?
      As for the larger issue of “post-truth” communication more generally to which you see no easy solution I do see an easy solution: Let’s make better use of pages like http://psychfiledrawer.org ! It would be easy to list all replications and replies there but the only information to be found one this case is the original study and its first replication, and the “discussion” only consists of spam. https://curatescience.org/ doesn’t seem to have anything for the author Cuddy. While the ReplicationWiki http://replication.uni-goettingen.de/ is open to all social sciences it leaves out psychology because projects that specialize on that field that but are apparently very incomplete even for cases as prominent as this one. We should promote such projects more because they offer a valuable service to the research community, and the effort needed to share one’s insights there is much lower than to publish a comment in a peer-reviewed journal and much more systematically accessible than posts hidden in different blogs.
    - Andrew on November 18, 2019 3:43 PM at 3:43 pm said:
      
      John:
      
      1. Regarding the Cuddy/Schultz/Fosse paper: my point here is not whether it’s a convincing refutation of anything; my point here is that it does not even claim to argue that Simmons and Simonsohn “are flat-out wrong. Their analyses are riddled with mistakes . . .” What I said was that I’ve never seen Cuddy present any evidence for these claims about Simmons and Simonsohn. A claim that the analysis changes when you add more data is not even a claim of a “mistake,” let alone evidence for it.
      
      2. If you want to read Crede’s article, if you google the title, “A Negative Effect of a Contractive Pose is not Evidence for the Positive Effect of an Expansive Pose: Comment on Cuddy, Schultz, and Fosse (2018),” you can find preprint versions online. Regarding Nickerson’s article: From a scientific standpoint, I don’t see the value in spending more time on power pose. But I recognize that different researchers have different perspectives. I’ve recommended that if people do want to study power pose, they should conduct within-person studies. Is it worth anyone’s time to try to shepherd Nickerson’s paper through the publication process? I don’t know. The world is full of unproven theories whose supporters refuse to let go of them. Nickerson wrote her article out of frustration at seeing the publication of misleading data analysis, but I don’t know that this frustration is enough of a motivation for people to continue following this one up.
      
      3. Yes, more could be done on this. It’s frustrating, though, and at some point exhausting. It takes individuals who are willing to put in the effort to fight the Harvard / Association for Psychological Science machine.
    - John on November 19, 2019 6:21 AM at 6:21 am said:
      
      How does it take individuals who are willing to put in the effort to fight the Harvard / Association for Psychological Science machine to just add information to one of the sites that aggregate replications and that give a much more systematic overview than any blog or journal ever could? Minimal effort involved and I don’t think Harvard (from where Cuddy has left as far as I understand) or the Association for Psychological Science (which clearly does a bad job if they refuse to publish a refutation of the reply) will continue fighting in any way.
    - Andrew on November 19, 2019 7:54 AM at 7:54 am said:
      
      John:
      
      If you want to put effort into this, go for it!
- Anonymous on October 24, 2018 2:51 PM at 2:51 pm said:
  
  Another guy who was really bullied was that Martin Shkreli guy. I mean, they actually put him in JAIL when his financial reports of his hedge fund didn’t replicate. Jeez people, he’s already made substantial contributions to the pharma industry by buying up a relatively unknown drug, ensuring that the political system prevented anyone from competing with him even though the patent was gone, and then raising prices to hundreds of times what they used to be, which is good because…. reasons duh.
  
  Reply ↓
  - Andrew on October 24, 2018 4:10 PM at 4:10 pm said:
    
    Anon, Nick:
    
    Just to be clear, yes, when it comes to power pose and replication issues more generally, I agree with Dana Carney, Anna Dreber, etc., and I do get annoyed when people present, as empirical statements, claims that are not supported by the data (for example, that claim that the effect of power posing on feelings of power has replicated 17 times), but I don’t think that the analogy between Cuddy and Shkreli is so great, Shkreli committed crimes and, at least in the short term, made a drug less available to people, which all seems much much much worse than anything that Cuddy did. I guess that Brian Wansink’s behavior falls somewhere in between: he didn’t mess around with anyone’s drugs, but he did his part to misallocate millions of dollars of public and private funds, which I’d expect to have negative consequences.
    
    Reply ↓
    - Daniel Lakeland on October 24, 2018 7:29 PM at 7:29 pm said:
      
      I don’t think the point is that Cuddy and Shkreli are equivalent, it’s the claim that pretending Cuddy is being bullied because people want her scientific claims to be replicated is about as outrageous as claiming that Shkreli is being bullied because he said he had millions in his hedge fund and in fact the right number was $35 or whatever.
      
      Demanding that people not make false or misleading claims is not bullying.
- Nick Danger on October 24, 2018 3:16 PM at 3:16 pm said:
  
  “Replication demands are a form of scientific bullying.”
  
  Hmm, I suppose soon also demands that mathematical proofs be logical will be seen as “bullying” as well.
  
  Reply ↓
  - Martha (Smith) on October 27, 2018 1:08 AM at 1:08 am said:
    
    However, I don’t think that any woman mathematician would see such demands as “bullying” — after all, we expect our students to give logical mathematical proofs, so we’d be accusing our selves and colleagues of being bullies. (There might be some students who make such bullying claims — they either shape up or flunk.)
    
    Reply ↓
Will Sorenson on October 24, 2018 3:18 PM at 3:18 pm said:

It seems like the issue we have is a result the tools we have being limited and a lack of willingness to adapt new tools. What most researchers do when they look for a new paper is go to google scholar.

What do they see? They see *all* papers that cite it regardless of the nature of that citation. For papers that have thousands of citations, it is not easy to find the small handful of papers that critically examine the claim of the paper.

Researchers (and normal people!) would be far better served if, when searching google scholar, they see links for two different types of citations instead of one:

– Citations: (References that do not critically examine whether the claim of the paper is true)
– Examinations: (References that do critically examine the validity). Someone should be able to think up of a better word than examinations.

So if someone is deciding whether to use a paper as evidence in their own papers, they can quickly look through the citations that are *examinations* and see whether the results have been replicated.

Reply ↓
- Mikhail on October 25, 2018 1:44 AM at 1:44 am said:
  
  There is no such thing as bad citations…
  
  Reply ↓
- Dzhaughn on October 25, 2018 1:28 PM at 1:28 pm said:
  
  I wonder if that is a masters degree level project in machine learning. That would be useful even if it only works 95% of the time.
  
  Reply ↓
- OliP on November 19, 2019 3:41 AM at 3:41 am said:
  
  +1. This would be a huge step forward for research assessment. Google Scholar also linking to PubPeer comments would be a small step in that direction.
  
  Reply ↓
Robert Calin-Jageman on October 24, 2018 4:49 PM at 4:49 pm said:

The story of intranasal oxytocin and trust is more than sad; it’s a scientific tragedy.

The original Kosfeld paper drew very strong conclusions from noisy data (p = 0.029 *one-sided* -> title of “Oxytocin increases trust in humans”) and failed to make a key statistical comparison to establish a selective effect on trust.

Kosfeld et al. (2005) spurred a wave of research on intranasal oxytocin and human trust, but most of it was conducted with insufficient sample sizes. Despite this, nearly all published results were statistically significant, a highly implausible state of affairs. Reviewing the literature in 2016, Walum, Waldman & Young concluded: “…intranasal OT studies are generally underpowered and that there is a high probability that most of the published intranasal OT findings do not represent true effects.”

As the published literature became more rife with seemingly positive results, it became nearly impossible to challenge the idea of oxytocin improving trust. One lab which had reported an initial set of ‘successful’ experiments had manuscripts reporting subsequent failures repeatedly rejected on the basis that the effect was now well established. This lab managed to publish just 39% of all the work it had conducted, all of it suggesting positive effects. Pooling over *all* their data, however, suggested little-to-no effect (see Lane et al., 2016).

From this manufactured certainty, numerous clinical trials have been launched trying to improve social function in children with autism through intranasal oxytocin. These trials have not yet yielded strong evidence of a benefit. Unfortunately, almost 40% of the 261 children so far treated with oxytocin have suffered an adverse event (DeMayo et al. 2017; compared with only 12% of the 170 children assigned a placebo). Thankfully, most (93) of these adverse events were mild; but 6 were moderate, and 3 severe.

Here’s the kicker for this story: oxytocin delivered through the nose is probably inert in terms of brain function; it may not be able to pass through the blood brain barrier (reviewed by Leng & Ludwig, 2016). Some still dispute this, but it seems likely that the very large literature claiming behavioral effects of intranasal oxytocin on human behavior is completely and totally spurious. It’s been a colossal waste of money and time. It gave false hope to those with autism and needlessly harmed clinical trial participants. And the nightmare drags on as oxytocin->trust is *still* being cited and marketed as well-established science.

It should have been possible to know better and do better… but somehow the illusions of certainty held up.

If anyone is interested, I have a post-mortem of this field in press at The American Statistician; a pre-print is here: https://psyarxiv.com/3mztg/

And here are the key references:

DeMayo, M. M., Song, Y. J. C., Hickie, I. B., and Guastella, A. J. (2017), “A Review of the Safety, Efficacy and Mechanisms of Delivery of Nasal Oxytocin in Children: Therapeutic Potential for Autism and Prader-Willi Syndrome, and Recommendations for Future Research,” Pediatric Drugs, Springer International Publishing, 19, 391–410. https://doi.org/10.1007/s40272-017-0248-y.

Lane, A., Luminet, O., Nave, G., and Mikolajczak, M. (2016), “Is there a Publication Bias in Behavioural Intranasal Oxytocin Research on Humans? Opening the File Drawer of One Laboratory,” Journal of Neuroendocrinology, 28. https://doi.org/10.1111/jne.12384.

Leng, G., and Ludwig, M. (2016), “Review Intranasal Oxytocin : Myths and Delusions,” 243–250. https://doi.org/10.1016/j.biopsych.2015.05.003.

Walum, H., Waldman, I. D., and Young, L. J. (2016), “Statistical and Methodological Considerations for the Interpretation of Intranasal Oxytocin Studies,” Biological Psychiatry, Elsevier, 79, 251–257. https://doi.org/10.1016/j.biopsych.2015.06.016.

Reply ↓
- Mikhail on October 29, 2018 1:59 AM at 1:59 am said:
  
  oh, this is depressing
  
  Reply ↓
Dan F. on October 25, 2018 3:34 AM at 3:34 am said:

The fundamental, basic premise of the use of citation counts to measure research productivity is that more citations correlates with article quality.

However, there is much evidence that the contrary is true. The article mentioned in this post is an extreme example, but perhaps it is the case that many (most?) highly cited articles are easily accessible to those who know little and of little profundity. This doesn not contradict the claim that very useful articles and very deep articles are sometimes highly cited. The question is one of inference, of the simplest kind, and a good illustration that naive, pure thought reasoning can lead to bad priors.

Most administrators would infer from 3000+ citations that an article was truly important and its authors worthy of promotion, pay raises, and other cheaper sorts of adoration. However, perhaps the correct inference, absent other information, is that they are charlatans, self-promoters, and cheaters.

This issue needs far more attention than it is usually given.

Reply ↓
Mikhail on October 25, 2018 5:32 AM at 5:32 am said:

Isnt replication is the simplest form of Statistical inference? Like after your study produces results, you want to check how reliable these results are. And the simplest way to do it is to run the whole study again.

But during the last 100 years we have developed a lot of advanced statistical tools so you dont need to waste your time to perform every research multiple times. All you need is to ask your “appropriate statistical inference” how reliable your results are, and “statistical inference” would reply something like “wow, this is totally sure”, “maybe a little but uncertain” or “you have not learned anything”. And then you go and collect more data or do additional experiments, depending on where you uncertainty are.

But it is not how it works today. Your “statistical inference” tells you that p<0.000001, this is interpreted as "your results as reliable as 1+1=2". And if your really trust your statistics, demands for replication looks like bullying indeed.

Reply ↓
Jonathan (another one) on October 25, 2018 10:35 AM at 10:35 am said:

“Data suggest that problematic research was approvingly cited more frequently after the problem was publicized.”
http://www.gsood.com/research/papers/error.pdf

Reply ↓
Andrew [not Gelman] on October 25, 2018 4:00 PM at 4:00 pm said:

You should check out “sans forgetica”, the font designed to boost memory: https://www.washingtonpost.com/business/2018/10/05/introducing-sans-forgetica-font-designed-boost-your-memory/?noredirect=on&utm_term=.cbe85e8e4972

You should also check out these recent review papers:
1) Meyer, A., Frederick, S., Burnham, T. C., Guevara Pinto, J. D., Boyer, T. W., Ball, L. J., … & Schuldt, J. P. (2015). Disfluent fonts don’t help people solve math problems. Journal of Experimental Psychology: General, 144(2), e16.

2) Xie, H., Zhou, Z., & Liu, Q. (2018). Null Effects of Perceptual Disfluency on Learning Outcomes in a Text-Based Educational Context: a Meta-analysis.

Reply ↓
Oliver C. Schultheiss on October 29, 2018 3:58 PM at 3:58 pm said:

Robert (further up) already referred readers to the excellent paper by Lane et al:

Lane, A., Luminet, O., Nave, G., and Mikolajczak, M. (2016), “Is there a Publication Bias in Behavioural Intranasal Oxytocin Research on Humans? Opening the File Drawer of One Laboratory,” Journal of Neuroendocrinology, 28. https://doi.org/10.1111/jne.12384.

For a further analysis of the state of the field in social neuroendocrinology, including an update on power posing, a self-critical review of my own research in this area and, perhaps most importantly, pointers to how things can get better in the future, this chapter may be of interest:

Schultheiss, O. C., & Mehta, P. H. (in press). Reproducibility in social neuroendocrinology: Past, present, and future. In O. C. Schultheiss & P. H. Mehta (Eds.), Routledge international handbook of social neuroendocrinology. Abingdon, UK: Routledge. Preprint URL: http://www.psych2.phil.uni-erlangen.de/%7Eoschult/humanlab/publications/Schultheiss_Mehta_in_press.pdf

The book also features many chapters that paint a more nuanced picture of oxytocin and social behavior (hint: it can also be associated with increased aggression!). The bottom line is that oxytocin is unlikely to be viewed as a “cuddle hormone” in the future. Just like cortisol is not simply the “stress hormone” or testosterone the “dominance hormone”. Psychophysiological measures are unlikely to have such a simple 1-to-1 relationship with psychological constructs and should have never been portrayed in this manner.

Reply ↓
Michael Nelson on October 29, 2018 5:10 PM at 5:10 pm said:

The solution here is remarkably simple and relatively easy to implement. There just needs to be a wiki, edited and contributed to by individuals with relevant and verified qualifications in the given field, with “provenance” pages for peer-reviewed papers. The initial objective would be to have a page for every paper that has a minimum of n citations, with n starting large and decreasing as more people contribute. Each entry would only need to provide three types of information for its paper: citations to prior papers that developed the concept being extended or limited in the current paper, citations to direct criticism or praise of the current paper, and citations to later papers that also extended or limited the concept in the current paper. Contributors could then embellish pages with narrative summaries, timelines, notes on citations, drop-down boxes for seeing the “family tree” in detail, links to other wiki articles, etc. Keep it open source, have professors encourage their grad students to read and contribute, and encourage researchers and journal reviewers to use it as a resource. Ideally, the main incentive for using it would be the fear of embarrassment of citing a paper that one’s colleagues can easily see is outdated.

This sort of model (minus the wiki format) is already used in the antiquities field, where knowing the provenance or biography of a piece can be the difference between making 1 dollar or 1 million. Under our current system, the incentive is reversed: journals are the only ones making bank on old information, and they make more money when related research is scattered across a hundred pay-walled journals. I suspect researchers have tolerated it because ambiguity as to the validity of a widely-cited paper can benefit a famous author by allowing his or her work to eclipse later critiques or contradictions. Incentives are now moving in the opposite direction for researchers–if everyone loses faith in science, whence the famous scientist?–and the technology is here. Hopefully this type of solution either is or will soon be implemented.

Reply ↓
squirrel on April 11, 2019 1:49 AM at 1:49 am said:

When writing a research paper, *particularly* when 80% of your field’s results are not reproducible (psychology) (partly because of the difficulties of controlling and getting good sample sizes), the stats on EVERYTHING you cite should be checked and caveats to prior research should be stated plainly. You can’t depend on someone else to try to reproduce things, nor that you will be able to find all attempts at reproduction — you not only have to have as extensive of a literature search as you can, but looking at *every* paper on a subject is impossible. Thousands of papers are published every day. You must be responsible for citing things that you have found as being supportive of a hypothesis only if, by your judgment, the statistics are reasonably good. And extremely particularly if being used to support experimentation on human subjects.

Note that by no means is a citation a direct support for a particular paper — many citations are for prior work in the subject, and it is common to mention this background with respect to what it does not do, so that you can fill that hole. You can even get dinged (not published) for not citing well-known papers on a subject, to show that you know the background.

It sounds like, with this oxytocin phenomenon, there were multiple studies potentially supporting it. But, again, I am surprised that (it sounds like) there isn’t a more rigorous standard of background support required for psychology/neurobiology clinical trials. Especially when dealing with populations absolutely desperate for solutions, as many folks with problems in psychology and neurology (and cancer and asthma, etc.) are.

A Master’s project for finding review/critical vs. experimental papers in a subject…. not that I would dissuade anyone from playing with it, but — the whole field of biomedical NLP struggles to combine and compare results from different papers, and figure out what the hell kind of paper they are. Automated classification of papers is surprisingly and exceedingly difficult, due to the flexibility of English, and of paper structure. You might be able to sort based on the word “review” in the title, but would certainly miss a huge number of reviews, and mis-classify non-reviews, of course. I’d think that to have this idea in Google work, you’d have to have the author or journal give each paper a tag upon publication. But prove me wrong!

The provenance idea…. it is good, but, it sounds insanely time-consuming to collect the legacy of each paper. No one would do it. People would argue over who would be a “verifiable expert,” and how they would be verified, and more time yet would be needed to verify them. It soudns like a different version of peer review, which obviously isn’t weeding out the statistically bad stuff anyway. Maybe you have an idea about how to make it more palatable/doable, though…..

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

A study fails to replicate, but it continues to get referenced as if it had no problems. Communication channels are blocked.

33 thoughts on “A study fails to replicate, but it continues to get referenced as if it had no problems. Communication channels are blocked.”

Leave a Reply to Martin Modrák Cancel reply