Skip to content
 

What has happened down here is the winds have changed

Someone sent me this article by psychology professor Susan Fiske, scheduled to appear in the APS Observer, a magazine of the Association for Psychological Science. The article made me a little bit sad, and I was inclined to just keep my response short and sweet, but then it seemed worth the trouble to give some context.

I’ll first share the article with you, then give my take on what I see as the larger issues. The title and headings of this post allude to the fact that the replication crisis has redrawn the topography of science, especially in social psychology, and I can see that to people such as Fiske who’d adapted to the earlier lay of the land, these changes can feel catastrophic.

I will not be giving any sort of point-by-point refutation of Fiske’s piece, because it’s pretty much all about internal goings-on within the field of psychology (careers, tenure, smear tactics, people trying to protect their labs, public-speaking sponsors, career-stage vulnerability), and I don’t know anything about this, as I’m an outsider to psychology and I’ve seen very little of this sort of thing in statistics or political science. (Sure, dirty deeds get done in all academic departments but in the fields with which I’m familiar, methods critiques are pretty much out in the open and the leading figures in these fields don’t seem to have much problem with the idea that if you publish something, then others can feel free to criticize it.)

As I don’t know enough about the academic politics of psychology to comment on most of what Fiske writes about, so what I’ll mostly be talking about is how her attitudes, distasteful as I find them both in substance and in expression, can be understood in light of the recent history of psychology and its replication crisis.

Here’s Fiske:

aps1

aps2

In short, Fiske doesn’t like when people use social media to publish negative comments on published research. She’s implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth. I’ve written elsewhere on my problems with this attitude—in short, (a) many published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication, and (b) publication itself is such a crapshoot that it’s a statistical error to draw a bright line between published and unpublished work.

Clouds roll in from the north and it started to rain

To understand Fiske’s attitude, it helps to realize how fast things have changed.
As of five years ago—2011—the replication crisis was barely a cloud on the horizon.

Here’s what I see as the timeline of important events:

1960s-1970s: Paul Meehl argues that the standard paradigm of experimental psychology doesn’t work, that “a zealous and clever investigator can slowly wend his way through a tenuous nomological network, performing a long series of related experiments which appear to the uncritical reader as a fine example of ‘an integrated research program,’ without ever once refuting or corroborating so much as a single strand of the network.”

Psychologists all knew who Paul Meehl was, but they pretty much ignored his warnings. For example, Robert Rosenthal wrote an influential paper on the “file drawer problem” but if anything this distracts from the larger problems of the find-statistical-signficance-any-way-you-can-and-declare-victory paradigm.

1960s: Jacob Cohen studies statistical power, spreading the idea that design and data collection are central to good research in psychology, and culminating in his book, Statistical Power Analysis for the Behavioral Sciences, The research community incorporates Cohen’s methods and terminology into its practice but sidesteps the most important issue by drastically overestimating real-world effect sizes.

1971: Tversky and Kahneman write “Belief in the law of small numbers,” one of their first studies of persistent biases in human cognition. This early work focuses on resarchers’ misunderstanding of uncertainty and variation (particularly but not limited to p-values and statistical significance), but they and their colleagues soon move into more general lines of inquiry and don’t fully recognize the implication of their work for research practice.

1980s-1990s: Null hypothesis significance testing becomes increasingly controversial within the world of psychology. Unfortunately this was framed more as a methods question than a research question, and I think the idea was that research protocols are just fine, all that’s needed was a tweaking of the analysis. I didn’t see general airing of Meehl-like conjectures that much published research was useless.

2006: I first hear about the work of Satoshi Kanazawa, a sociologist who published a series of papers with provocative claims (“Engineers have more sons, nurses have more daughters,” etc.), each of which turns out to be based on some statistical error. I was of course already aware that statistical errors exist, but I hadn’t fully come to terms with the idea that this particular research program, and others like it, were dead on arrival because of too low a signal-to-noise ratio. It still seemed a problem with statistical analysis, to be resolved one error at a time.

2008: Edward Vul, Christine Harris, Piotr Winkielman, and Harold Pashler write a controversial article, “Voodoo correlations in social neuroscience,” arguing not just that some published papers have technical problems but also that these statistical problems are distorting the research field, and that many prominent published claims in the area are not to be trusted. This is moving into Meehl territory.

2008 also saw the start of the blog Neuroskeptic, which started with the usual soft targets (prayer studies, vaccine deniers), then started to criticize science hype (“I’d like to make it clear that I’m not out to criticize the paper itself or the authors . . . I think the data from this study are valuable and interesting – to a specialist. What concerns me is the way in which this study and others like it are reported, and indeed the fact that they are repored as news at all”), but soon moved to larger criticisms of the field. I don’t know that the Neuroskeptic blog per se was such a big deal but it’s symptomatic of a larger shift of science-opinion blogging away from traditional political topics toward internal criticism.

2011: Joseph Simmons, Leif Nelson, and Uri Simonsohn publish a paper, “False-positive psychology,” in Psychological Science introducing the useful term “researcher degrees of freedom.” Later they come up with the term p-hacking, and Eric Loken and I speak of the garden of forking paths to describe the processes by which researcher degrees of freedom are employed to attain statistical significance. The paper by Simmons et al. is also notable in its punning title, not just questioning the claims of the subfield of positive psychology but also mocking it. (Correction: Uri emailed to inform me that their paper actually had nothing to do with the subfield of positive psychology and that they intended no such pun.)

That same year, Simonsohn also publishes a paper shooting down the dentist-named-Dennis paper, not a major moment in the history of psychology but important to me because that was a paper whose conclusions I’d uncritically accepted when it had come out. I too had been unaware of the fundamental weakness of so much empirical research.

2011: Daryl Bem publishes his article, “Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect,” in a top journal in psychology. Not too many people thought Bem had discovered ESP but there was a general impression that his work was basically solid, and thus this was presented as a concern for pscyhology research. For example, the New York Times reported:

The editor of the journal, Charles Judd, a psychologist at the University of Colorado, said the paper went through the journal’s regular review process. “Four reviewers made comments on the manuscript,” he said, “and these are very trusted people.”

In retrospect, Bem’s paper had huge, obvious multiple comparisons problems—the editor and his four reviewers just didn’t know what to look for—but back in 2011 we weren’t so good at noticing this sort of thing.

At this point, certain earlier work was seen to fit into this larger pattern, that certain methodological flaws in standard statistical practice were not merely isolated mistakes or even patterns of mistakes, but that they could be doing serious damage to the scientific process. Some relevant documents here are John Ioannidis’s 2005 paper, “Why most published research findings are false,” and Nicholas Christakis’s and James Fowler’s paper from 2007 claiming that obesity is contagious. Ioannidis’s paper is now a classic, but when it came out I don’t think most of us thought through its larger implications; the paper by Christakis and Fowler is no longer being taken seriously but back in the day it was a big deal. My point is, these events from 2005 and 2007 fit into our storyline but were not fully recognized as such at the time. It was Bem, perhaps, who kicked us all into the realization that bad work could be the rule, not the exception.

So, as of early 2011, there’s a sense that something’s wrong, but it’s not so clear to people how wrong things are, and observers (myself included) remain unaware of the ubiquity, indeed the obviousness, of fatal multiple comparisons problems in so much published research. Or, I should say, the deadly combination of weak theory being supported almost entirely by statistically significant results which themselves are the product of uncontrolled researcher degrees of freedom.

2011: Various episodes of scientific misconduct hit the news. Diederik Stapel is kicked out of the pscyhology department at Tilburg University and Marc Hauser leaves the psychology department at Harvard. These and other episodes bring attention to the Retraction Watch blog. I see a connection between scientific fraud, sloppiness, and plain old incompetence: in all cases I see researchers who are true believers in their hypotheses, which in turn are vague enough to support any evidence thrown at them. Recall Clarke’s Law.

2012: Gregory Francis publishes “Too good to be true,” leading off a series of papers arguing that repeated statistically significant results (that is, standard practice in published psychology papers) can be a sign of selection bias. PubPeer starts up.

2013: Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek, Jonathan Flint, Emma Robinson, and Marcus Munafo publish the article, “Power failure: Why small sample size undermines the reliability of neuroscience,” which closes the loop from Cohen’s power analysis to Meehl’s more general despair, with the connection being selection and overestimates of effect sizes.

Around this time, people start sending me bad papers that make extreme claims based on weak data. The first might have been the one on ovulation and voting, but then we get ovulation and clothing, fat arms and political attitudes, and all the rest. The term “Psychological-Science-style research” enters the lexicon.

Also, the replication movement gains steam and a series of high-profile failed replications come out. First there’s the entirely unsurprising lack of replication of Bem’s ESP work—Bem himself wrote a paper claiming successful replication, but his meta-analysis included various studies that were not replications at all—and then came the unsuccessful replications of embodied cognition, ego depletion, and various other respected findings from social pscyhology.

2015: Many different concerns with research quality and the scientific publication process converge in the “power pose” research of Dana Carney, Amy Cuddy, and Andy Yap, which received adoring media coverage but which suffered from the now-familiar problems of massive uncontrolled researcher degrees of freedom (see this discussion by Uri Simonsohn), and which failed to reappear in a replication attempt by Eva Ranehill, Anna Dreber, Magnus Johannesson, Susanne Leiberg, Sunhae Sul, and Roberto Weber.

Meanwhile, the prestigous Proceedings of the National Academy of Sciences (PPNAS) gets into the game, publishing really bad, fatally flawed papers on media-friendly topics such as himmicanes, air rage, and “People search for meaning when they approach a new decade in chronological age.” These particular articles were all edited by “Susan T. Fiske, Princeton University.” Just when the news was finally getting out about researcher degrees of freedom, statistical significance, and the perils of low-power studies, PPNAS jumps in. Talk about bad timing.

2016: Brian Nosek and others organize a large collaborative replication project. Lots of prominent studies don’t replicate. The replication project gets lots of attention among scientists and in the news, moving psychology, and maybe scientific research, down a notch when it comes to public trust. There are some rearguard attempts to pooh-pooh the failed replication but they are not convincing.

Late 2016: We have now reached the “emperor has no clothes” phase. When seemingly solid findings in social psychology turn out not to replicate, we’re no longer surprised.

Rained real hard and it rained for a real long time

OK, that was a pretty detailed timeline. But here’s the point. Almost nothing was happening for a long time, and even after the first revelations and theoretical articles you could still ignore the crisis if you were focused on your research and other responsibilities. Remember, as late as 2011, even Daniel Kahneman was saying of priming studies that “disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.”

Then, all of a sudden, the world turned upside down.

If you’d been deeply invested in the old system, it must be pretty upsetting to think about change. Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to. What Fiske should really do is cut her losses, admit that she and her colleagues were making a lot of mistakes, and move on. She’s got tenure and she’s got the keys to PPNAS, so she could do it. Short term, though, I guess it’s a lot more comfortable for her to rant about replication terrorists and all that.

Six feet of water in the streets of Evangeline

Who is Susan Fiske and why does she think there are methodological terrorists running around? I can’t be sure about the latter point because she declines to say who these terrorists are or point to any specific acts of terror. Her article provides exactly zero evidence but instead gives some uncheckable half-anecdotes.

I first heard of Susan Fiske because her name was attached as editor to the aforementioned PPNAS articles on himmicanes, etc. So, at least in some cases, she’s a poor judge of social science research.

Or, to put it another way, she’s living in 2016 but she’s stuck in 2006-era thinking. Back 10 years ago, maybe I would’ve fallen for the himmicanes and air rage papers too. I’d like to think not, but who knows? Following Simonsohn and others, I’ve become much more skeptical about published research than I used to be. It’s taken a lot of us a lot of time to move to the position where Meehl was standing, fifty years ago.

Fiske’s own published work has some issues too. I make no statement about her research in general, as I haven’t read most of her papers. What I do know is what Nick Brown sent me:

For an assortment of reasons, I [Brown] found myself reading this article one day: This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype by Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske (Journal of Social Issues, 2005). . . .

This paper was just riddled through with errors. First off, its main claims were supported by t statistics of 5.03 and 11.14 . . . ummmmm, upon recalculation the values were actually 1.8 and 3.3. So one of the claim wasn’t even “statistically significant” (thus, under the rules, was unpublishable).

But that wasn’t the worst of it. It turns out that some of the numbers reported in that paper just couldn’t have been correct. It’s possible that the authors were doing some calculations wrong, for example by incorrectly rounding intermediate quantities. Rounding error doesn’t sound like such a big deal, but it can supply a useful set of “degrees of freedom” to allow researchers to get the results they want, out of data that aren’t readily cooperating.

There’s more at the link. The short story is that Cuddy, Norton, and Fiske made a bunch of data errors—which is too bad, but such things happen—and then when the errors were pointed out to them, they refused to reconsider anything. Their substantive theory is so open-ended that it can explain just about any result, any interaction in any direction.

And that’s why the authors’ claim that fixing the errors “does not change the conclusion of the paper” is both ridiculous and all too true. It’s ridiculous because one of the key claims is entirely based on a statistically significant p-value that is no longer there. But the claim is true because the real “conclusion of the paper” doesn’t depend on any of its details—all that matters is that there’s something, somewhere, that has p less than .05, because that’s enough to make publishable, promotable claims about “the pervasiveness and persistence of the elderly stereotype” or whatever else they want to publish that day.

When the authors protest that none of the errors really matter, it makes you realize that, in these projects, the data hardly matter at all.

Why do I go into all this detail? Is it simply mudslinging? Fiske attacks science reformers, so science reformers slam Fiske? No, that’s not the point. The issue is not Fiske’s data processing errors or her poor judgment as journal editor; rather, what’s relevant here is that she’s working within a dead paradigm. A paradigm that should’ve been dead back in the 1960s when Meehl was writing on all this, but which in the wake of Simonsohn, Button et al., Nosek et al., is certainly dead today. It’s the paradigm of the open-ended theory, of publication in top journals and promotion in the popular and business press, based on “p less than .05” results obtained using abundant researcher degrees of freedom. It’s the paradigm of the theory that in the words of sociologist Jeremy Freese, is “more vampirical than empirical—unable to be killed by mere data.” It’s the paradigm followed by Roy Baumeister and John Bargh, two prominent social psychologists who were on the wrong end of some replication failures and just can’t handle it.

I’m not saying that none of Fiske’s work would replicate or that most of it won’t replicate or even that a third of it won’t replicate. I have no idea; I’ve done no survey. I’m saying that the approach to research demonstrated by Fiske in her response to criticism of that work of hers is an style that, ten years ago, was standard in psychology but is not so much anymore. So again, her discomfort with the modern world is understandable.

Fiske’s collaborators and former students also seem to show similar research styles, favoring flexible hypotheses, proof-by-statistical-significance, and an unserious attitude toward criticism.

And let me emphasize here that, yes, statisticians can play a useful role in this discussion. If Fiske etc. really hate statistics and research methods, that’s fine; they could try to design transparent experiments that work every time. But, no, they’re the ones justifying their claims using p-values extracted from noisy data, they’re the ones rejecting submissions from PPNAS because they’re not exciting enough, they’re the ones who seem to believe just about anything (e.g., the claim that women were changing their vote preferences by 20 percentage points based on the time of the month) if it has a “p less than .05” attached to it. If that’s the game you want to play, then methods criticism is relevant, for sure.

The river rose all day, the river rose all night

Errors feed upon themselves. Researchers who make one error can follow up with more. Once you don’t really care about your numbers, anything can happen. Here’s a particularly horrible example from some researchers whose work was questioned:

Although 8 coding errors were discovered in Study 3 data and this particular study has been retracted from that article, as I show in this article, the arguments being put forth by the critics are untenable. . . . Regarding the apparent errors in Study 3, I find that removing the target word stems SUPP and CE do not influence findings in any way.

Hahaha, pretty funny. Results are so robust to 8 coding errors! Also amusing that they retracted Study 3 but they still can’t let it go. See also here.

I’m reminded of the notorious “gremlins” paper by Richard Tol which ended up having almost as many error corrections as data points—no kidding!—but none of these corrections was enough for him to change his conclusion. It’s almost as if he’d decided on that ahead of time. And, hey, it’s fine to do purely theoretical work, but then no need to distract us with data.

Some people got lost in the flood

Look. I’m not saying these are bad people. Sure, maybe they cut corners here or there, or make some mistakes, but those are all technicalities—at least, that’s how I’m guessing they’re thinking. For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction.

In her article that was my excuse to write this long post, Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.

Some people got away alright

The other thing that’s sad here is how Fiske seems to have felt the need to compromise her own principles here. She deplores “unfiltered trash talk,” “unmoderated attacks” and “adversarial viciousness” and insists on the importance of “editorial oversight and peer review.” According to Fiske, criticisms should be “most often in private with a chance to improve (peer review), or at least in moderated exchanges (curated comments and rebuttals).” And she writes of “scientific standards, ethical norms, and mutual respect.”

But Fiske expresses these views in an unvetted attack in an unmoderated forum with no peer review or opportunity for comments or rebuttals, meanwhile referring to her unnamed adversaries as “methological terrorists.” Sounds like unfiltered trash talk to me. But, then again, I haven’t seen Fiske on the basketball court so I really have no idea what she sounds like when she’s really trash talkin’.

I bring this up not in the spirit of gotcha, but rather to emphasize what a difficult position Fiske is in. She’s seeing her professional world collapsing—not at a personal level, I assume she’ll keep her title as the Eugene Higgins Professor of Psychology and Professor of Public Affairs at Princeton University for as long as she wants—but her work and the work of her friends and colleagues is being questioned in a way that no one could’ve imagined ten years ago. It’s scary, and it’s gotta be a lot easier for her to blame some unnamed “terrorists” than to confront the gaps in her own understanding of research methods.

To put it another way, Fiske and her friends and students followed a certain path which has given them fame, fortune, and acclaim. Question the path, and you question the legitimacy of all that came from it. And that can’t be pleasant.

The river have busted through clear down to Plaquemines

Fiske is annoyed with social media, and I can understand that. She’s sitting at the top of traditional media. She can publish an article in the APS Observer and get all this discussion without having to go through peer review; she has the power to approve articles for the prestigious Proceedings of the National Academy of Sciences; work by herself and her colleagues is featured in national newspapers, TV, radio, and even Ted talks, or so I’ve heard. Top-down media are Susan Fiske’s friend. Social media, though, she has no control over. That’s must be frustrating, and as a successful practioner of traditional media myself (yes, I too have published in scholarly journals), I too can get annoyed when newcomers circumvent the traditional channels of publication. People such as Fiske and myself spend our professional lives building up a small fortune of coin in the form of publications and citations, and it’s painful to see that devalued, or to think that there’s another sort of scrip in circulation that can buy things that our old-school money cannot.

But let’s forget about careers for a moment and instead talk science.

When it comes to pointing out errors in published work, social media have been necessary. There just has been no reasonable alternative. Yes, it’s sometimes possible to publish peer-reviewed letters in journals criticizing published work, but it can be a huge amount of effort. Journals and authors often apply massive resistance to bury criticisms.

There’s also this discussion which is kinda relevant:

What do I like about blogs compared to journal articles? First, blog space is unlimited, journal space is limited, especially in high-profile high-publicity journals such as Science, Nature, and PPNAS. Second, in a blog it’s ok to express uncertainty, in journals there’s the norm of certainty. On my blog, I was able to openly discuss various ideas of age adjustment, whereas in their journal article, Case and Deaton had nothing to say but that their numbers “are not age-adjusted within the 10-y 45-54 age group.” That’s all! I don’t blame Case and Deaton for being so terse; they were following the requirements of the journal, which is to provide minimal explanation and minimal exploration. . . . over and over again, we’re seeing journal article, or journal-article-followed-by-press-interviews, as discouraging data exploration and discouraging the expression of uncertainty. . . . The norms of peer reviewed journals such as PPNAS encourage presenting work with a facade of certainty.

Again, the goal here is to do good science. It’s hard to do good science when mistakes don’t get flagged and when you’re supposed to act as if you’ve always been right all along, that any data pattern you see is consistent with theory, etc. It’s a problem for the authors of the original work, who can waste years of effort chasing leads that have already been discredited, it’s a problem for researchers who follow up on erroneous work, and it’s a problem for other researchers who want to do careful work but find it difficult to compete in a busy publishing environment with the authors of flashy, sloppy exercises in noise mining that have made “Psychological Science” (the journal, not the scientific field) into a punch line.

It’s fine to make mistakes. I’ve published work myself that I’ve had to retract, so I’m hardly in a position to slam others for sloppy data analysis and lapses in logic. And when someone points out my mistakes, I thank them. I don’t label corrections as “ad hominem smear tactics”; rather, I take advantage of this sort of unsolicited free criticism to make my work better. (See here for an example of how I adjusted my research in response to a critique which was not fully informed and kinda rude but still offered value.) I recommend Susan Fiske do the same.

Six feet of water in the streets of Evangeline

To me, the saddest part of Fiske’s note is near the end, when she writes, “Psychological science has acheived much through collaboration but also through responding to constructive adversaries . . .” Fisk emphasizes “constructive,” which is fine. We may have different definitions of what is constructive, but I hope we can all agree that it is constructive to point out mistakes in published work and to perform replication studies.

The thing that saddens me is Fiske’s characterization of critics as “adversaries.” I’m not an adversary of pscyhological science! I’m not even an adversary of low-quality psychological science: we often learn from our mistakes and, indeed, in many cases it seems that we can’t really learn without first making errors of different sorts. What I am an adversary of, is people not admitting error and studiously looking away from mistakes that have been pointed out to them.

If Kanazawa did his Kanazawa thing, and the power pose people did their power-pose thing, and so forth and so on, I’d say, Fine, I can see how these things were worth a shot. But when statistical design analysis shows that this research is impossible, or when replication failures show that published conclusions were mistaken, then damn right I expect you to move forward, not keep doing the same thing over and over, and insisting you were right all along. Cos that ain’t science. Or, I should say, it’s a really really inefficient way to do science, for individual researchers to devote their careers to dead ends, just cos they refuse to admit error.

We learn from our mistakes, but only if we recognize that they are mistakes. Debugging is a collaborative process. If you approve some code and I find a bug in it, I’m not an adversary, I’m a collaborator. If you try to paint me as an “adversary” in order to avoid having to correct the bug, that’s your problem.

They’re tryin’ to wash us away, they’re tryin’ to wash us away

Let me conclude with a key disagreement I have with Fiske. She prefers moderated forums where criticism is done in private. I prefer open discussion. Personally I am not a fan of Twitter, where the space limitation seems to encourge snappy, often adversarial exchanges. I like blogs, and blog comments, because we have enough space to fully explain ourselves and to give full references to what we are discussing.

Hence I am posting this on our blog, where anyone has an opportunity to respond. That’s right, anyone. Susan Fiske can respond, and so can anyone else. Including lots of people who have an interest in psychological science but don’t have the opportunity to write non-peer-reviewed articles for the APS Observer, who aren’t tenured professors at major universities, etc. This is open discussion, it’s the opposite of terrorism. And I think it’s pretty ridiculous that I even have to say such a thing which is so obvious.

P.S. More here: Why is the scientific replication crisis centered on psychology?

294 Comments

  1. D Kane says:

    Great summary. But sad that our friends from the Lancet Iraq death estimates did not make the timeline. That is certainly what got me involved with data sharing/replication issues.

    • I think that a long-term consideration of this line of work in conflict epidemiology will reflect quite well on this group, and perhaps a future SMCISS blog post will take this on. (I have worked with them on two more recent household surveys.)

      • D Kane says:

        The people behind the Lancet surveys, especially Les Roberts and Gilbert Burnham still refuse to release the (anonymized data) or the computer code used to compile their results. (And kudos to Mike Spagat for continuing to work on this.) You really think that “long-term consideration” will put them in a good light? If you aren’t transparent, your results can’t be trusted and, given where social science is going, I doubt that history will judge you gently.

        • I have good news about this, which I think is exactly the kind of “changing winds” that Andrew’s blog addresses. In the first survey I worked on with this team, a 2013 update to the Iraq mortality estimates, I argued that we should release a “replication archive” containing all data and analysis code necessary for reproducing the results in the paper. Burnham and company agreed, although I think it was just to humor me: http://ghdx.healthdata.org/record/mortality-iraq-associated-2003-2011-invasion-and-occupation

          During the next survey I helped them with, I was pretty busy with other responsibilities and much less involved in the analysis. I did not have the time to advocate for a data release, let alone do all the work preparing an archive. So I was very happily surprised when the paper came out to see that they created a public archive of all the study data without any urging from me: http://datadryad.org/resource/doi:10.5061/dryad.0nk1p

          • D Kane says:

            Kudos to you and your co-authors! This is just excellent. Any chance that you can get Burnham et al to release at least the code (if not the data) from the 2004/2006 papers? One reason why those results might be outside the confidence intervals of your more recent work might be coding errors.

  2. Keith O'Rourke says:

    > statisticians can play a useful role in this discussion.
    Before 2010 (maybe 2006) I believe most statisticians just dismissed there being much of a problem.

    One comment I remember getting at a meeting of established research statisticians was “you are painting an overly bleak picture of clinical research”.

    But it was really only in 2009 that I became aware of what extra information regulators get about studies that I finally realized the meta-analysis of published clinical papers on randomized trails was largely hopeless for the present.

    Hey that’s what 90% of my academic work was on :-(

    • Rahul says:

      Why is it hopeless? Can you elaborate? Sounds overly pessimistic to me.

      • Keith O'Rourke says:

        You mean like I am painting an overly bleak picture of clinical research ;-)

        What you will often learn from just the published papers can be quite different than what you would learn from all the documents the regulatory gets to see and even audit for authenticity.

        A public or prominent instance of this was – The Cochrane team, led by Dr Tom Jefferson, … decided to produce a systematic review of data held in the CSRs, and to ignore published study reports – the first (and still only) time this had been undertaken within Cochrane.” http://www.cochrane.org/features/neuraminidase-inhibitors-preventing-and-treating-influenza-healthy-adults-and-children

        As a wrote to a former colleague at Cochrane “Being at a regulatory agency this makes perfect sense [to me] and Senn and you I am sure would agree, but one of my colleagues is wondering whether this will cause systematic reviews of just published data to be avoided (if at all possible) in the future”. They responded that they have to make do with what they can get.

        But until this is fixed (e.g. data held in CSRs is made available), I do think it is close to hopeless.

  3. Anonymous says:

    “Hence I am posting this on our blog, where anyone has an opportunity to respond. That’s right, anyone. Susan Fiske can respond, and so can anyone else. Including lots of people who have an interest in psychological science but don’t have the opportunity to write non-peer-reviewed articles for the APS Observer, who aren’t tenured professors at major universities, etc. This is open discussion, it’s the opposite of terrorism. And I think it’s pretty ridiculous that I even have to say such a thing which is so obvious.”

    As someone who’s (in my opinion) perfectly reasonable, topical, and informational anonymous comments were recently deleted from 2 recent APS observer pieces on pre-registration and replications, i just want to thank you for this post, and for allowing anyone to comment, even anonymously.

    Here is Neuroskeptic on why anonymity in science (for instance when commenting on APS observer pieces) might be a good thing:

    http://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(13)00066-1?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1364661313000661%3Fshowall%3Dtrue&cc=y=

    I hope someone will link this post in the comment section of Fiske’s APS piece when it is published.

    • Mark Pawelek says:

      1) Anonymity is good when you absolutely need it to keep your job, save your life, … yet still tell the truth. In most social media anonymity is terrible, because it encourages irresponsible comment. When people aren’t accountable, they too often lie, use slurs, misrepresent, …

      • Anonymous says:

        absolutely tangential to the main discussion: if you limit anonymity to only those that “need” it, the anonymity set (the group of people that one claiming anonymity is indistinguishable from) is limited to people that “need to hide” for whatever reason, and thus everyone in the set is “guilty”, so the use of anonymity techniques itself is a sign of likely “wrongdoing” in the eyes of whoever might have the power to unmask.
        you are right about the lack of social control making anonymous communication different from what we are used to, and often frustrating. but unless it is normal / normatively accepted to claim anonymity “just because” now and then, the value it can offer in a crisis is threatened.

        • Mark Pawelek says:

          Anonymity encourages bad behaviour I’ve witnessed on internet forums. A compromise bans anonymity for people unable to use it responsibly. I’m only talking about anonymity within a community. I’ve no intention of helping any state ban anonymity. Yours is a purely abstract argument for anonymity. Mine is more practical: be prepared to TOS people who misuse it. Running an internet blog/forum/community where you routinely allow idiots to post whatever they want anonymously is a sure way to get it taken down by the PTB. You can be as idealistic as you want, but without a forum or blog you’ve no one to talk to.

      • Roger Sweeny says:

        Dear Professor Goldin-Meadow,

        I am a junior professor at _______. I wish to post an anonymous comment criticizing your speech because I fear it would jeopardize my chances for tenure (several of the full professors here have worked with you or been your students). Would you please certify that I need the anonymity so the comment can be posted?

        Thank you.

  4. Shecky R says:

    Thanks for this… and the problems in social psychology (in particular) research long preceded Susan Fiske and will long follow her. The paradigm for ‘constructive’ criticism of quality long needed changing and is now evolving.

  5. Baruch says:

    I was favorable to her paper — i am sensitive to “standing on giants’ faces” as Dienes quotes Broadbent, but your post convinced me. She is wrong.

  6. Yes, Fiske weakens her argument by refusing to say what she means. Who are these “methodological terrorists”? What “smear tactics” are being employed? What does she classify as an “ad hominem” remark? If you mention the author of a particularly bad study or silly theory, are you speaking “ad hominem”?

    Also, wouldn’t any tenure committee worth its salt look into the merit of any criticisms? If a stream of nasty tweets, on its own, makes them deny someone tenure, what kind of tenure committee is this? I would hope for a little more intellectual ruggedness.

    I was struck by this part of your comment: “Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.”

    There’s an awful lot of money in this, too. According to the Washington Speakers Bureau website, Amy Cuddy’s speaker fees are in tier 6–that is, $40,001 and up. I’m sure she has to pay some of it to the bureau, etc., but even so, she could pull in six figures with just a few talks. The least a speaker of this fee range (or any fee range) can do is admit to error.

  7. Ruben says:

    Some of us spoke to her at the DGPS just now. We emphasised that we want this to be constructive and about the science, that we feel lumped in with non-constructive people, that we know heaps of people whose careers flounder because of the reproducibility crises.
    Here’s some write-up: https://twitter.com/maltoesermalte/status/778548951779794944
    She didn’t really want to talk about this, but she kind of doubled down on the metaphor (both have virtuous goals and don’t mind collateral damage apparently), but said that lots of people told her they are scared their research will be targeted.
    It didn’t feel like this conversation accomplished much.

    • Andrew says:

      Ruben:

      The conversation may have accomplished something. My guess is that Fiske thinks that she has the support of a vast silent majority and that the people she disagrees with are a small group of malcontents and outside agitators. I have no idea what the majority of psychology researchers think about all this—I’ve seen no surveys on the topic—but maybe if Fiske sees pushback in person, she’ll have more of a sense that these are real people who disagree with her, that it’s not just a bunch of bomb-throwers or whatever it is that she was thinking. I’m sure it would take a lot for her to change her research practices or even her general attitudes about research, but a few encounters like this might at least change her perception of attitudes within the field of psychology, and she might then feel a little less confident about her ability to speak on behalf of the profession in this way.

      • Rahul says:

        In my opinion she’s probably right in thinking that she has the support of a vast silent majority.

        • Huffy says:

          No, across the field of academic psychology generally, I am sure she does NOT have much support at all, much less a vast silent majority. I say that from talking to many many people in different areas. However, in the most rotten areas like Social Cognition, though, she probably does have the support of a lot of nervous people with plum academic jobs, because in those fields a ommon path to getting a desirable job has been to p-hack the hell out of your data (or worse) so you appear to have made fascinating discoveries. No doubt about it that those people do not want all these rocks turned over through replications of their own work.

  8. Jeff Helzner says:

    Yet Fiske doesn’t seem to have any issue with fluffy TED talks. Apparently TED provides the quality control she mentions.

    “For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction.”

    mic drop

  9. Simon Gates says:

    – Amy Cuddy’s speaker fees are in tier 6–that is, $40,001 and up.

    Yikes. Well, that would create a bit of an incentive…

  10. Frank Charles says:

    This was a great read. Thank you.

    – Student of Research methods in Psychology.

  11. numeric says:

    I’ve seen very little of this sort of thing in statistics or political science (Sure, dirty deeds get done in all academic departments but in the fields with which I’m familiar, methods critiques are pretty much out in the open and the leading figures in these fields don’t seem to have much problem with the idea that if you publish something, then others can feel free to criticize it.)

    Fisher and the fiducial theory? Also, from an NSF grant proposal (not mine),

    “Finally, a word of advise to the PI. To disparage King’s work, as you frequently do in the proposal, is stupid. You will only hurt yourself. King may very well review this proposal and he is quite likely to be asked to write a letter evaluating your work when you are considered for promotion to tenure. King’s work is hardly the last word on ecological inference. In fact, it is probably the beginning of a whole new stream of research by himself and others that generalize the field. Many smart people have worked on the problem of ecological inference and made little or no progress. King’s book is an important breakthrough. To fail to recognize this is only to provide evidence of your own immaturity.”

    [if I can figure out a way to e-mail this to you anonymously, I’ll send it to you]. The point is that it happens all the time in academic political science, but usually the transactions take place over the phone–this grant proposal was unusual in that someone put how suppression occurs in writing. But if you want a published source over what it takes to succeed in academic political science, see http://www.socsci.uci.edu/~bgrofman/Wuffle-Advice%20to%20Assistant%20Professor.pdf (synopsis–shut up and suck up). I would say that the ability of senior people to eliminate any threats to their theories is the main reason quantitative political science has not progressed in any essential manner since the American Voter. In particular, the rational choice paradigm applied to American national elections has created a nearly completely incorrect view of these elections, which have been centered around race rather than policy choices (this latest election makes it clear that most voters are not choosing based on policy positions).

    • RE: “email anonymously” aren’t there temporary file-sharing sites you can use? Things like http://expirebox.com/

    • Stephen says:

      This reminds me of the time that King intimidated the editor of the American Political Science Review into retracting a paper that was critical of his ecological inference work. In fact, it wouldn’t surprise me if your quote is part of the same case.

    • Andrew says:

      Numeric:

      I’m certainly not saying that the fields of statistics and political science are immune to backdoor politicking. Maybe a better way to say it is that in statistics and political science, I don’t see any real connection between backdoor politicking and attitudes on research controversies.

      In her article, Fiske was not just saying that there are some sleazoids who send anonymous unsolicited letters to tenure review committees; she was directly connecting this behavior to the replication-and-criticism movement. I have no idea if there is any such connection in psychology; my point in mentioning stat and poli sci was to say that I know of no such systematic behavior in those fields, behavior that goes beyond individual assholes and cliques to rise to the level of a deliberate attempt to move the direction of the field via subterfuge.

      • numeric says:

        >level of a deliberate attempt to move the direction of the field via subterfuge.

        I would say Fiske isn’t using subterfuge–she’s just incompetent (but a full professor at Princeton!). When incompetence is pointed out, she reacts like an academic–she attempts to silence the source or use ad hominem attacks. But here’s the nice thing–she has to do it publicly, rather than pick up the phone (which is the standard method in academic political science). That’s because she can’t pick up the phone to silence you.

        >Look. I’m not saying these are bad people. Sure, maybe they cut corners here or there, or make some mistakes, but those are all technicalities—at least, that’s how I’m guessing they’re thinking. For Cuddy, Norton, and Fiske to step back and think that maybe almost everything they’ve been doing for years is all a mistake . . . that’s a big jump to take. Indeed, they’ll probably never take it. All the incentives fall in the other direction.

        Solzhenitsyn says that when you have spent your life establishing a lie, what is required is not equivocation but rather a dramatic self-sacrifice (in relation to Ehrenburg–https://en.wikipedia.org/wiki/Ilya_Ehrenburg). I see no chance of that happening in any social science field–tenure means Fiske will be manning (can I use that phrase?) the barricades until they cart her off in her 80’s. Thinking machines will be along in 20-30 years and then universities can dismantle the social sciences and replace them with those.

        • Andrew says:

          Numeric:

          Just to be clear, I wasn’t saying that Fiske herself was using subterfuge. I was saying that in her article, Fiske seems to be implying that the “methodological terrorists” were using subterfuge by sending unsolicited letters to committees, etc. Or maybe subterfuge isn’t even the point, as she was also expressing anger that the terrorists were acting openly by publishing criticisms on social media.

          • numeric says:

            These letters to committees happen (essentially) in academic political science, also. I was in a department where there were two types of tenure review–an “honest” review and a not so honest review. The way this was handled was on the not so honest reviews, the tenure committee members called the selected letter writers and asked what type of letter they were going to write. If the reply was negative, that individual letter writer was removed and another one was selected. On a “honest” review, the letters were just sent out without pre-check. Of course, I heard about this from those faculty who had been submitted to the honest review, as they were none too happy about being singled out for this honor.

            I guess my overall point is that it is so easy to manipulate these procedures and the people running them may not be “bad” people, but they don’t have a lot of integrity, either intellectually or morally, and it shows in the research. Who will guard these selfsame guardians? Your blog is a step in the right direction, though I think you’re too easy on political scientists because you’re a wannabe–stick to statistics.

  12. Eric Loken says:

    Wow. This will be an interesting discussion. I will say that I applaud her call for civility and fairness. That much is certainly appreciated.

    But with regard to the premises and underlying understanding of the issues, I’m also reminded of that line from the Big Short (I believe it was there)that went something like this – “Whatever that guy is buying, I want to bet against it.”

  13. Andrew, I think we need to talk about widening the picture beyond psychology, and beyond social sciences. Sure, psychology has its problems, but are these problems limited to psychology? Ioannidis’s paper and his recent open letter to a dead colleague suggest that medicine is similar. Indeed, your blog post about acupuncture indicates that medicine as a field doesn’t even know how to get anywhere on this kind of topic. As a society we spend Billions on developing and marketing cancer drugs many of which plausibly violate the basic concept of the hippocratic oath (ie. do no harm, many blockbuster cancer drugs plausibly offer a reliably reduced QUALY count by making people live marginally longer but in very sick conditions). Ioannidis discussed how his first application for a grant was denied de-facto (just ignored), it was to study in a clinical trial the role of antibiotics in chronic sinusitis. Chronic sinusitis affects something like 10% of the US in a very negative way. It turns out that current evidence suggests that it is in fact CAUSED in large part by antibiotics (specifically the depletion of beneficial bacteria). How many examples are there of medicine continuing to do things that are harmful because of poor practices and bad incentives and lack of knowledge of real scientific method, and reliance on cargo-cult statistical analyses with forking paths, and the secret datasets that Keith often talks about here?

    Outside medicine we’ve had recent examples in Economics, the inference on policy implications for austerity relied on by-hand data manipulation errors in Excel (https://www.washingtonpost.com/news/wonk/wp/2013/04/16/is-the-best-evidence-for-austerity-based-on-an-excel-spreadsheet-error/)? Tol’s papers on global warming that you discuss in your post. People read this stuff and make legal and economic policy based on it because it meets their preconcieved notions, and it appears with a big fat Academic Blue Ribbon.

    As academia and especially government funding in academia becomes more and more a “Market For Lemons” those of us who just can’t abide the thought of participating will opt out… and a feedback loop leaves us with either head-down clueless researchers with brief careers who don’t get tenure, or self-promoting puff-piece participants? How far down this path have we gone?

    In Engineering, my impression as a grad student was that for the most part the things being published were records of what people did to study the solution to problems that didn’t really exist using solution methods that were fancy sounding and difficult to explain. My dissertation and the main paper associated to it showed that the standard experimental methods to study soil liquefaction in the laboratory (the undrained cyclic triaxial compression test) were in fact ENTIRELY studying the properties of the rubber membranes used to hold the sample in the apparatus. This has been an active heavily funded area of research for around 50 years. The wikipedia definition of liquefaction as of today includes “A typical reference strain for the approximate occurrence of zero effective stress is 5% double amplitude shear strain”

    In the ground, 5% is ridiculous. It corresponds to a 10 meter thick sand fill in the SF bay settling 50 cm *prior to the onset of liquefaction*. In fact liquefaction occurs with strains of say 3 or 4 cm in such deposits. You need such large strains in a tabletop triaxial test because what determines the liquefaction in the tabletop is the elastic properties of the rubber membrane your sample is encased in, and essentially NOTHING else. Whole careers have been spent publishing papers on tabletop studies of liquefaction and trying to reconcile these studies with the mechanics of real-ground conditions.

    Is this a psychology and social science problem, or does it just permeate the fabric of academia?

    • Danny says:

      RE: The austerity study in economics. In defense of the field, that study was pretty loudly questioned from almost the day it was published. Published in 2010 to lots of skepticism, and by 2011 there were groups asking for their data and trying to replicate. And then failing, so R+R turned over their actual data in excel form and the error was discovered, and this discrepancy was published in 2013.

      All in all, that’s how science should work – skepticism of claims that seem dubious, replications almost immediately, and cooperative sharing of data. It’s also worth noting that the result moved the target variable of GDP growth from -0.1 to 2.2, but only 0.3 of that movement was due to the excel typo. The other movement was due to ‘research degrees of freedom’ – excluding certain countries for unclear reasons and a particular choice of weighting the data.

      Overall I think economics is reasonably good about this (although could still use much improvement) – most PHDs will have to take a large number of dense econometrics and statistics courses. Maybe that helps.

      • I am not on the inside of economics so it’s hard for me to know. Your point seems to be that econ is more skeptical and self-regulated than other fields in social science. That may be true, but nevertheless you admit “researcher degrees of freedom” was the big player in the R+R example. How much do economic conclusions rely on researcher degrees of freedom in the broad field? How methodologically rigid are economics people? My impression is that unbiased least-squares estimators and NHST and static linear regression type analyses are still pretty dominant (see Andrew’s example of the discontinuous polynomial regression on chinese coal pollution near a river for example, I’m limited in the examples I can bring to bear since as I say, i’m not in economics)

        “most PHDs will have to take a large number of dense econometrics and statistics courses. Maybe that helps”

        but statistics as practiced in standard textbook ways needs to take a big bunch of the blame for how things work today. So, perhaps not. Certainly I think an emphasis on unbiased least-squares estimation and randomization based inference and tests of normality of residuals and soforth is the dominant paradigm, treating countries or policies or people as independent samples from random number generators, so perhaps the stats makes things worse in that it embeds everything in an apparently rigorous mathematical formalism. I’d be interested to hear what a broad range of economists think.

        • Danny says:

          I’m not entirely inside either, to be upfront. But I think we may be talking about two different problems here – RDOF and the identification problem.

          With regards to RDOF, I think it exists in economics but likely to a lesser degree than other sciences. Certain R+R is an example of it, but it was also called out very quickly by other economists instead of festering for decades like ego depletion. As an aside, standard textbook statistics isn’t really the problem. It’s not that there’s anything wrong with doing significance tests and using BLUE estimators and linear regression. It’s when unaware researchers who are brilliant in their fields but relatively novice statisticians stumble into these tools, usually with a surface understanding but without a deep understanding of how statistics work. That’s how you get researcher degrees of freedom. If you’re rigorous in your approach, those tools can be utilized properly.

          The identification issues in macro are deeper, and that’s what I tend to worry about more. We make a lot of assumptions in economics, and normally this is fine. When you model a ball rolling across the floor in physics 101, you assume a perfect sphere and zero friction even thought neither of those is true – but it helps you learn about momentum and motion and so forth. Same with econ. When we make assumptions, we can build some fairly good models of how the economy works – sometimes on the macro and sometimes on the micro level. But as the models get more and more complex (and especially as they start to integrate microfoundations into macro models), a few things happen. What happens in your model can become more and more enmeshed with the starting axioms you choose. And you end up with a lot of variables that are really the algebriac leftovers of other variables, which can then be considered as concepts and put into other models. And those variables are really sensitive to calibration in the chaotic sense – small changes in one assumption can lead to moderate changes in a variable which can lead to huge changes in a model’s results.

        • Clyde Schechter says:

          Well, I am not an economist either. But Paul Romer, a former academic who now is chief economist at the World Bank thinks that macroecnomics is a science in failure mode, and thinks that this parallels the evolution of science in general. You can read his arguments at:

          https://www.law.yale.edu/system/files/area/workshop/leo/leo16_romer.pdf

          The gist of it is that economists have cooked up fancy models involving variables that have no measurable counterpart in the real world, and then use these models to draw conclusions that reflect nothing more than the arbitrary assumptions made to identify the model. Not being familiar with the models he criticizes, I can’t assess his claims, but they sound quite plausible. He has been sounding this alarm for quite a while now, and has published numerous papers which you can easily find by Googling the term “mathiness” (which he coined.)

            • Andrew: two rather different Romer posts/articles referenced here. The “mathiness” one you link to was aimed at a specific problem that he argues is common across most of economics. The latest one (link in Clyde’s post) is a broadside aimed at a specific subfield, macroeconomics (and business cycle macro in particular).

              Great post (yours) by the way! No, the economics discipline is not immune. I think I may put this post on the reading list for a quants course I’m teaching.

          • As an economist, I’ll say that IMHO what Danny and Clyde are saying is correct, and that this points to problems with the scientific status of core economics that are quite different (one might say, the opposite) of those in social psychology or bio-medicine. The problems are different because (1) economic research tends to be more constrained by theory – often, models are identified by appealing to a theory which is conventional, even when there is no empirical evidence for it and the mechanism is implausible – and this reduces forking-paths opportunities, or at least imposes a high tariff in terms of the evidence and rhetoric required to follow a fork, and (2) much of it is done with publicly available datasets.

            Even the Tol case is best understood in this way: sure he did some bad statistics and refused to back down, and that’s what people on this site see; but the bigger picture is that he had for a time quite a good career in environmental economics by stubbornly insisting that certain conventional assumptions from economic models – essentially, short-term financial models – applied to climate change questions. With climate change there is a potentially catastrophic downside for every human being over thousands of years, and his approach gives such possible outcomes a trivial weighting, if anything at all. He’s pretty comprehensively lost that argument, but the fact that he was able to hold on as long as he did tells you where the problem is in economics: sticking with the conventional model as long as you possibly can, gets respect. The infamous (prestigious but non-peer reviewed) Journal of Economic Perspectives paper was possible because of the status he had built in the way I have just described.

            There is a lot of economics that’s not in this core – stuff on the margins of psychology, sociology, political science – and the problems encountered there can be somewhat different.

            • Peter Dorman says:

              This discussion is already history, being a day old, but I’ll put in this comment for the record. What’s interesting about an academic profession is not so much the random spray of error or blindness (which is where I would place R+R), but the patterns that structure these things in a systematic way. I see three of these (at least) in the methodology of economics.

              One is researcher degrees of freedom, which is exacerbated, not diminished, by the demand to incorporate assumptions of rationally calculating agents (with the default of no interaction effects that interfere with aggregation). For subfields of economics in which these assumptions are mandatory, like all of macroeconomics and many applied micro fields, empirical results that contradict them are difficult to publish. The idea is to produce research that “successfully” demonstrates that the presumed relationships hold. The biggest areas of RDF are in sample selection and choice of control variables. The quantity of cherry-picking that goes on in them is staggering. That was the basis of my empirical critique of Viscusi on hedonic wage analysis, for instance.

              A second is the fundamental confusion about NHST and what it does or doesn’t say about the hypothesis of interest. A high percentage of empirical studies in economics employs the strategy of rejecting the null and simply concluding that this result is “consistent with” the hypothesis generated by some theoretical model. This is how implausible theories survive over the decades and will continue to survive until the end of time unless more aggressive testing methodologies are adopted.

              The third is the high proportion of empirical work that is about calibration rather than testing, or to put it another way, work that treats the ability to calibrate a model as equivalent to testing it. This was what Romer was really going after, showing in detail how vacuous the procedure is when you look at how small a role evidence plays relative to assumption. But this is not just a problem in macro, it pervades micro as well. Look at almost any applied field and you will find a bunch of calibrated models that perform poorly out of sample because the calibration was all about fitting and not about truly understanding the process by which the data were generated.

              Not all of economics displays these patterns, of course. In every field there are researchers who really understand what it means to assess the evidence, and behavioral economics (even if it draws on that dubious enterprise, psychology), with its anomaly-hunting, is a breath of fresh air. (Don’t take what I say in parentheses too seriously.)

              Incentives are part of the story in economics, and also selection bias (who becomes an economist), and acculturation too. Even so, I think the profession is ripe for a big discussion of these issues. McCloskey showed the demand was there when she created a splash with her argument about the role of rhetoric, but it only gently touched on the methodological problems. She has gone a bit further with her crusade against the obsession with p-values, but still has not tied it to larger conceptual matters, as this blog has done. I think if someone with a bit of name recognition and a secure spot at a research university takes up this cause in economics it will get a lot of play. If anyone reading this (if anyone is reading this) is such a person, please consider taking this on.

    • Dan Simon says:

      I agree completely that the big problem in hard science is less methodological corruption than research direction corruption: researchers inventing irrelevant fake problems and building huge bodies of (often methodologically unobjectionable) research around them based on completely bogus claims of real-world relevancy or deep theoretical significance (or even both). I’ve certainly seen tons of it in my own field.

      But both forms of corruption stem from the same underlying problem: reliance on peer review, carefully insulated from all external influence, as the sole criterion for judging research. Because it’s a sacred principle that only peers may judge peers’ work, research comes to resemble a giant game of “Survivor”, in which everyone’s real goal–whatever the ostensible goals of their field may be–is to advance one’s career by winning the approval of peers by any means necessary. The result is that that the production of valid and valuable scientific research takes a distant back seat to rampant log-rolling, politicking and painstaking conformity to various arbitrary collectively-formulated community norms.

      The only solution is to introduce some kind of external accountability into the process. Unless there exist measurable criteria by which outsiders can judge the value of researchers’ work–and reward or dismiss it accordingly–the research itself will inevitably devolve into self-referential uselessness.

      • Keith O'Rourke says:

        > The only solution is to introduce some kind of external accountability into the process.
        Completely agree.

        I previously suggested random outside audits of academic work on this blog and some were outraged at my lack of trust of others.

        I think the problems are variably in all disciplines except perhaps accidentally and then just temporarily.

        But without those random (representative) outside audits no one will no.

        (In mathematics and related subjects, what to audit and how will be very tricky – proofs and derivations have already mostly been reproduced by outsiders.)

        Can anyone think of a university that might want to volunteer to go first?

      • DM Berger says:

        Dan Simon +1

        Within psychology, this manifests most clearly in the overwhelming prevalence of arbitrary metrics (i.e. measures or experimental outcomes that do not correlate with any “real-world” behaviours or objective outcomes of interest), excessive focus on “basic science” investigating the “theoretical mechanisms underlying behaviour” prior to the basic thorough description of phenomena, all while abandoning prediction or even considering application and practical relevance. And of course there’s the excessive use of college students.

        Even in the clinical domain, where research has been going on for decades, you still have major outcome measures and diagnostic categories (e.g. “depression” as measured by HAM-D and BDI) that are essentially arbitrary. Like we can be reasonably certain antidepressants and psychotherapy reduce HAM-D and BDI scores by a few points, but we have basically zero understanding of what that reduction means in objective terms.

        Closed loops within closed loops.

    • Bob says:

      Daniel wrote: “In Engineering, my impression as a grad student was that for the most part the things being published were records of what people did to study the solution to problems that didn’t really exist using solution methods that were fancy sounding and difficult to explain.”

      My field is communications engineering. My perception is that there are often papers that describe “the solution to problems that don’t really exist.” But, for the most part, the papers are useful. Papers are almost always correct. I recall discussing this issue with a friend who had been president of one of the larger professional societies in the field—his view was that there were many poor papers but very few wrong ones. I think that something similar must be true in math and physics.

      Here’s an example of the first two lines of an abstract from a leading journal in the field:
      In this paper, we consider the problem of estimating the state of a dynamical system from distributed noisy measurements. Each agent constructs a local estimate based on its own measurements and on the estimates from its neighbors. Estimation is performed via a two stage strategy, the first being a Kalman-like measurement update which does not require communication, and the second being an estimate fusion using a consensus matrix.
      See http://ieeexplore.ieee.org/document/4497788/

      That sounds more like an example in the Stan documentation than a paper from APS.

      For perspective, note that a modern iPhone can download files at more than 400 Mbps—we must be doing something right.

      Bob

      • Engineering of course is the application of science to resource constrained economic problems (ie. how to best achieve some business goal, such as building a bridge or supplying water to a town, or reducing battery consumption of an iPhone). In many cases the empirical science is “god given” to the engineers, and the real question is what does the previously verified theory imply practically for the purpose at hand. Many engineers just don’t have to study the world to discover much of anything (that is, everything they rely on was previously verified by others, often decades or centuries ago). In such an environment the main issues are

        1) are you following the god given model correctly (methodological rigor)
        2) did your application pay off in a way that people care about

        My impression is that academic engineers are all about (1) but (2) is swept under the rug in many cases. The result is an academic field full of extremely sophisticated methodologically impeccable solutions to problems that largely don’t exist. It’s not everything, but it’s FAR from rare. If given a choice between applying for a grant to study some very real but complex problem, such as how best to reduce the risk of mosquito born illness in developing countries while doing minimal damage to the local ecology, vs methodologically rigorous application of a favorite method to a non-problem such as machine learning techniques for the use of cell phone sensor data to detect the location of urban heat islands… the latter wins far far too often. (note, even if you really care about urban heat islands, you can probably find them using very simple techniques and some satellite imagery that already exists)

        Of course, in some areas there is no comprehensive god given physical model. For example my graduate studies in soil mechanics. It became clear almost immediately that very basic early assumptions in the field had never been well thought out in the first place. In areas like this, where no comprehensive god given physical model is handed down, Engineers have the same empirical bumbling around that everyone else has. Some basic examples I’m aware of

        1) “Attenuation models” for earthquake shaking. If an earthquake occurs on a fault at point X predict the statistics of the shaking at point Y (peak ground acceleration, duration, peak velocity, total energy, etc)

        2) Mechanism of soil liquefaction: “During shaking, stress is transferred from the grains to the water, since water can not drain during the time-scale of the earthquake, the water pressure increases and this causes loss of strength” (sounds plausible but is actually circular reasoning and makes a fundamental empirically wrong mistake, water pressure *can* diffuse due to flow during even a single cycle of shaking, in fact this is the dominant issue)

        3) Models of the strength of materials. Far too many of these are just regression curves fit through some laboratory data. The extrapolation to real-world conditions may or may not be justified, and the models have no mechanism built in to help constrain the behavior under other conditions. Furthermore, sophisticated 3D computer models are built on this stuff giving the impression of a “solved problem”. Examples include the shear strength of concrete, and the behavior of soil in foundations.

        So, when Engineers are confronted with a problem where they need to develop a theory that has both mechanism and matches empirical reality, they seem to have the same problems everyone else does on average as a field. Empirical development of scientific models is hard. With Engineers, it doesn’t help that you can publish a lot more papers (and thus play the academic game better) solving many non-problems using god-given physical models than you can by developing a scientific research agenda to solve problems where no established model exists.

    • BenK says:

      I would like to remark that the ‘real’ reproducibility crisis was (for many of us) about the attempts by pharma to use drug lead compounds discovered in academic assays with methodological supporting data – and finding that not only did the results not generalize most of the time, but they didn’t even reproduce a fair amount of the time. This was where the rubber met the road – not in epidemiology or social cognition, but in pharma, where for better or worse, the FDA eventually stands athwart claims of progress yelling ‘Show me your data!’ Big Pharma needed to get by the FDA and can’t somehow skate by, because eventually post-market data will catch them, if the initial studies somehow pass.

      It isn’t just social science. Far from it.

      There are claims that areas close to engineering should get a pass – maybe even particle physics. Certainly mathematics.
      I’m inclined to agree. As for areas with no experimental component at all – pre-digital humanities, say – I guess they have completely different problems.

      • Mathematics should get a pass, but mathematics is not an empirical field (the philosophy of intuitionism notwithstanding). That is, whether Fermat’s Last Theorem is true or not does not rely on observations of experiments.

        I doubt particle physics is as clean and pure as you assume. I’ve been reading “Speakable and Unspeakable in Quantum Mechanics” by John Stewart Bell (of Bell’s inequalities fame). He’s pretty harsh on the fundamental philosophies embodied in QM. In particular “wave collapse” and a doubling down on some of the really odd ideas such as many worlds or Copenhagen interpretation. No one denies that QM makes accurate predictions, but Copenhagen says that this is all that is possible, there *does not exist an actual state of a particle* in between measurements. Of course, this is philosophical weirdness in which physicists tie themselves in knots in order to avoid giving up on the idea of locality. Bell’s inequalities prove *there are no local hidden variable theories* and he was actually on the side of QM is non-local, but the world has taken the approach of axiomatizing the idea that *there are no hidden variables* (ie. particles have no inherent state that determines the outcomes of experiments)

        His book discusses how this creates a gulf between the quantum world and the classical world that isn’t bridged by the theory. It’s a bit obscure but my impression is that the way QM is taught these days is a mathematical axiomatic approach that simply takes the Copenhagen interpretation as given. You can’t make much progress on this area if you’ve been taught that the axioms are unassailable and the theorems therefore are proven facts about the universe.

    • Simon Gates says:

      Daniel – yes I believe you are right. Medicine is thoroughly infected with the same problems. There have been lots of efforts to clean up methodology of things like clinical trials and systematic reviews over the years, which have certainly improved things, though lots of issues remain. But I have real concerns over lab studies and small-scale clinical studies up to small clinical trials – the sorts of things that are usually done by individual or small groups of clinicians, and fill up lots of space in medical journals. I don’t really know this field well but I come into contact with it sometimes in planning of clinical trials, and I see many of the same things as in psychology. This worries me because a lot of the time the justification for clinical trials (taking years to run and costing millions) is based on these sorts of study – so if we’re getting these wrong, we’re probably wasting a lot of time and effort.

    • Wonks Anonymous says:

      The excel errors actually had a very minor effect on that paper. Most of the differences with the Herndon, Ash & Pollin results were due to deliberate choices about which data to include and how to analyze it, but the excel stuff is easier for people to understand so that’s what gets referenced. The bigger problem though with the paper is that it didn’t even attempt to establish causality rather than correlation. After all, it should seem obvious that low growth can cause debt rather than merely the other way around. Miles Kimball has written about this.

  14. zbicyclist says:

    A friendly amendment to your timeline:

    1967: Psychology Today magazine is started, to popularize psychological research. It succeeded perhaps too well.

    (from Wikipedia) By “1976 Psychology Today sold 1,026,872 copies….From June 2010 to June 2011, it was the second top consumer magazine by newsstand sales.[5] In recent years, while many magazines have suffered in readership declines, Adweek, in 2013, noted Psychology Today’s 36 percent increase in number of readers.[6]”

    It’s not just that Psychology Today could take obscure social scientists and make them semi-public figures. The success of the magazine meant that people liked to hear about this stuff, and so other news outlets started paying more attention. This was not lost on university PR departments. But whereas the real news (like that state budget proposal) might be scrutinized by experts or at least political opponents, the “gee whiz” findings of social science tended to be reported with uncritical enthusiasm.

  15. David says:

    Genuine question from a new lecturer.

    “All the incentives fall in the other direction.”

    Let’s say I publish a paper in a high impact journal showing an interesting result with a suitably small p value. But then a replication study with a large sample finds no effect; the initial effect was probably just an outlier. No scientific fraud going on – just unfortunate that the initial result was illusory.

    What are the incentives and disincentives for a tenure-track researcher to admit that their published finding is probably wrong? Would they get through tenure review? Would they be able to find another academic job?

    I would guess that the result of admitting your findings were wrong would be no tenure and possibly no future job? If so then we’re actively selecting for researchers who won’t change their minds.

    • Keith O'Rourke says:

      My guess is that until fairly recently, the tenure stuff would have been over and done before any failed (and published) replications came forward.

      Also, one needs needs to keep in mind that adequate replication attempts of actual true findings should fail to be significant x% of the time (e.g. the unknown but true underlying power of the replication attempt). So unless some sloppiness, poor methods or misreporting is uncovered – there _should_ be little to worry about.

      Much more likely that researchers who were sloppy and flashy were be inadvertently selected on – and that’s part of the current challenge.

      (I do remember having to console a very careful researcher when they were feeling inadequate because their studies of clinical treatments – that were considered promising by their specialty – so far all ended up with no promising difference apparent. I told them it was because his studies were properly designed and carefully done that they showed no effects and at least patients and their families would appreciate this if they understood it. They might actually have been thinking of leaving research – fortunately they didn’t.)

  16. Paul Gowder says:

    Part of the problem here is that the ridiculous hyperbole (methodological terrorists? Wtf?) just ruins the potentially reasonable parts of her argument. But. But. Surely there are potentially reasonable parts? In particular, if people are choosing to critique psychological research not by publishing failed replications (even informally), publishing methodological critiques, etc., but by doing things like contacting tenure committees, trying to get people disinvited as speakers, etc., surely that *is* improper? That kind of behavior strikes me as just as much a violation of scientific norms as things like p-hacking: rather than openly criticize research in a public discussion, just taking direct aim at people’s careers in private.

    That being said, as you point out, Andrew, these are really nonspecific accusations. And it may be that there’s no actual evidence of anyone using these tactics. Still, if anyone is doing so, they should stop it.

    • Huffy says:

      The hypothetical junk you made up there, I have never heard of it happening.

    • Karl Kopiske says:

      I don’t think it is lost – people are very aware of her call for civility, just also very happy to point out the irony that her column lacks the same civility she demands. Whether the general tone will change, we will see.

      Also, I do not think that it is just her tone and the hyperbole. It is also the way she lumps together very obviously improper behaviour and very much not improper behaviour. One gets the impression that she genuinely thinks discussion without peer-review (i.e., blogging) is as bad as harrassing family and colleagues of another scientist. Either that, or she uses the latter to make a case against the former.

  17. Rachael Meager says:

    Amazing post. Thanks for keeping this history alive and pushing for better science.

  18. Llewelyn Richards-Ward says:

    Andrew, this piece is simply brilliant and gives a fantastic timeline of what myself and many of my clinical colleagues have been concerned about for many years. Thankyou. Science is about openness, not ego driven patch protection. Isn’t it odd really that people who might know better have such a defensive attitude to critique and questions? I worry for you in this context though. I just hope you don’t join the likes of Gallileo, who once proposed that the world is round. Let me know when the heresy trial is on and I’ll pop over!!!

  19. Galen says:

    Interesting to note that Susan Fiske was also the editor for PNAS of the ill-fated Facebook emotional contagion experiment from 2014. She had only a boilerplate response to my methodological critique (http://dx.doi.org/10.1080/1369118X.2015.1093525), which I sent to her after it was published. Given the national spotlight the piece received, I assume she got a large volume of responses that were not constructive, and I have a lot of sympathy for that. I do wish she had chosen to engage more with legitimate critics of the piece regarding its ethical and methodological problems, however.

    • Noah Motion says:

      I had dismissed that paper as little more than a clear illustration of the difference between statistical and substantive significance (and the influence of sample size on p values), but your paper is very interesting. I’m glad you dug into it in more detail. Thanks for the comment and the link.

      • Galen says:

        Yeah, they brush off the substantive significance by noting they have a bazillion users (I believe that was the exact figure), so, basically, an effect of almost any size could be considered substantively important.

        “After all, an effect size of d = 0.001 at Facebook’s scale is not negligible: In early 2013, this would have corresponded to hundreds of thousands of emotion expressions in status updates per day.”

        Thanks for reading my paper!

  20. Corey says:

    I will not be giving any sort of point-by-point refutation of Fiske

    By a curious coincidence, the internet slang for the act of giving a point-by-point refutation is fisking. Different Fisk, though.

  21. Raphael says:

    Thank you for this. As a graduate student, one form of intimidation that I don’t hear talked about enough is the message things like this article send to people entering the field of psychology. My experience is that those of us who think psychology needs statistical reform are treated pretty openly as outsiders, with vague but fairly constant references to both the supposed meanness of methodological criticism and to the distinction between real psychologists and “methods people”. I have seen (and been told directly by way of warning) how psychology departments view such “methods people” in the hiring process. I can’t help but view Susan Fiske’s article through this lense. By saying something everyone should agree with (don’t bully people) she is helping silence voices that would otherwise criticise a system she is clearly benefitting from. And this is even if we ignore the terrorism part!

    • Llewelyn Richards-Ward says:

      Raphael —

      Being a methods person wins no friends. I once (in a senior govt. role) critiqued a nationwide reliance on some unhelpful practices within the government ministry I worked within. The practice was (very simplistically) using multiple measures of criminal risk on release and allowing clinicians to use them somewhat additively — overestimation of risk hence delaying release of real people. When I presented a Bayesian explanation of why this was not acceptable, the then senior advisor of research (I was a more senior operations manager), who did not even understand Bayesian thinking, had the eloquent reply to a very long critique, “Llew [me] is wrong”. This left my other manager-level colleagues a little gobsmacked. It is many years since I had that role. Since then the same department has had significant legal pressures placed on it to change the same practice, which partially it has done. Another colleague working on sex-offending risk prediction tools did use a Bayesian approach to try and tidy up the fundamental errors, with some success, but again with a lot of pushback from political-level managers. “Yes”, you may experience pressures and intimidation. But doing the right thing is even more important when it ultimately impacts on people’s lives, as any worthwhile research must do. I am sure you will find many like-minded and ethical people; they just may not be the doyens you look up to, necessarily. Change is never easy, but I still believe that good science and ethical practice win in the end — sorry if this sounds a bit like a homily — just encouraging you to persevere. https://rotoruapsychologist.wordpress.com/2016/03/18/slow-burning-research/

    • Neurostormer says:

      Raphael, a word of hope: I didn’t realize until reading this post how recent the “bad stats revolution” has been, especially on the timescales of academic careers. I imagine that part of the hostility you are seeing is because the “anti-method” people are feeling the squeeze, and are lashing out to protect their turf. But unless psychology departs the realm of science, the wave of progress can only roll one way and the path forward for “methods people” will only get easier in the future compared to how it is today.

      • Adversary says:

        I am also a graduate student and should graduate within a year. I work in a biomedical field. Much of the motivation for my dissertation work came from (1) the realization that there exist many “urban legends” in my field that are commonly discussed/invoked, but for which little empirical support exists (I have followed the citation trails, and they don’t end anywhere near what they are used to claim), (2) the realization that much of the empirical evidence in my field comes from underpowered studies with poor statistical practices and open-ended hypotheses, and (3) the feeling that very few people in my field want to solve problems, and would rather go in circles with open-ended experiments that produce vague results that give rise to open-ended hypotheses that inform more open-ended experiments.

        I discuss these problems in my dissertation papers, because they are truly the motivations for my project. So far, most of the feedback I have received from reviewers has consisted of (1) suggestions that I not focus so heavily on limitations and flaws in other studies, and (2) criticism that my results are not interesting or necessarily novel because they are focused on “old” questions that have already been studied. Other reviewers have written reviews that contain enough falsehoods that I am sure they did not read the paper. One review was simply an expression of disappointment at how not novel my study was.

        Colleagues tell me that I am too harsh, and that I should “think about the hard work that went into the studies I am criticizing, and how it would feel to have someone criticize your work that way”. I could not care less about how much hard work went into something if it is an obstacle to solving the problems that I care about. I could not care less about someone’s feelings being hurt because they were criticized for doing poor work. I spend time on PubPeer writing lengthy (but impersonal) reviews detailing why the desired conclusions can’t be drawn from the evidence presented in high-profile studies in my field.

        I’m coming to accept that I am an adversary, as Susan Fiske claims, because I am not in this to be successful or to gain status or to make money or to feel important. I am in this to solve problems, and I am committed to being an adversary of the culture that stands in the way of achieving that goal.

        Methods people are the martyrs of the scientific enterprise.

        • Anoneuoid says:

          >”I am in this to solve problems, and I am committed to being an adversary of the culture”

          Problem one is that they are still acting as obstacles. You are wasting all your time being “an adversary” rather than solving biomedical problems. You should just be able to ignore them.

          Problem two is that once you figure out what needs to be done to solve problems, you will discover the need to learn the skills to build the tools needed to do so. This will not happen instantaneously, so you will be seriously set back in your career, possibly even kicked out of the field, since most do not understand what type of stuff needs to be done to actually generate something useful.

          Problem three is that even if you find a way around number 2, by the time it is over your colleagues will have no idea wtf you are talking about. They are thinking in terms of significant p-values and excel spreadsheets, etc and will have no concept at all about what you are trying to explain to them.

          The culture is so inconducive to proper research that we should consider any biomedical grad student who can avoid generating an entire report of misinformation to have been a great success.

        • Keith O'Rourke says:

          I worked with this guy when he was doing his research fellowship https://en.wikipedia.org/wiki/David_Naylor and I did not understand why he put so much effort placing earlier _famous_ work – that was now clearly deficient – in a good light.

          Part of it was giving a benefit of doubt that at that earlier time and context – it might well have been good work.

          If you can put the criticism in terms of what has been recently come to be understood as problematic though understandably not appreciated at the time – you might well enable yourself to do a lot more in your career.

        • Rahul says:

          A part of this is asking what exactly is the problem you are trying to solve. Who’s your audience?

          Assuming it is other biomed researchers, are the majority just innocently clueless about the problems in these studies? Or are they intentionally choosing to ignore them? To not rock the boat etc. Or maybe they just don’t care?

          Sometimes I wonder if “getting Fiske or Cuddy” to admit they were wrong is the right goal? Are we fighting the wrong battle? i.e. All smart professionals already smell out the Bullshit and avoid it. Some of them smell the bullshit but prefer to feign ignorance because it serves their interests better that way.

          But is there really a cohort of professionals who are being innocently duped by this stuff?

          If you really care about this stuff, publishing more academic papers hardly helps. Perhaps what you should be doing is touring high schools and community halls etc. and preaching to the real victims, the one who might actually innocently believe in power poses and get duped.

    • Anonymous says:

      Hi Raphael,

      I see where you’re coming from. I am a graduate student as well, and I spoke up about researcher degrees of freedom issues and p-hacking in a lab in my undergrad. I was promptly scolded and fired.

      There was probably a better way for me to have brought up these issues, but I was a 19 year old experiencing from a department chair a lot of what Susan Fiske claims people were doing to her.

      It can be discouraging to hear from these people warning you about how “methods people” are seen. However, there are many schools that do not hold this attitude. Despite my experience at my undergraduate institution, I worked with many other professors that would never have used these questionable methods. As well, my graduate institution has a strong focus on methodological rigor, and my supervisors at both schools have been “methods people”.

      Although it is possible a lot of questionable work also comes from lower-tier schools, a lot of the people that have a hard time accepting what’s going on are in high-profile positions at high-ranking universities. I don’t think that’s a coincidence. Now, I’m not telling you to lower your expectations, but it’s possible that the people at the top want to hold on a lot harder to the past, whereas schools just below have to be constantly improving to be recognized. Or you can join me in Canada, not that we’re devoid of all of these issues.

  22. I want to make a comment on the timeline. The Bem paper was circulating and being widely discussed in late 2010, before Simonsohn et al’s false-positive psychology article (published in October 2011). I have always assumed that Bem was a major impetus for the FPP paper, though I’ve never actually asked any of the FPP authors.

    A major turning point in my own awareness was a blog post in January 2011 by Tal Yarkoni, which well predates Simonsohn et al. It made the argument that you could see indirect signs of little fudges all over Bem that were (are?) probably common and accepted practice. It was the first time I encountered an idea that is now in wide circulation: that Bem was probably not a product of fraud or one big error somewhere (which is what everyone was thinking at the time), but rather a demonstration of how easy it is for acceptable-seeming little fudges to have an outsized effect in combination with each other, making it fairly easy produce an impossible result with a credible-looking analysis. False-Positive Psychology famously demonstrated the same thing, and the Garden of Forking Paths paper extended the idea. But I always remember Tal’s blog as my wake-up moment, showing me (and probably many others) the point first.

    http://www.talyarkoni.org/blog/2011/01/10/the-psychology-of-parapsychology-or-why-good-researchers-publishing-good-articles-in-good-journals-can-still-get-it-totally-wrong/

  23. H. Tailor says:

    This is an excellent ad feminem attack from the safety of male invulnerability. As a man, you may make grave statements and name names,
    and nothing will happen to you. But if a woman penned the very same words, knives would come out of every corner. And her career would be over.

    Surely you have read all the research concluding that women cannot afford to offend anyone. So you impugn Susan Fiske’s reputation, knowing full well that she cannot afford to use the same language and cynicism that you can. Shame on you.

    • Andrew says:

      H:

      I would have no problem if a woman were to pen the same words that I did. Or, I suppose, words with similar content. Indeed, it was Anna Dreber who told me about those failed replications of the power-pose studies, and I’ve just written an article with Hilde Geurts about the implications of the replication crisis for clinical neuropsychology. Susan Fiske is free to respond to what I wrote, either in the comments right here or in some other venue, perhaps PNAS. Really no problem at all, we should all be open. It happens that Fiske and her collaborators (one male, one female) made serious errors in their published data analysis, but I’ve made serious errors in my published data analyses too. Just as I feel free to point out where Fiske’s numbers don’t add up, she can feel free to read my research papers and point out any problems she finds with them. If you think that would end her career, then either you’re completely confused, or you know something that I don’t know about the employment contracts of tenured processors at Princeton.

      • Andrew, I think it’s not just about whether you can fire a tenured professor or not, it’s about whether a woman with the same level of criticism will be ostracized such that things will happen along the lines of:

        1) Losing prestigious editorships
        2) Being rejected for funding/grants
        3) Having bad responses from colleagues at conferences, inability to get collaborators
        4) Grad students / postdocs decide not to work for you
        5) Much more difficulty getting papers through anonymous review

        …etc

        I’m not saying H Tailor is necessarily right, but more that there are legitimate potentials to end someone’s career even without them losing their job and that probably a double-standard of behavior does exist in academia.

        • Andrew says:

          Daniel:

          I’m sure there are all sorts of double standards. What I was responding to was the specific remark by the commenter that “if a woman penned the very same words, knives would come out of every corner. And her career would be over.” The whole comment was ridiculous, but in particular it ignores all the women such as Bobbie Spellman who have been active in the psychology replication movement. Or if you want to consider a specific case of a woman who writes the same sorts of thing that I do, on a blog that’s kind of like mine, but still has a career, consider Dorothy Bishop: http://deevybee.blogspot.com

          So, yes, I agree completely with you on the general point, just not on the specific claim made by the above commenter, who implies that Fiske is somehow constrained in her possible responses here. Given what we’ve already seen from Fiske about terrorism etc., I don’t think she feels very constrained at all!

          • “So, yes, I agree completely with you on the general point, just not on the specific claim made by the above commenter,”

            That seems reasonable, particularly given your links to examples of women already doing this kind of thing. It is my experience though that a double-standard of women vs men in academia is in fact a big problem generally.

            • Martha (Smith) says:

              Daniel said, “It is my experience though that a double-standard of women vs men in academia is in fact a big problem generally.”

              Yes, there is a problem here. But, like every problem, we need to consider the details: How large of a problem is it? In which contexts? Have there been changes over time in the extent of the problem?

              My take (as a woman academic a few years older than Fiske) is that H.’s comments are exaggerations — especially currently. They would have been closer to the truth a few decades ago, but today, Roger’s comment (below), “we all know full well that she will not lose her endowed chair for that language and probably will not have any problems continuing her editorship, either,”applies.

              Another relevant factor that may not be widely know is that some tension arose in the eighties between (on one side) women in mathematics and the biological and physical sciences, and (on the other side) some (not all) women in psychology and sociology. Specifically, some (not all) women in the social sciences promoted the ideas of “women’s ways of knowing” and “feminist science,” based on an “intuitive” way of knowing rather than relying on evidence and logic. Many women in math and science reacted to this with an, “And ain’t I a woman?” attitude. My impression (just based on what I have read in this blog) is that Fiske may have some attachment to the idea of “women’s science” that influences her view of science and prompts her to confuse valid criticism with sexist bullying.

        • Huffy says:

          What is your evidence for this double standard?

      • Belatedly come to this discussion. I realise that, as a woman I am an outlier in being outspoken, and also it is true that I am senior enough (and close enough to retirement) not to have to worry about consequences. But I do think sexism is a red herring here. We know women are generally less likely than men to engage in debate on social media and I don’t think this topic is any more gender-biased than any other.
        To me it is really quite simple: the most important thing is the science, not people’s careers. If we allow bad scientific practices to flourish, this has knock-on effects for those that come behind us. In the area I work in, developmental disorders, it can also affect the wellbeing of families affected by those disorders. I thank autistic researcher Michelle Dawson for continually reminding us of that on Twitter – we shouldn’t need reminding, but we do.
        I try to operate with the following rules: wherever possible, avoid getting personal; criticise the science, not the scientist. Basically we all have lots to learn and you should treat others as you’d like to be treated. But this precept takes 2nd place to precept no. 1, which is that getting the science right takes first place. If it is going to upset someone to draw attention to a serious flaw in their work, then try to do it as kindly as possible, but don’t hold back.
        But sometimes it is reasonable to get angry: that’s when you see people treating the whole thing as a game, where all that matters is publishing papers and getting grants, and you know that they are wilfully misleading others. Then I think we should call them out. But only if there is water-tight evidence that they are acting deceptively, rather than just lacking awareness or skills.
        I thank Andrew for his initiative in tackling the serious problems with reproducibility that are only now being recognised.

    • Anon– says:

      “So you impugn Susan Fiske’s reputation, knowing full well that she cannot afford to use the same language and cynicism that you can. Shame on you.”

      That strikes me as a pretty odd assertion / attempted shaming. Recall that Fiske seemed fine offending many people with her choices of language! “Methodological terrorism” etc. seem like language that at least matches Gelman’s.

      And it is pretty clear that Fiske has earned this reputation through her decisions not just as an author, but as an insider (and tenured professor!) in this field, such as being a member of the National Academy of Sciences.

    • It should be “ad feminam,” not “ad feminem.” This is a grave statement, and in making it I am probably digging my grave in terms of a career involving Latin. However, in that sense, I am not alone, as Latin has many graves, not least of which is “grave,” the neuter of “gravis.”

      More to the point: If intellectual critique had to go mum around vulnerable populations and personages, it wouldn’t be critique.

    • Roger Sweeny says:

      Andrew and Daniel, This is pretty obviously sarcasm. “She cannot afford to use the same language and cynicism that you can.” But her language was considerably more intemperate than Andrew’s. And we all know full well that she will not lose her endowed chair for that language and probably will not have any problems continuing her editorship, either.

      • Roger Sweeny says:

        I think he is also satirizing a certain style of bullying: “Historically, men have mistreated women. Therefore, in a dispute between a man and a woman, you must have a strong prior that the woman is being mistreated.”

    • Maz says:

      Surely you have read all the research concluding that women cannot afford to offend anyone.

      I’d like to see this research. What kind of study design? Sample size? Was it preregistered? Does it replicate?

      Given Fiske’s status in her field, I will respectfully argue that what you claim about her vulnerability is utter bullshit. It is precisely because Fiske is at the top of the traditional status hierarchy, with power to change the field for the better, that her defense of careerism against open science is so repugnant.

    • Bill Murdock says:

      Is no one getting the joke? (Or have I missed it in the thread that someone has?)

      “Surely you have read *all the research concluding* that women cannot afford to offend anyone.”

      What were the p-values and t-stats on those studies? ;)

      Let’s give Tailor a round of applause for slipping that one by you guys so quickly.

  24. Neuroskeptic says:

    Good post. Regarding my blog:

    “2008 also saw the start of the blog Neuroskeptic, which started with the usual soft targets (prayer studies, vaccine deniers), then started to criticize science hype (“I’d like to make it clear that I’m not out to criticize the paper itself or the authors . . . I think the data from this study are valuable and interesting – to a specialist. What concerns me is the way in which this study and others like it are reported, and indeed the fact that they are repored as news at all,” but soon moved to larger criticisms of the field. I don’t know that the Neuroskeptic blog per se was such a big deal but it’s symptomatic of a larger shift of science-opinion blogging away from traditional political topics toward internal criticism.”

    This is all true although I’d like to note that my very first post in 2008 was *both* aimed at a soft target (a downright crazy psychic lady) and also an internal criticism of science. As I wrote:

    “This sorry spectacle is more than just a source of cheap laughs for bored bloggers. Honestly, it is. It’s actually a fascinating case study in the psychology and sociology of science. McTaggart’s efforts to extract a positive result are far from unique – they are only marginally more strenuous than those of some respectable researchers.

    [the crazy psychic experiment with “positive” results] shows that if you look hard enough you can literally find any conclusion in any data set. All it takes is enough post hoc statistics and a willingness to overlook those parts of the data which don’t turn out the way you’d want. The problem is that in academic science, and especially in neuroscience and psychology, there is a strong pressure to do just that.”

  25. concerned scientist says:

    It is for reasons like these pointed out here that tenure, as we currently understand it, should be done away with, at the very least, heavily revised to a more sensible approach where researchers can/should be held accountable for their repeated mistakes. The hand-wavy nature of tenured faculty to poor research should be inexcusable.

  26. Jochen Weber says:

    Thanks so much for this very detailed and thorough piece, Andrew!

    Almost all of the reasons for my own periodically over-boiling dissatisfaction with the output of academic science (particularly in psychology, but also other fields) are represented. Over the summer I read David Harvey’s take-down on capitalism, and the more I look around, the more often I see that financial (and related) incentives are close to the heart of why human endeavors in all sorts of areas, including science, fail:

    We have goals that are inherently incompatible with one another! For researchers those goals encompass the search for improved theories and better prediction of (or explanation of patterns in) data but also the desire for a successful career with many top-tier publications to boot…

    Unfortunately, true advances typically require much more care and deliberation, and with increased competition–a circumstance which, to this day, most people unquestioningly consider to be a guarantee for “the best will win”–people are being pushed further and further towards engaging in practices that actually bring the worst out of academia. Similarly as in health care, where providers are incentivized towards proposing (medically) useless procedures for the mere benefit of additional payment, researchers see themselves in a situation in which they can either quit (and keep their conscience relatively clean) or “cheat”.

    To what extent those who are “beyond reproach” and insist that their work is without flaws are engaging in this game consciously (i.e. they are actually aware of their role in pushing scientific outcomes in their respective fields to ever less reliable and replicable states) is hard to say.

    In my opinion, the incentive structure must change. No matter what methodological obstacles (multiple comparison correction, pre-registration, open-access reviews, etc.) are considered, in the end people who are willing to work less stringently or with less diligence will always be at a systematic advantage, so long as publishing bad work is rewarded rather than punished. I don’t have a good sense of how this could be achieved (other than by almost asking society at large to give up on money and the associated prestige and power as tools of differentiation).

    The one aspect in which I agree with Fiske is that social media and the speed with which (snarky, clever) messages self-replicate via “sharing” or re-tweeting seem to penalize careful consideration just as much as poorly regulated financial incentives. To the extent that attention-grabbing headlines are beneficial to readership and attention, a thorough statistical analysis of a paper will get less coverage than a “terrorist attack” with personal attacks. One of the reasons why Donald Trump can, without exception, rely on the media to cover each and every of his insults: it makes for better spectacle…

  27. I would just like to say that I love this post and I want everyone to read it.

  28. Thomas Bowman says:

    For an particularly pertinent example of ‘researcher degrees of freedom’, see paul romer’s latest working paper on failures in macroeconomics.

    Excellent blog post. I feel the problem might be even worse in economics. When there’s no data generating experiments, there’s no emphasis on replication.

    • Danny says:

      Romer’s paper is not about reproducability or researcher degrees of freedom, but about identification and calibration issues. Essentially, he believes certain concepts in macroeconomics are poorly defined – often as merely the algebriac leftover of other (better understood) variables, without a concrete real-world grounding. This can lead to perfectly replicable but nonsensical results, in his view.

      • Danny says:

        As an addition, macro often has the *opposite* problem from what researcher degrees of freedom would produce. Kocherlakota has made this point – given how sparse macro data is, macroeconomic researchers should have many different models that all fit the data pretty well. That’s what you’d expect from sparse data. Instead, macro has one dominant paradigm (DSGE) that doesn’t actually fit the data very well. This should lead us to believe that the field has unjustifiably strong priors about the assumptions baked into the models, as well as pointing towards the identification/calibration issues listed above.

  29. Carl Shulman says:

    I might put Feynman’s 1974 speech on cargo cult science in the timeline:

    http://calteches.library.caltech.edu/51/2/CargoCult.pdf

  30. Larry says:

    I have to agree that I love the post and the comments. I also have to add, extraneously, that I love the use of Randy Newman’s lyrics to segment the post. One of my favorite albums and song.

  31. ex-social psychologist says:

    Former professor of social psychology here, now happily retired after an early buyout offer. If not so painful, it would almost be funny at how history repeats itself: This is not the first time there has been a “crisis” in social psychology. In the late 1960s and early 1970s there was much hand-wringing over failures of replication and the “fun and games” mentality among researchers; see, for example, Gergen’s 1973 article “Social psychology as history” in JPSP, 26, 309-320, and Ring’s (1967) JESP article, “Experimental social psychology: Some sober questions about some frivolous values.” It doesn’t appear that the field ever truly resolved those issues back when they were first raised–instead, we basically shrugged, said “oh well,” and went about with publishing by any means necessary.

    I’m glad to see the renewed scrutiny facing the field. And I agree with those who note that social psychology is not the only field confronting issues of replicability, p-hacking, and outright fraud. These problems don’t have easy solutions, but it seems blindingly obvious that transparency and open communication about the weaknesses in the field–and individual studies–is a necessary first step. Fiske’s strategy of circling the wagons and adhering to a business-as-usual model is both sad and alarming.

    I took early retirement for a number of reasons, but my growing disillusionment with my chosen field was certainly a primary one.

  32. Brad Stiritz says:

    Andrew, thanks for this brilliant piece of history / opinion. Just one minor item to perhaps reconsider:

    >Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to.

    The analogy is quite apt IMHO. She wants people to believe the company (OPC = Old Paradigm Co) is worth say $100/share. You are like a short-seller going public with an argument that OPC is effectively bankrupt. In this case, OPC shares are actually worth pennies each. So you would be precisely correct: “there’s no one for her to sell her shares to”, because presumably no one is “buying” the old paradigm anymore.

    >What Fiske should really do is cut her losses, admit that she and her colleagues were making a lot of mistakes, and move on.

    Yes, exactly. As you say, perhaps she’s just too personally “invested” in OPC. Like they say, “Get married to a person, not to an idea or a stock position”

  33. joe says:

    Methodological terrorist is not the preferred nomenclature. Methodological freedom fighter, please.

  34. Jim Cox says:

    Scrutinizing the validity, reliability and generalizability of scientific work is an inherent part of the process. We shouldn’t give up our scientific objectivity just because someone’s feelings might be hurt. Conversations in pizzarias and taverns have never been peer-reviewed. The only thing different now is that casual conversations have a wider audience.

  35. Fernando says:

    Andrew:

    I would add Langmuir’s (1953!) talk to your timeline https://www.cs.princeton.edu/~ken/Langmuir/langmuir.htm

    Specially the section “Characteristic Symptoms of Pathological Science”. I quote:

    Symptoms of Pathological Science:

    1. The maximum effect that is observed is produced by a causative agent of barely detectable intensity, and the magnitude of the effect is substantially independent of the intensity of the cause.
    2. The effect is of a magnitude that remains close to the limit of detectability; or, many measurements are necessary because of the very low statistical significance of the results.
    3. Claims of great accuracy.
    4. Fantastic theories contrary to experience.
    5. Criticisms are met by ad hoc excuses thought up on the spur of the moment.
    6. Ratio of supporters to critics rises up to somewhere near 50% and then falls gradually to oblivion.

    I am not as optimistic as you appear to be that the current flood will change everything. I grant you that technology does make this new crisis different, and change more likely. But hardly a forgone conclusions. This is how I put it during a recent interview:

    Q: When did your interests in academia transition into entrepreneurial endeavors?

    A: When I realized that being an academic is not necessary for being a scientist. Modern academia is a little bit like the pre-Reformation Catholic Church. Cut off from society through unnecessarily complex language, and too preoccupied with its own internal affairs, and individual career progression. Not surprisingly, most published research is false.

    So I decided to abandon the Church, and embrace the Reformation. As with the printing presses then, so it is with technology now. New semantics, ubiquitous software, cheap computing, and innovations like Open Source licensing and crowdfunding will enable new scientific practices that are more accessible *and* reliable.

  36. There’s also a political/ideological dimension to social psychology’s methodological problems.

    For decades, social psych advocated a particular kind of progressive, liberal, blank-slate ideology. Any new results that seemed to support this ideology were published eagerly and celebrated publicly, regardless of their empirical merit. Any results that challenged it (e.g. by showing the stability or heritability of individual differences in intelligence or personality) were rejected as ‘genetic determinism’, ‘biological reductionism’, or ‘reactionary sociobiology’.

    For decades, social psychologists were trained, hired, promoted, and tenured based on two main criteria: (1) flashy, counter-intuitive results published in certain key journals whose editors and reviewers had a poor understanding of statistical pitfalls, (2) adherence to the politically correct ideology that favored certain kinds of results consistent with a blank-slate, situationist theory of human nature, and derogation of any alternative models of human nature (see Steven Pinker’s book ‘The blank slate’).

    Meanwhile, less glamorous areas of psychology such as personality, evolutionary, and developmental psychology, intelligence research, and behavior genetics were trundling along making solid cumulative progress, often with hugely greater statistical power and replicability (e.g. many current behavior genetics studies involve tens of thousands of twin pairs across several countries). But do a search for academic positions in the APS job ads for these areas, and you’ll see that they’re not a viable career path, because most psych departments still favor the kind of vivid but unreplicable results found in social psych and cognitive neuroscience.

    So, we’re in a situation where the ideologically-driven, methodologically irresponsible field of social psychology has collapsed like a house of cards … but nobody’s changed their hiring, promotion, or tenure priorities in response. It’s still fairly easy to make a good living doing bad social psychology. It’s still very hard to make a living doing good personality, intelligence, behavior genetic, or evolutionary psychology research.

    • Carl Shulman says:

      Geoffrey Miller:

      “Meanwhile, less glamorous areas of psychology such as personality, evolutionary, and developmental psychology, intelligence research, and behavior genetics were trundling along making solid cumulative progress, often with hugely greater statistical power and replicability (e.g. many current behavior genetics studies involve tens of thousands of twin pairs across several countries).”

      My impression is that evolutionary psychology has not been a beacon of statistical power and replicability, e.g. the ovulation-and-voting and Kanazawa studies Andrew mentions above are evolutionary psychology. Also, while family studies have been large, there was an era (still ongoing, although it is now being displaced by the high quality work you mention) of underpowered and almost wholly spurious candidate gene studies in the genetics of human behavior.

      • Maz says:

        I agree that lots of shaky stuff gets published in evolutionary psychology. Maybe developmental psychology, too. As for candidate genes, it’s true that that paradigm failed, but it failed because it was found out that study results could not be replicated. The fields involved responded to this fairly rapidly by increasing sample sizes and performing GWAS’s with stringent significance thresholds, and the results have been good.

        I would definitely agree with Miller’s larger point that the fields of psychology that have contributed most to our understanding of human behavior are underfunded and underappreciated compared to the flashy fields like social psychology that have contributed little.

    • Steve Sailer says:

      IQ testing is of course quite replicable.

      And yet, I’m struck by how little follow-up there has been among mainstream left-of-center social scientists regarding perhaps the biggest, most unexpected pro-Blank Slate social science empirical discovery of the late 20th Century: political scientist James Flynn’s uncovering of the Flynn Effect of broadly rising raw test scores on IQ tests around the world.

      This remains an important and fascinating topic, yet I’m not aware of much momentum toward exploring it, much less explaining it.

      • We’ve also had broadly rising Health and Nutrition and Education in the past 100 years, so first-pass it seems like not much explanation is really needed. Second pass, it’d be interesting to see how those factors correlate in time-series especially across countries where improvements in health and welfare may have come at different times..

  37. Justin says:

    Andrew,

    Thank you for such an outstanding and detailed post. This is precisely why your blog is required reading for me.

    If you ever revise the timeline, one article you might consider adding is:

    Franco, Annie, Neil Malhotra,and Gabor Simonovits. (2014) Publication bias in the social sciences: Unlocking the file drawer. Science. 345:1502-05.

    They document the extent of publication bias in social science experiments from TESS.

  38. You’re doing god’s work here. I hope you’re feeling ambitious about pushing hard to change the field.

    I would guess that the field won’t end up changing very much without deliberate, strategic efforts to change it.

  39. Shravan says:

    One key problem that prevents communication between statisticians and social psychologists is that the latter don’t know anything (or know hardly anything) about statistics. They just use it, the equivalent of pushing buttons in SPSS. How can one get the criticism across in such circumstances?

    In psycholinguistics the situation is similar. In most cases, the sole purpose of fitting a statistical model to data is to confirm something they already believe. If the data don’t provide the evidence they want, it will be made to. Fiske, Cuddy, Tol, etc. suffer from the same problem. They will, magically, never publish evidence against their own theories. Any idea they ever had will only find confirmation in data.

  40. Jacobian says:

    I think that very few people are comfortable with the idea that we may actually just know jack in social psychology. Least of all tenured social psychology professors. There is no law guaranteeing humanity an encyclopedia of reliable knowledge on a subject just because we have been studying this subject for seven decades. Science is hard. Perhaps generating any reliable knowledge in social psychology requires a truly extraordinary effort: exquisitely designed studies with thousands of participants and the statistical analysis planned out months in advance by expert statisticians. Perhaps instead of 1000 social psych professors, we must spread the funds among only 100 to get any results worthy of the name.

    If that is so, then not only are most past results suspect, but people committed to the old style of research (a sample of 40 psych undergrads and 10 hypotheses tested at p=.05) are guaranteed not to discover any real results for the entire rest of their careers. It’s hard to admit a mistake when you feel there may be nothing left to rely on. I’m not excusing Fiske et al., just suggesting that we shouldn’t be so surprised at their recalcitrance. Their careers past and future may be at stake.

    • Steve Sailer says:

      One possibility is that much of what social psychology attempts to study (e.g., priming) might be fundamentally ephemeral, and thus, even if the field were to massively improve its methodologies, often unlikely to replicate reliably.

      Human beings, and especially college students (the main source of subjects in Psych Dept. studies), are prone to taking up behavioral fads and then dropping them. Moreover, some people are better at eliciting faddish behaviors (e.g., rock stars) than are others.

      By way of analogy, the more I’ve looked into the Flynn Effect of rising raw scores on IQ tests (a generally much more replicable part of psychology), the more I’ve realized the past is a different country. Why were raw IQ scores lower in the past?

      I dunno. I’ve got lots of theories but not many ways to test them.

    • Nick says:

      I regularly make your suggestion (5 or 10 times bigger samples, 5 or 10 times fewer labs and, hence, PIs) to psychologists. I’m surprised how favourably most of them respond. Maybe they all imagine that they will be among the survivors.

  41. dmk38 says:

    In the end, the issues here will be resolved not by what people say but by what they do.

    For my part, I’m confident that they’ll only do more and more of the sorts of things that come under the broad umbrella of “post-publication peer review,” including blogs. Indeed, I’m confident that traditional “peer review” will become less & less central to the filtering process that empiricists, in all fields, rely on to identify the best work. I’m confident of that b/c the value of what Andrew & others are *doing* is so obvious.

    There is tremendous value in saying why these are are important things to do. The value, though, lies not in the contribution such words make to winning a debate w/ Fiske or anyone else (the contest is so lopsided that calling attention to that is not worth the time) but in the role they play in forging a shared understanding among those who are acting in concert to create a better empirical scholarly culture.

    So if you agree with Andrew for sure say so.

    But then go out & do what he is doing. And do it & do it & do it some more. In the competition of doing, who is right will be clear to anyone who is using reason to judge.

  42. Dr. Dwayne Elizondo says:

    In case you haven’t seen it, Kahneman realized something was off with priming and had stern words for the field: http://www.nature.com/polopoly_fs/7.6716.1349271308!/suppinfoFile/Kahneman%20Letter.pdf.

  43. Keith O'Rourke says:

    Just to stir things up (as if this is needed).

    In the Statistical Society of Canada’s Code of Ethical Statistical Practice are these two statements.

    1. While question and debate are encouraged, criticism should be directed toward procedures rather than persons.
    2. Avoid publicly casting doubt on the professional competence of others.

    The first, I believe, almost everyone would agree with.

    The second, seems perhaps problematic, blogs are public and valid criticisms of someone’s work and position would imply lack of competence?

    • Eric Rasmusen says:

      “criticism should be directed toward procedures rather than persons.” I think I would disagree with this as a general principle. Sometimes it is very socially useful to point out that someone is a charlatan. Mistakes in one article aren’t enough to establish this, but pointing out a pattern in someone’s work is fair game. In fact, it needs to be heavily encouraged, since we academics are nice and timid people who usually aren’t willing to say the emperor has no clothes.

      • Curious says:

        The question is whether this method of public shaming advances the goals of improving research better than one that focuses on procedures. Targeted attacks result in the opposition digging in, not rolling over. That is simply human nature. Assumptions otherwise are simply naive. And while it may bring a sense of fullfillment to a feeling of righteous indignation to shame someone for their foibles, it does not actually advance the cause of improvement.

        • Eric Loken says:

          Agreed that public shaming just to feel self-righteous is bad news, and unproductive. The problem though is that there is no uniform calibration of “value” of published work. When the housing market went south, all house prices were affected in a common way (not identically of course, but the same correction mechanism was in play). As we enter a recalibration of the value of published research, it has the appearance of playing out on a case-by-case basis and therefore seems targeted and arbitrary. It’s as if the housing market corrected by first dropping the value of a few houses in the neighbourhood at a time. Hey, what do you guys have against MY house? is the natural reaction. That it plays out this way is unfortunate, and Fiske is right that it can be damaging.

          But where is the value of published research established? Fiske wants to say it was established in peer review. The house had a sale price, it was inspected….that’s the value and don’t show the poor taste of challenging that. That’s an understandable response. But we now know that the peer-reviewed literature is far shakier than presumed. It’s over-valued, and we know this from the Meehl argument, the Ioannidis argument, failed replications, and unfortunately from what come off as a few drive-by shamings that show the weaknesses of individual projects. Trust the ratings agencies (peer-review) is not a great response considering where we have come and how we got here.

    • Alex says:

      Is there a way to question someone’s procedures without questioning their professional competence? I’m trying think of an example where you can say something like “this was done poorly” that couldn’t be taken as an indictment of the person’s competence. Does the Code give an example?

  44. Anonymous coward says:

    Do you know the Soviet definition of constructive criticism? It is criticism that does not involve tried senior executives (Solzhenitsyn, First Circle, ch. 78).

  45. Eric Rasmusen says:

    What I find surprising is the pride of some academics— their unwillingness to admit that one of their papers has a fatal flaw. It wouldn’t be surprising if that was their only publication, but why do these big-name people get so touchy? It’s like CEO’s who lie about which college they went to— they lie about things that would have trivial impact on their reputations, or whose admission might even enhance them. Prof. Gelman, for instance, admits he’s written papers which are worthless, and we don’t think the less of him. If we cut his vita in half, he’d still be considered a top scholar, but if he were caught defending the undefendable even once, he’d be a laughingstock. So he doesn’t (even aside from having good principles).

  46. Steve Morgan says:

    I am an infrequent commenter on your blog, but I do weigh in when it is important. This is important. I don’t now enough about social psychology to appreciate all of the history and infighting that is occurring, but it is absolutely clear that this replication debate has been an important episode. Even for methodologists. Lots of people have understood the problems created by data snooping and all-too convenient choices of covariates, and of course the silliness of .05 nhst. But, for those of us who don’t typically collect our own data, we have thought too little about the havoc created by self-serving choices of when to stop collecting data.

    On the Fiske piece, I think you have said what needed to be said. I very much appreciate your work on these issues.

  47. PjoombE says:

    “I’m looking at the river but thinking of the sea…”
    great analysis with equally great randy newman headers!

  48. Manoel Galdino says:

    About the timeline, maybe Ed Leamer’s paper should also be included?
    “Let’s Take the ‘Con’ Out of Econometrics,” by Ed Leamer. The American Economic Review, Vol. 73, Issue 1, (Mar. 1983), pp. 31-43.

  49. Michael J. says:

    I’m not sure it’s significant enough to belong on your timeline, but an article which was passed around in my social circles in early 2014 and which brought the issue to my attention was Slate Star Codex’s The Control Group is Out of Control, mostly on the Bem article mentioned:

    http://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/

  50. Mark Fichman says:

    This has been an excellent discussion. The piece of the history that is missing from Andrew Gelman’s excellent presentation is meta analysis. The recognition of the inherent noise in individual studies that was identified tellingly by meta analysts in many fields including psychology was where I learned to suspect individual studies. The version I first was exposed to that nailed the issues being discussed here was in the first chapter of Schmidt and Hunter’s `Methods of Meta Analysis’ which demonstrated `completely convincingly’ how multiple low powered studies would generate precisely the pattern of research findings and conflicts found in applied psychology, particularly personnel psychology. This can be seen in the first chapter of their book. It is a perfect forecast of where social and applied psychology has evolved, even with no errors and no malfeasance. It is a must read. Here is the exact cite:

    Methods of Meta-Analysis: Correcting Error and Bias in Research Findings 3rd Edition
    by Frank L. Schmidt (Author), John E. Hunter (Author)

    They have used the same first chapter in all their editions. It captures much of the problem in 10 pages. I use it as a demonstration in classes. Always works.

    • Keith O'Rourke says:

      Mark:

      I really doubt if meta-analysis should be in this history which I believe is about how non-replication became understood as being widespread and big problem amongst a large percent of those in any given discipline and even wider public.

      Now, I have written something on the history of meta-analysis in clinical research http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2121629/ and though meta-analysis may vary by discipline, I have met and talked with many of the early writers of 1980s. I don’t remember any saying its great how most are getting what meta-analysis’ purpose really is (not just increase precision by pooling) and what it can do for disciplines (encourage and maintain quality research efforts and reporting) but usually the opposite – why don’t they get why it is important. And the last time I talked to Nan Laird she said something like meta-analysis topics are difficult to publish – not much of a market for it.

      So, a subset in a discipline do learn about meta-analysis and a smaller subset will actually do it. Of the subset doing it, some will do it well and notice lots of problems with published papers and the inexplicable variation of their findings.

      But for most in the discipline it might just suggest that weighted averages of estimates will fix everything just fine :-(

      Also for psychology this might have been true and still be?
      “These [clinical meta-analysis] publications tended to emphasize the importance of assessing the quality of the studies being considered for meta-analysis to a greater extent than the early work in social sciences had done. They also emphasized the importance of the overall scientific process (or epidemiology) involved.”

  51. Schenck says:

    ” but also through responding to constructive adversaries”

    Perhaps it’s because I’m reading Mayo’s Error & Grwoth, but this sounds very much in keeping what she (or rather she see’s Kuhn) as describing with non-sciences. As in Astrology having conflicting schools within it, all able to deeply criticize each other, largely on the fundamentals, but no way to learn from error and criticism. The top down way of doing things can argue and respond to the replication crisis, but it can’t learn from it and adapt. And some of it’s results, like the spurious papers, read like astrology readings: believable; vague-but-somehow-personal.

    • Mayo says:

      Schenck: Good point, it’s mere critical discourse that can scarcely be used to advance the theory domain. I feel that my suspicions of how the ironies in the way psych was dealing with its replication crisis would undermine their effort aren’t far off https://errorstatistics.com/2014/06/30/some-ironies-in-the-replication-crisis-in-social-psychology-1st-installment/

      From the start, they should have considered whether they were willing to find that whole domains of inquiry, measurements, & experimental practices are shoddy,poor science or pseudoscience.

      In some cases, instead–in vintage Kuhnian form–we might see the smartest practitioners staying on top by cleverly revising accepted methodological practices (so as to appear to be engaging in genuine critical discourse)

  52. Frederic Bush says:

    I think you and Fiske are talking past one another here. Your methodological critique is very strong, but you basically dismiss out of hand the stronger parts of her argument: her claims that people are getting harassed in totally unacceptable ways, and that unfiltered comments can promote this harassment and in general pollute the internet. Why do you find this unlikely? You don’t have to look far to find websites whose comments are a cesspool (there is a reason why more and more sites are getting rid of them), and there are lots of people who have been forced off the internet and sometimes into hiding due to harassment and death threats from internet hordes. (I must say, so far the commentary I see here is unusually civil and informative, so perhaps you have your own small, biased sample to judge from.)

    Your argument, that this is about ethics in methodology, is true for you and other prominent bloggers and researchers, but that doesn’t mean that trolls aren’t using it as an excuse to harass people.

    • Andrew says:

      Frederic:

      I’m as bothered by anyone by trolls etc., and I’d’ve had no problem if Fiske had written an article about trolls, abuse of communication channels, etc. (ideally with some examples). But this has nothing to do with replication. These are two unrelated topics! What Fiske seems to be doing is conflating the replication movement, which she doesn’t like, with all sorts of “terroristic” behaviors which none of us like. I’d prefer for Fiske to write two articles, then I could say I agree with her article about bad behavior and I disagree with her argument about scientific criticism.

      • Anonymouse says:

        Andrew: It’s worth noting that her article only mentions replication once, and favorably.

      • Anonymouse (2) says:

        Andrew: I would also add that, there is a risk here of what I believe you referred to as second-order availability bias — either on your part, Fiske’s, or both.

        You argue that Fiske inappropriately conflates trolls with people who consider themselves ‘data cops.’* However, there may be good reason for her to do so. If methodological rigor is independent of trolling then we would expect a similar proportion of trolls in the methodological rigor camp as in the general population. If we further intersect that with people who post online, we should expect that proportion to be larger still.

        From your perspective, people who are concerned with methodology are constructive critics and not adversarial with regards to the science. Indeed, it is likely that you surround yourself with like-minded, constructive critics. You may avoid the trolls deliberately (I would) or at least, not seek them out. Moreover, they have little reason to seek you out. Intuitively, it would be easier to underestimate the correlation (thinking, perhaps, that methodological critics are less trollish than the population at large). In contrast, we have good reason to expect that Fiske is not surrounded by people like you (otherwise, we might expect that she could have been guided to better research practices earlier in her career or have had some gentle guidance now). The methodological critics who do approach her, however, are more likely to be trolls than the population average. So people in Fiske’s camp will be likely to see a correlation between critics and trolling; whereas constructive critics are less likely to see a correlation (or are more likely to see a negative correlation).

        Another commenter made an argument about this being an ‘ad feminam’ — which I disagree with completely. On the other hand, I believe there is, perhaps, an appropriate connection to be made to the struggles that women often face. Specifically, people who complain about a problem are often viewed as doing it to stir up attention. (Some) well-meaning men in the workplace doubt that sexism exists because they and their friends personally aren’t sexist. They look at the evidence around them and it just doesn’t hold up. But they don’t realize that their sample is biased. Well-meaning methodological critics may believe that trolling doesn’t exist because they likewise do not see it in their circles. That said, I don’t think we should be so quick to dismiss it when people suggest that it might be a problem.

        *[You said ‘replication’, but, as noted previously, Fiske never explicitly makes the connection. Perhaps the juxtaposition is enough, but I’ll just report what she stated here.]

    • Simon Byrne says:

      While I agree that abuse and trolling is unacceptable, Fiske takes a much stronger stand in her letter, wanting the only public discourse to be through “monitored channels”, with presumably with her and her peers doing the monitoring.

      • jrkrideau says:

        I have followed the citation trails, and they don’t end anywhere near what they are used to claim

        One of the things I have learned from reading secondary sources on historical cooking is that you should never trust a secondary source that does not include the primary, since you have no way of knowing what liberties the author may have taken in his “interpretation” of the recipe. David Friedman http://www.daviddfriedman.com/Medieval/To_Milk_an_Almond.pdf

        It is not clear to me that some researchers ever look at the original paper they cite, or at most, may have read (a potentially misleading) abstract.

    • Ed Snack says:

      Actually, I think that is an urban myth. there are a small number of sites perhaps that can be involved with such harassment, but generally the harassment is almost entirely the other way. The establishment (Fiske being a prime example) use their position to harass in unacceptable ways those who wish to criticize obvious methodological and data flaws.

      You want personal ad hom cesspools, try the pro-social research and politically committed websites. Example, try the foul personal attacks in the Climate Change supporting sites like “Skeptical Science” or “Realclimate” or “Tamino”, and the numerous and often successful attempts to have those who publish against the “scientific consensus” removed from positions and have their research unable to be published. The establishment is by far the largest source of such harassment, and by far the more successful at it.

      That isn’t her strongest argument, it’s her strongest method of getting her own way.

  53. Frederic Bush says:

    It really does seem to me to be an article about trolls, abuse of communication channels, etc. There are two cheap shots at “methodological terrorism” and “data police” but that is it for her engagement with the theoretical issue as far as I can tell.

    • Lee Jussim says:

      Andrew, this is a masterpiece. I am teaching an advanced undergrad course, The Psychology of Scientific Integrity, and had the class come to a screeching halt in order to read, consider, and critically evaluate both Fiske’s essay and your response here. Thank you. I recently posted the course syllabus here:
      https://www.psychologytoday.com/blog/rabble-rouser/201608/psychology-scientific-integrity-undergraduate-syllabus

      Frederic: One of the many exasperating (to me) aspects of Fiske’s essay is that she does not provide a shred of evidence, not a single example, of any of the things she is complaining about. I realize that does not mean they do not exist, maybe there are zillions, and my not knowing what she is writing about reflects my own ignorance.

      You wrote that the internet can be abused. Of course it can. But her essay was not a critique of internet blogs writ large. It was focused on those critical of psychological research. Which leads to my questions for you (or anyone else):

      1. Can you point to a single blog by any psychological researcher, or even any blog by any scientist with a phd and at least a couple of publications, that has constituted a personal or ad hominem attack, being a “methodological terrorist,” a “destructo-critic,” used “smear tactics,” has had a “chilling effect on discourse,” or warrants being called “vicious”?

      2. Do you (or anyone else) know of a single person who left the field because they were (or felt) harassed by bloggers? Or who chose not to come up for tenure?

      Nearly all of Andrew’s points resonated with me, but this one is particularly relevant here and worth repeating:
      “Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious ork but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.”

      IDK how many people this describes. However, I strongly suspect that it describes far more than those driven from the field by critical bloggers.

      • Shravan says:

        Lee, if I were Amy Cuddy, I would definitely be feeling personally attacked, especially if she was denied tenure. I got the feeling that Fiske’s article was about her former student being denied tenure. It would have been good if she had been more specific, I guess.

        • Lee Jussim says:

          Shravan,

          1. I agree that that is probably some of what Fiske is responding to.
          2. The fact that someone “feels” attacked, however, to me, provides lots of evidence of their subjective experience, and none about whether they were actually attacked.
          3. I think many of our colleagues have an extraordinary tendency to feel extremely defensive when their pet phenomena or studies are appropriately and scientifically criticized, so “feelings” of being attacked do not mean much to me.
          4.To me:
          Personal attacks are: “Smith is an ________” (fill in pejorative of choice: asshole, fraud, charlatan, incompetent).
          Scientific criticisms are: “Smith (2015) reported that hawks weigh more than geese, which does not follow from Smith’s data showing no difference” or “Smith (2015) reported that ducks weigh more than geese, but mis-coded his species; in fact, Smith’s data show that geese weigh more than ducks.”

          If Smith gets defensive about this, feels “personally attacked,” and decides to leave the field, that is Smith’s problem; it is not a problem with the critiquer.

          So, my question stands: Can you (or anyone reading this) produce one or more specific cases in the blogosphere, NOT where someone criticized “felt” attacked (yes, there are ample cases of people “feeling” attacked), but where any bona fide scientist published a personal attack (as defined above) on some other researcher? Thanks.

          Lee

          • Carol says:

            From THE ODYSSEY ONLINE, March 21, 2016: “She [Cuddy] mentions in the last few minutes, while being interviewed by Will Cuddy (same last name but no relation) about her recent opportunity to accept a position with tenure at Harvard Business School. She ended up turning down the offer due to her desire and passion to reach broader audiences.”

            Extraordinary.

          • Carol says:

            Hi Lee,

            The blog article by Dan Graur, who posts as “Judge Starling” and is a PhD biologist at the University of Houston, on Barbara Fredrickson’s 2.9013 positivity ratio research, could be considered an example of a personal attack. See

            judgestarling.tumblr.com/post/60725232226/barbara-fredrickson-positivity-con-artist

            The “Ma’am” was originally “Bitch.”

            • herr doktor bimler says:

              Fredrickson’s career is an interesting phenomenon. The 2.9013-positivity-ratio paper was fraudulent… Losada made it all up, Fredrickson went along with it and published a high-profile book promoting the fraudulence. Then there was her PNAS paper, “A functional genomic perspective on human well-being”, which is undiluted statistical incompetence and over-fitting. And the vagal-nerve upward-spiral stuff.
              Yet she remains paramount in the positive-psychology field.

              • Carol says:

                Hi Herr Doktor Bimler,

                Barbara Fredrickson has some very powerful people (e.g., Martin Seligman, former President of the American Psychological Association) who both support her *and* denigrate (sometimes openly, sometimes not) to the positive psychology audience the efforts of people like Nick Brown, who has been the point man for several of the critiques of her work. I applaud Nick for his courage and persistence.

            • Carol says:

              Hi Lee,

              Also, on the Facebook group PsychMeth, Ulrich (Uli) Schimmack, in a critique of an article by Wolfgang Stroebe entitled
              “Are Most Published Social Psychology Findings False?” wrote “So, Stroebe is not only a liar, he is a stupid liar, who doesn’t see ….”

              • Lee Jussim says:

                Carol et al.,

                One of my grad students just showed me the Psych Methods Facebook discussion of Fiske’s essay. Although I still think Fiske is about 95% wrong, and Andrew almost 100% right, I find myself compelled to acknowledge *some* validity to Fiske’s essay. The discussion there of controversial topics, such as the Fiske essay, gave me a general feeling of “Eeeeeewwwww.” I am not saying everything there is angry or harsh, it is not. There is lots of good stuff there. Very few posts are smoking guns of Unvarnished Evil. But the tone regarding Fiske’s essay came across to me as very unpleasant (both the pro and anti sides were this weird mix of calm, reasonable points, and stuff that I cringed to).

                I did not see or try to track down the Schimmack post, but nor do I find it hard to believe (even though I admire Schimmack’s scholarly work and his R Index blog). One line posts lend themselves to harsh zingers.

                It felt like a rabbit hole I did not want to go down and, when I did, just a little bit, I felt like I was in a rancid swamp. Now, lots of great wildlife lives in swamps, I just do not want to spend much of my time there, even if I occasionally want to check it out from a distance.

                Bottom line: So this search for evidence has changed my view, a little bit. Even though my view of Andrew’s essay as masterful remains unchanged, I no longer think Fiske is 100% wrong. I now think more like 90% wrong. Her invective and insults are still wrong. Her allusions without a single example are problematic. Her piece remains an exquisite example of the old ways of doing psychological science crumbling. I have yet to find a single psych blogger whose work is problematic.

                But I take the Facebook Psych Methods group posts as likely just one example of other dysfunctional communications. I would now guess that Facebook and Twitter lend themselves to exchanges that are snarky, or worse. The 10% valid point includes those and similar venues.

                Lee

              • Ulrich Schimmack says:

                Dear Carol,

                You quote a conclusion that is based on an extensive substantive review of the article.

                Stroebe makes two claims.

                1. The typical study in psychology has modest power (~ 50%).
                2. there is no evidence that publication bias contributes to the replication crisis.

                These two claims are inconsistent. How can we explain 95% success rates in published articles with 50% power?

                You may disagree with my tone, but substantively the claim that there is no evidence for the presence of publication bias is untrue (see e.g., Sterling et al., 1995). Moreover, Stroebe has repeatedly made similar statements that suggest we do not have a replication crisis when the success rate of replication studies in social psychology was only 25% in the OSC replication project. I think it is important to put my quote in the proper context as a conclusion about statements that are internal inconsistent and inconsistent with empirical evidence of publication bias.

                Sincerely, Ulrich Schimmack

            • Lee Jussim says:

              Carol,

              Thanks for pointing to that Judge Starling blog. It definitely meets my criteria for insulting, inappropriate and the tone is consistent with Fiske’s objections.

              However, I am about 95% sure that it is irrelevant to Fiske’s essay for several reasons:
              1. Biologists’ blogs do not usually have much traction in most of psychology; I would be very surprised even Fiske had even heard of this guy.

              2. Fiske’s essay is to be published in the flagship magazine of the Association for Psychological Science. As such, her audience is psychologists, not biologists, physicists, or historians. She refers to social media and blogs without the qualifier “psychology,” but I have little doubt that her main, and perhaps exclusive, targets are psychological blogs and perhaps some generalist/popular blogs such as Neuroskeptic.

              3. I doubt that a single whacked diatribe would inspire an essay such as Fiske’s. She repeatedly uses plural, as in “methodological terroristS” and “destructo-criticS.” Fiske is an excellent writer. I doubt the plural was used carelessly or metaphorically.

              4. All of which leads me to rephrase my question as a challenge, albeit a more narrow one:
              Can you, or anyone else out there, identify a single psychology blog, or popular generalist blog (such as Neuroskeptic, or Ed Yong) that constituted insulting, ad hominem, personal attacks?

              I regularly read the following: Srivastava’s Hardest Science, Vazire’s Sometimes I’m Wrong, Daniel Lakens, PIG-IE (Brent Roberts), Funderstorms (David Funder), and Schimmack’s blog on replication and his Rindex and Incredibility Index. Schimmack is the most pungent of the bunch, e.g., referring to phacking as “doping” but even that stops (in my view) WAY short of insults, or personal ad hominem attacks. And I have never seen any of the rest of the crew come even close.

              Now, I have to admit I do not frequent the Twitter-verse or Facebook world very much. Perhaps it is harsher and more snide out there. But most of the harshest stuff I have seen has not come from the critics, it is the growing pantheon of insulting terms being thrown about by the Defenders of Business As Usual (shameless little bullies, methodological terrorists, vicious ad hominem attackers, etc.).

              So, can you point me to any psychology or widely read generalist blogs that are comparably insulting? Can anyone? As usual, I am open to exposing my ignorance, if someone out there can show me the evidence.

              Lee

              • Carol says:

                Hi Lee,

                I’ll have to defer to the other commenters. I rarely read any blog other than Andrew’s and once in a while PubPeer, so I really don’t know.

                I do recollect that on the APA Friends of Positive Psychology list serve, Martin Seligman characterized the critique of Fredrickson et al.’s (2013) well-being and gene expression PNAS article written by Nick Brown and his colleagues as a “hatchet job” when he (Seligman) could not possibly have read and understood it. Nick’s reply was a marvel; he was extremely polite but did not give an inch. Nick can provide more details.

              • Carol says:

                Continued, Lee: But Martin Seligman would be on Susan Fiske’s side, of course. My guess is that Susan Fiske has greatly overstated her claims about online vigilantes attacking individuals, research programs, and careers. There are undoubtedly a few such persons, but my guess is not many. I think that the editor of THE OBSERVER should insist that she either provide evidence or remove these accusations. Perhaps that is your goal.

              • Carol says:

                Lee, I’d really like to know what “senior faculty have retired early because of methodological terrorism.”
                Presumably Fiske is talking about full professors in social psychology or related areas.

                Few academic psychologists retire early for any reason because of the requirements of their pension plans, so this is hard to believe. It seems to me that senior social psychologists (Bargh, Baumeister, Schwarz, Fredrickson, Gilbert, etc.) have been standing up for themselves when their work has come under criticism, not throwing in the towel.

              • Carol says:

                Hi Uli and Lee,

                I said nothing about the quality of Wolfgang Stroebe’s article or your critique; indeed, I have not read them. I was simply responding to Lee Jussim’s request for examples of blogs or other social media in which a bona fide scientist made a personal attack such as “Smith is an asshole, fraud, charlatan, incompetent.” You wrote that Strobe was “a stupid liar.” I have made no judgment — public or private — about the veracity of your description of Stroebe; indeed, I had never heard of him before.

                Carol

          • Shravan says:

            Well, it’s pretty clear by now that in matters relating to statistics and statistical inference, many of these senior and not so senior scientists *are* incompetent. That’s entailed by Andrew’s criticisms. Whether one says it in so many words or not is kind of moot. Many of them have replied directly to Andrew on this blog, and it was pretty clear they have no idea what the issues are. Nobody is born knowing stuff; they can find out what’s missing in their understanding if they choose to. And yet they hunker down and focus only on the mocking they are subjected to.

            It’s a pity that Fiske and the like do not have the courage to preface their remarks by stating that, yes, there are many problems in their work, and yes, they don’t (yet) understand the statistical issues too well, but they will try to fix this in future work. I just saw that the note she wrote is not yet published; maybe she will publish a revised version after having been subjected to this vigorous non-peer review by Andrew and others on the blogosphere. If she revises it, it would be a very interesting situation indeed.

        • Carol says:

          Hi Shravan,

          We looked into this on one of Andrew’s other postings. We learned that Amy Cuddy was offered tenure at Harvard Business School, but declined it.

          It is certainly the case that Cuddy’s work has been criticized but in my opinion at least, that criticism was justified.

      • Daniel Bradford says:

        And thank you, Lee, for your work combating these issues. I an advanced PhD student in clinical psychology. I have been seriously considering designing a course like yours and appreciate you sharing the syllabus.

        • Lee Jussim says:

          You are more than welcome. My course is an undergrad course; and you have to assume they have no idea about any of this. They have had a hard enough time understanding the logic of experimental methods, what p<.05 means, etc., to turn around and say, basically, "Everything you have learned so far is One Great Mess," is not the easiest thing for them.

          So, the news and magazine articles, though probably below the level of most of Gelman's denizens, are, I think, crucial to this sort of course, because they are written in a style accessible to the reasonably intelligent layperson — i.e., your basic senior majoring in psychology.

          And some are just really really good on their merits. If you find something else that you end up including in a course, that you think would be terrific for undergrads, would you let me know? jussim at rci dot rutgers dot edu. Thanks.

    • Martha (Smith) says:

      This looks like a good first effort, but could use some more thought — in particular, in the section “We need to do what we can to minimize the negative aspects of the climate that lead to name calling, personal attacks, and intimidation, while promoting and encouraging the positive aspects of the climate that lead to skeptical and critical discourse, productive discussions and debates, and a better, more self-correcting science.”

      Part of the problem is that phrases such as name calling, personal attacks, and intimidation can mean different things to different people, so it’s important to give specifics or examples to clarify what is meant.

      • Bill Harris says:

        Shravan, take a look at some of Chris Argyris’ work on “action science” and the “model II” communications model. I’ve found it quite effective in communicating effectively in the presence of conflict; YMMV.

        • Shravan says:

          This is interesting, I will read up on this approach, Bill.

          On the broader issue, I wonder if Andrew will consider a shift in tone on his blog; if so, Fiske’s (very reasonable) criticism about tone will be addressed, laying bare the problems on the statistical side of things. One can certainly communicate criticisms in more than one way (as Bill just points out), and I am curious to see if Andrew will change style. I think that a lot of people are alienated by Andrew’s style, and these people would be on board otherwise. I am reminded of Andrew’s ethics article in Chance in which he aggressively attacked a statistician (who, Andrew dismissively said, only had an MS in Statistics) for doing what Andrew felt was an incorrect analysis and for not releasing data. The statistician’s response (which was also published in the same magazine) was excellent, and in the end I felt Andrew came out of that fiasco looking a bit shabby. You can revisit the story here:

          Andrew:
          http://www.stat.columbia.edu/~gelman/research/published/ChanceEthics1.pdf

          House (the statistician):
          http://statmodeling.stat.columbia.edu/wp-content/uploads/2012/03/chanceletters.pdf

          I hope Andrew learnt something from this exchange. His criticisms could be a lot more effective if couched in a more guarded or nuanced language and without all the personal attacks on competence.

          Of course, Meehl and Cohen had made similar observations as Andrew and others do today, but they communicated through the peer-review system the way Fiske wants it. This had, it seems to me, zero impact on psychology. One could therefore make a case for trash-talkin’ Andrew kicking up some dust as a way to get things moving. What about Brian Nosek and EJ Wagenmakers, who are taking the problems on by writing articles about them? I guess they are doing what Fiske would like to see more of.

          I have also wondered if it is fair that Amy Cuddy fails to get tenure (if that is what happened, this seems to be an implication in Fiske’s text, or am I reading too much into it?) when the people who taught her to do what she does go scot free. Who is ultimately to blame for Cuddy’s overselling and statistical mistakes? She shares some of the blame, but not all of it. The problem usually is that one cannot even see what the problem is (“the don’t know what you don’t know” problem), and someone in authority told them what the right way to do things was and what the goal was of becoming a scientist (high h-index, lots of articles, having a media link on your home page).

    • Lee Jussim says:

      Shravan,

      This turned out much longer than I planned, so here is an abstract:
      Part I: My Reservations About that Petition
      Part II: Advantages of the Blogosphere Over Peer Reviewed Journals

      I sent this email to one of the organizers of that petition a few days ago. Here is what I wrote:

      i saw your petition, and am deeply ambivalent about signing on, and perhaps you can talk me down a bit here….
      1. There is a term common in the rightwing blogosphere and political editorials, “crybully.” It refers to people who try to stigmatize, shame, or shut down others on grounds of “look how badly you hurt me (or someone else).” http://www.wsj.com/articles/the-rise-of-the-college-crybullies-1447458587
      The petition is fine as far as it goes, but one person’s legitimate criticism is another’s personal attack. In the absence of a definition/description/distinction between the two, the petition can be used as justification for Fiske-ean views.

      I find that troubling and wonder what you think about it.

      2. I would say that several of those who have signed on already fit my description above. That deeply disturbs me because I see it as increasing the risk of legitimizing attempts to shut down and shut up the (incredibly constructive) critical movement within psychology.

      IDK what to do about that, and, again, wonder what you think.

      3. There are some strong statements in the petition, including these:
      “However, science suffers when attacks become personal or individuals are targeted to the point of harassment.”
      “We need to do what we can to minimize the negative aspects of the climate that lead to name calling, personal attacks, and intimidation.”
      “Also damaging to our scientific discourse are harassment and intimidation that happen through less visible channels. When people with a lot of influence use that influence to silence others (e.g., by using their leverage to apply pressure on the target or on third parties), and especially when they do so through nontransparent channels of communication, this harms our field.”

      I apparently do not get out enough. What is this talking about? Who has been personally attacked, targeted, harassed, called names, intimidated? What in tarnation is the “back channel” thing about? I really have no idea at all, and am wondering what you think of this…

      The person I wrote to responded first with this:
      “Those are all valid concerns and ones that I share.”
      After spending a paragraph explaining why s/he signed it despite his/her own ambivalence, the response ended with this:

      “overall I agree that our field’s biggest problem is not incivility but ridiculous levels of deference to authority and fame/status.”

      Shravan, I have, so far, come up with NOTHING in psychology’s blogosphere that rises to the level of invective that:
      a. Fiske claims is out there
      and
      b. Fiske used herself in her attempt to discredit the scientific integrity movement.

      Now, absence of evidence is not evidence of absence. Perhaps the sharpest exchanges, ones even I would consider insulting and inappropriate, occur on Twitter and Facebook. Twitter and Facebook, however interpersonally important they may be, are scientifically trivial. That is why they are called “social” media, not “scientific media.”

      Not so of the blogosphere. Some of the best work exposing dysfunctions and offering solutions is coming out of the blogosphere. Long form essays revealing bad stats, bad methods, and unjustified conclusions, and proposing or advocating for particular types of solutions are now an invaluable part of the field, and they are not going away any time soon.

      And, indeed, one can view the comments section of blogs as itself a new and emerging form of peer review, so it is, perhaps, not completely true to describe such contributions as circumventing peer review, although it is true that it is circumventing the traditional publication process.

      And one of the great advantages of the blogosphere, is that the info gets out VERY fast. I recently had a paper come out in JESP, one of social psych’s top journals, that you can think of revealing case after case after case validating Andrew’s point about how conclusions in social and cognitive psychology are often unhinged from methods and data. You can find that here, if you are interested:
      http://www.rci.rutgers.edu/~jussim/Interps%20and%20Methods,%20JESP.pdf

      Thing is, that took YEARS to write, submit, revise, and get out. (not to mention the mild forms of hell we were put through, in which one of the reviewers accused us of engaging in “personal attacks,” a comment echoed by the editors; my collaborators did thought resubmit the paper was useless because getting it in was hopeless). A good blog can be written in a few days (or, apparently, for Andrew, in a day…).

      I also see no evidence that there are more ERRORS in the blogosphere than in peer reviewed journal articles. Actually, experientially, I would say there are FEWER. But whether the blogosphere or peer reviewed articles yield fewer errors is actually an empirical question, to which none of us, not me, you, Andrew, or Fiske, yet have an empirical answer. Still, I would say the bloggers I read, perhaps because they care about scientific integrity and are usually relatively statistically and methodologically sophisticated (not anyone can aspire to be a methodological terrorist!), have far fewer errors than do peer reviewed journals.

      Lee

  54. Jon Frankis says:

    And yet … if the field under question were climate science and Fiske was making the same kind of complaints as above? – I’d find myself on the other side of the pleasantries sympathising with the published authors against (most of) the riff raff.

    Proud to be a skeptic of much published medical and social science, but even more skeptical of the critics of properly peer-reviewed climate science. In short, I think: modern times are complicated.

    • Martha (Smith) says:

      My understanding is that climate science publications are super-peer-reviewed — not just sent to an editor and then off to someone the editor decides, but examined carefully by “peers” in several areas of science, and revised carefully to answer any questions or lack of clarity.

      • Shravan says:

        You mean stuff like Richard Tol’s is carefully peer reviewed, right? ;)

        • Martha (Smith) says:

          I meant “science” in the strict sense, not in the loose sense — in particular, not including social “sciences” such as economics.;~)

          • The essence of science is that we submit our theories to validation by observation and experiment. So let’s call it “physical sciences” rather than “science in the strict sense”. I think it’s perfectly possible to do good *real* science studying social phenomena even if perhaps much social science falls short in practice.

            • Martha (Smith) says:

              I originally was going to say “physical sciences,” but didn’t want to exclude the biological sciences, since they are involved in climate science. I wasn’t trying to say that social science can never be “real” science — but was thinking of so many of the problematical social “science” that has been discussed on this blog.

              So to try again:

              I meant science in the strict sense, where theories are submitted to validation or refutation by careful observation, experiment, and reasoning, rather than in the loose sense practiced by those who have a title or affiliation involving the word “science”, but do not submit their theories to validation or refutation by careful observation, experimentation, and reasoning.

      • Ed Snack says:

        No, no they’re not. They are most often pal reviewed. Send to specifically friendly reviewers and obvious errors are ignored. Prime example, Mann et al 2008. It contains gross data errors (upside-down data to original gatherers interpretation), inclusion of data that was designated unusable and contaminated by the original author, and inclusion of data that was recommended to be excluded (Idso & Gray Bristlecones, and it should be mentioned that a more modern set of data on the same area, the Bristlecone study by Linah Ababneh is never used though it is both more modern, comprehensive, and better modelled – but it has the “wrong answers”). When those basic outright errors are corrected the paper’s conclusions are no longer supported by the data. And yet it has not been withdrawn nor properly corrected.

        The other major issue in Climate “Science” is one that this discussion should well recognize; the “Researcher Degrees of Freedom” and in particular the post collection screening of data before inclusion. Virtually all paleo-climate papers use data that has been specifically selected from a pool of possible data because it produces the required results, usually via some pseudo-scientific screening process. Critics of this process are not just ignored but actively and personally vilified in exactly teh way that Fiske here complains about.

        I suggest that you don’t just believe what you are told, but actively investigate to see whether there are exactly the same methodological issues in Climate Science as in Psychology before making such politically and socially acceptable anti-climate change criticism.