Discussion with Lee Jussim and Simine Vazire on eminence, junk science, and blind reviewing

Lee Jussim pointed me to the recent article in Psychological Science by Joseph Simmons and Uri Simonsohn, expanding on their blog post on flaws in the notorious power pose article. Jussim then commented:

I [Jussim] think that Cuddy/Fiske world is slowly shrinking. I think your “What Has Happened Here…” post was:

1. A bit premature ONLY in its claim that Fiske was working in a dead paradigm

BUT

2. Inspirationally aspirational — that is Exactly what most of us are shooting for — to kill that paradigm.

and, therefore

3. Perhaps that claim in your post, though not descriptive, may actually be prescient.

In that spirit, you may enjoy this killer unpubbed paper by Vazire:

Against Eminence

And you can find a great turn of phrase in this Freakonomics Podcast:

“We were practicing eminence-based medicine” which I have used to frame as a question I sometimes use in talks that you might enjoy:
“Do we want an eminence-based psychology or an evidence-based psychology?”

I replied that I found Vazire’s article inspirational and I agree with its general flow, but I have some concerns.

First, I could imagine the current problems with reproducibility arising even in a completely eminence-free environment. To me, it just seems like there are basic misunderstandings of statistics, along with a desire for certainty. Indeed, when Vazire writes, “Scientists constantly struggle with the tension between human desires and scientific ideals. Why make it harder for them by giving these human desires an even stronger platform?”, this makes me think of the human desire for certainty, which is amplified by statistical methods of null hypothesis significance testing that inappropriately transmute uncertainty into certainty.

I also wonder about the statement in the paper that heterosexuals are overrepresented at high levels of status. This could be true, I guess; I’ve just never even seen this claimed before.

I also am not so sure about the claim that we can “reliably distinguish rigorous science from shoddy science.” I guess I’d have to know how these are defined.

Also it’s funny to see the link to a Freakonomics podcast given that Freakonomics has a habit of hyping junk science! We have to take the good where we find it, I guess.

I also cc-ed the above to Simine Vazire, who wrote:

I [Vazire] completely agree with you that the human drive for certainty would also lead to shortcut-taking and therefore to reproducibility problems, without adequate checks in the system (which we currently lack – see another paper of mine here: http://collabra.org/articles/10.1525/collabra.74/ )

Regarding whether we can reliably detect rigorous science from shoddy science, I think we can do a decent job if we learn our lessons from the replicability crisis. I think we now know that there are some pretty reliable indicators of rigor, based on meta-science, simulations, and sheer logic (e.g., pre-registration is good for reasons that don’t need to be empirically demonstrated, though the empirical evidence also supports the view that results from pre-registered studies are more accurate – e.g., effect sizes are less inflated).

Regarding whether heterosexuals are overrepresented in high status positions – I admit I was extrapolating here. I think there is pretty good evidence that sexual orientation is associated with discrimination (with heterosexuals receiving better treatment; though admittedly I am not an expert on this literature), and I am extrapolating from that to status. But of course it’s an empirical question and perhaps I should have been more careful. I’d be curious to know the data, and I’d bet money that, compared to proportions at lower levels of status within academia, heterosexuals are overrepresented at high levels of status. Perhaps not compared to proportions in the general population though, since I suspect sexual orientation minorities may be more likely to select into academia in the first place.

Jussim then added:

Here is why I think Simine’s article is killer, regardless of whatever imperfections it may have or disagreements I have with details:

1. She is absolutely right that the field is gaga-eyed over eminence. Fiske’s “bullying” essay only got the play it did because she is famous. But this problem is even worse in peer review. You all know the old Peters & Ceci paper? It is amazing. And it is the best concrete evidence for exactly Simine’s main point (hey, Simine, I just noticed this paper is not cited . . . ?). It shows that the exact same paper which had previously been published by someone at a prestigious institution, when resubmitted as a manuscript that had not been published and was authored by people at low status sounding institutions, was almost always rejected.

Points that I believe to be true based on anecdotal evidence and experience:

2. The old boy/girl network is alive and well, though it has morphed with the times. Connections among eminent authors, editors and reviewers creates a hidden track for publication among the eminent that the rest of us mortals do not usually have.

3. It is even more severe with grants.

None of this is absolute. Yes, NSF in 1998 or 2005 might reject Fiske’s or Bargh’s proposal, but they are far less likely to have done so than if the identical proposal had been submitted by someone less eminent.

4. Andrew’s points and Simine’s are not mutually exclusive. Scientists have many motivations that are either orthogonal to, or even in conflict with, the search for truth. Eminence is one. Certainly, science can go wrong out of ignorance, as well. I suspect, however, that some of that is motivated or willful ignorance.

5. The Identification of Shoddy Science question is an interesting one. As this empirical question, say, “Can most social psychologists distinguish good from bad social psychology?” — IDK. As this empirical question: “Can distinguishing good from bad research be done using known methods?” — I am pretty sure the answer is yes. Maybe not perfectly, but pretty well.

And Vazire then took the discussion in a slightly different direction:

I know the Peters & Ceci paper, and forgot to cite it! The sample size is quite small, but still provides evidence that should at least make us think… I still run into a lot of people who are adamantly against blinding. Here’s some more anecdotal evidence: since I started blinding myself (as editor) to authors’ identities before making desk reject decisions, it’s been a completely different (and somewhat frightening) experience. It just feels so so different. Anyone who’s against it should try it both ways for a few hundred manuscripts – I’d love to see if anyone can do that and then tell me that it makes no difference.

26 thoughts on “Discussion with Lee Jussim and Simine Vazire on eminence, junk science, and blind reviewing

  1. Why are you giving psychology so much attention on this blog? Just curious.

    On one hand, I despise the few for p-hacking; On the other, you are a much bigger person than to pick on these little guys.

  2. Let me offer a more pessimistic view. I think the problems flow from the fundamental incentives each individual researcher has to advance their own academic careers. Discovering new, true, and interesting scientific facts is hard work, and there just isn’t enough scientific gold in the ground to support all the eager prospectors. Most researchers are, therefore, tempted to try to turn their meager excavations into tenurable ore with the questionable methods you document.

    My pessimism comes from my experience that this sort of thing is widespread — not just in academia, but everywhere. Every TV commercial puts its product in the best light. Every annual report tries to cast company performance in the best light. Every politician is … well … a politician.

    I was woke to this phenomenon by reading about Steve McIntyre and Ross McKitrick’s work in climate science starting around 2000. They were early practitioners of clear-eyed auditing of empirical and statistical research, and they found many of the same things you are finding. Alas, researchers and journals reacted in much the same way you have seen.
    A good example of McIntyre’s work is his posts about a paper published in 2013 by Marcott et al. Ross McKitrick summarized that episode here:

    http://business.financialpost.com/fp-comment/were-not-screwed.

    On March 8, a paper appeared in the prestigious journal Science under the title A reconstruction of regional and global temperature for the past 11,300 years. … News of this finding flew around the world and the authors suddenly became the latest in a long line of celebrity climate scientists. … Their data showed a remarkable uptick that implied that, during the 20th century, our climate swung from nearly the coldest conditions over the past 11,500 years to nearly the warmest. … This uptick became the focus of considerable excitement, as well as scrutiny. …

    Stephen McIntyre of climateaudit.org began examining the details of the Marcott et al. work, and by March 16 he had made a remarkable discovery. The 73 proxies were all collected by previous researchers, of which 31 are derived from alkenones … the original researchers take care to date the core-top to where the information begins to become useful. …

    Fewer than 10 of the original proxies had values for the 20th century. Had Marcott et al. used the end dates as calculated by the specialists who compiled the original data, there would have been no 20th-century uptick in their graph, as indeed was the case in Marcott’s PhD thesis. But Marcott et al. redated a number of core tops, changing the mix of proxies that contribute to the closing value, and this created the uptick at the end of their graph. Far from being a feature of the proxy data, it was an artifact of arbitrarily redating the underlying cores.

    Worse, the article did not disclose this step. … they claimed to be relying on the original dating, even while they had redated the cores in a way that strongly influenced their results….

    When this became public, the Marcott team promised to clear matters up with an online FAQ. It finally appeared over the weekend, and contains a remarkable admission: “[The] 20th-century portion of our paleotemperature stack is not statistically robust, cannot be considered representative of global temperature changes, and therefore is not the basis of any of our conclusions.”

    Now you tell us! The 20th-century uptick was the focus of worldwide media attention, during which the authors made very strong claims about the implications of their findings regarding 20th-century warming. Yet at no point did they mention the fact that the 20th century portion of their proxy reconstruction is garbage.

    Here are the key McIntyre posts:
    https://climateaudit.org/2013/03/16/the-marcott-shakun-dating-service/ (replication findings)
    https://climateaudit.org/2013/03/31/the-marcott-filibuster/ (the authors’ response)

    Here is a post specifically about statistical standards in climate science:
    https://climateaudit.org/2013/04/07/marcotts-dimple-a-centering-artifact/

    Here is a chronological listing of all McIntyre’s posts showing the work he put in.
    https://climateaudit.org/2013/03/13/marcott-mystery-1/
    https://climateaudit.org/2013/03/14/no-uptick-in-marcott-thesis/
    https://climateaudit.org/2013/03/15/marcotts-zonal-reconstructions/
    https://climateaudit.org/2013/03/15/how-marcottian-upticks-arise/
    https://climateaudit.org/2013/03/16/the-marcott-shakun-dating-service/
    https://climateaudit.org/2013/03/31/the-marcott-filibuster/
    https://climateaudit.org/2013/04/02/april-fools-day-for-marcott-et-al/
    https://climateaudit.org/2013/04/04/marcott-monte-carlo/
    https://climateaudit.org/2013/04/07/marcotts-dimple-a-centering-artifact/
    https://climateaudit.org/2013/04/07/marcotts-dimple-a-centering-artifact/

    It is discouraging to think of how much work was required to audit this single paper, how few people are willing to do this work, and how cavalier the field and the journals were.

    I realize you are reluctant to include climate science in your ambit (except for heretics like Wegman), but there is a lot of money and prestige in modern climate science, and the temptations are very large. I am not surprised that the same things are going on in climate science that you have found in psychology.

  3. Related to eminence is product-pushing and idea-marketing. Even if a study is reasonably robust, it does not necessarily translate into quick application. For instance, if the power pose study had not been so egregiously flawed, it still would not have justified the promotion of power poses themselves. When there is a gap between a study’s findings and its possible applications, yet the researchers rush to champion the latter, this in itself constitutes bad science.

  4. The best quote on eminence (and one of the oldest): Eine neue wissenschaftliche Wahrheit pflegt sich nicht in der Weise durchzusetzen, daß ihre Gegner überzeugt werden und sich als belehrt erklären, sondern vielmehr dadurch, daß ihre Gegner allmählich aussterben und daß die heranwachsende Generation von vornherein mit der Wahrheit vertraut gemacht ist.

    Or, as normally paraphrased: science advances one funeral at a time.

    And Mach knew whereof he spoke.

  5. > “reliably distinguish rigorous science from shoddy science.”
    Its a spy versus spy problem, if you can distinguish it today, motivated authors will find ways to work around that.

    Pre-registration cuts off some ways of gaming (with a cost) but there are others.

    Without some audits you will have very low chance of identifying instances of _up-scaled_ shoddy science.

    (I think the first impact of RCT quality scoring guidelines/checklists was their use to fluff up poorly done research to overcome editors concerns by knowing how to best claim having done things well.)

    • Keith:

      Yeah, I know what you mean. We can imagine future Wansinks running all their programs through GRIM in order to check that their tables are possible. And checking last digits. There’s getting to be more of a motivation to go full Lacour and just make up the entire dataset.

      On the other hand, I’m guessing that most researchers aren’t flat-out cheaters and fabricators. Yes, people go through all sorts of contortions in order to defend their previously-published claims, but, moving forward, I think that most of these people would like to do things right.

      They just have to take better measurements. Otherwise, cheating is their only reliable route to success.

      • > most researchers aren’t flat-out cheaters and fabricators
        Agree, but its not that hard to get into a desperate position e.g. if you don’t get another publication this year you might lose your position – that might have happened to me. Now there was someone who asked me out to lunch about a year after I left there and they wanted to talk about whether it was serious or not if they did not meeting their publication targets. I said it was likely better not to find out.

        A good analogy is not having enough to pay the rent if you file your tax return honestly and you have been assured when you go to file it – that you won’t be audited.

      • But the problem isn’t the ratio of Lacours:Bems:Vazires in academic psychology. The problem is that academics 1) reproduce at rate greater than replacement – they have more graduate students than 1. 2) graduate students in academic psychology don’t have a meaningful alternative to an academic job presented to them. Thus there is ever increasing competition for scarce academic positions which incentivizes the bad behavior.

        • AnonAnon’s point is not the entirety of “the problem,” but it is an important part of the problem.

          The problem has existed in fields like English Literature, Classics, etc. for a long time — more Ph.D.’s produced than jobs available. But it has spread more to the sciences in recent years (e.g., http://hechingerreport.org/oversupply-phds-threaten-american-science/)

          Psychology seems to have an additional twist in “demand expands to meet supply”, namely, that many universities have three psychology departments: One in Liberal Arts or Social Sciences, one in Education, and one in Business. I’m not quite sure how the “supply chain” works among these three types of departments. Does each one mainly feed its own type, or are there pathways both ways between them, or just one way, or …?

        • Martha,

          Depending on where you are Educational psychology is most closely related to clinical psychology but tailored to dealing with the school setting and population. It is sometimes but not always focused on granting certifications for that purpose. Although some Education psyschology departments also house statisticians who work on things like large scale national tests (for example Gene Glass was in Arizona State’s Ed psych department). But the tracts are so well defined now that you might only see someone move from psychology department to an Ed psych if they figure out they want to work with kids in a school setting or if they want to work on the quant problems involved in standardized tests.

          As for the business school, most of the cross-pollination there occurs between Social psychology and Marketing/Consumer behavior. There are a few but not many social psychologists that have moved from a psych department into a business school. It’s typically one direction but that could well be related to Tversky and Kahneman’s work. (Even back further in social psychology there was some debate about the differentiation between Social psychology in the psych department vs Social psychology in the sociology department).

        • Or to put it differently one might make the same criticism that the “demand expands to meet supply” regarding the proliferation of statistics, biostatistics, and machine learning programs in different departments and colleges.

        • I’m not so sure that there is a problem of “demand expands to meet supply” fueling the “proliferation of statistics, biostatistics, and machine learning programs in different departments and colleges.” A big problem there is that the statistics supply was not large enough to meet demand, so “home grown” programs in statistics arose to meet local needs (as you have mentioned, for example, statistics related to educational testing is often housed in Educational Psychology Departments). This “dissemination of groups” does have the negative effect of promoting institutionalization of less-than-ideal methodology which might not have arisen or proliferated if there had been more communication across fields of application.

        • Martha,

          I think it’s easy to see how different programs that appear to operate under the same umbrella could be evidence of “demand expands to meet supply.” At least to someone who is a disinterested outsider. But I think the points that we’ve both raised are examples of how someone who is an insider and knows more of the history and the interconnections might view it differently. That was the point of my reframing.

          But I do take your point about how a diaspora can lead to those newly fragmented groups promoting and institutionalizing poor methodology. I think the issues with p-hacking will flow from Social psychology into some of the consumer behavior research as a by-product of that spreading. I also think that what you probably meant Industrial-Organizational (IO) psychology when talking about psychologists in business schools. Which I believe was a splintering off of more “applied researchers” from more “pure researchers” although I don’t know much about IO’s origins. And as another example of the problem you pointed out, I do think work in that field suffered from ignoring hierarchical structures or nesting in many of the research and study designs and so the lack of independence wasn’t taken into account.

        • This raises an question in my mind: to the extent that competition for scarce positions is resulting in overproduction of bad studies, can we see any similar effects in the humanities, which as you note has had that problem for decades? I feel like I’ve read some trend pieces about shuttering of university presses because they can’t make enough money selling all the monographs that nobody wants to read but everyone has to produce to get tenure. But low sales potential doesn’t necessarily mean those books are wrong in the same way that bad science is wrong.

        • Erin:

          There’s also the question of bad statistics research. Statistics journals have been filled with useless crap for . . . ummm, for about as long as there’s been statistics journals. Sturgeon’s law and all that. Bad statistics papers typically have a different flavor from bad psychology papers, in that they’re typically not quite wrong (in the way that, say, the work on beauty and sex ratio, ovulation and clothing, ESP, etc., is wrong in reporting results from noisy data that we’d have no reason to expect would occur in the general population), but just useless: some irrelevant asymptotic analysis, or the computation for some model nobody should ever want to fit, or whatever.

  6. Noodling on this “human desire for certainty” idea. It certainly rings true, and it feels like a dimension reduction problem, in a way: if we can find the One True Cause for something, it reduces the number of other things we have to keep in mind as we engage with the world. Keeping a million things in mind at once is exhausting and annoying. I suppose relief from that feeling is part of what statisticians are selling. Now I sort of feel like a pusher.

    I wonder if there’s been any psychology research on this posited drive for certainty. That would bring things full circle ;)

    • Off the top of my head, as measures of individual differences there are Need for Cognition, Need for Structure, and Preference for Consistency that all come to mind for roughly measuring a “drive for certainty.” I think they are mostly from the late 80’s and 90’s when Social psychology seemed interested in these kinds of questions.

    • EJ: Noodling on this “human desire for certainty” idea. It certainly rings true, and it feels like a dimension reduction problem, in a way: if we can find the One True Cause for something, it reduces the number of other things we have to keep in mind as we engage with the world. Keeping a million things in mind at once is exhausting and annoying. I suppose relief from that feeling is part of what statisticians are selling. Now I sort of feel like a pusher.

      I wonder if there’s been any psychology research on this posited drive for certainty. That would bring things full circle ;)

      GS: Of course there is…but would you recognize it? Would the research have to be called “desire for certainty research”? What else might it be named? Are you sure it is a *human* desire per se? And what are “desires”? What sorts of research would go to that issue? Would “desire” be in the title? Let’s assume that you don’t have any answers to the questions I asked. Who would you ask for clarification (besides me, of course)? How many different answers do you think you would get? Which ones would be “right”? How would you know?

      Anyway…

      I’ll give you some answers. The first thing the question makes me think of is so-called “observing behavior” as it is investigated in humans as well as other species. Let’s talk pigeons. The birds are trained until the penultimate condition is reached; the birds peck a plastic “key” behind a hole in the wall, the consequence of which is the occasional delivery of food (grain). Specifically, when the key is transilluminated green, pecks produce food delivery contingent on a peck if the peck occurs after the passage of some amount of time since the last food delivery, and the amount of time changes from food delivery (reinforcement) to food delivery (i.e., a variable-interval – VI – schedule of reinforcement). Say, on average, food is delivered, contingent on a peck of course, every 120 s (when the key is green). Periods of time when the key is green (and the VI schedule is “running”) alternate “unpredictably” with periods of time in which the key is red, and key-pecks have no programmed consequences. These contingencies, of course, produce discriminated responding; the pigeon pecks at a moderate, steady rate when the key is green (VI 120 s schedule), and little or not at all when the key is red (extinction). This sort of thing has been investigated for a century. But now we do something interesting; we keep the VI schedule and the extinction schedules alternating unpredictably, but we remove the green and red stimuli and replace them with a white key. Again, the schedules keep operating as they always have, but the stimuli correlated with the particular schedule (VI or Ext) no longer appear – that is, until the pigeon pecks a second key (the “observing key” – call the original key the “food key”). Eventually, the pigeon pecks the observing key and the only consequence is that the response occasionally produces either the red or green key, depending on which schedule is currently programmed. So…appreciate that the observing response does not alter the program that alternates the VI and Ext and it does not alter the schedule that arranges contingent food deliveries during the VI. Speaking colloquially, the only consequence of the observing response is to provide information about the “state” of the schedules on the food key. So…how is this not, speaking colloquially, an experiment where the pigeon “works for information that tells it with certainty which schedule is in effect on the food key”? The pigeon, again speaking colloquially, *desires* certainty, and works to get it.

      So…stimuli that otherwise would not function as reinforcers will do so if the stimuli, speaking colloquially, “provide information about uncertain events.” I will add a caveat here – if the only consequence of the observing response is the stimulus correlated with extinction, the pigeon will stop responding – pigeons, speaking colloquially, “desires information about uncertain events – as long as it is good news.” Humans may “do work” (i.e., they “desire” such information) to obtain both “good and bad news” – probably owing to verbal behavior.

      Now…there is a reason that the observing procedure is called what it is. The procedure doesn’t *create* observing behavior, so to speak, it simply makes it easy to measure by making the relevant stimuli contingent on a key-peck…a response deliberately engineered to be easily measured in experiments. But observing behavior is always occurring – under the original conditions (before the stimuli were removed and the observing key “activated”), the pigeon need only engage in other behavior that “clarifies the situation” – it must engage in the behavior called “looking at.” The ubiquitous behavior of “visually interrogating the environment for information that clarifies uncertainty” (and that is only vision – the other senses work the same way) is learned behavior – it is, speaking technically, operant behavior.

      The main points are these: 1.) the original question (i.e., if “…there’s been any psychology research on this posited drive for certainty”) suggests an extraordinary level of naiveté – the “desire for certainty” is utterly ubiquitous! It is what – in conjunction with the “sensorimotor contingencies” – causes us to, literally, learn to perceive the world and to continue to do so! But the very ubiquity of the behavior, combined with the farce that is mainstream psychology, means that laypeople and mainstream psychological “science” alike, haven’t a clue – essentially about *anything* having to do with behavior. Mainstream psychological “science” is a joke – it is, as I like to say, a “conceptual cesspool.” Anyway, thank the Heavenly Father that Gelman and the rest here are crusading to fix psychology.

      And there’s another message here. The issue of a “desire for certainty” was brought up in the context of the behavior of scientists (or, at least, mainstream psychologists and their experimental practices). This is apropos since science, in general, is the behavior of scientists (or “scientists” in the case of psychologists). IOW, a description of “what science is” depends on a natural science – the science of behavior and, I’ll tell you something my friends, very little of mainstream psychology could be called a natural science of behavior. So…talking about the practices of psychologists (and others more deserving of the title “scientist”) does, indeed, “come full circle” when it is discussed from the standpoint of a natural science of behavior.

      A closing note: I have been talking a lot and using phrases preceded by, “speaking colloquially.” The reason is obvious – learning to speak the language of a natural science of behavior requires that one needs to learn a new language – one that is antithetical to the practices of ordinary-language speakers and the psychologies built on ordinary-language notions or the newfangled jargon of “information processing” that has the same epistemic flaws as ordinary language. Ordinary language evolved (culturally) “because” it serves a vast, utterly important purpose – it allows us the ability to control each other’s behavior (you will call it “communicating”). But ordinary language did not “evolve to serve” the needs of a science of behavior, and it is utterly ill-fitted to the task and has resulted in mainstream psychology and the fields it has corrupted (much of neuroscience). Now, do you suppose there has been any research on “the desire for certainty”? How would you, or the vast majority of mainstream psychologists (and other people concerned with behavior) and neuroscientists, recognize it if there was?

Leave a Reply to Steve Sailer Cancel reply

Your email address will not be published. Required fields are marked *