The both both of science reform

This is Jessica. I pay some attention to what gets discussed in methodological/statistical reform research and discussions, and I’m probably not the only one who’s watched as the movement (at least in psychology) seems to be getting more self-aware recently. The other day I jotted down what strike me as some yet-unresolved tensions worth reflecting on: 

The need for more rigor in specifying and evaluating proposed reforms, versus the simultaneous need to consider aspects of doing good science that seem harder to formalize and tangential to the dominant narratives. For example, developing the ability to be honest with oneself about how flawed your work is and to value your progress in getting better at doing science over the external outcomes. These types of concerns seem very non-scientific, but in my experience are just as crucial. They might correlate with interest in methodological reform, but how to instill them without assuming intrinsic interest in reforming one’s practices?  

The fact that science reform seems to inevitably imply stretching or growing our tolerance for uncertainty in consuming results, versus the fact that many of the core challenges that are purported to have led to the current “crisis”, e.g,. heavy reliance on NHST, researcher degrees of freedom, dichotomization in reporting, etc. seem pretty clearly related to bounded cognition. By which I mean all the evidence that suggests that to make progress in thinking and communicating, we can only handle so many shades of gray at once. There’s been a fair amount of focus on finding the best (i.e., most complete, transparent) representations of evidence. But the strategies that might be applied in consuming or creating these are often not discussed much. I personally find this one interesting, and see it a lot in the kinds of topics I work on, like uncertainty visualization, and other proposals about how to do better science. Some examples: how concerned should we be that something like a multiverse analysis might overwhelm a reviewer such that judgment calls are made using heuristics, perhaps even more so than they would have been without it? How much mental NHSTing happens even when the researchers have intentionally decided not to NHST, and are there ways to keep that NHSTing in check?  

Somewhat related to that last point, previously I wrote something about signalling in science reform, where under certain conditions the simple fact of whether one has used certain practices or not may be used to make value judgments about the work. This isn’t entirely bad of course (transparency is good and undoubtedly correlated with more careful research, at least at this point in time!) But I think there’s room for more focus on how people use and consume some of the new practices being espoused, since one thing we can be pretty sure of is that we all like to mentally substitute simple things for hard things. 

The need to acknowledge the various ways in which a diversity of methods and intellectual perspectives is naturally good for scientific progress, while coming to some agreement in terms of what are better versus worse methods or mindsets for doing science, or more or less replicable areas of research. There have been several simulation-based studies showing how different types of methodological and ideological diversity can have positive effects. But, many common science reform narratives imply there’s a certain set of things that are definitively best.  

Similarly, the need to agree upon and advocate for practical recommendations (like transparency, preregistration) that are well motivated in the current situation, while not forgetting that critical evaluations of proposals will be crucial if the movement is to stay intellectually malleable, i.e., as evidence or theory about limitations of proposals grows.  

The need for oversight through, for instance, stricter “auditing” of materials in paper reviewing, and for blunt, outside criticism to become more accepted, versus the facts that a) centralization of knowledge and resources to enact audits has been difficult to achieve, b) in the absence of a trusted, centralized force to audit, retract, etc., these attempts may be more likely to come from those already in power and more likely to target those who well, seem like easy targets, such as scientists working on problems that are already marginalized, taking unfavored (at least within academia) perspectives, or coming from backgrounds that are already marginalized.

Finally, the need to instill awareness of analysis degrees of freedom and along with that to continue to theorize about the underspecified distinction between EDA and CDA, versus the need to redirect some focus to “design freedom,” which seem harder to theorize about and create guidelines around. Degrees of freedom in experiment design came up in a comment on a recent post, and I think of it as all the kinds of tweaking one can do of the experimental setup, prompts, etc to get desired “evidence” for a hypothesis. With enough experience and ingenuity, I tend to think a good experimenter could probably design a compelling experiment to demonstrate many things of questionable real world importance. True, an experimental paper has to clearly describe the tasks and interfaces, but when you look at the process of getting to whatever the final set of tasks and interfaces is, the researchers have many many degrees of freedom. Combined with a hypothesis that isn’t clearly testable to being with, can look a lot like an even bigger garden of forking paths. It reminds me of how back when I learned experiment design, by taking courses and working with economists, the more I came to understand their experiments, the more I found myself thinking, Wow, this is not what I thought it was. These were researchers doing what I would call solid empirical work, but there was a certain cleverness of getting the design to work that struck me as more art than science.   

What am I missing?

P.S. The weird title comes from a phrase Ezra Pound used to refer to situations having a dual nature, e.g., the need to be both the poet and the warrior, actor and the critic. The phrase has stayed with me as a reference to duality as the solution. So I guess the message of this post is about simultaneously accepting and rejecting what we know about how to improve science. 

P.P.S. I just learned that Ezra Pound was a fascist. No connection between fascism and science reform was intended!

22 thoughts on “The both both of science reform

  1. Jessica:

    I agree with what you wrote above. One issue is that sometimes things can be bad or good. For example, “researcher degrees of freedom” or “forking paths” can be a bad thing; Simmons et al. wrote a (justly) influential paper how how unrecognized researcher degrees of freedom can lead to persistent scientific errors. On the other thing, when modeled appropriately, researcher degrees of freedom and forking paths are a real plus, as they are tools in exploration and learning. And both the good and bad are real! It’s not just the choice of labels; these really do represent aspects of workflow that have both positive and negative aspects.

    Also, your statement about the need for more rigor in specifying and evaluating proposed reforms reminds me of this recent paper, The case for formal methodology in scientific reform, by Devezer et al.

  2. 1. Jessica said,
    “For example, developing the ability to be honest with oneself about how flawed your work is and to value your progress in getting better at doing science over the external outcomes. These types of concerns seem very non-scientific, but in my experience are just as crucial. They might correlate with interest in methodological reform, but how to instill them without assuming intrinsic interest in reforming one’s practices?”

    I think that “developing the ability to be honest with oneself” about a lot of things is important. I suggest that the key here is to try to instill this value early, and by example. I realize that this is a big task — this is something that many people are not good at; the first step is to help teachers learn to value this, so that they can teach by example, rather than “by authority”.

    2. Jessica also said,
    “The fact that science reform seems to inevitably imply stretching or growing our tolerance for uncertainty in consuming results, versus the fact that many of the core challenges that are purported to have led to the current “crisis”, e.g,. heavy reliance on NHST, researcher degrees of freedom, dichotomization in reporting, etc. seem pretty clearly related to bounded cognition. By which I mean all the evidence that suggests that to make progress in thinking and communicating, we can only handle so many shades of gray at once.”

    I think a big part of the problem is that many people’s views of “what science is” are very limited. To me, uncertainty is a core part of science. Two examples:

    1. The Heisenberg uncertainty principle tells us that we cannot measure both position and momentum accurately — the more precise our measurement of one of these is, the more imprecise the other is. I was fortunate to have encountered this while I was still in high school.

    2.Uncertainty is also inherent in cellular biology — in particular in meiosis, where daughter cells may are typically not identical.

    I think that we need more education, at an early stage (high school, but starting even earlier), in uncertainty and probabilistic thinking. To do this, we need to start by educating teachers (both science and math) in these thinking skills and values.

    • I agree that “developing the ability to be honest with oneself” is important, but in most cases it won’t stick unless institutions develop the ability to reward people for it.

    • > developing the ability to be honest with oneself

      Strong agreement here! It reminds me of two issues.

      First, I think this is emblematic of the more general question of how to raise children. On the one hand, we want them to learn specific skills and knowledge that will let them succeed. But on the other hand, we want them to be independent enough to know how to use those skills and knowledge without guidance—and to throw them out entirely when the time comes. Often, teaching skills/knowledge requires structure, but fostering independence requires some degree of free reign, so the challenge is how to find a balance.

      Second, a lack of self-honesty leads to two kinds of problems. We tend to focus on the problem when people *act* without self-honesty, leading to overconfidence and an inability to accept uncertainty or admit error. This problem is ultimately that people overestimate what they can do. But people can be wrong in the other direction too: If people underestimate what they can accomplish, they will stay quiet or fail to act or accept the status quo. I view this as an inability to be honest with oneself about the quality of one’s ideas or ability, and I think this problem might be even more prevalent, but by its nature goes largely unseen.

      > many people’s views of “what science is” are very limited

      Again, strong agreement here, and I also agree that correcting this will go a long way toward addressing the self-honesty issue. Science is often presented (especially in textbooks) as about making a specific hypothesis and then testing it, and students inevitably come to believe that a good scientist is one who keeps confirming their hypotheses, because that shows they were “right” or “smart”. This leads some people to abandon science because they don’t think they can make correct guesses like a good scientist can (this is the under-confidence problem). It also, more obviously, leads some people to do shoddy research with empty hypotheses so that they can keep “winning” according to the rules they were taught.

  3. Very interesting. As a journal editor with a front seat to watch all this “reform” play out in psychology. I have come to think that the attitudinal issues you describe, like being able to admit error and tolerating uncertainty (and other ones like wanting to solve a problem rather than “get a publication”) are central. Many of the practices that you refer to as signaling are, in my view, counterproductive, such as pre-registration, which often serves as a psychological bulwark against flexibility in the service of problem solving (even though it is “allowed” to explain a departure from the pre-registered plan). Others involve reporting of effect sizes, which has now gotten to the point where it is almost impossible to read the Results section of a paper, since every sentence is filled up with several different ways of reporting a result, just to be sure (e.g., F, p, c.i., eta^2_p). Often the main point is omitted, such as the proportion of subjects who did X with or without manipulation Y.

    Editors and reviewers also need to be flexible about what counts as a publishable set of results. Somewhere we need to draw the line and reject papers that are entirely full of ambiguous results, even when the paper is open about admitting the uncertainty. This decision is more difficult than it was in the days when everyone relied on null-hypothesis testing of a single main hypothesis.

    • >Others involve reporting of effect sizes, which has now gotten to the point where it is almost impossible to read the Results section of a paper, since every sentence is filled up with several different ways of reporting a result, just to be sure (e.g., F, p, c.i., eta^2_p). Often the main point is omitted, such as the proportion of subjects who did X with or without manipulation Y.

      Exactly! It drives me crazy when I’m reading or helping someone write a results sections for experimental work, often in the JDM space that I work in, and it seems that everything has been reported except for what I actually want to know: what were the overall differences were in terms of response scale, how much variance was there in the raw data, how far is the modal response or sometimes the model intercept from what we would expect if everyone understood the decision or judgment task properly. And (related to design freedoms), if the intervention can be parameterized (e.g., an experiment that varies stimulus intensity along some scale), how “zoomed in” was the range of parameter values used to create the experimental stimuli relative to what we would expect in the world.

      I also very much can relate to this:
      >Somewhere we need to draw the line and reject papers that are entirely full of ambiguous results, even when the paper is open about admitting the uncertainty. This decision is more difficult than it was in the days when everyone relied on null-hypothesis testing of a single main hypothesis.

      I’ve seen this in the fields I publish in (and even benefitted from it myself), where papers that would previously have been hard to publish because the effects just weren’t obvious or important enough are now seen by reviewers as valuable despite the wash of uncertainty, because everyone is afraid to reject something that’s well reported, uses nice models, etc. This bothers me because the outcome of publishing everything that’s well reported regardless of how ambiguous it is, can be that we either a) end up with a bunch of papers where the only non-tenuous claim that emerges is, ‘Look, this behavior is noisy,’ or b) (perhaps more likely) end up with lots of papers where the results are quite ambiguous and any effects that are found are questionable getting interpreted/cited later as though those small effects were equally important to anything else in the literature.

      • Meehl actually discussed a variant of this problem (dealing with pilot studies) and as far as I know he had no ideal solution, but he did proffer the idea that the situation would be much improved if tenuous studies (that is those that are ambiguous, not even close to definitive, report failed tests, etc) were publishable as short articles not exceeding some small number of pages; this remedy removes the publishing barrier (i.e. the record is reported honestly) but also segments the studies such that the mental compartmentalization you worry won’t take place is reflected in the way the article is published (i.e. no mental effort required by the reader).

        I haven’t thought much on this, but it seems to me to be not such a bad idea? Naturally issues will arise with readers not giving a paper its due (or the converse) as a consequence of what form a paper is published in. But I don’t think you can get away from that with any form of publishing.

        • I also like the idea to have specific publication venues for this kind of thing (that shouldn’t be perceived as being of lower standard), maybe special sections of journals?

        • I agree with Allan and Christian’s comments. These “tenuous” studies provide what I think of as “yeoman” work that needs to be recognized as important beginning steps toward understanding the phenomena being studied.

        • I like these ideas. I wonder how many papers end up getting published this way. At least in psychology I suspect a good portion of “failed studies” also end up getting tweaked until they are “succesful studies”, by playing with the design of the experiment, what gets measured etc. Would be nice to normalize reporting of all the changes that were made on the road to the final version of a reported experiment as well.

        • It should be illegal to publish materials from govt funded studies anywhere other than a pure archive of studies administered by the govt. Basically an ArXive type thing. Could be contracted out to someone, but basically funded by and approved as the sole publication method for govt funded studies. Voila. Problem solved.

        • Daniel:

          That works if you combine it with my idea to replace journals with recommender systems. Also it should be allowed to publish government-funded work on free sites. For example, I should be allowed to write about government funded work on my blog.

        • The whole model is broken. Once posted what’s the need / incentive to republish.

          It’s like having the NYT run articles already posted elsewhere. So why would I pay to read them at NYT.

          And if NYT can allow me free reading then who pays to run it.

Leave a Reply to gec Cancel reply

Your email address will not be published. Required fields are marked *