How to think scientifically about scientists’ proposals for fixing science

I wrote this article for a sociology journal:

Science is in crisis. Any doubt about this status has surely been been dispelled by the loud assurances to the contrary by various authority figures who are deeply invested in the current system and have written things such as, “Psychology is not in crisis, contrary to popular rumor . . . Crisis or no crisis, the field develops consensus about the most valuable insights . . . National panels will convene and caution scientists, reviewers, and editors to uphold standards.” (Fiske, Schacter, and Taylor, 2016). When leaders go to that much trouble to insist there is no problem, it’s only natural for outsiders to worry . . .

When I say that the replication crisis is also an opportunity, this is more than a fortune-cookie cliche; it is also a recognition that when a group of people make a series of bad decisions, this motivates a search for what went wrong in their decision-making process. . . .

A full discussion of the crisis in science would include three parts:
1. Evidence that science is indeed in crisis . . .
2. A discussion of what has gone wrong . . .
3. Proposed solutions . . .
I and others have written enough on topics 1 and 2, and since this article has been solicited for a collection on Fixing Science, I’ll restrict my attention to topic 3: what to do about the problem?

Then comes the fulcrum:

My focus here will not be on the suggestions themselves but rather on what are our reasons for thinking these proposed innovations might be good ideas. The unfortunate paradox is that the very aspects of “junk science” that we so properly criticize—the reliance on indirect, highly variable measurements from nonrepresentative samples, open-ended data analysis, followed up by grandiose conclusions and emphatic policy recommendations drawn from questionable data—all seem to occur when we suggest our own improvements to the system. . . . I will now discuss various suggested solutions to the replication crisis, and the difficulty of using scientific evidence to guess at their effects.

And then the discussion of various suggested reforms:

The first set of reforms are educational . . . These may well be excellent ideas, but what evidence is there that they will “fix science” in any way. Given the widespread misunderstandings of statistical and research methods, even among statisticians, what makes us so sure that more classroom or training hours will make a difference?

The second set of reforms involve statistical methods: I am a loud proponent of some of these ideas, but, again, I come bearing no statistical evidence that they will improve scientific practice. My colleagues and I have given many examples of modern statistical methods solving problems and resolving confusions that arose from null hypothesis significance testing, and our many good stories in this vein represent . . . what’s the plural of “anecdote,” again?

A related set of ideas involve revised research practices, including open data and code, preregistration, and, more generally, a clearer integration of workflow into scientific practice. I favor all these ideas, to different extents, and have been trying to do more of them myself. But, again, I don’t see the data demonstrating their effectiveness. If the community of science critics (to which I consider myself a member) were to hold the “open data” movement to the same standards that we demand for research such as “power pose,” we would have no choice but to label all these ideas as speculative.

The third set of proposed reforms are institutional and involve altering the existing incentives that favor shoddy science and that raise the relative costs to doing good, careful work. . . . these proposals sound good to me. But, again, no evidence.

I conclude:

The foregoing review is intended to be thought provoking, but not nihilistic. One of the most important statistical lessons from the recent replication crisis is that certainty or even near-certainty is harder to come by then most of us had imagined. We need to make some decisions in any case, and as the saying goes, deciding to do nothing is itself a decision. Just as an anxious job-interview candidate might well decide to chill out with some deep breaths, full-body stretches, and a power pose, those of us within the scientific community have to make use of whatever ideas are nearby, in order to make the micro-decisions that, in the aggregate, drive much of the directions of science. And, when considering larger ideas, proposals for educational requirements or recommendations for new default statistical or research methods or reorganizations of the publishing system, we need to recognize that our decisions will necessarily rely much more on logic and theory than on direct empirical evidence. This suggests in turn that our reasoning be transparent and openly connected to the goals and theories that motivate and guide our attempts toward fixing science.

21 thoughts on “How to think scientifically about scientists’ proposals for fixing science

  1. Research is in crisis, not science. It is in crisis because they decided to replace science (eg independent replications, precise predictions) with something else (eg peer review, statistical significance).

  2. Very interesting! I have also wondered about proposed “solutions” to current problems and have wondered if they are really tackling the issues. Perhaps better expressed: i think there are implicit/hidden issues no-one talks about.

    For instance, it has often been said that researchers are driven by the incentive to publish many papers. This in turn makes them cut corners, etc. I think this could play a part in early career researchers, but to me does not explain why tenured researchers, who i reason have no “pressure” to publish lots of papers, don’t adhere to some higher standards. Also, perhaps there is not necessarily anything wrong with rewarding researchers who publish a lot, the main issues is the quality not the quantity.

    Also, it has been proposed to simply report everything in detail as a “solution”. This to me also does not make sense. You can find anything you want in psychology by using a few tricks (https://www.researchgate.net/publication/286363969_False-positive_psychology_Undisclosed_flexibility_in_data_collection_and_analysis_allows_presenting_anything_as_significant). Simply telling the reader you used the tricks does not affect the amount of BS you are putting out there.

    In my reasoning the solution is very simple: make sure researchers can not use all the tricks they have been using up until now, hereby increasing the quality/replicability/usefulness of their research. From what i understand, they best method for psychology for achieving these types of things are Registered Reports: https://osf.io/8mpji/wiki/home/ where the possibility for using tricks is minimized and where even results that potentially do not fit with the possible story the researchers wants to put out there are being published.

    I think possible overlooked important, and not talked about, issues in psychology are conflicts of interests (e.g. i want my book deal based on my “research”), ideologies (“i want to prove X”), and outside influence (e.g. government, interest groups, etc.). Given the potential power of the Registered Report format to also solve these possibly overlooked issues, it is of no surprise that there has been a lot of backlash concerning it (https://www.timeshighereducation.com/comment/opinion/pre-registration-would-put-science-in-chains/2005954.article), and that very few psychologists/journals are using the format.

    To me, non-scientific issues like the possibly overlooked issues mentioned above explain psychological researcher’s and journals refusal to adopt the improved research format that is Registered Reports en masse. To me, the overlooked issues also explain why some social psychology journals (http://www.tandfonline.com/doi/full/10.1080/23743603.2015.1070611) are coming up with “alternative” so-called “pre-registration” formats in which they keep the pre-registration information a secret to the reader. In my reasoning this makes it virtually useless from a scientific point of view, but more importantly it makes me wonder why a social psychology journal that claims to value good practices and transparency would hide the pre-registration information from the reader? I find it quite astonishing, and incredibly funny at the same time.

    Social psychology: Fool me once, shame on you. Fool me twice, shame on me.

  3. A well-known English professor at Johns Hopkins (now passed) wrote in the 1980s about how much the culture of his dept, college, and, indeed, higher ed (in his assessment) transformed when universities started paying higher wages in the late 1970s. Before that, it was a congregation of scholars who were attracted by (and compensated by) access to library resources, time to study and converse and teach. They all lived close to the university, regularly had graduate students to their homes, and the coin of the realm, as it were, was scholarly quality.

    Higher wages brought careerists, a topic addressed in this blog before. And that changed everything. No one wants to go back to “a scholar under a tree,” of course, but until the disproportionate economic incentives are addressed, it is hard to envision a movement for “fix” forming on its own from the bottom, up.

    At our university, new assistant professors (with only their grad school production as evidence) in some departments started this past year at $300K (including summer support, which continues for three years but somehow ends up extending farther). With this salary comes pressure to produce — the college needs it to charge higher fees, the university needs that to cut the colleges loose financially on their own, the overall budget needs that to justify grants and funded research and publicity. So they produce.

    When you need to secure a grant, assure a promotion, lock in a job offer, whatever…, scrutinizing the experimental design (aided by a statistics grad student in another college from yours) and the hypothesis tests in your last piece of work with three co-authors, all addressing their own needs, is not really on anyone’s top ten list.

    When the prize is monetary and the consequences for publishing something that’s not supported by the data don’t come close to the consequences of not publishing that same work, that is a hard temptation to resist in the moment. The real consequences of bad science occur in the aggregate but are hugely diffused at the individual level — it’s something close to the free rider model. Other than outright fraud, most of the “publish / don’t publish” decisions have some gray in them for the authors. Having them or their friends/colleagues who referee them or the friends/colleagues of their mentors who referee or review grant proposals — having this network be the filter for low-quality scientific work borders on pollyannaish.

    The economic engine driving these trends is powerful, pervasive, and as higher ed (in the main, not the elites so much) faces all kinds of financial challenges over the next decade, it’s hard to imagine individuals, departments, colleges, labs, or other university entities exerting any real leverage any time soon. But, true to form, their will be more statements, more public commitments, more conferences, more journal articles about other journal articles. Cornell is taking the lead on this as we write.

    There’s definitely hope, but economic incentives are powerful. As they say in security talks, computer security is not a technical problem it’s a people problem. Likewise here — good science is such a precious asset and security for that asset is ultimately not a program issue but, again, a people issue.

    • +1

      In 1925, William Carlos Williams wrote the following: “We crave filling and eagerly grab for what there is. The next step is, floating upon cash, we wish to be like the others. Now come into the universities the conformists of all colors…”

    • Andrew has written a lot about quality problems in psychology. Is it really true that salaries for an assistant professor position in psychology is anywhere near 300k? If I had to guess salaries in this discipline at this level, I would say around 75k.

      • Not in psychology but there are ‘high demand’ asst professor positions whose package last year was at $300K. We aren’t alone, of course, because the market is specifying that’s what it takes to get people for these positions. This hire goes with a reduced teaching load. The pressure on these people to produce is significant and explicit. Dissertations now, as you know, are written up in article format, usu three or four pre-packaged articles make up the dissertation. The effort in the first year or two is focused on getting those in, reviewed, and accepted. There is no one in the entire authority ladder actively pushing for a review of hypotheses, tests, or anything in the “fix” portfolio, valuable as those are. Nothing other than almost pure logistics.

  4. “…the reliance on indirect, highly variable measurements from non-representative samples, open-ended data analysis, followed up by grandiose conclusions and emphatic policy recommendations drawn from questionable data…” A beautifully concise (and depressing) summary of the overall problem.

    I’m increasingly of the opinion that we probably need to add wilful dishonesty to the picture – at least in my research domain. Absent any real negative consequences for such dishonesty the problem will persist and become the norm. A part of me suspects that it already has become the norm.

  5. I don’t think we should ask for “evidence” for our research practices. The proof must be sought in the pudding itself. There is a big difference between asking for evidence of a pill working (and not harming) and asking for evidence for a practice. The difference, I think, lies in whether an understanding is part of whether or not the practice works. Research methods can’t be applied mechanically. They aren’t like taking a pill. P values have been treated a bit like a pill that you don’t have understand in order for the effect to be felt.

    What’s that anecdote about Niels Bohr? A guest noticed a horse shoe hanging over his door. “You don’t believe in that nonsense do you, Niels?” “No,” he answered, “but I’m told it works anyway.” Well, methodology isn’t like that. Nor is research management. Nor tenure processes. But impact factors are certainly approached in that spirit.

    • Thomas:

      You write, “Research methods can’t be applied mechanically. They aren’t like taking a pill.” But social interventions aren’t like taking a pill either. For that matter, lots of medicine isn’t like taking a pill! So I think there’s a larger problem here, which is that statisticians and scientists identify causal inference with the “taking a pill” paradigm.

      • I agree. Power posing isn’t like taking a pill either. You probably have to be in the right mood for it to have an effect on your chances of getting the job, or something.

        The problem with evidence-based social policy is that it shifts responsibility onto, precisely, the evidence base. It lets policy-makers wash their hands ever before a plan is implemented. What is really needed in governance is decisions that decision-makers take responsibility for and therefore follow up on. Also, you have to be responsive to the will of the people (at least in a democracy). Social processes aren’t causal; they are, for lack of a better word, moral. The effects are continuously passed through the hearts and minds of the people.

        The demographic effect of birth, death and immigration rates can be determined scientifically, of course. But the proper attitude of 61-year-old divinity professors about diversity training can’t be weighed against “the evidence”.

    • Thomas: > don’t think we should ask for “evidence” for our research practices
      “Evidence” can be very expensive with a low probability of getting it at all.

      But I do think we should be clear where and when we don’t have much if any of it and always be on the look out for affordable ways to get some.

      More generally, we will never be sure if we have enough evidence and its shelve life is completely unknown anyways.

      • I guess I imagine that experience will in most of these cases offer such a strong prior that no amount collected evidence will actually contribute new knowledge. But I’m not going to rule out the possibility a priori, as it were.

        • I do mostly agree with you.

          In hindsight this empirical research of mine was no more than ornamental https://link.springer.com/article/10.1007%2FBF02596342?LI=true as the math was not in question.

          Most of the empirical research on methodology done by the Cochrane group seemed questionable (are risk ratios more homogeneous than odd ratios?) or largely uninformative compared to background prior knowledge (does blinding of treatment assignment affect results?).

          Much of the stuff Metrics currently seems to be working on likely will the same?

          A proper cost benefit analyses of evidence _should_ help – but we have no empirical evidence for that.

    • Alan:

      Almost all comments on this blog are with respect and sincere, I think!

      Regarding your suggestion: Yes, I agree. But as of now, the empirical evidence is weak and we still need to make decisions. It’s fair enough to say that we’re still in the “exploration” phase and far from the “exploitation” stage of the process. That’s one of the points of my article. Another point is that it’s not clear what are good ways to gather the empirical evidence. In our statistics textbooks we emphasize the importance of gathering data using randomized experiments and random-sample surveys, yet when considering proposed methods, practices, and reforms, we don’t seem to do so much of this. Lots of mixed messages circulating here.

  6. One of the things that Taleb (the Black Swan guy) talks about a lot is “skin in the game.” If no one has much riding on the accuracy or usefulness of the results and papers are published largely as part of a pointless academic tournament, then bogus results don’t matter that much and I kind of doubt constant reminders about statistical methodology will change things very much. Marketing is similar in a lot of ways to psychology research, but marketing is greatly superior from a “skin in the game” perspective because it’s highly actionable and they need to get results. If a paper in an academic journal is wrong, it simply doesn’t matter in most cases. Sure, if something gets a lot of attention the government or Bill Gates or someone might waste a lot of money putting the “findings” into practice, but that is way downstream.

  7. This honestly gets depressing. I feel like all these methods are great, but human actors keep subverting them for career enhancement. It feels like any statistical or methodological reform would ultimately be gamed. It really feels like those institutional barriers are insurmountable sometimes.

    Apologies, just feeling pessimistic about the whole enterprise today. Gotta keep trying to do better anyway.

    • “I feel like all these methods are great, but human actors keep subverting them for career enhancement”

      Good point! Proposed solutions to current problems should be aware of this, and should try and reason (and subsequently test) how folks could potentially abuse things.

      But, when i take a look at Registered Reports, where researchers use high powered designs, pre-registration (included in the paper!!), and publishing of results no matter the outcome, i have a hard time coming up with ways researchers could game this. The only way to do that is to commit actual fraud, which is a big difference from ways researchers gamed the system up until now (e.g. p-hacking, and hiding results are not (yet) seen as fraud), and is something you never can get rid off, and which i reason is a big enough step for most researchers to not do that.

      “Apologies, just feeling pessimistic about the whole enterprise today.”

      When i feel down about it all, i try to look for, and find, researchers who value good practices and try out possible improvements. Perhaps you could even work together on something. The thing i am trying to say is that perhaps even in a flawed system, researchers can build islands of sound research and simply ignore the rest.

      • Doing science and research the right way is like doing quality exercise — everyone is going to do that in the future, along with eating better and getting more sleep. Right now, I just have to get this one paper out because my retention/co-author/funder/promotion is really demanding it. But there are examples of great discipline in science, as in exercise, out there and they stand as reminders of what it looks like and motivators that it is possible, and that’s inspiring. Hopefully their influence grows as the analog-obesity problem of bad science gets more publicity. If the trend gets popular enough, TED will pick it up — they appear to be content-agnostic and read only Nielsen.

Leave a Reply

Your email address will not be published. Required fields are marked *