Fixing the reproducibility crisis: Openness, Increasing sample size, and Preregistration ARE NOT ENUF!!!!

In a generally reasonable and thoughtful post, “Yes, Your Field Does Need to Worry About Replicability,” Rich Lucas writes:

One of the most exciting things to happen during the years-long debate about the replicability of psychological research is the shift in focus from providing evidence that there is a problem to developing concrete plans for solving those problems. . . . I’m hopeful and optimistic that future investigations into the replicability of findings in our field will show improvement over time.

Of course, many of the solutions that have been proposed come with some cost: Increasing standards of evidence requires larger sample sizes; sharing data and materials requires extra effort on the part of the researcher; requiring replications shifts resources that could otherwise be used to make new discoveries. . . .

This is all fine, but, BUT, honesty and transparency are not enough! Even honesty, transparency, replication, and large sample size are not enough. You also need good measurement, and some sort of good theory. Otherwise you’re just moving around desk chairs on the . . . OK, you know where I’m heading here.

Don’t get me wrong. Sharing data and materials is a good idea in any case; replication of some sort is central to just about all of science, and larger sample sizes are fine too. But if you’re not studying a stable phenomenon that you’re measuring well, then forget about it: all those good steps of openness, replication, and sample size will just be expensive ways of learning that your research is no good.

I’ve been saying this for awhile so I know this is getting repetitive. See, for example, this post from yesterday, or this journal article from a few months back.

But I feel like I need to keep on screaming about this issue, given that well-intentioned and thoughtful researchers still seem to be missing it. I really really really don’t want people going around thinking that, if they increase their sample size and keep open data and preregister, that they’ll solve their replications. Eventually, sure, enough of this and they’ll be so demoralized that maybe they’ll be motivated to improve their measurements. But why wait? I recommend following the recommendations in section 3 of this paper right away.

28 thoughts on “Fixing the reproducibility crisis: Openness, Increasing sample size, and Preregistration ARE NOT ENUF!!!!

  1. “will just be expensive ways of learning that your research is no good.”

    Call me cynical, but I feel that would be a massive improvement over “expensive ways to produce false positive results”.

    • Hanno:

      I agree. And I agree that prereg will supply an indirect incentive to do better work. I just don’t think prereg will, in itself, create better work. At best it will provide the space for better work to appear.

  2. “But if you’re not studying a stable phenomenon that you’re measuring well, then forget about it: all those good steps of openness, replication, and sample size will just be expensive ways of learning that your research is no good.”

    I have wondered whether large scale (replication) research is really the most optimal way to perform research for some time now. In light of this i posted the following on the website of the “Psychological Science Accelerator” (which if i understood things correctly aims to perform large scale research involving many labs, and announced they were writing a paper about the project):

    https://psysciacc.org/2017/11/08/the-psychological-science-accelerators-first-study/

    “I wondered if the following might be interesting, and useful, for you guys to investigate concerning 1) how to optimally accelerate psychological science, and 2) your possible paper about the project:

    1) Take a look at all the Registered Replication Reports (RRR’s) performed thusfar

    2) Randomly take 1, 2, 3, 4, 5, etc. individual labs from one of these RRR’s and their individual associated confidence intervals, effect sizes, p-values, no. of participants, etc.

    3) Compare the pooling of the information of these 1, 2, 3, 4, 5, etc. individual labs to the overall results of that specific RRR

    4) Try and find out what the “optimal” no. of labs/participants could be to not waste possible unnecessary resources

    5) Possibly use this information to support the no. of labs/participants per PSA-study in your paper, and/or use the information coming from this investigation to come up with a possibly improved manner to optimally accelerate psychological science.”

    I do not have the computer- and/or statistical skills to do this myself, but i thought it could possibly be a useful investigation for them concerning how to optimally perform research and accelerate psychological science.

    There has since been posted a pre-print about the “Psychological Science Accelerator”, but i could not find any concrete calculations/statistics about the use of participants except the following general statement:

    https://psyarxiv.com/785qu/

    “First, the ability to pool resources from many institutions is a strength of the PSA, but one that comes with a great deal of responsibility. The PSA will draw on resources for each of its projects that could have been spent investigating other ideas. Our study selection process is meant to mitigate the risks of wasting valuable research resources and appropriately calibrate
    THE PSYCHOLOGICAL SCIENCE ACCELERATOR investment of resources to the potential of research questions. To avoid the imperfect calibration of opportunity costs, each project will have to justify its required resources, a priori, to the PSA committees and the broader community.”

    • Quote from above: “I have wondered whether large scale (replication) research is really the most optimal way to perform research for some time now.”

      What i also worry about concerning large collaborative efforts is the large data set coming from it, and how this set might be used.

      1) For instance, i wonder if (large) existing data sets, and subsequent analyses using these existing data sets, might lead to a whole new way of A) p-hacking, and B) selective reporting.

      I reason that p-hacking and selective reporting can/will still be done using (large) existing data sets, but it doesn’t seem like it at first glance. That’s because i reason the p-hacking and selective reporting (possibly) happens by separate papers, by different researchers, and over a longer period of time.

      For instance, how can researchers control for multiple analyses by adjusting the p-value when they don’t know how many others are, or have been, analyzing the open data set they are currently working on? (p-hacking?).

      And, what do you think the chances are that researchers will “explore” the data set in tons of different ways, and then consciously or unconsciously, only “pre-register” the analyses they are subsequently “confirming” in their to be written paper? (p-hacking? selective reporting?).

      And, what do you think the chances are that researchers will only write about “findings” they want to find, but not those they don’t want to find, when analyzing the open data set? (selective reporting?).

      2) The following quote is from Standing, Sproule, & Khouzam’s 1991 paper called “Empirical statistisc IV: Illustrating Meehl’s 6th law of soft Psychology: Everything correlates with everything”:

      From the abstract: “Every one of the 135 variables, save ID number, displayed more statistically significant correlations with the other variables than could be predicted from chance. With alpha set at .05 (two-tailed), a given variable correlated significantly on average with 41% of the other variables, although the absolute magrutude of the correlations averaged only .07.”

      Now imagine this was a large “open data” set. And now imagine the possible scenarios i wrote above. I can totally see p-hacking and/or selective reporting happening all over the place. I really worry it’s similar to p-hacking, and selective reporting, as has happened in the last decades. With possibly one notable exception which i fear even makes it way worse: the studies that resulted in these large data sets, and the “findings” coming from them, can/will almost certainly never be replicated to verify matters due to the large sample size, costst, etc.

      • ““Every one of the 135 variables, save ID number, displayed more statistically significant correlations with the other variables than could be predicted from chance. With alpha set at .05 (two-tailed), a given variable correlated significantly on average with 41% of the other variables, although the absolute magrutude of the correlations averaged only .07.””

        Time to bring up Tyler Vigen’s contibution to the ubiquity of correlations for those who haven’t yet seen it:
        https://tylervigen.com/spurious-correlations

      • “What i also worry about concerning large collaborative efforts is the large data set coming from it, and how this set might be used.”

        I even worry about the step before all that. I worry the initial “pre-registration” might be sub-optimal (to say the least) as seems to be the case with large scale “collaborative” efforts involving many different labs like “Registered Replication Reports”. Please correct me if i am wrong, but they seem to me to have a very messy and/or incomplete pre-registration.

        For example, here is the “pen-in-mouth” Strack et al. “Registered Replication Report”. On page 5 of the paper there is a link depicted to the pre-registration https://osf.io/h2f98/. Now, that leads to a page where you can download a few things, but the thing that (i assume) is the actual pre-registration seems to me to only talk about the analyses, and does not mention the exact labs who will participate.

        I find this very strange. If we value the goal, and function, of pre-registering the statistical analyses we are planning to perform (so we don’t leave some out and report others), we should also value pre-registering the exact labs that will perform the study (so we don’t leave some out and report others)…

  3. Good measurement and theory are subjective judgments, at least ex ante. And social phenomena are inherently unstable. Honesty and transparency are all we can rely on to converge on good methods and results ex post.

    I await the melee! :)

    • Kevin:

      Honesty and transparency are great but they won’t do it alone, except in the indirect sense of limiting the amount of time that is wasted on hopeless studies. If researchers are studying small or highly variable and unpredictable effects with noisy measurements, they’re not going to learn much of anything useful. But, yes, indirectly an honest and transparent approach will help, in that honesty and transparency should help people realize that these studies are not working.

      We also need to distinguish honesty and transparency from morality. For example, by all accounts Daryl Bem is a wonderful human being, but his studies of ESP are not transparent. He does not present his raw data, he only presents some of his data summaries, and his mode of reporting is all about proving a point, not about open exploration. These have been the standards in his field, and I’m not calling Bem “dishonest” by following these standards, but the result is that he’s not presenting a full account of his data and experimental procedures.

      So: (a) being a good person is not enough, and (b) honesty and transparency are not enough. But I do agree that a norm of honesty and transparency should help researchers give up, or modify their approaches, faster from various dead ends.

  4. Psychology faces both difficult challenges and easy ones. It will be difficult to make psychology a genuinely useful science, because that will require much better measures and much better theories than we currently have. That ain’t going to be easy. But it should, at least in principle, be easy to shift norms in way that encourage replicability. Of course that in itself won’t solve the big challenges. But it makes sense to me to start with the easy problems and then push forward from there.

    • Eh, well if we’re just talking about USEFUL, that’s extremely easy to achieve, since some things that psychologists study are behaviors of interest in and of themselves rather than being indicators of an underlying factor. And some of those types of things have been replicated a good amount such that we know they’re real.

      Theoretical matters are probably more difficult. I get that strong theory is a good thing, I just don’t really get how you’re supposed to make it in psychology, perhaps because I’ve seen either no examples or few examples.

    • “It will be difficult to make psychology a genuinely useful science, because that will require much better measures and much better theories than we currently have”

      Perhaps this depends on what you consider to be “useful”.

      To name just one thing, psychology makes lots of people lots of money if i am not mistaken.

      Editors, peer-reviewers, scientists, and universities all play a part in what can be considered to be a giant acedemic publication scam (e.g. https://forbetterscience.com/2017/08/24/the-costs-of-knowledge-scientists-want-their-cut-on-the-scam/)

    • “But it should, at least in principle, be easy to shift norms in way that encourage replicability. Of course that in itself won’t solve the big challenges. But it makes sense to me to start with the easy problems and then push forward from there.”

      Hmm, i read this piece here:

      https://www.psychologicalscience.org/observer/preregistration-becoming-the-norm-in-psychological-science

      and if i am understanding things correctly it states that the journal Psychological Science handed out 4 pre-registration badges in 2015, 3 in 2016, and 19 in 2017.

      This seems a very low amount to me. This low number is even stranger to me, if i understood things correclty, after reading here that pre-registration might even be considered to be one of the “easy” things to tackle. This all made me wonder:

      1) How many pre-registered papers did Psychological Science receive in 2015/2016/2017?

      2) Why were these possible other pre-registered papers not published?

      3) Why does Psychological Science not simply require pre-registration?

      • I have been preregistering studies in my lab for several years and if memory serves not one of them has yet been published in a journal (although I hope some soon will be). My point is that typically there is a substantial amount of time between a researcher initially adopting preregistration and that researcher publishing preregistered work in a journal. Just because a study is preregistered does not, in my view, mean that its results warrant submission, let alone publication. My first two preregistrations (the second following up on the first) both yielded results exactly the opposite of prediction and I am still trying to figure out why. So I think it is pretty impressive that Psych Science published 19 articles that reported preregistered research in 2017. I would be surprised if any other journal would hold a candle to that. True, that’s a bit less than 10% of our empirical papers in 2017, but I think the number will be substantially higher in 2018, and higher still in 2019.

        Just because a study was preregistered does not mean that the work was worth doing or informative. It is quite easy to preregister an ill-conceived study. I don’t know how many submissions in 2017 that included one or more preregistered studies were declined, but I do know that at least some were.

        If we required preregistration at this point in time, we would not get very many submissions. But I would not be surprised if preregistration does eventually become a requirement.

        Also, standards for what is accepted as “preregistration” will gradually increase.

        Finally, preregistering is easy compared to fundamental advancements in theory and measurement. But it is not trivially easy.

        Steve

        • “Also, standards for what is accepted as “preregistration” will gradually increase. ”

          Ow, this is interesting to me! Also in light of the link above (https://www.psychologicalscience.org/observer/preregistration-becoming-the-norm-in-psychological-science)

          Can you tell me more about why and how standards of pre-registration could increase?

          I hope this will not involve any “special” in-house “pre-registration experts” who will review the pre-registrations, and i hope this will not involve hiding the pre-registration from the reader of the actual paper.

          I can totally see how journals would like something like that to happen though, as it legitimizes their role + that of peer-reviewers and can keep them in business. I am more interested in improving psychological science, and would view that scenario as a really bad idea.

          Like i wrote in the link about Registered Reports, i sincerely hope i did not work my ass of trying to improve psychological science, only for 1) certain “Open Science” people to put all the power and responsibility back in the hands of those that screwed things up, and 2) basically do the exact opposite of what they have been talking about all this time.

        • Steve Lindsay: “Just because a study is preregistered does not, in my view, mean that its results warrant submission, let alone publication.”

          I was lightly daydreaming about this on the way to work a few weeks ago.

          Why shouldn’t preregistering be sufficient for publication? If some hypothesis seemed interesting and plausible enough to get funded for a study, then shouldn’t the fact that it *didn’t* pan out as expected also be interesting as well? If 30 people tried to replicate the power pose, don’t we want a record showing that it was only “successfully” replicated twice?

          You could argue that this might lead to too many papers. I would say we’re already there…plus we have the issue that your sample of studies has been greatly biased by the significance filter.

        • “Why shouldn’t preregistering be sufficient for publication?”

          Aha, interesting point!

          The funny thing is that a certain Stephen Lindsay wrote a piece about pre-registration and “Registered Reports” in the link posted above, and here again now: https://www.psychologicalscience.org/observer/preregistration-becoming-the-norm-in-psychological-science

          What are “Registered Reports” you may ask: well those are studies that get pre-registered and published regardless of the results. In a way, to quote you, “preregistering is sufficient for publication” with that format.

          The main difference i think Steve Lindsay might argue here, is that “peer-reviewers” and/or the editor will have been super helpful during the 1st stage submission in that format: maybe they even completely changed the design or goal of the study, and/or will have decided whether the study “warrents publication”.

          Apparently “reviewer 2” does not exist with “Registered Reports”, there is no chance of editors and reviewers blocking certain research or otherwise manipulating things, and editors and reviewers all of a sudden do not make researchers leave out conditions and analyses and do other bad stuff!

          It’s like “Registered Reports” make all that is bad about “peer-review” and journals vanish all of a sudden! On top of that “Registered Reports” make researchers who act as “peer-reviewers” possess super research powers concerning research design, statistical analyses, and pre-registration that they apparently not possess as mere researchers submitting the study/proposal/paper. It’s like magic.

        • Steve Lindsay said, “standards for what is accepted as “preregistration” will gradually increase.”

          I’m not so sure about that. There is a common human tendency for “standards” to become codified quickly, which works against the tendency for initial attempts at standards to have weaknesses that need to be corrected.

        • 1) “I don’t know how many submissions in 2017 that included one or more preregistered studies were declined, but I do know that at least some were.”

          Hmm, i thought you were the editor-in-chief at Psychological Science. If so, i reason you should (want to) know how many pre-registered studies were declined in 2015/2016/2017, and/or could easily find out.

          2) “If we required preregistration at this point in time, we would not get very many submissions.”

          Ah, okay so it seems even more likely that you are the editor-in-chief given your use of “we” here.

          Assuming this is correct, and reasoning from that point onward:

          If you just stated that you don’t know how many declined submissions in 2017 (and i reason also 2015 and 2016) included one or more pre-registered studies, how exactly do you know that you would not get very many submissions if you would require pre-registration?

        • “I would be surprised if any other journal would hold a candle to that. True, that’s a bit less than 10% of our empirical papers in 2017, but I think the number will be substantially higher in 2018, and higher still in 2019.”

          Hmm, i’m not really impressed with that. But more importantly, maybe you could design “pre-registration percentage” badges for journal editors?

          I hear badges for journals are a super useful “incentive” for researchers to pre-register (why exactly is still not clear to me), so perhaps they could also work for journal editors to actually publish the pre-registered work.

          If Psychological Science stays on course this year, and i understood you correctly, that could mean you, as editor-in-chief of Psychological Science, could be eligible for a “10% of our publications used pre-registration”-badge.

          You could hang it on you fridge at home, or do something else with it perhaps.

        • “My first two preregistrations (the second following up on the first) both yielded results exactly the opposite of prediction and I am still trying to figure out why”

          Why could that be indeeed.

          While you think about that, perhaps you could read the following pre-print i wrote:

          https://psyarxiv.com/5vsqw/

          I don’t know if it makes much sense, but it may contain some links to possible answers to your ponderings (should you not be aware of them) that could perhaps be helpul in your quest for answers.

          If you don’t find the answers at first, you could take a break and look in the mirror for instance. You could then read it for a 2nd time, then look in the mirror again. Etc.

          Good luck with your quest for answers!

    • Steve:

      I agree that it will be a good step to encourage honesty and replicability. Right now we see researchers completely misrepresent data and references in social media and in published work, including in publications of the Association for Psychological Science; see for example here.

      So I think an excellent start would be for professional organizations to move toward a zero-tolerance position on lying and misrepresentation of data and references. Corrections, retractions, official apologies, the whole thing. No more attempts to talk the problems away. Instead, correction and contrition.

      • “So I think an excellent start would be for professional organizations to move toward a zero-tolerance position on lying and misrepresentation of data and references. Corrections, retractions, official apologies, the whole thing. No more attempts to talk the problems away. Instead, correction and contrition.”

        I recently started thinking that it could be the case that by wanting to work with someone/something that is part of the problem and/or responsible for the problem, you may actually be giving them unnecessary influence and keep them as an unnecessary problematic part.

        (e.g. by wanting to work with publishers “gold open access” may actually make publishers even more money and become even more powerful http://bjoern.brembs.net/2016/04/how-gold-open-access-may-make-things-worse/?utm_content=buffere157b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer)

        1) Concerning research for instance:

        Researchers who want to perform their research with some higher standards could perhaps work together in small groups, and just mostly ignore other researchers with “flair” and “magical” results.

        I reason those that want to “do the right thing” are way more dependent on the “input” of their work, and i reason it could be smart for them to start to work together in small groups that still allows all of them to try and contribute their own ideas

        (e.g. see here for my best attempt at describing such a possible research-format http://statmodeling.stat.columbia.edu/2017/12/17/stranger-than-fiction/#comment-628652)

        2) Concerning the link you provide with regards to the misrepresentation of references:

        The person who makes the error is the one who possibly looks like a fool, not the person being accused of something they did not say/write. The journal who published it, and did not want to publish a correction if i understood things correctly, possibly looks like a fool. Both of them should be the ones who should make the effort to correct, not you. If they don’t, then that tells me, the reader, something.

        3) Concerning the APS in general:

        Any organization that hands out “APS rising star” awards (apparently to members only if i understood things correctly) makes themselves possibly look like a fool. It boggles my mind that psychological science organizations feel the need to hand out individual awards, but to then also dare to call it “rising star” like we’re on “The Voice” or “American Idol” is incomprehensible to me.

        (e.g. see here for my thoughts on individual awards in science here: https://psyarxiv.com/pju9c/)

        In my reasoning the important thing is that there are alternatives: concerning psychological research, researchers, publications, and how one views contributions to science.

        Perhaps it’s more useful to simply ignore a lot of stuff/researchers/journals/organizations/etc. and focus on the alternatives: e.g. researchers who want to do research with some higher standards can maybe work in groups to help themselves, pre-prints can be posted by anyone, blogs can be written, discussions happen on other media, etc.

        Psychological Science does not belong to an organization.
        Psychological Science does not belong to a journal.
        Psychological Science does not belong to academia.

Leave a Reply to Kevin Lewis Cancel reply

Your email address will not be published. Required fields are marked *