The value (or lack of value) of preregistration in the absence of scientific theory

Javier Benitez points us to this 2013 post by psychology researcher Denny Borsboom.

I have some thoughts on this article—in particular I want to compare psychology to other social science fields such as political science and economics—but first let me summarize it.

Preregistration and open science

Borsboom writes:

In the past few months, the Center for Open Science and its associated enterprises have gathered enormous support in the community of psychological scientists. While these developments are happy ones, in my view, they also cast a shadow over the field of psychology: clearly, many people think that the activities of the Center for Open Science, like organizing massive replication work and promoting preregistration, are necessary. That, in turn, implies that something in the current scientific order is seriously broken. I think that, apart from working towards improvements, it is useful to investigate what that something is. In this post, I want to point towards a factor that I think has received too little attention in the public debate; namely, the near absence of unambiguously formalized scientific theory in psychology.

Interesting. This was 6 years ago, and psychology researchers are continuing to have these arguments today.

Borsboom continues:

Scientific theories allow you to work out, on a piece of paper, what would happen to stuff in conditions that aren’t actually realized. So you can figure out whether an imaginary bridge will stand or collapse in imaginary conditions. You can do this by simply just feeding some imaginary quantities that your imaginary bridge would have (like its mass and dimensions) to a scientific theory (say, Newton’s) and out comes a prediction on what will happen. In the more impressive cases, the predictions are so good that you can actually design the entire bridge on paper, then build it according to specifications (by systematically mapping empirical objects to theoretical terms), and then the bridge will do precisely what the theory says it should do. . . .

That’s how they put a man on the moon and that’s how they make the computer screen you’re now looking at. It’s all done in theory before it’s done for real, and that’s what makes it possible to construct complicated but functional pieces of equipment. This is, in effect, why scientific theory makes technology possible, and therefore this is an absolutely central ingredient of the scientific enterprise which, without technology, would be much less impressive than it is. . . .

My [Borsboom’s] field – psychology – unfortunately does not afford much of a lazy life. We don’t have theories that can offer predictions sufficiently precise to intervene in the world with appreciable certainty. That’s why there exists no such thing as a psychological engineer. And that’s why there are fields of theoretical physics, theoretical biology, and even theoretical economics, while there is no parallel field of theoretical psychology. . . .

Interesting. He continues:

And that’s why psychology is so hyper-ultra-mega empirical. We never know how our interventions will pan out, because we have no theory that says how they will pan out . . . if we want to know what would happen if we did X, we have to actually do X. . . .

This has important consequences. For instance, as a field has less theory, it has to leave more to the data. Since you can’t learn anything from data without the armature of statistical analysis, a field without theory tends to grow a thriving statistical community. Thus, the role of statistics grows as soon as the presence of scientific theory wanes.

Not quite. Statistics gets used in atomic physics and in pharmacometrics: two fields where there is strong theory, but we still need statistics to draw inference from indirect data. The relevant buzzword here is “inverse problem.”

Borsboom’s comments all seem reasonable for the use of statistics within personality and social psychology, but I think he jumped too quickly to generalize about statistics in other fields.

Even within psychology, I don’t think his generalization quite works. Consider psychometrics. Theory in psychometrics isn’t quite as strong as theory in physics or pharmacometrics, but it’s not bad; indeed sometimes they can even break down the skill of solving a particular problem into relevant sub-skills. To fit such models, with all their latent parameters, you need sophisticated statistical methods.

So, yes, they use fancy statistics in junk science like ESP and embodied cognition, but they also use fancy statistics in real science like psychometrics that involve multivariate inference from sparse data.

Borsboom continues:

It would be extremely healthy if psychologists received more education in fields which do have some theories, even if they are empirically shaky ones, like you often see in economics or biology.

Again, what about psychometrics? “Psychology” is not just personality and social psychology, right?

The post also has several comments, my favorite of which is from Greg Francis, another psychology researcher, who writes:

I [Francis] liked many of the points raised in the article, but I do not understand the implied connection between the lack of theoretical ideas and the need for preregistration. . . .

If we do not have a theory that predicts the outcome of an experiment, then what is the point of preregistration? What would be preregistered? Guesses? It does not seem helpful to know that a researcher wrote some guesses about their experimental outcomes prior to gathering the data. Would experimental success validate the guesses in some sense? What if the predictions were generated by coin flips? . . .

I agree with Francis here. Indeed, we have an example of the problems with preregistration in a study published a year ago that purportedly offered “real-world experimental evidence” that “exposure to socioeconomic inequality in an everyday setting negatively affects willingness to publicly support a redistributive economic policy.” This was a study that was preregistered, but with such weak theory that when the result came in the opposite direction as predicted (in a way consistent with noise, as discussed for example in this comment), the authors immediately turned around and proposed an opposite theory to explain the results, a theory which in turn was presented as being so strong that the study was said to “[advance] our understanding of how environmental factors, such as exposure to racial and economic outgroups, affect human behavior in consequential ways.” Now, it’s tricky to navigate these waters. We can learn from surprises, and it’s fine to preregister theories, including those that do not end up supported by data. The point here is that the theory in this example is indeed pretty close to valueless. Or, to put it more carefully, the theory is fine for what it is but it does not interact usefully with the experiment, as the experimental data are too noisy the data. This is the same way in which the theory of relativity is fine for what it is but it does not interact usefully with a tabletop experiment of balls rolling down inclined planes.

So, yes, as Greg Francis says, you don’t get much scientific value from preregistering guesses.

The other thing which came up in the discussion of the above example, which was not explicitly mentioned by Borsboom, is that a lot of social scientists seem to turn off their critical thinking when a randomized experiment is involved. I wrote that Borsboom didn’t explicitly mention this point, but you could say that he was considering it implicitly, in that lots of junk science of the psychology variety involves experimentation, and there does seem to the attitude that:

random assignment + statistically significant p-value = meaningful scientific result,

and that:

random assignment + statistically significant p-value + good story = important scientific result.

The random assignment plays a big part in this story, which is particularly unfortunate in recent psychology research, given that psychologists are traditionally trained to be concerned with validity (are you measuring something of interest) and reliability (are your measurements stable enough to allow you to learn anything generalizable from your sample).

There’s a lot going on in this example, but the key points are:

– I agree with Borsboom that personality and social psychology have weak theory.

– I also agree that when researchers have weak theory, they can rely on statistics.

– I disagree with the claim that “the role of statistics grows as soon as the presence of scientific theory wanes”: highly theoretical fields can also rely heavily on statistics.

– I agree with Francis that preregistration doesn’t have much value in the absence of strong theory and measurement.

Comparing preregistration to fake-data simulation

You might wonder about this last statement of above, given that I published a preregistered study of my own on a topic where we had zero theory. And, more generally, I recommend that, before gathering new data, we start any research project with fake-data simulation, which is a sort of preregistered hypothesis, or model, of the underlying process as well as the data-generating mechanism. I think the value of preregistration, or fake-data simulation, is that it clarifies our research plans and it clarifies our model of the world. Here I’m thinking of “model” as a statistical model or generative model of data, not as a theoretical model that would, for example, explain the direction and size of an effect.

I think it’s worth further exploring this distinction between statistical generative model and scientific theoretical model, but let’s set aside this for now.

There is, of course, another reason that’s given for preregistration, which is to avoid forking paths and thus allow p-values to have their nominal interpretations. That’s all fine but it’s not so important to me, as I’m not usually computing p-values in the first place. I like the analogy between preregistration, random sampling, and controlled experimentation.

To put it another way, preregistration is fine but it doesn’t solve your problem if studies are sloppy, variation is high, and effects are small. That said, fake-data simulation can be helpful in (a) making a study less sloppy, and (b) giving us a sense of variation is high compared to effect sizes. What’s needed here, though, is what might be called quantitative preregistration: the point is not to lay down a marker about prior belief in the direction of an effect, but to make quantitative assumptions about effect sizes and variation.

Comparing psychology to other sciences

Unfortunately, in recent years some political scientists have been following the lead of psychology and have been making grand claims from small experiments plus zero theory (even theory that would appear to contradict the experimental data).

On the other hand, we draw all sorts of generalizations from, say 16 presidential elections, and it’s not like we have a lot of theory there. I mean, sure, voters react to economic performance. But that’s not quite a theory, right?

I asked Borsboom what he thought, and he wrote:

On the other hand, the culture of psychology has changed for the better and has done so rather quickly (especially given the general inertia of culture). People are really a lot more open about everything and I would say the field is in a much better state now than say ten years ago.

Yet even after all those years it strikes me that psychology’s reaction to the whole crisis has basically been the same as it’s always been: gather more data. Of course: data are now gathered in a massive multicenter open reproducible data settings and analyzed with bayesian statistics, model averaging, and multiverses. But data is data is data, it doesn’t magically turn into good scientific theory if you just gather enough of it.

I sometimes think that psychology should just stop gathering data for a year or two. Just organize the data we have gathered over the past century and think deeply about how to explain the reliable patterns we have observed in these data. Or perhaps 10% of the scientists could do that. Rather than 100% of the people chasing new data 100% of the time.

He also points to this article by Paul Smaldino, Better methods can’t make up for mediocre theory.

34 thoughts on “The value (or lack of value) of preregistration in the absence of scientific theory

  1. Borsboom is a psychometrician, right? I wonder why he apparently doesn’t think psychometrics has a strong theoretical foundation. I suspect it’s because of the idea that a strong theory allows you to intervene in a system and accurately predict the outcomes. If I’m remembering correctly, Borsboom has a paper arguing for a definition of validity based on this same idea (i.e., a measure is valid just insofar as it enables causal intervention and manipulation).

    Psychometrics is one of the more useful areas in psychology, in that it allows us to (kinda-sorta) predict some outcomes based on observed/measured characteristics of people, but it’s not clear to me that it allows for intervention and associated predictions. You can’t really intervene and manipulate people’s psychometric characteristics, and you would need psychometric-external or psychometric-adjacent (i.e., non-psychometric) theory to make predictions about how interventions might interact with these characteristics, right?

    • Noah writes: “Psychometrics is one of the more useful areas in psychology, in that it allows us to (kinda-sorta) predict some outcomes based on observed/measured characteristics of people, but it’s not clear to me that it allows for intervention and associated predictions.”

      How can this be true? How can you have prediction without intervention? Are you using “prediction” as a synonym of “correlation”? I think of prediction as “when I do x, y happens.” That entails intervention? What psychometrics predictions are their that don’t involve interventions?

      • Anon:

        Predicting the results of an intervention is a special case of prediction. If I can predict next year’s weather patterns, for example, that’s valuable, even if I can’t intervene and change the weather.

  2. I agree with what Andrew says about pre-registration. Pre-registration drives me crazy. As a journal editor, I don’t look at pre-registration documents and don’t care much about what someone “predicted”. Part of my concern is philosophical and is summarized in this blog post: http://judgmentmisguided.blogspot.com/2018/05/prediction-accommodation-and-pre.html.

    My other concern is that pre-registration tends to inhibit authors from looking at their data. Of course it doesn’t prohibit that, but it does inhibit it. I see this happening.

    • What do you suppose the mechanism is by which pre-registration discourages authors from doing exploratory analyses? That it causes them to undervalue exploratory results? That they fear others will value it less?

      Seems like in either case, they would be strongly incentivized to use exploratory results from existing data to form pre-registerable hypotheses.

      Or are they afraid they will find some pattern in the data that contradicts their pre-registered findings? If so, not pre-registering wouldn’t prevent getting contradictory results, it would just allow them to omit one set of results from the article without criticism. In that case, the problem isn’t the influence of pre-registration, it’s the unwillingness to fully report results.

      That’s all I can think of. Other ideas?

    • Jon,
      I read your essay but it seems to be tangential to the topic you are tackling. You wrote:

      “Should your belief in the existence of the amplification effect depend on whether I predicted it or not (all else being equal)?”

      Yes. I don’t believe your finding yet because it fell out of an experiment that could not possibly have been optimized to prove or disprove it. You were expecting something else and presumably built your model based upon what you expected. If you set up an experiment specifically to prove or disprove amplification, would it look exactly like the experiment in which you found it?

      When you make a specific prediction about the outcome of an experiment, a reviewer or colleague can parse your work to see if you really challenged your prediction, or whether you tweaked the methodology to give yourself a better chance of success.

      This goes back to the old hypothesis/theory meme, which I believe has died a quiet death. Your “amplification” becomes a hypothesis, and now you set up an entirely new experiment focused on that. In this experiment, you make reasonable attempts to disprove the amplification, controlling more factors that directly relate to it. Clear this hurdle and you have yourself a theory.

    • If pre-registration inhibits people from doing exploratory analyses, more fool them. We explore regardless, but we clearly separate the planned vs. exploratory analyses. Frankly I am surprised by what the debate is all about for planned experiments. My suspicion is that a lot of the backlash is the contrarian spirit of academics kicking in (“whatever you say, I’ll say the opposite”).

      Here is an example of a pre-registered + exploratory analysis (the latter is where most of the energy was spent). See: https://psyarxiv.com/2atrh/

      As a journal editor, you are of course free to do whatever you like. And maybe as an editor you are not expected to have the time to take a close look; that’s the job of the reviewers. As a researcher, if I’m really into a topic, and someone publishes a pre-registered study, the mere fact of the time-stamped pre-reg adds a lot of value to the work in my mind.

  3. I agree with Noah above that psychometrics is one of the most useful and successful domains in psychology, probably the closest you can get to what Borsboom calls “psychological engineering”. But I’d say that to the extent that it has strong theoretical foundations, the theory is about how latent factors manifest in measurement, that is, in theories of how propensities, abilities, etc. are translated into test scores, response times, etc. And of course, that enables you to do the inverse trick of inferring the structure of those latent propensities, abilities, etc. from the measurements you have.

    But I think Borsboom is (in my view, correctly) thinking more about the fact that most of psychology makes little contact with theories of those underlying propensities, abilities, etc. And to the extent that we have such theories from cognitive and developmental psychology (e.g., how experience in a domain translates to expertise in that domain) they are often limited to fairly obscure experimental setups, so they are limited in their ability to make predictions in less constrained but more “useful” settings.

    That said, physics still can’t give an easy answer to where a baseball will land in a real baseball game, largely because of imcomplete information about all the relevant factors (quirks on the surface of the bat, uneven mass distrbution in the ball, air turbulence, etc.) that we would control in an experiment.

    Anyway, while I agree that statistics and theory are not de jure anti-correlated, I think Borsboom is pointing to the de facto state of many fields in psychology where statistically-derived “effects” or “factors” have indeed taken the place of theory. Merely labeling something as an “effect” or “factor” is treated as an explanation.

    • “while I agree that statistics and theory are not de jure anti-correlated, I think Borsboom is pointing to the de facto state of many fields in psychology where statistically-derived “effects” or “factors” have indeed taken the place of theory. Merely labeling something as an “effect” or “factor” is treated as an explanation.”

      +1

  4. How interesting to see two of my favorite methods people engage in a discussion about the heart of psychological research (my field), albeit asynchronously, with a time lag of 6 yesrs (“First he said…then 6 years later the other one said…”). Keep the good discussions coming!

    I clearly agree with Borsboom on the trade-off between stats and theory in personality/social psychology, though (which should NOT be interpreted as a prejudice against stats as mindless, number-crunching machinery, only against those who use it in this way). When I was a young assistant professor at the University of Michigan in the early 2000s, I was a member of two tribes: the social/personality area and the biopsychology area, with the latter being strongly shaped by animal-model folks who had very clear-cut theories about how manipulations of certain parts and states of the brain would alter behavioral outcomes (Kent Berridge, Terri Lee, Jill Becker, Terry Robinson and others). The people in my field of origin (personality/social psychology) were big on weaving nomological nets with lots of sophisticated structural-equation models trying to see meaningful patterns in the cornucopia of substantial correlations that come with a heavy reliance on self-report measures. The people in biopsychology, on the other hand, rarely used more than a simple ANOVA or t-test. Back then, I realized that a major reason for this cultural difference was the fact that biopsychologists had clear-cut theories and derived from those rigorous experimental designs that helped to quickly separate the wheat from the chaff when it came to the validity of some predictions. In personality/social psychology — and particularly in the field of personality research — this is not the case. Although personality psychologists sometimes engage in experimental research of the personality trait x situational opportunity type, they typically don’t see how their research subject — personality traits or dispositions — could itself be amenable to experimental interventions. (It can, as David McClelland, Phebe Cramer, and others have convincingly demonstrated — but that’s another story.) Hence they try to ascertain the validity of their constructs with heavy statistical lifting after the data have been collected, not by clever research design before anything is measured. Hence I came to the conclusion back then that in personality/social psychology, “the level of sophistication of statistical analyses is typically inversely related to the substance and sophistication of the research they are applied to” (quoted from an email I sent on Oct 30 2002). It’s interesting to observe that apparently I wasn’t the only one to arrive at that conclusion.

    • Oliver C. Schultheiss:

      I thinks it is noteworthy that Paul Barrett, who studied under Hans Eysenck — another experimentalist — has come to similar conclusions about psychometrics and measurement in the area of personality psychology.

        • Curious:

          Certainly not “experiemntalism”, if you mean by that a blind groping for whatever factor shows a significant difference. But certainly experimental tests of principled theories (however stringent or loose) and the potential of falsification that comes with that. In motivation science, Clark Hull’s idea that motivation is about the termination of an intrinsically unpleasant drive state was abandoned after several different experiments clearly showed that getting to a reward does not shut off neurons involved in motivational processes (quite the opposite) and that animals will work even for partial (but not full) rewards that were supposed to be inadequate to terminate the drive state. This important insight led to the incentive-motivation models that are the fundament of modern approaches to devising and validating measures of motivational dispositions in humans. I wouldn’t characterize this as experimentalism (i.e., a instance of Feynman’s cargo-cult science) but as a variant of normal science and scientific progress.

        • Oliver C. Schultheiss:

          I take your point about the distinction between approaches to experiments and interpretations within a given discipline. My concern is with the unwarranted and overly broad generalizations that often occur once the findings leave the laboratory and enter the private sector.

          Also, I have been out of the field for quite some time and haven’t kept up with the newer literature on dispositional motivation — Can you recommend a paper or study that you believe captures the current zeitgeist?

  5. ….I should also add that there ARE pockets of psychology — outside of psychometrics — where predictions made from clear-cut theories are more than just a fuzzy guess. Computational modeling is a tool frequently used by cognitive psychologists. It is typically used to model, describe, and predict “local” processes and phenomena (such as allocation or shielding of attention, visual processing, etcetera) with numerical precision. The modeled predictions are then typically compared to data from actual experiments and their fit can be evaluated. That’s a pretty sophisticated use of both theory and statistics, and we clearly need more of this in psychology, not just in some very specific subfields that have cultivated this approach.

  6. I would humbly argue that there are ‘psychological engineers’; they’re human factors and user experience researchers. The government title for them is actually ‘engineering research psychologist’. They apply theories of how the mind works to make more usable interfaces, smoother procedures/processes, etc.

      • Are they successful in their goals?
        For a fun read– The Design of Everyday Things, well read the book, not the wiki.

        One of the many reasons planes, even Boeing planes, do not crash more often is engineering research psychology or ergonomics.

        Another good read is Gerry Wilde’s Target risk: dealing with the danger of death, disease and damage in everyday decisions ISBN 978-0-9699124-0-8 , also available in pdf https://is.muni.cz/el/1423/podzim2016/PSY540/um/64998189/64998284/targetrisk3_1.pdf

        I think that Borsboom ignores the entire fields of learning theory and applied behavioural analysis, which the last time I looked at them had strong theory and tools.

        As Oliver C. Schultheiss points out above:

        biopsychologists had clear-cut theories and derived from those rigorous experimental designs that helped to quickly separate the wheat from the chaff when it came to the validity of some predictions.

        Come to think of it, the crisis in reproducibility in psychology seems something of a misnomer. It is a crisis in social psychology and some adjacent areas—perhaps some areas of clinical or educational psychology among others?

        Social psych is “sexy” and gets attention even when it is completely flaky[1]. I do not think I have seen much hoopla or press releases about “massed” vs “distributed” practice or Fixed versus Variable reinforcement schedules.

        1. Some areas of Social Psychology are pretty well grounded and very useful though I do not think you will see the equivalent of Newton’s equations.

  7. Noah writes: “Psychometrics is one of the more useful areas in psychology, in that it allows us to (kinda-sorta) predict some outcomes based on observed/measured characteristics of people, but it’s not clear to me that it allows for intervention and associated predictions.”

    How can this be true? How can you have prediction without intervention? Are you using “prediction” as a synonym of “correlation”? I think of prediction as “when I do x, y happens.” That entails intervention? What psychometrics predictions are their that don’t involve interventions?

    • In psychometrics, it would be predictions of outcomes based on measured/estimated characteristics of people. For example, you might predict second language proficiency based on scores from a battery of cognitive tests. In this case, you’re not intervening or doing anything to cause one person’s test score(s) to be lower or higher, but you can still make predictions about how well they will do learning a second language.

      • I would not describe prediction (at least in the statistical sense) as “when I do x, y happens,” but rather, “Doing x increases the probability that y will happen.”
        I would also consider, “y is more likely to happen when x is the case than when x is not the case” as prediction.

        • In Carlos’ formulation, “I see x” can represent “I see such and such amount of this drug administered” and “y happens” can be “the patient being 5x as likely to get better happens”.

          But no matter how we operationalize “x” and “y” there is still a ontological difference between having observing “x” and doing “x” as the antecedent. No matter what counterfactual reasoning we apply to an observational relationship it is (in my humble difference) very different than an experimental intervention.

        • Just to be clear: I completely agree that those things are different, I just meant that we can “predict” different kinds of things (with or without intervention).

  8. > That’s how they put a man on the moon and that’s how they make the computer screen you’re now looking at. It’s all done in theory before it’s done for real

    Well I don’t think that’s true. Lots of things are done for real before they’re done in theory. For instance, sharp rocks and fire are like the basic caveman tools, but I’m a 21st century mechanical engineer and wouldn’t be able to explain them much beyond “that’s sharp” and “that’s releasing chemical energy”. It’s more than nothing but not great.

    And the idea of the theory seems a bit exaggerated too. Like, good luck working out on a piece of paper why when I press on my keyboard stuff appears on the screen.

    I think the differences between these fields are not the differences he’s pointing out. We can build complex things like monitors because that’s a problem we can break down into well-contained pieces.

    • Along the same lines, so much of our technology uses natural constants or other information that could only come from experimentation. Einstein only dropped his theory that the universe is static after it was empirically shown that the universe is expanding. The theory of particle-wave duality came from explaining the results of experiments–it’s too crazy an idea to be a totally a priori theory.

  9. I think your example of why preregistration doesn’t matter illustrates why it matters. Had those authors not preregistered their hypothesis, do we really think they would’ve so readily reported that they’d reversed their hypothesis?

    And even if (as I believe) p-values can be ignored, it is very useful to know the extent to which the researcher was doing exploratory vs confirmatory analysis. An editor could use that information to assess whether the author’s stated degree of confidence in a study’s results and implications is warranted, a situation editors and reviewers should at least push back against.

    I fully agree that the benefits of preregistration get over-sold sometimes as a silver bullet, but we shouldn’t respond by under-selling the actual benefits. I mean, we don’t want *less* information about how a study was planned, right? And telling people prior to the study keeps us from falling prey to our faulty memories and unacknowledged biases.

    • Personally, I think the pre-registration goes beyond “oversold” and rises to the level of “useless as the proverbial on a bull”. I have a hard time caring which specious theory is the “pre-registered” and which the “reversal” in many of these situations.

  10. > I think the value of preregistration, or fake-data simulation, is that it clarifies our research plans and it clarifies our model of the world.

    Pre-registration is even better: it clarifies your research plans and your model *to* the world.

  11. We cannot arrive at any coordinating ideas in any science without the ability to make observations free of theoretical support. So the question becomes: what constitutes a legitimate observation? There’s only one way to find out which observations are legitimate: test them and retest them.

    If there’s one thing that’s clear at this point, NHST isn’t normally a useful tool for confirming the legitimacy of an observation. But the only way to show that is to keep testing NHST-related claims with replication. When people start to fear publishing NHST-related claims for worry that they’ll be discredited, that will be the marker of progress. Who was it that said science proceeds one funeral at a time? :)

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *