This came up in comments to Jessica’s recent post.
I like preregistration. It’s not something I used to do, and I still don’t always do it. I’ve worked on hundreds of research projects, and only a few of them had had any preregistration at all.
That said, I think preregistration has value, and I’m doing it more and more.
The reason I like preregistration has nothing at all to do with hypothesis tests or p-values or p-hacking or questionable research practices or anything like that.
I like preregistration for two reasons.
1. For me, preregistration implies constructing a hypothetical world–not a “null hypothesis” of no effect, but a possible world corresponding to what I’m actually aiming to study–and then simulating fake data and proposing and trying out analysis methods on those simulated data. I find this sort of commitment–the effort of laying out a complete generative model for the process–to be helpful. Thinking about effect sizes and their variation, all sorts of things, also seeing if the proposed analysis can recover parameters of interest from the simulated data, which is what’s often called power analysis although I prefer the more general term “design analysis.”
2. When other people preregister, that can be useful because then we can see discrepancies between the original plan and what actually got reported. Two examples are here and here–in both those cases, discrepancies between the preregistration and the final paper gave us doubts about the published claims. When these changes happen, it is not a moral failure on anyone’s part–we can learn from data!–it’s just relevant for understanding the theories being promulgated in these papers.
I agree that preregistration is not necessary for good science. I still think it can be a useful tool, both my own workflow in developing scientific hypotheses and gathering data to understand them, and in communication of workflow to others.
Preregistration has a valuable indirect function of making it more difficult to do bad science. It does not directly turn bad science into good science. That doesn’t make preregistration a bad idea–recently I’ve been preregistering studies and, more generally, simulating data before gathering any data–; we should just be aware that this sort of procedural step can only one small part of the story. Ultimately, science is about the substance of science, not just about the scientific method.
There’s something interesting here, though, that links the two perspectives. If you do things right, your preregistration will involve the substance of what you’re studying and will not merely be a procedural step, a form of paperwork that exists to validate the p-values that your study will produce. Rather, doing this preregistration will require simulating fake data, which in turn will require hypothesizing a full model of the underlying process.
I recognize that what I just described is not the usual thing that is meant by “preregistration,” which is more along the lines of: “We will perform this comparison and use a 2-sided test,” etc. But it could be! I think this is a useful connection.
P.S. As discussed in comments, a more precise term for what I’m recommending is fake-data simulation or simulated-data experimentation. I use the term “preregistration” above in order to connect with the many people in the science-reform movement who use that term.
Preregistration is like trying to bring down the crime rate by locking people in their houses. It wasn’t needed before, but works I suppose.
Anon:
As discussed in the above post, the reason I like preregistration has nothing to do with the crime rate or fraud or anything like that. I like preregistration for science for the same reason that architects like to build scale models before designing the building. Preregistration helps me form my expectations, allows me to make more reasonable design choices, and makes it easier for me to be surprised in useful ways.
Maybe “pre-thinking” is a better word.
Anon:
The term I prefer is fake-data simulation or simulated-data experimentation.
I used “preregistration” in the above post in order to connect with the many people in the science-reform movement who use that term.
Curious what your thoughts are on approaches beyond scientific method – eg strong inference, etc. https://www.science.org/doi/10.1126/science.146.3642.347
Vineet:
It’s interesting to read that article from 1964. One funny thing is that one of the examples he gives of “fields of science [that] are moving forward very much faster than others” is . . . high-energy physics! I guess that was the case sixty years ago, not so much in recent decades, though.
Overall, I think I’m in agreement with the author’s take on good science or what he calls “strong inference”; I just think he’s a bit glib in thinking that all you need is “their systematic application.” It’s a king-of-the-world attitude that physicists had after WW2 and biologists had after the discovery of the DNA structure (and this guy was a professor of biophysics!) and that economists had in the early 2000s (as discussed here, for example). When you’re the king of the world it’s easy to imagine you’re doing everything right, and everything could work just fine for everybody else if they just followed your simple life lessons.
Dating myself, I suppose, but Platt was required reading in both experimental design and science philosophy for me, and had a lasting impression. The strange takeaway for me now, in retrospect, is related to why I read this blog. All my training was at Ag schools, where Fisher ruled and frankly, NHST seems to work pretty well when a rep is a pot trial. Funny how testing complex forestry/ecosystem level field experiments tends to have low signal and high noise, and (true) replication is a real challenge.
Thanks for the response! Definitely not as “slam dunk” as Platt conveys, but I remember finding that paper late in my PhD and feeling so validated that “there had to be other ways”. I think what you’re outlining in the main blog post resonates with me – except “preregistration” in my work was more (a) explicit mathematical models of what the clinicians were expecting and (b) git commits to github that couldn’t be modified. Akin to engineers doing explicit blueprints, then tracking modifications – imho it’s aligned with your definition of preregistration, though quite different than that of the clinical/science reform folks around me.
That “strong inference” paper had a detrimental effect on community ecology in the 1980s, when it was discovered and held up as a model of scientific method. It led to a vogue for a particularly inapt form of null hypothesis significance testing. The conceptual null hypothesis adopted– that species are randomly distributed across environments and geographic space– was known not to be true, and in the absence of any model of what that hypothesis might actually mean, the “null hypothesis” was implemented by permutation of observed species distributions, which, of course, were subject to all sorts of non-random effects which could not be made to go away by permutation. This is not to say that the practices of community ecologists pre-1980 couldn’t be critiqued and strengthened, but that Platt’s influence was not helpful.
+1. When I was in undergrad and starting grad, null models were still generating lots of attention and debate. I went into ecosystem ecology more than community so have lost touch with that literature but I recall finding it a frustrating framework…
Platt should have provided more examples and elaboration, that’s all. People misinterpreted the paper.
Huge progress has been made in making models of protein folding by “his” method.
The elephant in the room is how you measure success/progress.
He leaves that out of his paper. Eg, there is no discussion of predictive skill or performing impressive feats.
So the end result is stringing together a bunch of NHST conclusions, which is not reliable in the least. In fact it is one of the most efficient misinformation-generating mechanisms known to humankind.
Here is a discussion from 2002 about where his approach leads:
https://pubmed.ncbi.nlm.nih.gov/12242150/
I’ll add that personally all my efforts to integrate molecular bio results into some kind of dynamic, quantitative model ended up requiring impossibilities. Eg, the entire surface of a cell devoted to one receptor, or volume to a single enzyme.
There is no sanity-checking required for the static and qualitative “signalling pathways” everyone thinks of, so wrong ideas can easily persist for decades.
So my personal view of molecular bio is rather dim.
There is never any reason for the surface of a cell to be devoted to one receptor; see e.g. here and the work by Berg that I note.
I didn’t think there was.
Anyway, in that case (as far as I could figure) what actually happened was that neurons require laminin to grow in a dish. And the substance stopping the growth formed a coat over that laminin, thus blocking it. There was such a perfect MW vs inhibition curve that it must be some non-specific artifact. This empirically fit curve also predicted future results.
So the putative receptors had nothing to do with those results. In fact the “receptors” seemed to be for generic “intrinsicly disordered protiens” or maybe aggregates like amyloids rather than anything specific anyway.
Point is that if you try to integrate all these supposed molecular bio “facts” into a quantitative model the accepted narrative quickly falls apart.
Raghu –
The link to the video of the presentations didn’t work for me.
Of course, electrical engineering design by muntzing is somewhat like the biologist approach to fixing a radio described in that paper. See https://en.wikipedia.org/wiki/Muntzing and the references there.
Bob76
Oh my, my grandfather had a Muntz TV. It has been years since I have seen any reference to them. And yes, in a big city, they really only worked in big cities.
Doesn’t seem like he tried to figure out what the components he removed actually did though.
We never had a Muntz TV, we just had some regular brand-name B&W model. At one point when my dad was laid off from GE, he thought he could make some money by repairing people’s TV’s. One time he was working on someone’s color TV. I was really excited and asked if we could watch it. He said, no, it needs a new picture tube.
That pairs well with Miguel Hernan’s point that imagining an RCT helps to clarify what causal question is being asked even if the RCT isn’t feasible.
Michael:
Not to take anything away from Hernan here . . . just wanna say that the idea of imagining a controlled experiment, even when it is not possible, is a longstanding idea in statistics, much predating Hernan (or me)!
“For me, preregistration implies constructing a hypothetical world–not a “null hypothesis” of no effect, but a possible world corresponding to what I’m actually aiming to study–and then simulating fake data and proposing and trying out analysis methods on those simulated data.”
Sounds like what you’re really after is blind analysis, where data is obscured or ‘faked’ to prevent bias but still allow analysis (unlike preregistration where analysis is done when data doesn’t yet exist). See for example: https://www.nature.com/articles/526187a
But I think I’d push you more to ask for a more fundamental/basic reason for why you think working on simulated data is “helpful”. Same point applies to your second reason for liking preregistration: “discrepancies between the preregistration and the final paper gave us doubts about the published claims”
Why do discrepancies give reason to doubt published claims? Why is blinding yourself to the real data helpful? After specifying these higher-order motivations, I wonder if you’re actually in disagreement with preregistration proponents who are motivated by a distaste for p-hacking, contrary to what seems to be suggested.
Milktea:
No, I don’t want to do “blind analysis.” I want to construct a hypothetical world and simulate fake data before collecting any real data. When I collect the real data, sure, I’ll first do the analysis that I’d done in the preregistration, but then I’ll continue to do lots of other things. As I’ve written elsewhere, preregistration is a floor, not a ceiling.
The reason I’m doing fake-data simulation is not to reduce bias; it’s to more clearly explore my model of the world before going out and collecting data, and it’s a reaction to past experiences where I’ve just gone and collected data and then not been sure what to do with the data I’d gathered.
To answer your question about why discrepancies give reason to doubt published claims, I refer you to the examples here and here.