Science reform can get so personal

This is Jessica. Lately I’ve been thinking a lot about philosophy of science, motivated by both a longtime interest in methodological reform in the social sciences and a more recent interest in proposed ethics problems and reforms in computer science. The observation I want to share is not intended to support any particular stance, but just to note how personal these topics can be and how what at first glance seem like trivial decisions can bring up questions about who you think you are as a scientist and how you think empirical research works. 

I’ll take a couple examples related to methods reform. One which is related to Andrew’s post the other day about how statisticians choose their methods (sometimes by convention or convenience) is doing Bayesian statistical analysis. Some of the research I do involves running controlled experiments, and I’ve always gravitated toward Bayesian philosophy despite being taught statistics by Frequentists. So shortly after becoming faculty I made a more concerted effort to use it. Since then in my lab we tend to default to it (and it helps that my close collaborator Matt Kay has done a lot of it than me so we can look to him for advice when needed). 

But, sometimes we’re analyzing some data from an experiment that uses a relatively simple, say between subjects design, where we don’t really have useful prior information going into it. So specifying a Bayesian model seems nice for reasons like interval interpretation but otherwise not very consequential.  We end up with Bayesian models that essentially produce the same thing you’d get with the Frequentist version. Which is fine, of course, but in cases like this it strikes me as maybe more honest to use Frequentist stats, since that’s better understood in my field. That way we’re not running the risk of implying there’s some big added value of being Bayesian in this case, to others or even to ourselves.  

I guess the premise I’m assuming is that if you don’t expect everyone to bother thinking about whether your model choice was well motivated or not, but you do expect people to pay attention to who is using what methods, then you may be signaling things through your choices that influence how other people think about what good science means. I dislike this signaling or heuristic aspect, because it’s counter to properties I associate more strongly with good science, like being honest about why one is doing something and being skeptical of any method presented as a panacea. 

Another example is preregistration. For the last five or so years I’ve been preregistering most of the experiments I do. I think I even had the first preregistered experiment ever at some visualization venues. I never really questioned doing this too much, since preregistration is associated with transparency and I think transparency is good, especially if it prevents authors from exploiting degrees of freedom in an analysis until they get the result they want. Also, most of my papers have lead authors who are Ph.D. students, as is common in computer science, and preregistration has been very useful as a forcing function for them and me for making sure we think about and agree ahead of time about the modeling approach and exactly what comparisons we want to make. 

But sometimes I find myself thinking about what sorts of values preregistration implies, and feel a bit conflicted about it, again because of the conflict between what might get signaled and my own values when it comes to science.

For example, we use a pattern on these projects where we design an experiment, collect pilot data, then use what we see to simulate fake data to figure out how much we should collect to learn from the comparisons we do and the models we think we’ll use. Preregistration is easy in that it simply involves writing down everything we’ve planned. And of course we can deviate from it if necessary. 

If we didn’t do it, we might delude ourselves into thinking we are being honest and actually tweaking things in our favor. So there’s an implied value about having a forcing function to keep us honest, as well as transparency being correlated with good science. 

But post-tenure especially I find myself increasingly distrustful of experimental work in my field. I would like to think that these days, I have no reasons not to be honest and I can trust that my judgment on whether a result is valid is not deluding me. So then when I preregister, I feel like I’m admitting to myself I need external forces to keep me from being devious and I can’t make ethical decisions without needing to be policed. There’s something unpleasant about signaling that to oneself, whether or not it’s true, or believing it about human nature in general.

I could instead tell myself that I can make decisions without preregistration, and undoubtedly many others can too, but I’m doing it to signal to others that preregistration is important because I believe there is an overall benefit for the field if we do it. Treating it as signalling seems realistic given that preregistering doesn’t actually hold you to anything, it’s a gesture toward transparency. 

But if preregistration is about signaling the value of transparency, then maybe I should consider other things that it can signal. For instance, related to the premise above, I’m pretty certain that some people in my field who haven’t followed methods reform closely but recognize certain terms see the word preregistration or Bayesian model and think, ‘oh that’s a good sign’ when reviewing or deciding who to pay attention to. Which is probably smart in the sense that preregistering and using Bayesian models may be correlated with paying closer attention to possible threats to the validity of empirical claims you are presenting. But this is just another heuristic, when heuristics have in many ways been part of what’s led us so astray in empirical science.

On a personal level, I think skepticism about any easy solution to fixing science is part of the solution, and I can’t help but care about how I am or am not contributing to better science. So I wish there was a way to signal that I choose methods because I think they can help, not that I think they are necessary in any way. Maybe if what I value is being honest and transparency about where I stand with science reform, I should also be honest about why I’m preregistering or using Bayesian stats. I could say in the paper, “We preregistered to impart a sense of honesty.” Or, “We preregistered because while we can’t say much about how important it is for good science, we think it is useful to get more people taking transparency seriously.” When we use Bayesian methods but don’t think they add anything special, we can say that or report how close the results are to estimates from the equivalent non-Bayesian model. Nobody really seems to be doing much of this, but maybe it’s not such a crazy idea. It requires considering your stance on the methods you use, which I like, and folds some reflection on science reform into papers that aren’t really about that. 

None of this is meant to bash preregistration or Bayesian stats of course. I’ve learned a lot by doing both and have undoubtedly improved my process. My bigger point is that science reform is complex and can provoke personal reflection on values. I think this is a good thing, even if it can seem hard sometimes to be thoughtful and honest about how few answers we have within the usual constraints.

44 thoughts on “Science reform can get so personal

  1. Jessica, let me first say that I appreciate most of your postings a lot. Many of them deal with topics that I also find very interesting.

    Regarding Bayesian statistics: I’m personally not doing Bayesian statistics a lot, and despite not being against it, I think a lot of claims are made about a potential superiority of Bayesian statistics that do not hold water, for example this one: “a Bayesian model seems nice for reasons like interval interpretation”. I tend to say that in Bayesian statistics the posterior inherits meaning from the prior, meaning that you only have a “nicely interpretable” posterior if your prior makes good sense, otherwise not. If you don’t have prior information that can be translated into a prior in a well justified way, I don’t think anything is won by using a Bayesian approach at all. Realising that a strikingly large proportion of publications using Bayesian statistics comes with no or only very weak justification of the prior, I don’t think there’s any good rewason “signalling” that Bayesian statistics is the “thing to do” at all, obviously not denying the value of a good Bayesian analysis where prior information is indeed put to good use (I admire Andrew’s discussions of priors on this blog, so I’m not talking of anyone in the room of course;-).

    Regarding pre-registration: “So then when I preregister, I feel like I’m admitting to myself I need external forces to keep me from being devious and I can’t make ethical decisions without needing to be policed.” This is something I don’t quite get. I don’t think you’d worry about the existence of police because it implies that you cannot be trusted? If you value transparency, the thing about preregistration is that it is a transparent act itself, and that should be enough to motivate you doing it, shouldn’t it? It’s about documentation, not discipline, I’d say.

    • very nice position, Christian, with which i agree and share.

      i am considered a bayesian because it is the ‘school of thought’ to which my mentors belong.

      however what you’ve mentioned is key: bayesian statistics shouldn’t offer any advantage unless the prior is actually informative. in my case, it just so happened that i bore a gift in terms of not being conscious of its importance and much of my “downstream” analysis worked out due to “chance” ( ;) ).

      at the risk of being excoriated due to my lack of “fundamental” calisthenics (being exiled from my field and all– hey it happens!): the no free lunch theorem, loosely speaking, would support the allegation that a bayesian method should perform no better than a frequentist analogue, unless there is something specific that should improve its performance (a prior).

      and i agree with your position on documentation. a publication is only as useful as the documentation of the process. sometimes it is hard to propose a procedure without being able to ‘explore’. if you are enslaved to preregistration, then this process may prevent useful insights that are gleaned through this exploration. the obvious risk, of course, is overfitting. however, if documented properly, and accepted after peer review, it is fair to expect the audience to assess the presence of that risk.

      nice post bud.

      • > bayesian statistics shouldn’t offer any advantage unless the prior is actually informative

        Frequentist confidence intervals can misbehave and do ugly things like covering only impossible values (say the interval contains only negative values for a parameter that can only be positive).

        Bayesian credible regions cannot contain impossible values by construction. One could argue this is one advantage.

        (Of course if the prior is wrong a Bayesian method can also give bad answers. And if the model is bad enough any statistical method is doomed and discarding all the data and picking numbers out of a hat may give more accurate answers.)

        • > bayesian statistics shouldn’t offer any advantage unless the prior is actually informative

          Another way to say this is “Bayesian statistics always offer an advantage, very occasionally the advantage is quite small”

        • Daniel:

          Bayesian statistics doesn’t always offer an advantage. Sometimes the Bayesian inference can be worse than the non-Bayesian alternative. When this occurs, it’s from a problem with a model. But people fit models with problems all the time.

        • > Sometimes the Bayesian inference can be worse than the non-Bayesian alternative.

          I wonder if there is an example where every possible Bayesian inference is worse than some non-Bayesian alternative.

          “Using a blueprint to build a house doesn’t always offer an advantage. Sometimes the blueprint-guided construction can be worse than building and raising walls randomly. For example when the blueprint does not include doors and windows.”

        • “I wonder if there is an example where every possible Bayesian inference is worse than some non-Bayesian alternative.”

          That can’t happen unless you somehow restrict the space of “possible” Bayesian inferences, because if you include literally every possible Bayesian inference then that includes the maximally lucky case where the prior is concentrated completely on the best possible parameter value.

        • The example may include randomness in the true value of the parameter. One argument in favour of Bayesian probabilistic reasoning is that if the prior corresponds to that randomness the inference will be optimal.

          And maybe for any “non-Bayesian alternative” one can find a corresponding prior in a Bayesian setting so you have something equivalent but better?

          Or maybe not. But there would be no compelling reason not to be a Bayesian if the best arguments against the superiority of Bayesian inference are of the form “performing abdominal surgery with a scalpel can be worse than doing so using a hunting knife if you use the scalpel to sever the aorta”.

        • “I wonder if there is an example where every possible Bayesian inference is worse than some non-Bayesian alternative.”
          That there exists a possible Bayesian inference that is better doesn’t mean that the practitioner will find it. Existence is a weak statement.

        • By the way, what does it even mean to say that Bayesian inference X is “better” than non-Bayesian inference Y? Frequentist probability statements have a different meaning than Bayesian ones (actually there are various valid interpretations for the Bayesian ones alone), so on what scale can you compare their “quality”?

        • on what scale can you compare their “quality”?

          Predictive skill. Or really predictive skill normalized to resources required. I use confidence intervals all the time, just because for many cases it is going to give near the same answer anyway much quicker.

        • > on what scale can you compare their “quality”?

          To the extent that the meaning of frequentist inference includes anything about the actual world (and not just about an idealized sequence of comparable events) it can lead to some predictions/decisions/actions that can be compared with those from Bayesian inference. (A comparison for a single realization may still not be meaninful, of course. A broken clock does also give the hour perfectly if we check only at the “right” time.)

          If frequentist inference has no practical relevance there is no need for a quality comparison, really.

        • Carlos: That’s fair enough. However, unless we’re talking about situations in which the only interest is in predicting future observations that can reasonably be assumed to come from the same data generating process (regarding which I’m with Anoneuoid), such a comparison can be very complex, because decisions are not usually made based on the outcome of an isolated statistical analysis alone, and we have to ask also, what have we learnt from or analysis, what ideas may be inspired by this, how is it used by non-experts etc. Even in situations in which prediction of future observations is of major interest, situations in which the future is in some respect different from the past on which we ran our analyses happen more often than not, so additional input and expertise is always required.

          Surely looking at isolated Bayesian and frequentist methods and say “this inference is better than that” normally requires very strong idealisation and simplification, every aspect of which is open to criticism.

        • There’s also the fact that randomness plays a role in determining which method works best in any particular case. For almost every (reasonable) method, there are data sets on which the method works better than the alternatives and data sets on which it does worse. At least that has been my experience.

        • I think we can only compare Frequentist vs Bayesian analysis in this way, if we hold the data model constant. That is, both models agree on how the data comes about, and the only difference is between choice of prior. Otherwise, of course we’re always worse off when we do a Bayesian analysis on a terrible data model than if we do a Frequentist analysis on an acceptable data model….

          So assuming the data model is held constant,

          In practice, all likelihood based analysis is Bayesian. The issue is that in practice all analysis utilizes IEEE double floating points, which have a support on something like -1e300 to 1e300. This corresponds to the proper prior uniform(-a,a) for a the endpoints of the interval. Sure the floats aren’t uniform on that interval, but the roundoff rule makes each one represent a particular interval.

          In this sense the Bayesian model is virtually always better because no one chooses units of measure such that 1e298 is the answer to anything and that’s the only kind of case where “uniform on the floats” winds up winning out.

    • Christian said, “If you value transparency, the thing about preregistration is that it is a transparent act itself, and that should be enough to motivate you doing it, shouldn’t it? It’s about documentation, not discipline, I’d say.”

      I’d say it’s about both documentation and discipline.

    • Thanks Christian. I wasn’t trying to imply that I think Bayesian stats are necessarily superior (I’m not a statistician but if I were asked point blank to make a strong case for this point I would definitely back down). It’s more that I like the philosophy and have found that getting more familiar with Bayesian methods has improved my understanding in ways I appreciate, and I guess I think it might help others as well.

      On preregistration – perhaps what I’m trying to express is hard to say without sounding contradictory or anti-preregistration, which isn’t my intention at all. I’m all for documentation and transparency and occasional audits like Keith mentions below. What I’m reflecting on is something like: wouldn’t it be nice to think that in an ideal world, we as scientists could follow good practices like preplanning what we’ll do and being mindful of any ways in which we might mislead ourselves, without having to register it because people could trust that we were careful about these things? I guess I find it hard sometimes not to think about methodological reforms “in the limit” and so the post is about acknowledging that in that sense, it can seem to conflict with what I want to believe. I.e., that scientists can be capable of not cheating and can be trusted without a public registry (since if one buys into preregistration then it is hard to think about it as soemthing that is opt-in rather than a default policy we all should be following). I totally acknowledge that this ideal world where it’s not necessary may never be possible and pre-registration fills an important gap, so I’m not saying I or anyone else should stop doing it. Just that as an idealist these thoughts cross my mind.

      • In another ideal world science would not be about “trust” at all. At the moment it’s to some extent about trust, and probably both your world allowing for complete trust and another parallel dream world in which everything is routinely documented and published so that everyone including outsiders can check for themselves at all times what is going on are worlds that require a different kind of creature.

        I guess my point is that my view of science is somehow at odds with your “ideal world” view that allows scientists to just trust each other. I don’t think this is what science is about.

        • I don’t think its so much as about scientists just trusting each other on everything they say as it is just trusting that the methods have been used as intended without having to provide proof. But yes, this is just one type of ideal and one could motivate other ideals that require the opposite.

    • I am someone who always fits Bayesian models no matter what. I think there are two important points that Christian didn’t mention.

      First, a frequentist analysis (meaning using p values to decide whether an effect is “reliable”) is completely nonsensical when power is low, and power is usually low. So what’s even the point?

      Second, a fact that is quite stunning is that even when a researcher has spent 40 years studying a single problem, they analyze the next experiment they run as if they know absolutely nothing about the problem. It’s just absurd to use frequentist methods when one has such prior knowledge, but this is the mainstream activity.

      • There’s much more to frequentist analysis than computing a p-value in a study with low power.

        The other point is more complex. Separating the analysis of new data from knowledge that is there before does not mean that this knowledge is ultimately ignored. Again, as I always argue, it depends on what exactly is going on in all detail. I’m not sure how many “researchers that spend 40 years studying a single problem” you’ll find but I may agree with you when it comes to what they do. However normally problems are not “the same” but related in complex and not fully understood ways, and it is well legitimate to say that we use the already existing knowledge for a careful interpretation of what is known together with the analysis of the new data that we do separately rather than lumping the knowledge together in some (potentially very controversial) way to feed it into the analysis of the new data.

  2. > gravitated toward Bayesian philosophy
    I believe that is fine, as long as it doesn’t prevent you from acknowledging in some cases a Bayesian analysis will meet Frequency requirements (or almost) nor disparage that. That is approximately 95% coverage of the credible interval. In line with Christian’s comments, without a prior that does not make sense, you can’t hope to get something better than meeting Frequency requirements. (Exactly what makes sense is complicated). When I worked in clinical research, use of Bayes was not “allowed” except for things like trial planning and cost/benefit analysis. But I often did Bayes to help understand the Frequentist method that needed to be used.

    Regarding pre-registration: this is primarily lessening the need for others “to take your word for it” and putting pressure on other’s to lessen their requests to others “to take my word for it”.

    Now, often when I worked in groups, I would proposed random checks by others of the subset of work done. When they agreed, I always found it easier to be more careful. Should I not always be careful? We are human. In line with this, I do think random audits of research submitted for publication should be subject to random audits – say 5%. I think it would make for better research without adding much stress – once researcher get used to it.

  3. Jessica, it’s unclear to me whether your preregistered plans tend to be like, “We will conduct a pilot and data simulation, with the following designs and decision criteria, to determine sample size…,” or more like “Our planned sample size of N is informed by a pilot and data simulation that used the following designs and decision criteria….” Can you clarify?

    My interest is not entirely related to the topic. I’ve long been interested in the role pilot studies may play in the causal model. I suspect that at least some of the infamously improbable results that fail to replicate are artifacts of forking paths in the analysis, but in the design process. Anecdotally, theoretically-important mediators and moderators are often included or excluded from the ultimate experiment based on pilot performance without researchers ever realizing–or acknowledging–the implicit limitations on generalizability. That’s my answer to Andrew’s piranha problem: researchers are carefully transferring individual fish into their own bowls to conduct experiments, but interpreting the results as if they took place in the tank with the other fish.

    Don’t get me wrong–I’m not saying you should be preregistering pilots if you’re not. Pilot studies are very useful for weeding out bad ideas or unworkable methodologies, so sticking to an a priori plan is next to impossible.

    • Interesting, I think you’re on to something with forking paths in the design process. I haven’t really felt a strong urge to do experiments lately, other than to maybe evaluate models for potentially explaining human judgments or ask more open ended questions about how people perform with some system. But when we have run them its more like the second case you list – “Our planned sample size of N is informed by a pilot and data simulation that used the following designs and decision criteria….” The typical process is usually 1) set up a task environment to put people in that we think should produce measurements that allow us to ask whatever our question is, 2) pilot this, usually already with some idea of how we’ll analyze things, but still very open to the fact that we’ve overlooked important factors or sources of noise in the measurement process that might make it hard to answer our question 3) tweak things as needed and possibly repeat piloting to reduce whatever we perceive as messing up our measurements, confirm that we think the results will be interesting if we run the full thing/determine how much data to collect through simulation and reasoning about what’s a practically meaningful effect 4) preregister 5) do the experiment and follow the preregistered plan and if we analyze anything differently report it as exploratory in the paper.

      What bugs me about this whole process is that its very easy in this kind of process to tweak the design of the process/environment until you see what you want. My experimental research has generally involved asking questions about how well some intervention works (visualization, interactive technique etc), so often there is a hypothesis about what we expect to see. All of the decisions about how we define and design the task give tons of degrees of freedom to nudge the results if we want to. So the thought that our results are seen as trustworthy because they were preregistered before the real data collection bugs me because it overlooks the large amount of control we have in setting up the environment. So yeah, “researchers are carefully transferring individual fish into their own bowls to conduct experiments, but interpreting the results as if they took place in the tank with the other fish.” — I agree with that and its part of why I’m sensitive about the thought of signaling that these other measures are somehow guarding against bad science.

  4. > I could instead tell myself that I can make decisions without preregistration, and undoubtedly many others can too

    The preregistration thing seems like it would help with being organized.

    Like I get lost in the weeds all the time on stuff I’m working on. It can be very convenient to come back and remember there was a higher level plan.

  5. There are so many things I’d like to say here but I want to focus on the last part. This made me think about the value of “candor” and how little of it we have in research/science. Countless times I felt like I should have written something way more candidly in my papers and then I ended up toning it down. For instance, even in more engineering oriented research I am always always tempted to spend way ore time explaining the many ways the thing we built is limited and does not work yet. In fact, I’d be happy to conjecture we have much more to learn from knowing when and why something does not work than what it does work!

    But, that is, candor, does not seem to rank high in the list of things we do when we write papers or communicate about our work. But maybe there’s a reason for that? Part of me is wondering if in the end we would all be way more candid then our manuscripts would be a total mess? I don’t know … I am still mulling over these things … In any case thanks so much Jessica. That was really thought provoking!

    • I agree about candor but offer that there has long been an alternative to it: repeatedly checking previous work from every different angle.

      There just isn’t nearly enough of this. Everyone wants one study = actionable result. There are studies that produce that. The problem is we often don’t find out which ones until we repeatedly test the results.

  6. About the explaining why Bayesian methods were used instead of Frequentist methods when they could produce similar findings, I am of the mind that it doesn’t need justification. I feel that often times when justifying using Bayesian methods, it is signalling that there should be a special reason to use Bayesian methods over Frequentist methods- that Frequentist methods should be the default. I think that perhaps normalizing using Bayesian methods might implore scientists to familiarize themselves with those methods in addition to Frequentist methods, so they can understand studies that utilize either approach. Or perhaps that if it were to be required to justify using Bayesian methods over Frequentist methods in a study, then it should also be justified why Frequentist methods were chosen over Bayesian methods. Just some thoughts I had after reading your post.

    • I agree with you, we shouldn’t have to motivate it. What I’m commenting on is more a possibility that when most researchers doing empirical research in a field are doing Frequentist analysis (like mine), then its more noticeable when people diverge from it and I don’t really want to be contributing to beliefs that Bayesian methods were somehow clearly the right choice and added a bunch of value in cases where they really weren’t/didn’t.

      • Welcome. Are you a new writer on this blog? “Noticeable when people diverge from it” is putting it mildly. Communication is important. We have generations of applied scientists and engineers with stat methods courses who take it literally. I am working with engineers on an experiment now with 3 simple comparisons. I plan on introducing Bayesian analysis because 1) I don’t believe the errors are normal and a prior will help and 2) The n is small and our previous experiments seemed underpowered. One of my colleagues had a paper published called “Breaking the Bayesian Ice..” so communicating a new idea is important. That said for much of the work I do with low variance data frequentist techniques for estimation work quite well.

    • I’d actually agree with this in principle. The statistician doesn’t need to motivate using a Bayesian method more than a frequentist one. However, using a Bayesian method the statistician needs to motivate the prior, and with a frequentist method there is none.

      • I need to have some evidence base; or barring that I need to have some proxy for an evidence base; which is fine. I have to be interested in a particular question. I can only be interested in that question if I recall various situations in which an effect (or absence of it) was interesting or surprising; or if I have in mind various situations in which I would like very much to probe; to see whether an intervention is useful or not; and to what degree. But in every instance unless I live in the basement on another planet, in some place where clocks and semi-rigid objects and standards of measurement are not what they appear to be to us here — then in every interest, I have in mind a reference class of some sort. And what I go looking to do as an experimentalist is a course of action always conditioned on some prior *experience* — either mine or someone else’s. If I am not delusional, my beliefs are conditioned on the projection of *experiences* into my mental record-keeping system. Rational beliefs if you will are summaries of *experiences*. Conditioning probabilities on belief is *not* different — from conditioning them on reference classes of events .in. .the. .world.

        • I’m not sure whether I’m interpreting you correctly here, but you seem to want to make a general case for Bayesian analysis for involving that existing evidence base. To which I say: It all relies on how well you get your evidence based formalised in a way compatible with the Bayesian approach. Sure there is background knowledge but this doesn’t usually come in form of a prior distribution, so it needs to be “translated”. This is a highly nontrivial task that can go wrong; Bayesians know about a number of pitfalls where making some “naive” translation decisions can actually push the analysis to go a certain wrong way without the modeller realising it. It’s surely a good thing when done well, but that’s a strong condition. “Objective”/informationless/”default” priors are hugely popular for a reason.

          I grant you though that choosing a frequentist model that is properly in line with what is known is not a piece of cake either.

        • > It all relies on how well you get your evidence based formalised in a way compatible with the Bayesian approach.

          The Frequentist approach solves the issue by ignoring that prior knowledgde altogether.

          Frequentism 1 – Bayesianism 0, I guess.

        • My addition crossed your reply. You said a couple of comments back that “using a Bayesian method the statistician needs to motivate the prior, and with a frequentist method there is none”. Maybe your point was that fundamentally Bayesian methods are better because they take prior information into account but Frequentist methods are better in practice because they don’t. Having worse results accepted may be better that having better results questioned.

        • I don’t think that in any general sense Bayesian or frequentist methods are better. It depends on the situation; particularly to what extent the available information is reliable and can be convincingly translated into a prior. It also depends on the aim of modeling and what kind of statements one would want to make. There are also situations in which one prefers the “message from the data” alone because interests are at stake and the available knowledge is interpreted in very different ways by stakeholders. I’m a pluralist really. I do not believe that “fundamentally” Bayesian methods are better or worse. If “fundamentally” is meant to refer to an ideal world that is essentially different from the world we have, I’ve got to admit I’m not very interested in what’s “fundamentally” better, and neither am I interested much in the supposed “existence” of a superior Bayesian analysis if it isn’t clear how in a practical situation the user can find it. On the other hand of course it is not true that frequentist models are more “objective” just because they don’t require a prior; for sure a frequentist has to think hard about available prior information as well, and this has implications for frequentist analysis.

          This blog is predominantly Bayesian, so I tend to be more frequentist here than in some other discussions, just to balance things.

        • To be fair, your last line suggest it may be 0-0. As you say elsewhere every aspect of everything is open to criticism but I’d say the frequentist approach is more open to criticism because it lacks a clear foundation. The apparent simplicity of the Frequentist approach is not really an advantage relative to the Bayesian approach if it hides lots of implicit assumptions and ad-hoc choices.

        • If I do a Bayesian analysis and my beliefs and assumptions are brought to bear rationally — and so my analysis is intended to be persuasive at all, and perhaps a spur to consensus and perhaps to action — then my assumptions and the evidence for those assumptions, on which my analysis is conditioned, would best be made explicit; or as explicit as possible. It may be difficult to do so. But *if* in fact I want to condition my analysis on my beliefs being such-and-such (and code that into a PDF), *then*, I should think it imperative that I explain *why* my beliefs are such-and-such. And “why” means: I exhibit the evidence for my belief.

  7. Interesting post. In my lab, we have been doing pre-registration for a while now. What I have learnt from this exercise is the following:

    – We don’t spend enough time thinking about the pre-registration text. We always discover that we messed up after we have collected the data. In one extreme case, we didn’t even bother to simulate the data from the model whose predictions we were studying, and tried to guess what the model would produce as data (incorrectly, it turns out). Pretty embarrassing. I think this failure happened due to overconfidence coupled with time pressure. I think one needs special training to write a good pre-registration.

    – We rarely or never manage to find what we predicted. There’s always something new. It’s like we’re chasing a dream. I don’t really know yet what this means for my field, but it’s worrying.

    • Same here, there’s always something we kind of regret. I’m not sure I want to be good at the art of writing preregistrations though. It’s hard to say what that skill aligns with.

      We also don’t always find what we predicted. Though somehow I never feel very surprised anymore. Maybe we’ve done enough of it that our baseline expectation is that things will be a bit different than what we expected from pilot results. In grad school I was also taught that a good experiment should be interesting regardless of whether you see what you want or not so we spend a lot of time thinking through the design to try to achieve that.

Leave a Reply to phdummy Cancel reply

Your email address will not be published. Required fields are marked *