Welcome. Are you a new writer on this blog? “Noticeable when people diverge from it” is putting it mildly. Communication is important. We have generations of applied scientists and engineers with stat methods courses who take it literally. I am working with engineers on an experiment now with 3 simple comparisons. I plan on introducing Bayesian analysis because 1) I don’t believe the errors are normal and a prior will help and 2) The n is small and our previous experiments seemed underpowered. One of my colleagues had a paper published called “Breaking the Bayesian Ice..” so communicating a new idea is important. That said for much of the work I do with low variance data frequentist techniques for estimation work quite well.

]]>There’s much more to frequentist analysis than computing a p-value in a study with low power.

The other point is more complex. Separating the analysis of new data from knowledge that is there before does not mean that this knowledge is ultimately ignored. Again, as I always argue, it depends on what exactly is going on in all detail. I’m not sure how many “researchers that spend 40 years studying a single problem” you’ll find but I may agree with you when it comes to what they do. However normally problems are not “the same” but related in complex and not fully understood ways, and it is well legitimate to say that we use the already existing knowledge for a careful interpretation of what is known together with the analysis of the new data that we do separately rather than lumping the knowledge together in some (potentially very controversial) way to feed it into the analysis of the new data.

]]>+1

]]>I think we can only compare Frequentist vs Bayesian analysis in this way, if we hold the data model constant. That is, both models agree on how the data comes about, and the only difference is between choice of prior. Otherwise, of course we’re always worse off when we do a Bayesian analysis on a terrible data model than if we do a Frequentist analysis on an acceptable data model….

So assuming the data model is held constant,

In practice, all likelihood based analysis is Bayesian. The issue is that in practice all analysis utilizes IEEE double floating points, which have a support on something like -1e300 to 1e300. This corresponds to the proper prior uniform(-a,a) for a the endpoints of the interval. Sure the floats aren’t uniform on that interval, but the roundoff rule makes each one represent a particular interval.

In this sense the Bayesian model is virtually always better because no one chooses units of measure such that 1e298 is the answer to anything and that’s the only kind of case where “uniform on the floats” winds up winning out.

]]>If I do a Bayesian analysis and my beliefs and assumptions are brought to bear rationally — and so my analysis is intended to be persuasive at all, and perhaps a spur to consensus and perhaps to action — then my assumptions and the evidence for those assumptions, on which my analysis is conditioned, would best be made explicit; or as explicit as possible. It may be difficult to do so. But *if* in fact I want to condition my analysis on my beliefs being such-and-such (and code that into a PDF), *then*, I should think it imperative that I explain *why* my beliefs are such-and-such. And “why” means: I exhibit the evidence for my belief.

]]>Same here, there’s always something we kind of regret. I’m not sure I want to be good at the art of writing preregistrations though. It’s hard to say what that skill aligns with.

We also don’t always find what we predicted. Though somehow I never feel very surprised anymore. Maybe we’ve done enough of it that our baseline expectation is that things will be a bit different than what we expected from pilot results. In grad school I was also taught that a good experiment should be interesting regardless of whether you see what you want or not so we spend a lot of time thinking through the design to try to achieve that.

]]>I don’t think that in any general sense Bayesian or frequentist methods are better. It depends on the situation; particularly to what extent the available information is reliable and can be convincingly translated into a prior. It also depends on the aim of modeling and what kind of statements one would want to make. There are also situations in which one prefers the “message from the data” alone because interests are at stake and the available knowledge is interpreted in very different ways by stakeholders. I’m a pluralist really. I do not believe that “fundamentally” Bayesian methods are better or worse. If “fundamentally” is meant to refer to an ideal world that is essentially different from the world we have, I’ve got to admit I’m not very interested in what’s “fundamentally” better, and neither am I interested much in the supposed “existence” of a superior Bayesian analysis if it isn’t clear how in a practical situation the user can find it. On the other hand of course it is not true that frequentist models are more “objective” just because they don’t require a prior; for sure a frequentist has to think hard about available prior information as well, and this has implications for frequentist analysis.

This blog is predominantly Bayesian, so I tend to be more frequentist here than in some other discussions, just to balance things.

]]>My addition crossed your reply. You said a couple of comments back that “using a Bayesian method the statistician needs to motivate the prior, and with a frequentist method there is none”. Maybe your point was that fundamentally Bayesian methods are better because they take prior information into account but Frequentist methods are better in practice because they don’t. Having worse results accepted may be better that having better results questioned.

]]>To be fair, your last line suggest it may be 0-0. As you say elsewhere every aspect of everything is open to criticism but I’d say the frequentist approach is more open to criticism because it lacks a clear foundation. The apparent simplicity of the Frequentist approach is not really an advantage relative to the Bayesian approach if it hides lots of implicit assumptions and ad-hoc choices.

]]>Do you read my postings? Have I said such a thing?

]]>> It all relies on how well you get your evidence based formalised in a way compatible with the Bayesian approach.

The Frequentist approach solves the issue by ignoring that prior knowledgde altogether.

Frequentism 1 – Bayesianism 0, I guess.

]]>I’m not sure whether I’m interpreting you correctly here, but you seem to want to make a general case for Bayesian analysis for involving that existing evidence base. To which I say: It all relies on how well you get your evidence based formalised in a way compatible with the Bayesian approach. Sure there is background knowledge but this doesn’t usually come in form of a prior distribution, so it needs to be “translated”. This is a highly nontrivial task that can go wrong; Bayesians know about a number of pitfalls where making some “naive” translation decisions can actually push the analysis to go a certain wrong way without the modeller realising it. It’s surely a good thing when done well, but that’s a strong condition. “Objective”/informationless/”default” priors are hugely popular for a reason.

I grant you though that choosing a frequentist model that is properly in line with what is known is not a piece of cake either.

]]>Carlos: That’s fair enough. However, unless we’re talking about situations in which the only interest is in predicting future observations that can reasonably be assumed to come from the same data generating process (regarding which I’m with Anoneuoid), such a comparison can be very complex, because decisions are not usually made based on the outcome of an isolated statistical analysis alone, and we have to ask also, what have we learnt from or analysis, what ideas may be inspired by this, how is it used by non-experts etc. Even in situations in which prediction of future observations is of major interest, situations in which the future is in some respect different from the past on which we ran our analyses happen more often than not, so additional input and expertise is always required.

Surely looking at isolated Bayesian and frequentist methods and say “this inference is better than that” normally requires very strong idealisation and simplification, every aspect of which is open to criticism.

]]>What I like about this post is that it shows very clearly how preregistration is a tool to learn something.

]]>– We don’t spend enough time thinking about the pre-registration text. We always discover that we messed up after we have collected the data. In one extreme case, we didn’t even bother to simulate the data from the model whose predictions we were studying, and tried to guess what the model would produce as data (incorrectly, it turns out). Pretty embarrassing. I think this failure happened due to overconfidence coupled with time pressure. I think one needs special training to write a good pre-registration.

– We rarely or never manage to find what we predicted. There’s always something new. It’s like we’re chasing a dream. I don’t really know yet what this means for my field, but it’s worrying.

]]>I am someone who always fits Bayesian models no matter what. I think there are two important points that Christian didn’t mention.

First, a frequentist analysis (meaning using p values to decide whether an effect is “reliable”) is completely nonsensical when power is low, and power is usually low. So what’s even the point?

Second, a fact that is quite stunning is that even when a researcher has spent 40 years studying a single problem, they analyze the next experiment they run as if they know absolutely nothing about the problem. It’s just absurd to use frequentist methods when one has such prior knowledge, but this is the mainstream activity.

]]>> on what scale can you compare their “quality”?

To the extent that the meaning of frequentist inference includes anything about the actual world (and not just about an idealized sequence of comparable events) it can lead to some predictions/decisions/actions that can be compared with those from Bayesian inference. (A comparison for a single realization may still not be meaninful, of course. A broken clock does also give the hour perfectly if we check only at the “right” time.)

If frequentist inference has no practical relevance there is no need for a quality comparison, really.

]]>I need to have some evidence base; or barring that I need to have some proxy for an evidence base; which is fine. I have to be interested in a particular question. I can only be interested in that question if I recall various situations in which an effect (or absence of it) was interesting or surprising; or if I have in mind various situations in which I would like very much to probe; to see whether an intervention is useful or not; and to what degree. But in every instance unless I live in the basement on another planet, in some place where clocks and semi-rigid objects and standards of measurement are not what they appear to be to us here — then in every interest, I have in mind a reference class of some sort. And what I go looking to do as an experimentalist is a course of action always conditioned on some prior *experience* — either mine or someone else’s. If I am not delusional, my beliefs are conditioned on the projection of *experiences* into my mental record-keeping system. Rational beliefs if you will are summaries of *experiences*. Conditioning probabilities on belief is *not* different — from conditioning them on reference classes of events .in. .the. .world.

]]>on what scale can you compare their “quality”?

Predictive skill. Or really predictive skill normalized to resources required. I use confidence intervals all the time, just because for many cases it is going to give near the same answer anyway much quicker.

]]>I’d actually agree with this in principle. The statistician doesn’t need to motivate using a Bayesian method more than a frequentist one. However, using a Bayesian method the statistician needs to motivate the prior, and with a frequentist method there is none.

]]>By the way, what does it even mean to say that Bayesian inference X is “better” than non-Bayesian inference Y? Frequentist probability statements have a different meaning than Bayesian ones (actually there are various valid interpretations for the Bayesian ones alone), so on what scale can you compare their “quality”?

]]>“I wonder if there is an example where every possible Bayesian inference is worse than some non-Bayesian alternative.”

That there exists a possible Bayesian inference that is better doesn’t mean that the practitioner will find it. Existence is a weak statement.

I agree with you, we shouldn’t have to motivate it. What I’m commenting on is more a possibility that when most researchers doing empirical research in a field are doing Frequentist analysis (like mine), then its more noticeable when people diverge from it and I don’t really want to be contributing to beliefs that Bayesian methods were somehow clearly the right choice and added a bunch of value in cases where they really weren’t/didn’t.

]]>I agree about candor but offer that there has long been an alternative to it: repeatedly checking previous work from every different angle.

There just isn’t nearly enough of this. Everyone wants one study = actionable result. There are studies that produce that. The problem is we often don’t find out which ones until we repeatedly test the results.

]]>But, that is, candor, does not seem to rank high in the list of things we do when we write papers or communicate about our work. But maybe there’s a reason for that? Part of me is wondering if in the end we would all be way more candid then our manuscripts would be a total mess? I don’t know … I am still mulling over these things … In any case thanks so much Jessica. That was really thought provoking!

]]>The preregistration thing seems like it would help with being organized.

Like I get lost in the weeds all the time on stuff I’m working on. It can be very convenient to come back and remember there was a higher level plan.

]]>The example may include randomness in the true value of the parameter. One argument in favour of Bayesian probabilistic reasoning is that if the prior corresponds to that randomness the inference will be optimal.

And maybe for any “non-Bayesian alternative” one can find a corresponding prior in a Bayesian setting so you have something equivalent but better?

Or maybe not. But there would be no compelling reason not to be a Bayesian if the best arguments against the superiority of Bayesian inference are of the form “performing abdominal surgery with a scalpel can be worse than doing so using a hunting knife if you use the scalpel to sever the aorta”.

]]>There’s also the fact that randomness plays a role in determining which method works best in any particular case. For almost every (reasonable) method, there are data sets on which the method works better than the alternatives and data sets on which it does worse. At least that has been my experience.

]]>“I wonder if there is an example where every possible Bayesian inference is worse than some non-Bayesian alternative.”

That can’t happen unless you somehow restrict the space of “possible” Bayesian inferences, because if you include literally every possible Bayesian inference then that includes the maximally lucky case where the prior is concentrated completely on the best possible parameter value.

]]>> Sometimes the Bayesian inference can be worse than the non-Bayesian alternative.

I wonder if there is an example where every possible Bayesian inference is worse than some non-Bayesian alternative.

“Using a blueprint to build a house doesn’t always offer an advantage. Sometimes the blueprint-guided construction can be worse than building and raising walls randomly. For example when the blueprint does not include doors and windows.”

]]>Interesting, I think you’re on to something with forking paths in the design process. I haven’t really felt a strong urge to do experiments lately, other than to maybe evaluate models for potentially explaining human judgments or ask more open ended questions about how people perform with some system. But when we have run them its more like the second case you list – “Our planned sample size of N is informed by a pilot and data simulation that used the following designs and decision criteria….” The typical process is usually 1) set up a task environment to put people in that we think should produce measurements that allow us to ask whatever our question is, 2) pilot this, usually already with some idea of how we’ll analyze things, but still very open to the fact that we’ve overlooked important factors or sources of noise in the measurement process that might make it hard to answer our question 3) tweak things as needed and possibly repeat piloting to reduce whatever we perceive as messing up our measurements, confirm that we think the results will be interesting if we run the full thing/determine how much data to collect through simulation and reasoning about what’s a practically meaningful effect 4) preregister 5) do the experiment and follow the preregistered plan and if we analyze anything differently report it as exploratory in the paper.

What bugs me about this whole process is that its very easy in this kind of process to tweak the design of the process/environment until you see what you want. My experimental research has generally involved asking questions about how well some intervention works (visualization, interactive technique etc), so often there is a hypothesis about what we expect to see. All of the decisions about how we define and design the task give tons of degrees of freedom to nudge the results if we want to. So the thought that our results are seen as trustworthy because they were preregistered before the real data collection bugs me because it overlooks the large amount of control we have in setting up the environment. So yeah, “researchers are carefully transferring individual fish into their own bowls to conduct experiments, but interpreting the results as if they took place in the tank with the other fish.” — I agree with that and its part of why I’m sensitive about the thought of signaling that these other measures are somehow guarding against bad science.

]]>Daniel:

Bayesian statistics doesn’t always offer an advantage. Sometimes the Bayesian inference can be worse than the non-Bayesian alternative. When this occurs, it’s from a problem with a model. But people fit models with problems all the time.

]]>My interest is not entirely related to the topic. I’ve long been interested in the role pilot studies may play in the causal model. I suspect that at least some of the infamously improbable results that fail to replicate are artifacts of forking paths in the analysis, but in the design process. Anecdotally, theoretically-important mediators and moderators are often included or excluded from the ultimate experiment based on pilot performance without researchers ever realizing–or acknowledging–the implicit limitations on generalizability. That’s my answer to Andrew’s piranha problem: researchers are carefully transferring individual fish into their own bowls to conduct experiments, but interpreting the results as if they took place in the tank with the other fish.

Don’t get me wrong–I’m not saying you should be preregistering pilots if you’re not. Pilot studies are very useful for weeding out bad ideas or unworkable methodologies, so sticking to an a priori plan is next to impossible.

]]>> bayesian statistics shouldn’t offer any advantage unless the prior is actually informative

Another way to say this is “Bayesian statistics always offer an advantage, very occasionally the advantage is quite small”

]]>I don’t think its so much as about scientists just trusting each other on everything they say as it is just trusting that the methods have been used as intended without having to provide proof. But yes, this is just one type of ideal and one could motivate other ideals that require the opposite.

]]>> bayesian statistics shouldn’t offer any advantage unless the prior is actually informative

Frequentist confidence intervals can misbehave and do ugly things like covering only impossible values (say the interval contains only negative values for a parameter that can only be positive).

Bayesian credible regions cannot contain impossible values by construction. One could argue this is one advantage.

(Of course if the prior is wrong a Bayesian method can also give bad answers. And if the model is bad enough any statistical method is doomed and discarding all the data and picking numbers out of a hat may give more accurate answers.)

]]>In another ideal world science would not be about “trust” at all. At the moment it’s to some extent about trust, and probably both your world allowing for complete trust and another parallel dream world in which everything is routinely documented and published so that everyone including outsiders can check for themselves at all times what is going on are worlds that require a different kind of creature.

I guess my point is that my view of science is somehow at odds with your “ideal world” view that allows scientists to just trust each other. I don’t think this is what science is about.

]]>“bayesian statistics shouldn’t offer any advantage unless the prior is actually informative”

I would be careful saying this around Bayesians!

]]>Thanks Christian. I wasn’t trying to imply that I think Bayesian stats are necessarily superior (I’m not a statistician but if I were asked point blank to make a strong case for this point I would definitely back down). It’s more that I like the philosophy and have found that getting more familiar with Bayesian methods has improved my understanding in ways I appreciate, and I guess I think it might help others as well.

On preregistration – perhaps what I’m trying to express is hard to say without sounding contradictory or anti-preregistration, which isn’t my intention at all. I’m all for documentation and transparency and occasional audits like Keith mentions below. What I’m reflecting on is something like: wouldn’t it be nice to think that in an ideal world, we as scientists could follow good practices like preplanning what we’ll do and being mindful of any ways in which we might mislead ourselves, without having to register it because people could trust that we were careful about these things? I guess I find it hard sometimes not to think about methodological reforms “in the limit” and so the post is about acknowledging that in that sense, it can seem to conflict with what I want to believe. I.e., that scientists can be capable of not cheating and can be trusted without a public registry (since if one buys into preregistration then it is hard to think about it as soemthing that is opt-in rather than a default policy we all should be following). I totally acknowledge that this ideal world where it’s not necessary may never be possible and pre-registration fills an important gap, so I’m not saying I or anyone else should stop doing it. Just that as an idealist these thoughts cross my mind.

]]>Christian said, “If you value transparency, the thing about preregistration is that it is a transparent act itself, and that should be enough to motivate you doing it, shouldn’t it? It’s about documentation, not discipline, I’d say.”

I’d say it’s about both documentation and discipline.

]]>very nice position, Christian, with which i agree and share.

i am considered a bayesian because it is the ‘school of thought’ to which my mentors belong.

however what you’ve mentioned is key: bayesian statistics shouldn’t offer any advantage unless the prior is actually informative. in my case, it just so happened that i bore a gift in terms of not being conscious of its importance and much of my “downstream” analysis worked out due to “chance” ( ;) ).

at the risk of being excoriated due to my lack of “fundamental” calisthenics (being exiled from my field and all– hey it happens!): the no free lunch theorem, loosely speaking, would support the allegation that a bayesian method should perform no better than a frequentist analogue, unless there is something specific that should improve its performance (a prior).

and i agree with your position on documentation. a publication is only as useful as the documentation of the process. sometimes it is hard to propose a procedure without being able to ‘explore’. if you are enslaved to preregistration, then this process may prevent useful insights that are gleaned through this exploration. the obvious risk, of course, is overfitting. however, if documented properly, and accepted after peer review, it is fair to expect the audience to assess the presence of that risk.

nice post bud.

]]>I believe that is fine, as long as it doesn’t prevent you from acknowledging in some cases a Bayesian analysis will meet Frequency requirements (or almost) nor disparage that. That is approximately 95% coverage of the credible interval. In line with Christian’s comments, without a prior that does not make sense, you can’t hope to get something better than meeting Frequency requirements. (Exactly what makes sense is complicated). When I worked in clinical research, use of Bayes was not “allowed” except for things like trial planning and cost/benefit analysis. But I often did Bayes to help understand the Frequentist method that needed to be used.

Regarding pre-registration: this is primarily lessening the need for others “to take your word for it” and putting pressure on other’s to lessen their requests to others “to take my word for it”.

Now, often when I worked in groups, I would proposed random checks by others of the subset of work done. When they agreed, I always found it easier to be more careful. Should I not always be careful? We are human. In line with this, I do think random audits of research submitted for publication should be subject to random audits – say 5%. I think it would make for better research without adding much stress – once researcher get used to it.

]]>Regarding Bayesian statistics: I’m personally not doing Bayesian statistics a lot, and despite not being against it, I think a lot of claims are made about a potential superiority of Bayesian statistics that do not hold water, for example this one: “a Bayesian model seems nice for reasons like interval interpretation”. I tend to say that in Bayesian statistics the posterior inherits meaning from the prior, meaning that you only have a “nicely interpretable” posterior if your prior makes good sense, otherwise not. If you don’t have prior information that can be translated into a prior in a well justified way, I don’t think anything is won by using a Bayesian approach at all. Realising that a strikingly large proportion of publications using Bayesian statistics comes with no or only very weak justification of the prior, I don’t think there’s any good rewason “signalling” that Bayesian statistics is the “thing to do” at all, obviously not denying the value of a good Bayesian analysis where prior information is indeed put to good use (I admire Andrew’s discussions of priors on this blog, so I’m not talking of anyone in the room of course;-).

Regarding pre-registration: “So then when I preregister, I feel like I’m admitting to myself I need external forces to keep me from being devious and I can’t make ethical decisions without needing to be policed.” This is something I don’t quite get. I don’t think you’d worry about the existence of police because it implies that you cannot be trusted? If you value transparency, the thing about preregistration is that it is a transparent act itself, and that should be enough to motivate you doing it, shouldn’t it? It’s about documentation, not discipline, I’d say.

]]>