“My basic question is do we really need data to be analysed by both methods?”

Posted on September 1, 2024 9:10 AM by Andrew

Ram Bajpai writes:

I’m an early career researcher in medical statistics with keen interest in meta-analysis (including Bayesian meta-analysis) and prognostic modeling. I’m conducting a methodological systematic review of Bayesian meta-analysis in the biomedical research. After reading these studies, many authors presented both Bayesian and classical results together and comparing them and usually say both methods provide similar results (trying to validate). However, being a statistician, I don’t see any point of analysing data from both techniques as these are two different philosophies and either one is sufficient if well planned and executed. Consider me no Bayesian expert, I seek your guidance on this issue. My basic question is do we really need data to be analysed by both methods?

My quick answer is that I think more in terms of methods than philosophies. Often a classical method is interpretable as a Bayesian method with a certain prior. This sort of insight can be useful. From the other direction, the frequency properties of a Bayesian method can be evaluated as if it were a classical procedure.

This also reminds me of a discussion I had yesterday with Aaditya Ramdas at CMU. Ramdas has done a lot of theoretical work on null hypothesis significance testing; I’ve done lots of applied and methodological work using Bayesian inference. Ramdas expressed the view that it is a bad thing that there are deep philosophical divisions in statistical regarding how to approach even the simplest problems. I replied that I didn’t see a deep philosophical divide between Bayesian inference and classical significance testing. To me, the differences are in what assumptions we are willing to swallow.

My take on statistical philosophy is that all statistical methods require assumptions that are almost always clearly false. Hypothesis testing is all about assumptions that something is exactly zero, which does not make sense in any problem I’ve studied. If you bring this up with people who work on or use hypothesis testing, they’ll say something along the lines of, Yeah, yeah, sure, I know, but it’s a reasonable approximation and we can alter the assumption when we need to. Bayesian inference relies assumptions such as normality and logistic curves. If you bring this up with people who work on or use Bayesian inference, they’ll say something along the lines of, Yeah, yeah, sure, I know, but it’s a reasonable approximation and we can alter the assumption when we need to. To me, what appears to be different philosophies are more like different sorts of assumptions that people are comfortable with. It’s not just a “matter of taste”—different methods work better for different problems, and, as Rob Kass says, the methods that you use will, and should, be influenced by the problems you work on—I just think it makes more sense to focus on differences in methods and assumptions rather than frame as incommensurable philosophies. I do think philosophical understanding, and misunderstanding, can make a difference in applied work—see section 7 of my paper with Shalizi.

38 thoughts on ““My basic question is do we really need data to be analysed by both methods?””

to on September 1, 2024 9:44 AM at 9:44 am said:

It seems to me that if you had some reliable data about some subject and then you acquired some more, you would know how to combine them.

If you didn’t have that first data set but you knew something about it so you could construct some reasonable approximation, you would still combine the old (but constructed rather than measured) data set and the new in the same way. The uncertainties would be larger than before, but that effect could be calculated or estimated.

The second situation seems to me to be the essence of Bayesian methods. Not that different, really.

Reply ↓
Tom Passin on September 1, 2024 9:44 AM at 9:44 am said:

It seems to me that if you had some reliable data about some subject and then you acquired some more, you would know how to combine them.

If you didn’t have that first data set but you knew something about it so you could construct some reasonable approximation, you would still combine the old (but constructed rather than measured) data set and the new in the same way. The uncertainties would be larger than before, but that effect could be calculated or estimated.

The second situation seems to me to be the essence of Bayesian methods. Not that different, really.

Reply ↓
- Daniel Lakeland on September 1, 2024 10:32 AM at 10:32 am said:
  
  The coherence of sequential data collection is a huge huge part of Bayes. Suppose 3 different labs all do a set of experiments producing data set A, B, C in different parts of the world and on different schedules.
  
  One group wants to analyze the data and they ask the other labs to send their datasets when they’re done. Now various things could occur which result in the analysis group receiving the data in different weeks in one of the sequences:
  
  ABC or ACB or BAC or BCA or CAB or CBA
  
  If after receiving each dataset they run an analysis, they will get different posterior distributions for weeks 1 or 2 based on which dataset they receive, but by the final week the Bayes analysis will produce a consistent posterior no matter which of these sequences occurs, because the posterior is a function of the **set** of data, and sets are *unordered*.
  
  If you try to do a sequence of Frequentist analyses in which you accept or reject hypotheses after each week, you will quickly wind up in a backwater of inconsistent results having accepted in the first week, rejected in the second, then accepted in the third… or a different sequence depending on what you do. If the third lab calls you up and says “have we rejected the hypothesis so far, maybe we shouldn’t waste our time doing this experiment?” now depending on the probability for you to answer that question “yes we rejected already why don’t you save your resources and not bother” or “you know what, let’s just do the experiment anyway according to plan” the accept or reject probability has or has not been compromised (it has been 100% compromised unless you have exactly 100% probability to always say “do it according to plan”). If that probability depends on the weather and your girlfriend being mad at you then the p value of the experiment depends on whether you remembered to get your girlfriend a birthday present, or if you forgot to do the dishes on wednesday before going to bed and left a mess in the kitchen or butterfly wing-flaps in the Amazon a month ago…
  
  Frequentist analysis is fundamentally at the DEEPEST LEVEL flawed.
  
  Reply ↓
  - Dale Lehman on September 1, 2024 11:30 AM at 11:30 am said:
    
    In your example, I find something confusing. I would think the frequentist analysis (for the 3 lab example) would either be to analyze C on its own, or if the 3 labs really used the same procedures, analyze A+B+C, while the Bayesian analysis would clearly use the sequential information. If, as you say the Bayesian approach gives the same answer once all 3 labs’ data has been analyzed (this is not so clear to me, but I’ll take your word for that), then it doesn’t seem so different than the frequentist A+B+C answer. It does seem clear that analyzing C as if A and B don’t exist would be flawed.
    
    What you are saying seems like the “deepest level” flawed is applying NHST. I don’t see that as the same thing as frequentist analysis, but as one type of frequentist analysis (and one that is deeply flawed).
    
    Reply ↓
    - Daniel Lakeland on September 1, 2024 12:04 PM at 12:04 pm said:
      
      Yes, if you first collect all the data A,B,C and then frequentist analyze the result. and there is no “probability that you decide to change your mind” then you can have a proper p value. Note that **even if you don’t in fact change your mind** the probability that you *might have changed your mind* in the past needs to be 0. It can’t just be that you did the equivalent of a biased coin flip and it came up heads, it needs to be that it **couldn’t have come up tails** according to the mythical universal high complexity kolmogorov sequence that determines all facts about the universe according to the Frequentist philosophy. Because the probability that *if you reran the experiment* you’d get data as extreme or more extreme etc… is dependent on the probability that you’d get A,B and then decide not to run C in the future if the whole thing were repeated. (or get B,A, then C, or get B,A then not C, or get B, call the whole thing off… or get A then C then call the whole thing off… etc etc etc)
      
      The whole thing is mumbo jumbo.
      
      We all know that in complex analyses where things really matter (for example medical treatment trials) people observe what’s going on and make decisions all the time in the process. We heard in the last few years about treatment of kids in Africa for malaria. They saw some results of some sort, and decided to call the trial off. Similar stuff happens all the time. Say in water pollution prevention trials, or regulatory trials or whatever.
    - Carlos Ungil on September 1, 2024 1:43 PM at 1:43 pm said:
      
      > the Bayesian analysis would clearly use the sequential information.
      
      It depends on what we call “Bayesian analysis”. If it’s about having a (data-independent) model and a meaningful prior distribution then it’s true that that the posterior of the analysis of A could be taken as prior for the analysis of B using the same model and the result would be the same as the joint analysis of A and B.
      
      If it’s about the kind of Bayesian workflow often discussed in this blog – where “adding prior information” is optional and model checking and improvement is “absolutely necessary” – it’s not clear that the sequential information would be used and the results could be different even if it was.
      
      It’s also not clear what you call “frequentist analysis” but I even if there was a way to apply it sequentially I doubt the result would be the same as for a single analysis on the aggregate data.
    - Matt Skaggs on September 1, 2024 1:55 PM at 1:55 pm said:
      
      “What you are saying seems like the “deepest level” flawed is applying NHST.”
      
      That is the only way I could make sense of what Daniel wrote as well.
      
      And now I am trying to figure out how an analysis of the flexibility of rubber at three different labs simulates a random number generator when tested by a frequentist but does not when tested by a Bayesian.
    - Carlos Ungil on September 1, 2024 2:22 PM at 2:22 pm said:
      
      > And now I am trying to figure out how an analysis of the flexibility of rubber at three different labs simulates a random number generator when tested by a frequentist but does not when tested by a Bayesian.
      
      I’m not sure what “tested by a frequentist” and “tested by a Bayesian” means there but Daniel’s comment was about the intrinsic inability of what is ususally called “frequentist analysis” to do a coherent analysis of data if it is done sequentially. Maybe you have that figured out though?
    - Daniel Lakeland on September 1, 2024 2:52 PM at 2:52 pm said:
      
      Note there are two ways to do the Bayesian analysis.
      
      1) the approximate way… analyze A, get some posterior for the parameters, approximate this posterior in a way that allows you to program it into Stan, then analyze B… repeat for C… This is an approximate analysis when A,B,C are not all available as a dataset in one place. (ie. uncooperative non-data-sharing type situation)
      
      2) Analyze A, when B comes in, concatenate the A,B datasets and run the same Stan model (or whatever language you prefer) then when C comes in concatenate A,B,C and run the model again… Even if the data comes in as B,C,A or whatever other order, you will get the same final posterior… This is the full correct Bayesian model run.
      
      The point is that the “proper” Bayesian analysis (2) takes the prior and the full concatenated/unioned dataset and results in a final posterior distribution regardless of the order in which you actually collected A,B,C or if you made decisions like “let’s cut the C data collection in half to save money now that we have A,B and see that it’s giving us almost all the data we need to make a decision”. That’s because Bayesian probability doesn’t model “how often would you get x” it models “how much do you know about x?”
      
      The Frequentist analysis, even in the **full data sharing single final analysis** case which is the best case, asks “what is the probability that you would get a dataset whose test statistic t(A U B U C) is as extreme or more extreme than the one we observed?” and in order to say that, if the data collection is sequential and potentially has decision making or regulatory changes, or supply chain variability in the reagents or is affected by you getting discouraged or your funders removing the funding or … whatever… all of those things change the probability of getting a future dataset with a different t() statistic. Even if you’re willing to ignore some of them, we know for sure that some of them are not ignorable, like in cases involving malaria in Africa or Long COVID / post viral chronic fatigue etc as we’ve seen in the past on this blog.
      
      What is the proper p value for repeating the experiment on the coal policy in China? Do we really think that re-doing that experiment over the next 30 years would lead to the same frequency outcomes that occurred in the 50 years before the coal near the river experiment? If we don’t the p value and the “frequency in infinite trials” are complete and utter fictions in a deep way that makes them highly irrelevant to the scientific question. It’s an angels on the heads of pins type question.
    - Daniel Lakeland on September 1, 2024 5:24 PM at 5:24 pm said:
      
      Matt…
      
      The frequentist analysis fundamentally says that the rubber experiments arise as if they were (in Julia notation)
      
      outcomes = [f(rand(); p1,p2, p3…) for i in 1:n]
      
      for some fixed function f parameterized by some fixed parameters p1,p2,p3 and some secret random seed that only the universe knows. (this is what it means to have a stationary frequency distribution).
      
      The Bayesian analysis says they are [g(state_of_the_universe(t))] where g is largely insensitive to most aspects of the state of the universe and can be approximated as for example [g2(sulfur content, temperature, UV exposure) + g3(other_stuff)] and the information we have about g3() is that for almost all aspects of the possible state of the universe the result will be near to zero to within some ranges which get increasingly less likely the farther we are from zero. (or this at least is typical of the kind of thing Bayes models say)
      
      A lot of people believe they’ve done a “Frequentist” analysis when they build a maximum likelihood model and fit it. They haven’t. They’ve done a Frequentist analysis when they rely on the shape of the distribution of f(rand()) to compute some test statistic t and and utilize a p value to determine whether the assumed f would produce data “like” the observed data or not. This fundamentally relies on the *shape* of the distribution of f(rand(); …) and the fact that it is constant through time.
    - Andrew on September 1, 2024 5:38 PM at 5:38 pm said:
      
      Daniel:
      
      Strictly speaking, a frequentist analysis refers to the process of evaluating the long-term properties of some statistical procedure, averaging over possible datasets that could arise, as specified by some model. You can use frequentist analysis to estimate the error rate of a Bayesian estimate, for example. It is not necessary for a frequentist analysis to involve tail-area probabilities or hypothesis testing at all.
    - Daniel Lakeland on September 1, 2024 6:16 PM at 6:16 pm said:
      
      Andrew. Maybe. I think you’re asking about a frequency analysis of the Bayesian procedure, not a Frequentist analysis of a scientific inference problem. Like, if we both want to find out the average breaking cycle count of a fatigue analysis of some rubber, we have a question in which the answer is something like “1554 cycles is the average breaking cycle count” not “this procedure gives an interval that contains the right answer 99% of the time you run it”. The second kind of answer is a different kind of frequency analysis, it’s an analysis of a procedure, which we could identify with for example a computer code. It’s not the analysis of a question about rubber that we undertake after collecting rubber fatigue counts.
Daniel Lakeland on September 1, 2024 10:05 AM at 10:05 am said:

In my mind the difference is immense. In one analysis scheme the mathematical description of the process that you actually believe is involved in the data production can be fleshed out in whatever way you want, and then there’s a universal method to add uncertainty to both the unknowns and the actual data. That’s Bayes.

In the second the assumption is that the world is as if there’s an unknowable sequence of random numbers which generates the data and guarantees the randomness in terms of long run frequencies and that guarantee of frequency stability can be relied upon to design tests which fundamentally only work if the frequency based randomness is true, and these tests can help you determine what region of parameter space the unknown but fixed parameters lie in.

It’s no surprise to me that people who fundamentally think cryptographic random number generators are a good substitute for describing the world leave us with so many bad science papers. In essence they’ve given up on science as a first step. Does anyone really believe that the effect of drugs on the body is best described by a Kolmogorov high complexity sequence of bits transformed through a nonlinear function? That the economy works like that? That educational effectiveness works like that? The flexibility and crack resistance of rubber polymers? Weld strength in structural steel? Hurricane resistance of roofing tiles? Agricultural yields in fertilizer and weed suppression?

Reply ↓
John G Williams on September 1, 2024 11:13 AM at 11:13 am said:

Ram asks why researchers report the results of applying both approaches. My guess is that they do so to satisfy readers who favor one approach or the other. Is this too simple minded?

Reply ↓
Anoneuoid on September 1, 2024 12:44 PM at 12:44 pm said:

You can do bayesian or frequentist NHST. Both amount testing an irrelevant strawman hypothesis.

Instead test *your* hypothesis and either method is fine. Usually the threat of systematic error looms so large whatever statistical differences are negligible.

Hilarious (in a gallows humor way) that, rather than doing science, now theres two parallel types of bizarro science to ignore. This crap is going to destroy civilization, just like Fisher and Lakatos warned.

Reply ↓
Simon Gates on September 1, 2024 4:09 PM at 4:09 pm said:

“Hypothesis testing is all about assumptions that something is exactly zero, which does not make sense in any problem I’ve studied.”
I agree, but this statement often gets attacked by significance-testers, with various justifications that things CAN be exactly zero. One I’ve seen is that if you randomised into two groups (and don’t treat the groups differently) then you expect zero difference. But that makes no sense because the whole point of randomising is to then treat the groups differently, and if they receive different interventions then the difference won’t be exactly zero unless the intervention is homeopathy or intercessory prayer or something like that.
I’ve seen clever people make this arguement (the one I’m criticising!) which is a bit baffling. Maybe I’m just not understanding something.

Reply ↓
- Andrew on September 1, 2024 4:34 PM at 4:34 pm said:
  
  Simon:
  
  The “steelman” version of the argument is that, even if nothing’s zero, effects can be zero in practice. For example, in some area of application you could define any effect less than 0.01 (on some relevant scale) as effectively zero, and then perhaps much of the hypothesis-testing reasoning will still work out. I think, though, that this sort of effectively-zero reasoning will only work if you’re careful about defining that range, which is counter to the usual approach in hypothesis testing of just trying to reject the null hypothesis and call it a win.
  
  Reply ↓
  - Mathias Berggren on September 2, 2024 8:33 AM at 8:33 am said:
    
    Simon, Andrew:
    
    Also: If I have a composite hypothesis H1: mu > 0, against the null H0: mu <= 0, and I test mu = 0 and find that results are highly unlikely under that assumption, and in the direction of H1, then I know that results are even more unlikely under any other parameter value in H0 (assuming usual simple models, like the normal distribution for observations). Thus, if these results can be treated as evidence against mu = 0, they can also be treated as evidence against any other mu in H0.
    
    This seems to me to be how significance tests are usually treated in practice in social science, at least when there is some theory that guides the testing, as that will usually predict that the parameter is in one direction rather than another.
    
    Of course, one can question if H1 is enough to have as an informative hypothesis. H1 allows mu to be arbitrarily close to 0 here, for example. And one can ask whether H0 is not still often a strawman under this approach. If I have the hypothesis H1: "People who are more sociable will like parties more on average", versus, H0: "People who are more sociable will like parties equally much or less on average", then H0 does not seem like anything any theory or perspective would seriously predict, so it does not seem very informative to test it.
    
    Reply ↓
    - Dale Lehman on September 2, 2024 9:00 AM at 9:00 am said:
      
      I’m not entirely following your logic, but I think the idea of seeing whether people who are more sociable will like parties more on average is a ridiculous thing to be asking – but I would agree it is the type of question researchers often are asking. The “how much” question is what matters, not whether or not the directional impact is something you are willing to say. Economists are often guilty of being satisfied with a directional conclusion: e.g., do increases in price reduce the quantity demanded? Do increases in the minimum wage lead to more unemployment? Without saying anything about how large the impact is, I don’t see any value in being able to say the direction of the impact (aside from the fact that a binary yes/no answer is never of interest to me, since the answer is always maybe).
      
      So, regarding your last sentence, I’d say it does not seem very informative to test any null hypothesis, whether it is one or two sided. That doesn’t mean that the ingredients in such a test are uninformative, only that the binary conclusion tells us nothing.
    - Andrew on September 2, 2024 9:07 AM at 9:07 am said:
      
      Dale:
      
      Yeah, I’m reminded of the classic finding, “Participants reported being hungrier when they walked into the café (mean = 7.38, SD = 2.20) than when they walked out [mean = 1.53, SD = 2.70, F(1, 75) = 107.68, P < 0.001]."
    - Mathias Berggren on September 2, 2024 9:23 AM at 9:23 am said:
      
      Dale:
      
      I agree that H1 is probably not all that informative in most cases when formulated this way. My point though was mainly that researchers often do not “just” test mu = 0, but a set of parameters that do not conform to their hypothesis. It would of course be more informative if we had, say, H1: mu > 1, and H0: mu <= 1.
      
      However, I do think there is at least one time when it can be fairly informative with a directional hypothesis: When there is some other theory that does predict mu <= 0 (or mu < 0). Not that the examination should end with that test of course.
      
      That is why I lifted the sociable-example: Because that is one time when there does not seem to be much theory that would predict the other way. (And because it seems to be what is sometimes done in personality psychology, like when the existence of an Extraversion factor of personality is considered supported due to a positive relationship between sociability and liking parties, although similar predictions could come from e.g. a network perspective on personality.)
    - Dale Lehman on September 2, 2024 11:17 AM at 11:17 am said:
      
      Mathias
      I think we are in agreement, but I’d prefer not to use NHST to distinguish between theories in your example where it could go either way. Knowing which direction the effect is in is probably more dangerous than not knowing. Given the number of other factors that could confound the result and the myriad forking paths, using a directional finding to reject some theory that has the opposite effect sounds like a bad idea to me. Also, since most effects will be positive for some people and negative for others, just knowing which effect is likely to be larger (without saying anything about by how much) seems similarly dangerous.
- Christian Hennig on September 4, 2024 8:07 PM at 8:07 pm said:
  
  My defence of null hypotheses of this type would always be that frequentist models are not about how reality is anyway, but they are tools for thinking, and the use of the point null hypothesis tool is that if the data seem compatible with it, you’d have a hard time to use them to convincingly argue something else. This argument does not require the possibility that the null model is really true, not even approximately (the latter would be the way require a definition of what “approximation” exactly means, and with different definitions one may come to different conclusions in many cases).
  
  Reply ↓
  - Andrew on September 4, 2024 8:19 PM at 8:19 pm said:
    
    Well put!
    
    Reply ↓
  - Daniel Lakeland on September 4, 2024 9:19 PM at 9:19 pm said:
    
    I feel like there’s a distinct philosophical difference between your view expressed here which is something like “this bog simple random model is not inconsistent with the data therefore if you have an alternative model it should be in some sense “dramatically better” to justify accepting it” compared to the view sometimes expressed in areas like pharma regulations where “the method has a ‘frequentist guarantee’ that only 1 in 20 confidence intervals fails to contain the true real world value”
    
    This “guarantee” stuff is often taken to be a fact about the world. not a “tool for thinking” but a tool for guaranteeing a small number of real errors in a large number of real drug trials.
    
    It is of course nothing of the kind. The logical mistake is to assume the frequentist model is true and correct but missing only a mean, or a couple other parameters.
    
    Your “tool for thinking” is right, we could certainly compete two models in a Bayesian comparison, one that looks at no covariates and just predicts randomness around a typical result, and one that uses more knowledge. If the more knowledge does better at predicting it will win out in such a comparison.
    
    I have no beef with “does this data look unusual from a random process of type X” it’s inference of the type “since we are sampling from a random process of type X what are the parameters we should use?” That bothers me as it begs the question (in the philosophical sense of that phrase)
    
    Reply ↓
    - Christian Hennig on September 5, 2024 5:49 AM at 5:49 am said:
      
      Yeah, I think we have little disagreement here. I do believe that my interpretation of frequentism is philosophically nonstandard, see what I had linked before but I do it once more: https://arxiv.org/abs/2007.05748
      
      How to regulate clinical trials is a very complicated issue. As “all models are wrong but some are useful”, I do think that it can make sense to have regulations that refer to models that are not “true”, but of course when deciding how exactly they should look like, this better be acknowledged. (I’m not working in clinical trials myself, so my qualification for discussing this is somewhat limited.)
    - confused on September 5, 2024 1:37 PM at 1:37 pm said:
      
      >>I do think that it can make sense to have regulations that refer to models that are not “true”,
      
      Sadly, it’s often necessary. And not just frequentist vs Bayesian issues.
      
      If you are doing environmental stuff where you are extrapolating from effects on a small population of rodents at high dose, to a huge human population at orders of magnitude lower dose, the risk levels you get are hugely dependent on your model for the extrapolation … but you have to have some (probably wrong because not constrained by any real data in the ranges of interest) to choose any level at all. (Well, other than purely arbitrarily.)
Jesse O. on September 1, 2024 10:29 PM at 10:29 pm said:

> I [don’t] see a deep philosophical divide between Bayesian inference and classical significance testing.

I think the philosophical divide is indeed deep. The two traditions arise out of very different philosophical commitments about what probability statements mean. It might be better to say that the divide is philosophically deep but methodologically shallow.

Reply ↓
Shravan Vasishth on September 2, 2024 2:41 AM at 2:41 am said:

For me, the divide is insurmountable. One reason is that in Bayes we are talking about the uncertainty of a parameter of interest; that is something we cannot talk about in the frequentist world, even though it is actually of crucial interest to us. In the frequentist world, we can only talk about the uncertainty of the sampling distribution of the parameter, under entirely imaginary sampling.

Another thing that distinguishes Bayes from the frequentist methodology for me is that we can build on what we know already in Bayes (through prior specification). That is what we do in science anyway. Why do we want to give up incremental knowledge acquisition by starting from scratch each time?

Another unique aspect of Bayes that frequentism has no way to mimic is that we can impose regularization on parameters for which we have insufficient data to get reliable MLEs for, but which we nevertheless want in the model. I’m thinking of all those convergence failure in lmer, and all the shenanigans we have to engage in in frequentism to get past the problem of +/-1 correlations on random effects.

Finally, writing out models in languages like JAGS and Stan help the researcher to actually think about the generative process. In frequentist modeling, the pull is strong to just take a canned model and somehow try to squeeze out some information from it. You then get ridiculous things like using probability as a dependent measure and then assuming that you can just treat probability as coming from a Normal distribution, leading to conclusions that you can get greater than 100% probability; or treating reading time as normally distributed, leading to models that generate negative reading time data. Frequentist tools seem to encourage madness.

At the same time, for fields like psych, the frequentist logic of the properties of replications seems super imporant, but that can be brought into Bayesian models as well, by considering what happens when we repeatedly sample and test our hypotheses (e.g., the dance of the Bayes factor, as Daniel Lakens put it). Some kind of intermediate world, taking the best of both, seems important in such field. Like capitalism with socialist properties.

Reply ↓
- Chris Wilson on September 4, 2024 11:08 AM at 11:08 am said:
  
  +1
  Yesterday I was helping out a colleague analyzing a straightforward 2X2 factorial lab experiment using a Gamma GLM implemented in rstanarm. Even without ‘random effects’ the advantages in terms of readily constructing the quantities of interest from posterior MCMC samples is so tremendously liberating compared to all the ad hoc solutions you have to futz with in non-Bayesian methods. For instance, I can readily construct marginal means and contrasts, stratified contrasts, normalized responses, etc and get full posterior distributions over any and all of them, once I have a satisfactory underlying model fit.
  This throws the emphasis back on careful delineation of the scientific questions and hypotheses themselves, really trying to be clear about what makes sense from prior theory versus what is chasing patterns in the data we happened to observe and so on.
  
  Reply ↓
  - Shravan Vasishth on September 5, 2024 12:39 AM at 12:39 am said:
    
    I know exactly what you mean; I have experienced that feeling of liberation too.
    
    Reply ↓
Mark Palko on September 2, 2024 6:10 AM at 6:10 am said:

I know they’re horribly out of fashion but in almost all of the business problems I’ve been asked to perform tests for, there’s a default and a challenger suggesting a one-tailed test. For example, I want to know if the more expensive option beats the cheaper one. From a decision standpoint, it makes no difference whether C>E or C=E, either way I’m going with C.

Not sure where the false assumption is there.

(but everywhere I go two-tailed is the default)

Reply ↓
- Carlos Ungil on September 2, 2024 7:31 AM at 7:31 am said:
  
  One-tailed tests are often used for non-inferiority trials – more by convention than anything else. They are rarely used for superiority trials but it may happen when changing the rules mid-race gets you a winner.
  
  Reply ↓
kj on September 2, 2024 11:58 AM at 11:58 am said:

I feel today’s workflow differences between Bayesians and Frequentists is quite huge today. Particularly with model checking/CV + the ability to completely customize your model with Stan, I think modern Bayesians solve problems very differently than Frequentists. This is in no small part thanks to Andrew.

Back up 40 years ago though, could small changes have led to the reverse situation, with the modern Frequentist community doing model checking and having more customizable models, while the Bayesians are stuck being doggedly subjective or insisting on well-understood non-informative priors and the like? Or is the Frequentist philosophy inherently constraining? Does the Bayesian philosophy lend itself more towards growth and practical value?

Reply ↓
- Andrew on September 2, 2024 12:14 PM at 12:14 pm said:
  
  Kj:
  
  I think the term “frequentist” is just too vague here, as it describes so many different approaches to statistics. For example:
  
  – Anova and hypothesis testing, of the sort that you’ll see in psychology research;
  
  – Regression and confidence intervals, as is standard for causal inference in medicine and epidemiology;
  
  – Estimation and prediction using machine learning, with uncertainty estimates taken from bootstrapping or cross validation, as is common in many areas of business.
  
  All these approaches tend to use computer programs with preprogrammed models, so in that sense you’re right—there’s not a lot of model checking and customization going on. But a user of frequentist machine learning techniques might argue that, by throwing in so many predictors, they’ve removed the need for model building, model checking, and iterative workflow more generally. My impression is that the most popular ideas in frequentist statistical theory involve “super-learner”-type approaches that can automatically adapt to any data and that require minimal human input to fit and interpret. Which makes sense if you’re fitting a model to a zillion data points or if you’re repeating some process a zillion times.
  
  So, yeah, I guess I agree with you that the modern Bayesian and frequentist workflows are different. There’s also a third option, which is neither Bayesian nor frequentist and which one might call the “data science” perspective. I’m thinking here about the tidyverse and I assume similar structures in Python. In this world, the data scientist is active, with a workflow that has many similarities to the Bayesian workflow that my colleagues and I talk about—but it’s a workflow of data rather than models. A data scientist might well fit black-box machine learning models (or black-box anovas and regressions) but then do checking and customization by feeding in different datasets, and playing around with different codings or different methods.
  
  Reply ↓
fwiw on September 3, 2024 12:21 AM at 12:21 am said:

‘Bayesian inference relies assumptions such as normality and logistic curves’. It really doesn’t rely on those assumptions at all.

Reply ↓
RNM on September 3, 2024 1:19 PM at 1:19 pm said:

I often analyze studies both ways. If the results align then I’m more comfortable with the results, and if they don’t it gives me a place to work from to better understand the assumptions that are key for the particular analysis. This isn’t the only way I check the models, but I’ve found it to be helpful.

Another nice property is that in medical research (where I work) frequentist approaches are still the norm and this often heads of reviewer comments.

Reply ↓
- Christian Hennig on September 4, 2024 8:13 PM at 8:13 pm said:
  
  Agree. If we run two analyses of the same data that are supposed to tell us the same thing, and they come out differently, we can always use this to learn something about the interplay between the data and the methods. This may or may not be about the assumptions; it can also be that there is a particular feature of the data that causes one analysis to react in a different way than the other, so we can learn something more about our data. (I don’t particularly have Bayes vs. frequentist in mind here; it could also be parametric vs. nonparametric etc.)
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

“My basic question is do we really need data to be analysed by both methods?”

38 thoughts on ““My basic question is do we really need data to be analysed by both methods?””

Leave a Reply Cancel reply