Stephen Martin writes:

Daniel Lakens recently blogged about philosophies of science and how they relate to statistical philosophies. I thought it may be of interest to you. In particular, this statement:

From a scientific realism perspective, Bayes Factors or Bayesian posteriors do not provide an answer to the main question of interest, which is the verisimilitude of scientific theories. Belief can be used to decide which questions to examine, but it can not be used to determine the truth-likeness of a theory.

My response, TLDR:

1) frequentism and NP require more subjectivity than they’re given credit for (assumptions, belief in perfectly known sampling distributions, Beta [and thus type-2 error ‘control’] requires subjective estimate of the alternative effect size)2) Bayesianism isn’t inherently more subjective, it just acknowledges uncertainty given the data [still data-driven!]

3) Popper probably wouldn’t like the NHST ritual, given that we use p-values to support hypotheses, not to refute an accepted hypothesis [the nil-hypothesis of 0 is not an accepted hypothesis in most cases]

4) Refuting falsifiable hypotheses can be done in Bayes, which is largely what Popper cared about anyway

5) Even in a NP or LRT framework, people don’t generally care about EXACT statistical hypotheses, they care about substantive hypotheses, which map to a range of statistical/estimate hypotheses, and YET people don’t test the /range/, they test point values; bayes can easily ‘test’ the hypothesized range.

My [Martin’s] full response is here.

I agree with everything that Martin writes above. And, for that matter, I agree with most of Lakens wrote too. The starting point for all of this is my 2011 article, Induction and deduction in Bayesian data analysis. Also relevant are my 2013 article with Shalizi, Philosophy and the practice of Bayesian statistics and our response to the ensuing discussion, and my recent article with Hennig, Beyond subjective and objective in statistics.

Lakens covers the same Popper-Lakatos ground that we do, although he (Lakens) doesn’t appear to be aware of the falsificationist view of Bayesian data analysis, as expressed in chapter 6 of BDA and the articles listed above. Lakens is stuck in a traditionalist view of Bayesian inference as based on subjectivity and belief, rather than what I consider a more modern approach of conditionality, where Bayesian inference works out the implications of a statistical model or system of assumptions, the better to allow us to reveal problems that motivate improvements and occasional wholesale replacements of our models.

Overall I’m glad Lakens wrote his post because he’s reminding people of important issues that are not handled well in traditional frequentist or subjective-Bayes approaches, and I’m glad that Martin filled in some of the gaps. The audience for all of this seems to be psychology researchers, so let me re-emphasize a point I’ve made many times, the distinction between statistical models and scientific models. A statistical model is necessarily specific, and we should avoid the all-too-common mistake of rejecting some uninteresting statistical model and taking this as evidence for a preferred scientific model. That way lies madness.

Here is my post, explicitly addressing the OP author’s position (for some context, we debate on FB often ;))

Daniel Lakens: Let me first apologize for my previous comments, if they were a bit snarky ;-) I am now on a computer, so will respond more thoroughly and thoughtfully. I have been talking a lot about assumptions (here, twitter, and at my University). What does this mean ? Well, of course, we have the assumptions of normality, homogenous variances, etc.. We also have the assumption of a null sampling distribution, repeated sampling, ect.. Added to this, as Stephen Martin pointed out, an exact value (d_truth) is often assumed for the hypothesized effect. Now one might say the assumed value for d_truth is a belief, but I would not. Rather, it is an estimate informed by prior information. Conceptually, this is incorporating “prior” information into calculating a long run expectation (e.g., power).

That said, I am very unclear how, for a given hypothesized effect (d_truth), you think we can know exact error probabilities? This would require many things to be true, for which we cannot ever know. By definition, the frequentist assumption is a fixed unknown value. Consider we had an unbiased meta-analytic estimate, d = 0.3, 95-% CI[0.10, 0.40]. We may reason that d = 0.3 is our best estimate of the true effect, but, in fact, the interval contains values we would NOT reject at the alpha = 0.05 level. Accordingly, power and thus type II for d = 0.30 cannot be exactly correct, even assuming all other assumptions are reasonable. In a Bayes context, we may reason d_truth does precisely exist, but we could compute “power” for some decision rule across the entire posterior distribution. Of course, this is also not exactly correct, but does seem more reasonable as it allows for a measure of uncertainty around detecting d_truth.

About falsification. Like many that use Bayes, we probably all learn freqeuntist methods first. In pursuing Bayes, the training–often self-directed–comes with a heavy dose of the history and philosophy of science and especially statistics. You appeal to Popper, which is great. However, to my knowledge, Popper is best known for falsification, but not for falsifying the null hypothesis (please let me know if this is wrong). In Bayesian books, it often comes up that NHST probably runs contrary to what Popper had in mind. That is, the original intent was to falsify our hypothesis, not the null hypothesis. Along Stephen Martin, Richard McElreath makes this point nicely in the Rethinking book.

Moreover, I would argue that the way Bayes is often used (at least by those who follow Gelman et als approach) in practice does fit into Popper’s notion of falsification. If we posit our model is our hypothesis for the data generating process, then the logical step is to attempt to falsify our model. This is done with posterior predictive checks, and is part of the standard work flow for Bayesian modeling (see Gelman’s book). That is, for many Bayesians, this is done for every model. To my knowledge, this kind of stuff is not commonly done in frequentist frameworks, although I am sure something similar could be achieved.

Finally, lets us move to some slightly more complex models. That is, lets not think of comparing two groups, but of a multi-level model, in which slopes and intercepts can vary among individuals. In addition to the error variance (e_var), we have the “random” intercept (u_0), and “random” slope (u_1) variance to consider. Here, it has been noted by many that computing power (and thus type II) is very difficult, because we have to assume values for e_var, u_0, and u_1. Of course, we can and I do, but I also appreciate these values cannot be considered exactly correct at best, and might be very misleading at worse. One solution might be to define d_truth not by a point estimate, but as a random variable in which our uncertainty is expressed as a probability distribution.

I thought your post was very intriguing. However, I use Bayes often (maybe daily) and I do not recognize the philosophy behind how you describe Bayes and how I–or everyone else that I work with–use Bayes in practice. There is a big disconnect here, and I think it might be worth while for you to check out Gelman’s and/or McElreath’s book to gain a better understanding of modern Bayesian data analysis.

Andrew,

Thanks again for the swift response!

I think the ‘subjective’ nomenclature should go away, much like you suggest in your linked papers. Even ‘prior’ can be misleading in modern bayes. I tend to describe the prior not as subjective, or even necessarily based in empirical observations, but as either soft, probabilistic constraints, or as encoding information into the model. That information can be very objective (based on previous estimates), or it may be to identify a model, or for several other things.

I find it really intriguing that Popper and Fisher concepts are so readily abused in the modern day; honestly, it seems as though if one is Fisherian and Popperian, so to speak, that one’s null hypothesis /should/ be the hypothesis to be tested. That is, rather than ‘nil’ null hypotheses, it seems like the null should represent something substantive that one is actively attempting to disprove from a theoretical standpoint. Noone seems to do this [contemporaries, anyway]. Moreover, I was especially intrigued by these two things:

1) NP-inspired model comparisons pit one hypothesis against another. It seems like if one is planning a study with an ‘alternative’ fixed point hypothesis in mind, then one should actually pit the ‘null’ against the a-priori alternative; that is, if one’s null in theta = 0 and one’s alternative is theta = .4 (when conducting an a-priori power analysis, for instance), then one’s model comparison should be p(data | theta = 0) vs p(data | theta = .4); it seems weird to me, now, from a NP standpoint to plan a study with an a-priori point hypothesis of theta=.4, obtain an estimate, then do a test with p(data | theta = 0) and p(data | theta = estimate). I’m not sure what your thoughts are on this, but from a purist NP perspective, the former seems to make more logical sense than the latter.

2) NP-inspired hypotheses don’t often take point estimates, but ranges; e.g., theta = 0 vs theta > 0. In this case, it seems like one shouldn’t, again, compare a point estimate to another point estimate, but a range of point estimates to one point estimate; or a range of estimates to another range of estimates, each range dictated by a substantive hypothesis.

These two things came to mind, because Popper doesn’t often discuss /statistical/ hypotheses or null hypotheses, but rather the falsification of substantive hypotheses. IMO, people ignore this distinction in the NHST framework. Substantive: “I think A predicts B” Statistical: “r = .4; r \in (.1, .5]; r != 0). Popper rarely talked about the latter. This is important, because if Popper were talking solely about statistical hypotheses, then people perhaps /should/ use null hypotheses (where the null is actually the hypothesis of interest to be subjected to a severe test, if possible). On the other hand, if Popper were talking about rejecting substantive hypotheses (which, it seems like he was, fairly consistently; his example hypotheses and predictions seemingly never included some point value or range of values, but logical methods of ruling out statements), then he cares much more about gaining support against or for hypotheses when one actively seeks data that can rule out hypotheses. In this latter case, exactly /how/ one gains evidence for or against a hypothesis doesn’t seem to matter. Whether one uses BFs, p-values, CIs, PPCs, model comparisons, whatever, the goal is to assess the extent to which a substantive hypothesis mismatches the data, and the substantive hypothesis may be expressed as statistical hypotheses which the data either match or mismatch. It’s for this reason that I completely fail to see how Popper would be against any modern Bayesian method. Whether one is fisherian, NP, likelihoodist, bayesian, non-parametric, if one can map a falsifiable substantive hypothesis to a quantity or range of quantities, and one’s hypothesis-inspired model fails to match that quantity or range of quantities, then surely one could say that the hypothesis is unsupported and requires revision.

I’ve also recently given a lot of thought to the notion of Occam’s razor and /statistical/ models. It seems to me like the important part of theoretical parsimony is to explain as much as you can with as few conditions as possible. I don’t think this necessarily applies to statistical models, or at least some components. E.g., I fear that many would fear the idea of hierarchically modeling variances, or permitting heteroskedacity because the statistical model is not as parsimonious (more parameters); and yet, one’s theory doesn’t actually dictate, usually, that variances must match some condition. With that mindset, it seems to me that permitting more parameters that are irrelevant to one’s theory is not a breach of parsimony, but rather decreases the number of conditions one is asserting. That is, one camp would see more parameters [irrelevant to theory] and fear a lack of parsimony; another camp would see more parameters [irrelevant to theory] and assert that they are making fewer assumptions about the DGP, and thus are being more parsimonious in assumptions. I’d love to hear your input about this: what is important about parsimony in statistical models… fewer parameters = more parsimonious (in an occam’s razor sense) or more parameters (fewer assertions/constraints about the DGP) = more parsimonious (in an occam’s razor sense).

Some stray remarks written on the fly:

“3) Popper probably wouldn’t like the NHST ritual, given that we use p-values to support hypotheses, not to refute an accepted hypothesis [the nil-hypothesis of 0 is not an accepted hypothesis in most cases]”

Popper was well aware of Fisherian testing and appealed to it as the basis for his account of statistical falsification. Popper was weak when it came to statistics (he told me this). That is, he was unable to cope with the obvious need for statistical falsification without Fisher. What is anathema to Popper is committing the age-old fallacy of inferring a theory T based on just a statistical effect, even if genuine. That’s why Lakatos, having heard the NHST horror stories from Meehl, wondered if the social sciences were just so much pseudo-intellectual garbage (or some words very close to those). On the other hand, Lakatos saw in Neyman-Pearson statistics the embodiment of Popperian methodological falsification (i.e., the probability is assigned to the method).

“Refuting falsifiable hypotheses can be done in Bayes, which is largely what Popper cared about anyway.”

I don’t see that falsification is accomplished or desired in Bayesian probabilism (updating or Bayes factors) but any account can be made to refute falsifiable hypotheses by adding a falsification rule, and all accounts that falsify have falsification rules. The hard part is showing it’s a good falsification rule–merely spotting anomalies and misfits aren’t enough. The method needs good capability of falsifying claims just when they’re (specifiably) false, and of corroborating claims just when they fail to be falsified by a stringent test.

Lakens on Popperian verisimilitude: This was a degenerating problem that has been rejected by Popperians.

Popper’s goal, to define closeness to the (whole) truth, was a failure as were various other distances from the whole truth that have been tried. It was intended to help explain why false theories are partially successful. (The hypothetical-deductivist corroborates the whole of the theory–they are, after all, deductivists). Dyed-in-the-wool Popperians like Musgrave* have abandoned it in favor of locating aspects of theories that are true. How to do it (i.e., figure out which aspects of theories to hold true, was/is a central problem in philo of scie, and solving it was one of the central goals of (my) severity project. The idea is that the portions (or variants) of a theory that are true are those that have passed severe tests. Moreover, I argue, those parts remain through theory change, even through redefinitions, changed entities and all manner of underdetermination.

> where Bayesian inference works out the implications of a statistical model or system of assumptions, the better to allow us to reveal problems that motivate improvements

In the context of the habits of practice currently instilled in a research community – e.g. the preference for under powered studies and near complete inattention to intervals that cross zero (or at least by much) results in the uniform coverage property of confidence intervals, a prior taken as very good, being actually horrible.

(off topic) There’s a link to Thomas Bayes’ theological work on Marginal Revolution today

http://marginalrevolution.com/marginalrevolution/2017/06/true-thomas-bayes.html#comments

It’s interesting that Bayes, a Presbyterian minister, was likely a heretic by traditional Christian notions (an Arian).

It’s also interesting that in the Bellhouse paper linked to

http://www2.isye.gatech.edu/~brani/isyebayes/bank/bayesbiog.pdf

we see, in section 5, the importance of various views of the Trinity in the religious politics of that time — interesting since one’s view of the Trinity has little influence on one’s actual behavior, and exists only in a theological ether of apologia.

“The idea is that the portions (or variants) of a theory that are true are those that have passed severe tests. Moreover, I argue, those parts remain through theory change, even through redefinitions, changed entities and all manner of underdetermination.”

GS: Why not drop all pretense of scientific “truth”? This entails (it seems to me – I’m not an academic philosopher…not that I’m saying that most philosophers are worth much but they sure do have to read a lot to get that Ph.D.) dropping all talk about “reality” and science as approximation of a “true reality.” Truth criteria are, essentially, replaced with prediction and control as values and the essence of science is, thus, obtaining facts that are reliable and general. This doesn’t mean that there is no theory, but often it means that there need not be any theory about “events taking place elsewhere and measured, if at all, in different dimensions [than the phenomenon in question].

Truth criteria are, essentially, replaced with prediction and control as values and the essence of science is, thus, obtaining facts that are reliable and general. This doesn’t mean that there is no theory, …I think this is a very good point, worth careful, thorough consideration. At least some pragmatist philosophers of science take more or less exactly this approach, arguing that theories are evaluated with respect to, e.g., explanatory scope, how well they allow us to intervene and control nature, whether they make surprising predictions, how consistent they are with theories in neighboring fields, etc… Personally, I find this approach compelling.

I think it’s also interesting to think about belief and reality in this kind of pragmatist framework, since even if we don’t (or can’t) evaluate theories based on their verisimilitude, individual scientists (and non-scientists) still do or do not believe particular theories. And even if I don’t think we can ever know if any given theory is true, I believe that there are facts of the matter.

NM: I think it’s also interesting to think about belief and reality in this kind of pragmatist framework, since even if we don’t (or can’t) evaluate theories based on their verisimilitude, individual scientists (and non-scientists) still do or do not believe particular theories. And even if I don’t think we can ever know if any given theory is true, I believe that there are facts of the matter.

GS: Well, interestingly, the sort of pragmatism (or “contextualism”) that I advocate doesn’t preclude truth – it’s just that you could never know you had it in hand. Truth would mean that the verbiage in question was the most effective possible in predicting and controlling Nature. Now, that’s pretty much what you seem to be saying, though. As to *belief*…I’m not exactly sure what you are saying…I think you may be trying to imply more than I’m able to say. But, basically, the terms here (“belief” and “reality”) can be understood (like the term “understood”!) by sticking to the contextualist position. The particular contextualistic position that does that is the Wittgenstienian-like (later W.) radical behaviorism and its scientific expression, behavior analysis. Anyway…consider the last part of your last sentence: “I believe that there are facts of the matter.” The question is “How do you predict and control whether someone says ‘belief’ or ‘believes’ etc. or ‘this is a fact?’” The things you look at to do that are the independent-variables of which the speaker’s verbal behavior is a function. In that sense, there is nothing paradoxical about “…believ[ing] that there are facts of the matter” and “believing that science is a matter of prediction and control, not correspondence to ‘truth’” (and, I guess, it seems to me that that is sort of what you were implying). Talk of beliefs and facts is intimately tied to the circumstances under which we predict and control the One World (whatever that may be!) and that is what we are “talking about” when we talk about “truth.” I’m sure that’s clear as mud…

I think we (mostly? completely?) agree. I definitely agree with the point that we can never know if we have the truth in hand.

GS: not correspondence to ‘truth’

KO: Certainly not in any way adequately assessed correspondence to ‘truth’ but rather a desperate hope that there is some correspondence or a _regulative_ hope that continuing inquiry if persisted in adequately and inexhaustibly would result in correspondence to ‘truth’.

Otherwise we give up on better “predict[ing] and control[ing] the One World (whatever that may be!)” or science as something worth while doing. That is we don’t just want to discern and report what did happen in an experiment, but rather what would continue to happen in x% of case in future experiments that presupposes the “One World” we have no direct access to.

GS: The particular contextualistic position that does that is the Wittgenstienian-like (later W.) radical behaviorism and its scientific expression, behavior analysis.

KO: ??? – not that helpful to me or likely other statisticians.

The challenge is we need a thoughtful meta-statistics that inspires, informs and encourages thoughtful applications of statistics that enable better habits of inquiry in various scientific fields.

Not sure I would be convinced that we can get that from the “later W.”

GS: not correspondence to ‘truth’

KO: Certainly not in any way adequately assessed correspondence to ‘truth’ but rather a desperate hope that there is some correspondence or a _regulative_ hope that continuing inquiry if persisted in adequately and inexhaustibly would result in correspondence to ‘truth’.

GS (new): Not sure I follow the above sentence. But…it seems to me that you are trying to turn a conceptual question into an empirical one. That’s done a lot. No amount of experimental work can corroborate or cast a shadow on a conceptual question, for they are, you know, neither questions of fact (begging the issue) nor theory.

KO: Otherwise we give up on better “predict[ing] and control[ing] the One World (whatever that may be!)” or science as something worth while doing.

GS (new): That seems an outlandish statement that is neither an alleged statement of fact nor a conceptual or theoretical statement.

KO: That is we don’t just want to discern and report what did happen in an experiment,[…]

GS (new): Geez. Now you’re just makin’ stuff up. How does anything I said suggest that that the activities of a scientist would be limited to “discern[ing] and report[ing]”? However, I would say, that, speaking a bit technically, verbal behavior that is under stimulus control of features of the world (and, no, that doesn’t concede the so-called “real-world” if that is supposed to mean “knowable independent of data”) is the very backbone of science. Indeed, science grows out of the persistence of our cultures in establishing such verbal behavior, which I would call “tacts” (cf, Skinner) – verbal responses under stimulus control of features of the world* and relatively free of conditions of deprivation and aversive stimulation…i.e., “motivational operations”.

*Explicitly mentioning stimulus control shows the difference between the “one world” and the so-called “real world,” if I may be allowed to adopt that terminology. The “one world” is about what can potentially control behavior. The one-world is the antithesis of an independent (you know…since it is about stimulus control of behavior) world as well as the wishy-washy “internal copies” (i.e., representations, computations etc.) of indirect realism (of which you are almost certainly a member)

KO: […]but rather what would continue to happen in x% of case in future experiments that presupposes the “One World” we have no direct access to.

GS (new): Wow! You must do magic experiments if you can assess the reliability and generality of data with a single experiment. Now, I would argue that, with SSDs, the case for some amount of reliability can be assessed with one experiment as such experiments generally contain multiple within- and between-subject replications. This is no doubt why some fields do not suffer from much failure to replicate. Such sciences generally start very simple and move up the complexity ladder in fits and starts (I’m talking about experimental sciences here). Anyway, the reliability and generality (obviously generality) of a single data set can only be assessed by direct and systematic replication (but, again, the caveat that with SSDs and reversible phenomena, each experiment constitutes several but, alas, all within one lab).

GS (new): The particular contextualistic position that does that is the Wittgenstienian-like (later W.) radical behaviorism and its scientific expression, behavior analysis.

KO: ??? – not that helpful to me or likely other statisticians.

GS (new): Yes…well…ahem…you have taken the above statement by me out of context and your response seems to be about some other, perhaps fantasied, context. I was talking about how utterances concerning “”belief,” “truth,” “reality” etc. can be understood from within a contextualistic framework.

KO: The challenge is we need a thoughtful meta-statistics that inspires, informs and encourages thoughtful applications of statistics that enable better habits of inquiry in various scientific fields.

GS (new): Are you running for political office or something?

KO: Not sure I would be convinced that we can get that from the “later W.”

GS (new): And I’m pretty sure that you are not talking about the context of verbal behavior as *meaning* – you know, the one thing for which later W. is most widely known? You know, “The meaning of a word is its use”?

GS: not talking about the context of verbal behavior as *meaning* … You know, “The meaning of a word is its use”?

KO: Correct – as later W. prefaced that comment “For a large class of cases – though not for all …”

Again my interest is almost exclusively in applying statistics more thoughtfully.

GS: not talking about the context of verbal behavior as *meaning* … You know, “The meaning of a word is its use”?

KO: Correct[…]

GS (new): Right…problem is, I *was* writing about the use of terms like “belief,” “reality” etc. The reason being is that if you take a contextualistic view of science, you talk about prediction and control and you are inevitably asked “Prediction and control of what?” Oh! The world? But if there’s a world, then things you say may or may not correspond to it, i.e., some things are “true” and some not. And so forth…

It was Noah that brought up the beliefs of scientists about truth. I just briefly addressed the fact that such utterances can (and must) be subject to conceptual analysis and that such a conceptual analysis could proceed via a contextualistic view. I saw Noah as saying that there was a sort of paradox – you can talk all you want about prediction and control substituting for truth criteria but, c’mon, you still believe that you are progressively approaching truth. The point is that there is no paradox because evidence of truth, even among laypeople, is prediction and control. IOW, emission of the word “truth” is *occasioned by* the successful prediction and control of the world. A similar condition prevails WRT the emission of terms like “the world” or “the real world.”

But the important aspect of what I had to say concerns the implications of adopting a contextualistic view of science, rather than the standard reductionism/mechanism that is often treated as the only reasonable view. The tension between contextualistic views and mechanism exists, today and has existed from a point in time that Prigogine puts at Fourier’s treatment of heat transfer in metals. The epitome of mechanism is reflected in (I think) Rutherford’s famous, “all science is physics or it is stamp-collecting.” Lakeland expresses this view – unless it’s quantum mechanics, it can’t be “truth.”

KO: – as later W. prefaced that comment “For a large class of cases – though not for all …”

GS: Funny that those who don’t like how W. “rubs out” explanatory fictions always point to this line…but time after time in Philosophical Investigations W. does just that. And, to a great extent, the apex of this view is the philosophy called radical behaviorism and the science called behavior analysis.

KO: Again my interest is almost exclusively in applying statistics more thoughtfully.

GS: Mine too, I guess, especially since “statistics” pretty much has to do with everything from graphs to complex mathematical treatments of data – look up the definition. In that sense, statistics is part of all of science and always has been. But a great deal of discussion on this blog is about science and general (when most here can spare the time out from attacking the very worst part of mainstream psychology) and that is what I was talking about. Otherwise, I don’t see what your comments have to do with much that I wrote about – unless you’re saying that the assumptions (colloquially speaking of course – “assumptions” are more explanatory fictions) that underlie science in general don’t matter for the conduct of science.

Glen:

I was trying to make sense of your comments drawing on my background from mostly CS Peirce.

> the apex of this view is the philosophy called radical behaviorism and the science called behavior analysis

Looking into that lead me to “Skinner (1979) singled out C. S. Peirce’s pragmatism as “very close … to an operant analysis”: The method of Pierce was to consider all the effects a concept might conceivably have on practical matters. The whole of our conception of an object or event is our conception of its effects. That is very close [emphasis added], I think, to an operant analysis of the way in which we respond to stimuli. (p. 48).” http://psycnet.apa.org/journals/bar/5/1/108.pdf

This might help – thanks.

The type of claim that may be corroborated will differ according to the evidence and field. In some areas it may be limited to experimental adequacy (i.e., T solves its experimental problems adequately), in others, it may well be limited to prediction, but prediction alone is a very poor basis on which to compare theories or other scientific claims.

Lakens is very confused and just spreading his confusion around. The irony is that he want to promote better scientific practices, but he’s just regurgitating the very same ideas that took us to where we are now.

What are the “same ideas that took us to where we are now” that Lakens is regurgitating, poor confused thing.

There are several problems, his main problem is that he doesn’t understand probabilistic inference. Not long ago he was promoting equivalence testing as a solution to NHST without realizing it suffers from most of the same problems.

But we can start with the fact that he thinks frequentist hypothesis tests controls error rates and provide a basis for deciding among theories.

To see this is not true, let’s take the very same example of words and incongruent colors he gives. What would be your/his ideal way of doing science there? NHST?

Stephen:

“1) frequentism and NP require more subjectivity than they’re given credit for (assumptions, belief in perfectly known sampling distributions”

Models are idealisations, frequentist distributions are idealisations; using them doesn’t require me to “believe” in anything.

(I don’t dispute that “frequentism and NP require more subjectivity than they’re given credit for” though.)

It was bad wording, to be sure.

I was moreso responding 1) in direct reply to the blog and Lakens’ comments that frequently used the term ‘belief’, so I wanted to emphasize that if bayesian priors are ‘beliefs’, then a frequentist ‘believes’ in known sampling distributions, since these distributions are not empirically or objectively derived — You don’t actually run an experiment 1,000,000 times and obtain a sampling distribution, then say ‘my particular data were unlikely given this objective sampling distribution’ and 2) I would use the term ‘assumption’ instead of belief, but I already used ‘assumptions’, and by that term I meant the assumptions like IID normally distributed variables, rather than some larger assumption about the premise of frequentism.

For Popper, a scientific theory of wide scope has negligible probability, effectively 0 probability.

Do statistical models according to falsificationist Bayesianism also have 0 probability?

David:

I don’t think it makes sense to talk of the probability of a model. See this paper with Shalizi for much discussion of this point.

If models do not have probabilities, perhaps they have possibilities in the sense of possibility theory.

For example, the possibility of a model might be a function of its adequacy according to a model checking procedure:

Appendix B of https://goo.gl/5s7bS3

I would say yes and no, and here’s why. A preface to this answer: I don’t like the ‘find probability of model’ approach, BUT here are my thoughts.

IF you want to compute p(M[k] | D), you “can”. It’s just p(M[k] |D) = p(D | M[k])p(M[k]) / sum_k p(D|M[k])

The “No” answer: In practice, people have a few models of interest, in which case p(M[k] | D) gives you the probability of the model ASSUMING the list of models are exhaustive. In practice, this tells you that OF the models under consideration, some model is more probable than another. This can be useful, though in the end it’s just a metric of model misfit to the data, and there are simpler ways of assessing this (LOOIC, WAIC, mixtures).

The “Yes” answer: In reality, there are an infinite number of possible models, and where there are infinite values, the point probability of any one given model is 0. If you have 1,000,000 models as an exhaustive list of possible models, than surely even the posterior probability of one of those is still essentially near 0. You may still choose the ‘most probable’, but in the end, the posterior probability of one model being ‘correct’ amidst an infinite number of models is still 0. People who use the p(M[k]|D) approach seem to ignore the fact that this approach is only valid for assessing the probability within a particular subset of models; the list of models is CERTAINLY not exhaustive, but OF the list of models, some p(M[k]|D) may be fairly large, though p(M[k]|D) given all the /possible models/ is 0. I think this makes sense, anyway, but it’s also why I hate that JASP by default gives you the p(M | D) for a bunch of models, as people less versed in probability will think “the probability that I’m correct is .8”, even though that’s totally wrong — It’s the probability that you’re correct assuming all the models tested represent all the possible models, which it isn’t.

Another issue is that most people who talk about the probability of a model being true somehow think of this model in a frequentist way (modelling a data generating process) but they’d rather think about the probability of the model being true in a subjectivist or objectivist Bayesian way (modelling strength of evidence), and mixing these up in the same analysis will produce a strange bastard. (I am aware that some people have ways to justify this kind of thing but I am not convinced.)

In any case the issue is not only that there are infinitely many models; regardless of how many there are, these are all idealisations, and I doubt it makes sense to say that any one of them is “true”. (Laurie Davies observed that if you replace “true” by “being a reasonable approximation to reality”, then if any one model is true, several others are virtually indistinguishable and “true” in the same sense, and then the whole probability calculus breaks down which is based on assuming that if A is true and B is not equal to A, B can’t be true as well.)

Christian:

Yes. Also the phrase “reasonable approximation” puts a big burden on “reasonable.” To consider a familiar example, Newton’s laws are a reasonable approximation in some settings and not at all reasonable in others.

In many practical settings it can be useful to assign what might be called “pseudo-probabilities” (that is, nonnegative weights that sum to 1) to models—see, for example, this recent paper—but I don’t think it makes too much sense to think of these as probabilities as they don’t represent contingency, uncertainty, or frequency.