For example, the possibility of a model might be a function of its adequacy according to a model checking procedure:

Appendix B of https://goo.gl/5s7bS3

Yes. Also the phrase “reasonable approximation” puts a big burden on “reasonable.” To consider a familiar example, Newton’s laws are a reasonable approximation in some settings and not at all reasonable in others.

In many practical settings it can be useful to assign what might be called “pseudo-probabilities” (that is, nonnegative weights that sum to 1) to models—see, for example, this recent paper—but I don’t think it makes too much sense to think of these as probabilities as they don’t represent contingency, uncertainty, or frequency.

]]>In any case the issue is not only that there are infinitely many models; regardless of how many there are, these are all idealisations, and I doubt it makes sense to say that any one of them is “true”. (Laurie Davies observed that if you replace “true” by “being a reasonable approximation to reality”, then if any one model is true, several others are virtually indistinguishable and “true” in the same sense, and then the whole probability calculus breaks down which is based on assuming that if A is true and B is not equal to A, B can’t be true as well.)

]]>IF you want to compute p(M[k] | D), you “can”. It’s just p(M[k] |D) = p(D | M[k])p(M[k]) / sum_k p(D|M[k])

The “No” answer: In practice, people have a few models of interest, in which case p(M[k] | D) gives you the probability of the model ASSUMING the list of models are exhaustive. In practice, this tells you that OF the models under consideration, some model is more probable than another. This can be useful, though in the end it’s just a metric of model misfit to the data, and there are simpler ways of assessing this (LOOIC, WAIC, mixtures).

The “Yes” answer: In reality, there are an infinite number of possible models, and where there are infinite values, the point probability of any one given model is 0. If you have 1,000,000 models as an exhaustive list of possible models, than surely even the posterior probability of one of those is still essentially near 0. You may still choose the ‘most probable’, but in the end, the posterior probability of one model being ‘correct’ amidst an infinite number of models is still 0. People who use the p(M[k]|D) approach seem to ignore the fact that this approach is only valid for assessing the probability within a particular subset of models; the list of models is CERTAINLY not exhaustive, but OF the list of models, some p(M[k]|D) may be fairly large, though p(M[k]|D) given all the /possible models/ is 0. I think this makes sense, anyway, but it’s also why I hate that JASP by default gives you the p(M | D) for a bunch of models, as people less versed in probability will think “the probability that I’m correct is .8”, even though that’s totally wrong — It’s the probability that you’re correct assuming all the models tested represent all the possible models, which it isn’t.

]]>I don’t think it makes sense to talk of the probability of a model. See this paper with Shalizi for much discussion of this point.

]]>Do statistical models according to falsificationist Bayesianism also have 0 probability?

]]>I was moreso responding 1) in direct reply to the blog and Lakens’ comments that frequently used the term ‘belief’, so I wanted to emphasize that if bayesian priors are ‘beliefs’, then a frequentist ‘believes’ in known sampling distributions, since these distributions are not empirically or objectively derived — You don’t actually run an experiment 1,000,000 times and obtain a sampling distribution, then say ‘my particular data were unlikely given this objective sampling distribution’ and 2) I would use the term ‘assumption’ instead of belief, but I already used ‘assumptions’, and by that term I meant the assumptions like IID normally distributed variables, rather than some larger assumption about the premise of frequentism. ]]>

“1) frequentism and NP require more subjectivity than they’re given credit for (assumptions, belief in perfectly known sampling distributions”

Models are idealisations, frequentist distributions are idealisations; using them doesn’t require me to “believe” in anything.

(I don’t dispute that “frequentism and NP require more subjectivity than they’re given credit for” though.) ]]>

I was trying to make sense of your comments drawing on my background from mostly CS Peirce.

> the apex of this view is the philosophy called radical behaviorism and the science called behavior analysis

Looking into that lead me to “Skinner (1979) singled out C. S. Peirce’s pragmatism as “very close … to an operant analysis”: The method of Pierce was to consider all the effects a concept might conceivably have on practical matters. The whole of our conception of an object or event is our conception of its effects. That is very close [emphasis added], I think, to an operant analysis of the way in which we respond to stimuli. (p. 48).” http://psycnet.apa.org/journals/bar/5/1/108.pdf

This might help – thanks.

]]>KO: Correct[…]

GS (new): Right…problem is, I *was* writing about the use of terms like “belief,” “reality” etc. The reason being is that if you take a contextualistic view of science, you talk about prediction and control and you are inevitably asked “Prediction and control of what?” Oh! The world? But if there’s a world, then things you say may or may not correspond to it, i.e., some things are “true” and some not. And so forth…

It was Noah that brought up the beliefs of scientists about truth. I just briefly addressed the fact that such utterances can (and must) be subject to conceptual analysis and that such a conceptual analysis could proceed via a contextualistic view. I saw Noah as saying that there was a sort of paradox – you can talk all you want about prediction and control substituting for truth criteria but, c’mon, you still believe that you are progressively approaching truth. The point is that there is no paradox because evidence of truth, even among laypeople, is prediction and control. IOW, emission of the word “truth” is *occasioned by* the successful prediction and control of the world. A similar condition prevails WRT the emission of terms like “the world” or “the real world.”

But the important aspect of what I had to say concerns the implications of adopting a contextualistic view of science, rather than the standard reductionism/mechanism that is often treated as the only reasonable view. The tension between contextualistic views and mechanism exists, today and has existed from a point in time that Prigogine puts at Fourier’s treatment of heat transfer in metals. The epitome of mechanism is reflected in (I think) Rutherford’s famous, “all science is physics or it is stamp-collecting.” Lakeland expresses this view – unless it’s quantum mechanics, it can’t be “truth.”

KO: – as later W. prefaced that comment “For a large class of cases – though not for all …”

GS: Funny that those who don’t like how W. “rubs out” explanatory fictions always point to this line…but time after time in Philosophical Investigations W. does just that. And, to a great extent, the apex of this view is the philosophy called radical behaviorism and the science called behavior analysis.

KO: Again my interest is almost exclusively in applying statistics more thoughtfully.

GS: Mine too, I guess, especially since “statistics” pretty much has to do with everything from graphs to complex mathematical treatments of data – look up the definition. In that sense, statistics is part of all of science and always has been. But a great deal of discussion on this blog is about science and general (when most here can spare the time out from attacking the very worst part of mainstream psychology) and that is what I was talking about. Otherwise, I don’t see what your comments have to do with much that I wrote about – unless you’re saying that the assumptions (colloquially speaking of course – “assumptions” are more explanatory fictions) that underlie science in general don’t matter for the conduct of science.

]]>KO: Correct – as later W. prefaced that comment “For a large class of cases – though not for all …”

Again my interest is almost exclusively in applying statistics more thoughtfully.

But we can start with the fact that he thinks frequentist hypothesis tests controls error rates and provide a basis for deciding among theories.

To see this is not true, let’s take the very same example of words and incongruent colors he gives. What would be your/his ideal way of doing science there? NHST?

]]>KO: Certainly not in any way adequately assessed correspondence to ‘truth’ but rather a desperate hope that there is some correspondence or a _regulative_ hope that continuing inquiry if persisted in adequately and inexhaustibly would result in correspondence to ‘truth’.

GS (new): Not sure I follow the above sentence. But…it seems to me that you are trying to turn a conceptual question into an empirical one. That’s done a lot. No amount of experimental work can corroborate or cast a shadow on a conceptual question, for they are, you know, neither questions of fact (begging the issue) nor theory.

KO: Otherwise we give up on better “predict[ing] and control[ing] the One World (whatever that may be!)” or science as something worth while doing.

GS (new): That seems an outlandish statement that is neither an alleged statement of fact nor a conceptual or theoretical statement.

KO: That is we don’t just want to discern and report what did happen in an experiment,[…]

GS (new): Geez. Now you’re just makin’ stuff up. How does anything I said suggest that that the activities of a scientist would be limited to “discern[ing] and report[ing]”? However, I would say, that, speaking a bit technically, verbal behavior that is under stimulus control of features of the world (and, no, that doesn’t concede the so-called “real-world” if that is supposed to mean “knowable independent of data”) is the very backbone of science. Indeed, science grows out of the persistence of our cultures in establishing such verbal behavior, which I would call “tacts” (cf, Skinner) – verbal responses under stimulus control of features of the world* and relatively free of conditions of deprivation and aversive stimulation…i.e., “motivational operations”.

*Explicitly mentioning stimulus control shows the difference between the “one world” and the so-called “real world,” if I may be allowed to adopt that terminology. The “one world” is about what can potentially control behavior. The one-world is the antithesis of an independent (you know…since it is about stimulus control of behavior) world as well as the wishy-washy “internal copies” (i.e., representations, computations etc.) of indirect realism (of which you are almost certainly a member)

KO: […]but rather what would continue to happen in x% of case in future experiments that presupposes the “One World” we have no direct access to.

GS (new): Wow! You must do magic experiments if you can assess the reliability and generality of data with a single experiment. Now, I would argue that, with SSDs, the case for some amount of reliability can be assessed with one experiment as such experiments generally contain multiple within- and between-subject replications. This is no doubt why some fields do not suffer from much failure to replicate. Such sciences generally start very simple and move up the complexity ladder in fits and starts (I’m talking about experimental sciences here). Anyway, the reliability and generality (obviously generality) of a single data set can only be assessed by direct and systematic replication (but, again, the caveat that with SSDs and reversible phenomena, each experiment constitutes several but, alas, all within one lab).

GS (new): The particular contextualistic position that does that is the Wittgenstienian-like (later W.) radical behaviorism and its scientific expression, behavior analysis.

KO: ??? – not that helpful to me or likely other statisticians.

GS (new): Yes…well…ahem…you have taken the above statement by me out of context and your response seems to be about some other, perhaps fantasied, context. I was talking about how utterances concerning “”belief,” “truth,” “reality” etc. can be understood from within a contextualistic framework.

KO: The challenge is we need a thoughtful meta-statistics that inspires, informs and encourages thoughtful applications of statistics that enable better habits of inquiry in various scientific fields.

GS (new): Are you running for political office or something?

KO: Not sure I would be convinced that we can get that from the “later W.”

GS (new): And I’m pretty sure that you are not talking about the context of verbal behavior as *meaning* – you know, the one thing for which later W. is most widely known? You know, “The meaning of a word is its use”?

]]>KO: Certainly not in any way adequately assessed correspondence to ‘truth’ but rather a desperate hope that there is some correspondence or a _regulative_ hope that continuing inquiry if persisted in adequately and inexhaustibly would result in correspondence to ‘truth’.

Otherwise we give up on better “predict[ing] and control[ing] the One World (whatever that may be!)” or science as something worth while doing. That is we don’t just want to discern and report what did happen in an experiment, but rather what would continue to happen in x% of case in future experiments that presupposes the “One World” we have no direct access to.

GS: The particular contextualistic position that does that is the Wittgenstienian-like (later W.) radical behaviorism and its scientific expression, behavior analysis.

KO: ??? – not that helpful to me or likely other statisticians.

The challenge is we need a thoughtful meta-statistics that inspires, informs and encourages thoughtful applications of statistics that enable better habits of inquiry in various scientific fields.

Not sure I would be convinced that we can get that from the “later W.”

]]>GS: Well, interestingly, the sort of pragmatism (or “contextualism”) that I advocate doesn’t preclude truth – it’s just that you could never know you had it in hand. Truth would mean that the verbiage in question was the most effective possible in predicting and controlling Nature. Now, that’s pretty much what you seem to be saying, though. As to *belief*…I’m not exactly sure what you are saying…I think you may be trying to imply more than I’m able to say. But, basically, the terms here (“belief” and “reality”) can be understood (like the term “understood”!) by sticking to the contextualist position. The particular contextualistic position that does that is the Wittgenstienian-like (later W.) radical behaviorism and its scientific expression, behavior analysis. Anyway…consider the last part of your last sentence: “I believe that there are facts of the matter.” The question is “How do you predict and control whether someone says ‘belief’ or ‘believes’ etc. or ‘this is a fact?’” The things you look at to do that are the independent-variables of which the speaker’s verbal behavior is a function. In that sense, there is nothing paradoxical about “…believ[ing] that there are facts of the matter” and “believing that science is a matter of prediction and control, not correspondence to ‘truth’” (and, I guess, it seems to me that that is sort of what you were implying). Talk of beliefs and facts is intimately tied to the circumstances under which we predict and control the One World (whatever that may be!) and that is what we are “talking about” when we talk about “truth.” I’m sure that’s clear as mud…

]]>I think this is a very good point, worth careful, thorough consideration. At least some pragmatist philosophers of science take more or less exactly this approach, arguing that theories are evaluated with respect to, e.g., explanatory scope, how well they allow us to intervene and control nature, whether they make surprising predictions, how consistent they are with theories in neighboring fields, etc… Personally, I find this approach compelling.

I think it’s also interesting to think about belief and reality in this kind of pragmatist framework, since even if we don’t (or can’t) evaluate theories based on their verisimilitude, individual scientists (and non-scientists) still do or do not believe particular theories. And even if I don’t think we can ever know if any given theory is true, I believe that there are facts of the matter.

]]>GS: Why not drop all pretense of scientific “truth”? This entails (it seems to me – I’m not an academic philosopher…not that I’m saying that most philosophers are worth much but they sure do have to read a lot to get that Ph.D.) dropping all talk about “reality” and science as approximation of a “true reality.” Truth criteria are, essentially, replaced with prediction and control as values and the essence of science is, thus, obtaining facts that are reliable and general. This doesn’t mean that there is no theory, but often it means that there need not be any theory about “events taking place elsewhere and measured, if at all, in different dimensions [than the phenomenon in question].

]]>http://marginalrevolution.com/marginalrevolution/2017/06/true-thomas-bayes.html#comments

It’s interesting that Bayes, a Presbyterian minister, was likely a heretic by traditional Christian notions (an Arian).

It’s also interesting that in the Bellhouse paper linked to

http://www2.isye.gatech.edu/~brani/isyebayes/bank/bayesbiog.pdf

we see, in section 5, the importance of various views of the Trinity in the religious politics of that time — interesting since one’s view of the Trinity has little influence on one’s actual behavior, and exists only in a theological ether of apologia.

In the context of the habits of practice currently instilled in a research community – e.g. the preference for under powered studies and near complete inattention to intervals that cross zero (or at least by much) results in the uniform coverage property of confidence intervals, a prior taken as very good, being actually horrible. ]]>

“3) Popper probably wouldn’t like the NHST ritual, given that we use p-values to support hypotheses, not to refute an accepted hypothesis [the nil-hypothesis of 0 is not an accepted hypothesis in most cases]”

Popper was well aware of Fisherian testing and appealed to it as the basis for his account of statistical falsification. Popper was weak when it came to statistics (he told me this). That is, he was unable to cope with the obvious need for statistical falsification without Fisher. What is anathema to Popper is committing the age-old fallacy of inferring a theory T based on just a statistical effect, even if genuine. That’s why Lakatos, having heard the NHST horror stories from Meehl, wondered if the social sciences were just so much pseudo-intellectual garbage (or some words very close to those). On the other hand, Lakatos saw in Neyman-Pearson statistics the embodiment of Popperian methodological falsification (i.e., the probability is assigned to the method).

“Refuting falsifiable hypotheses can be done in Bayes, which is largely what Popper cared about anyway.”

I don’t see that falsification is accomplished or desired in Bayesian probabilism (updating or Bayes factors) but any account can be made to refute falsifiable hypotheses by adding a falsification rule, and all accounts that falsify have falsification rules. The hard part is showing it’s a good falsification rule–merely spotting anomalies and misfits aren’t enough. The method needs good capability of falsifying claims just when they’re (specifiably) false, and of corroborating claims just when they fail to be falsified by a stringent test.

Lakens on Popperian verisimilitude: This was a degenerating problem that has been rejected by Popperians.

Popper’s goal, to define closeness to the (whole) truth, was a failure as were various other distances from the whole truth that have been tried. It was intended to help explain why false theories are partially successful. (The hypothetical-deductivist corroborates the whole of the theory–they are, after all, deductivists). Dyed-in-the-wool Popperians like Musgrave* have abandoned it in favor of locating aspects of theories that are true. How to do it (i.e., figure out which aspects of theories to hold true, was/is a central problem in philo of scie, and solving it was one of the central goals of (my) severity project. The idea is that the portions (or variants) of a theory that are true are those that have passed severe tests. Moreover, I argue, those parts remain through theory change, even through redefinitions, changed entities and all manner of underdetermination.

Thanks again for the swift response!

I think the ‘subjective’ nomenclature should go away, much like you suggest in your linked papers. Even ‘prior’ can be misleading in modern bayes. I tend to describe the prior not as subjective, or even necessarily based in empirical observations, but as either soft, probabilistic constraints, or as encoding information into the model. That information can be very objective (based on previous estimates), or it may be to identify a model, or for several other things.

I find it really intriguing that Popper and Fisher concepts are so readily abused in the modern day; honestly, it seems as though if one is Fisherian and Popperian, so to speak, that one’s null hypothesis /should/ be the hypothesis to be tested. That is, rather than ‘nil’ null hypotheses, it seems like the null should represent something substantive that one is actively attempting to disprove from a theoretical standpoint. Noone seems to do this [contemporaries, anyway]. Moreover, I was especially intrigued by these two things:

1) NP-inspired model comparisons pit one hypothesis against another. It seems like if one is planning a study with an ‘alternative’ fixed point hypothesis in mind, then one should actually pit the ‘null’ against the a-priori alternative; that is, if one’s null in theta = 0 and one’s alternative is theta = .4 (when conducting an a-priori power analysis, for instance), then one’s model comparison should be p(data | theta = 0) vs p(data | theta = .4); it seems weird to me, now, from a NP standpoint to plan a study with an a-priori point hypothesis of theta=.4, obtain an estimate, then do a test with p(data | theta = 0) and p(data | theta = estimate). I’m not sure what your thoughts are on this, but from a purist NP perspective, the former seems to make more logical sense than the latter.

2) NP-inspired hypotheses don’t often take point estimates, but ranges; e.g., theta = 0 vs theta > 0. In this case, it seems like one shouldn’t, again, compare a point estimate to another point estimate, but a range of point estimates to one point estimate; or a range of estimates to another range of estimates, each range dictated by a substantive hypothesis.

These two things came to mind, because Popper doesn’t often discuss /statistical/ hypotheses or null hypotheses, but rather the falsification of substantive hypotheses. IMO, people ignore this distinction in the NHST framework. Substantive: “I think A predicts B” Statistical: “r = .4; r \in (.1, .5]; r != 0). Popper rarely talked about the latter. This is important, because if Popper were talking solely about statistical hypotheses, then people perhaps /should/ use null hypotheses (where the null is actually the hypothesis of interest to be subjected to a severe test, if possible). On the other hand, if Popper were talking about rejecting substantive hypotheses (which, it seems like he was, fairly consistently; his example hypotheses and predictions seemingly never included some point value or range of values, but logical methods of ruling out statements), then he cares much more about gaining support against or for hypotheses when one actively seeks data that can rule out hypotheses. In this latter case, exactly /how/ one gains evidence for or against a hypothesis doesn’t seem to matter. Whether one uses BFs, p-values, CIs, PPCs, model comparisons, whatever, the goal is to assess the extent to which a substantive hypothesis mismatches the data, and the substantive hypothesis may be expressed as statistical hypotheses which the data either match or mismatch. It’s for this reason that I completely fail to see how Popper would be against any modern Bayesian method. Whether one is fisherian, NP, likelihoodist, bayesian, non-parametric, if one can map a falsifiable substantive hypothesis to a quantity or range of quantities, and one’s hypothesis-inspired model fails to match that quantity or range of quantities, then surely one could say that the hypothesis is unsupported and requires revision.

I’ve also recently given a lot of thought to the notion of Occam’s razor and /statistical/ models. It seems to me like the important part of theoretical parsimony is to explain as much as you can with as few conditions as possible. I don’t think this necessarily applies to statistical models, or at least some components. E.g., I fear that many would fear the idea of hierarchically modeling variances, or permitting heteroskedacity because the statistical model is not as parsimonious (more parameters); and yet, one’s theory doesn’t actually dictate, usually, that variances must match some condition. With that mindset, it seems to me that permitting more parameters that are irrelevant to one’s theory is not a breach of parsimony, but rather decreases the number of conditions one is asserting. That is, one camp would see more parameters [irrelevant to theory] and fear a lack of parsimony; another camp would see more parameters [irrelevant to theory] and assert that they are making fewer assumptions about the DGP, and thus are being more parsimonious in assumptions. I’d love to hear your input about this: what is important about parsimony in statistical models… fewer parameters = more parsimonious (in an occam’s razor sense) or more parameters (fewer assertions/constraints about the DGP) = more parsimonious (in an occam’s razor sense).

]]>Daniel Lakens: Let me first apologize for my previous comments, if they were a bit snarky ;-) I am now on a computer, so will respond more thoroughly and thoughtfully. I have been talking a lot about assumptions (here, twitter, and at my University). What does this mean ? Well, of course, we have the assumptions of normality, homogenous variances, etc.. We also have the assumption of a null sampling distribution, repeated sampling, ect.. Added to this, as Stephen Martin pointed out, an exact value (d_truth) is often assumed for the hypothesized effect. Now one might say the assumed value for d_truth is a belief, but I would not. Rather, it is an estimate informed by prior information. Conceptually, this is incorporating “prior” information into calculating a long run expectation (e.g., power).

That said, I am very unclear how, for a given hypothesized effect (d_truth), you think we can know exact error probabilities? This would require many things to be true, for which we cannot ever know. By definition, the frequentist assumption is a fixed unknown value. Consider we had an unbiased meta-analytic estimate, d = 0.3, 95-% CI[0.10, 0.40]. We may reason that d = 0.3 is our best estimate of the true effect, but, in fact, the interval contains values we would NOT reject at the alpha = 0.05 level. Accordingly, power and thus type II for d = 0.30 cannot be exactly correct, even assuming all other assumptions are reasonable. In a Bayes context, we may reason d_truth does precisely exist, but we could compute “power” for some decision rule across the entire posterior distribution. Of course, this is also not exactly correct, but does seem more reasonable as it allows for a measure of uncertainty around detecting d_truth.

About falsification. Like many that use Bayes, we probably all learn freqeuntist methods first. In pursuing Bayes, the training–often self-directed–comes with a heavy dose of the history and philosophy of science and especially statistics. You appeal to Popper, which is great. However, to my knowledge, Popper is best known for falsification, but not for falsifying the null hypothesis (please let me know if this is wrong). In Bayesian books, it often comes up that NHST probably runs contrary to what Popper had in mind. That is, the original intent was to falsify our hypothesis, not the null hypothesis. Along Stephen Martin, Richard McElreath makes this point nicely in the Rethinking book.

Moreover, I would argue that the way Bayes is often used (at least by those who follow Gelman et als approach) in practice does fit into Popper’s notion of falsification. If we posit our model is our hypothesis for the data generating process, then the logical step is to attempt to falsify our model. This is done with posterior predictive checks, and is part of the standard work flow for Bayesian modeling (see Gelman’s book). That is, for many Bayesians, this is done for every model. To my knowledge, this kind of stuff is not commonly done in frequentist frameworks, although I am sure something similar could be achieved.

Finally, lets us move to some slightly more complex models. That is, lets not think of comparing two groups, but of a multi-level model, in which slopes and intercepts can vary among individuals. In addition to the error variance (e_var), we have the “random” intercept (u_0), and “random” slope (u_1) variance to consider. Here, it has been noted by many that computing power (and thus type II) is very difficult, because we have to assume values for e_var, u_0, and u_1. Of course, we can and I do, but I also appreciate these values cannot be considered exactly correct at best, and might be very misleading at worse. One solution might be to define d_truth not by a point estimate, but as a random variable in which our uncertainty is expressed as a probability distribution.

I thought your post was very intriguing. However, I use Bayes often (maybe daily) and I do not recognize the philosophy behind how you describe Bayes and how I–or everyone else that I work with–use Bayes in practice. There is a big disconnect here, and I think it might be worth while for you to check out Gelman’s and/or McElreath’s book to gain a better understanding of modern Bayesian data analysis.

]]>