Abduction is something that has been overlooked by many in science, philosophy and statistics until fairly recently.

]]>> we formalize the next moves in model formulation space?

CS Peirce did claim (~ 1905) that abduction should be a logic (normative providing oughts) but only of the vaguest sense.

Not aware if anyone has gotten any further?

]]>ojm: I’m not up on constructivist intuitionist mathematics, other than to know that it exists, and it denies the law of excluded middle. So here I have to pause with all of that, and potentially come back to it when I have sufficient background.

However, what I will say, and now I really want a book on model theory or something… I’m not sure what it is that is the relevant field… is that I can imagine a logic in which we assume statements like “the drag coefficient is 3.31 at time t=0” are decidable only externally to Bayes (at a scientific level, not a formal level), and the meaning of a statement like that is “under whatever restrictions on our view of the world that lead us to accept the Navier Stokes equations as realistic, including any granularification / round-off / scaling / homogenization or other mathematical modeling assumptions, we can’t tell the difference in the context of that model between whatever the best most exact description of the appropriate drag coefficient value is and the value 3.31 “

This plays nicely into IST, in which we assume there is some “infinitesimal” scale beyond which we accept that dragcoeff ~= a means standard_part(dragcoeff) = standard_part(a). Of course in the mathematics, we idealize this and talk about 1/N being infinitesimal only if N is nonstandard… but it’s an idealization of the basic concept that in science, beyond some level of precision, differences just don’t matter.

Then, in a model that has something like

p(Data | model1) p(model1) + p(Data | model2)p(model2)

what we really mean is “probability that Data is “essentially equal” to its observed value given model1 is “essentially equal” to the correct model * probability that model1 is essentially equal to the correct model”

When we do this with a finite set of models, we then demand that the observer for the moment is willing to accept that they can resolve the question of “is essentially the right model” and “is essentially the value Data” at an external level.

Model checking then can be seen as verification of the notion of “essential equality” implicit in the Bayesian construct. If none of the samples of the parameters in the high probability region produce a model which we are willing to call “essentially correct” then we can reject the particular finite selection of models. In essence the logic of Bayes becomes:

if (Models are scientifically sufficient) then (parameters are probably the ones in the high probability region of the posterior).

It becomes then our external observer logic requirement to evaluate whether (Models are scientifically sufficient) is a true fact in observer logic.

Bayes then becomes a restricted component of observer logic which operates on assumptions about the decidability of “TRUE” at the external level.

at least that’s the sketch of what seems intuitive to me.

]]>(Or to empirical Bayes)

]]>It also seems entirely possible that constructive Bayes reduces to some form of likelihoodism (in the general sense, not the Bayes minus a prior sense).

]]>It’s probably possible to see ‘falsificationist bayes’ as ‘constructive bayes’ where ‘constructive procedures’ are perhaps akin to generative models.

A key issue, as Andrew acknowledges, is clarifying exactly which things can and can’t be assigned probability statements/distributions. It’s clear that not everything that classical Bayes assigns standard probability statements to would have such statements in constructive Bayes, but it’s not clear where the line is drawn and how that affects practice.

I don’t think it’s sufficient to say ‘within the model, Bayes, without non-Bayes’ though, especially when dealing with hierarchical structures. So what would be sufficient then? Again, there seems to me to be some sort of connection to the concept of identifiability. In this case it would be nice to have some (supplementary) method of identifiability analysis.

]]>Here is an attempt to develop a constructive/intuitionistic Bayes/probability theory:

http://brian.weatherson.org/conprob.pdf

What I am not sure of is the upshot – it seems as if we can only then apply probability to observable statements, which would severely restrict Bayes as a universal logic of uncertainty/inference.

]]>Quick comment.

Kosko seems to say that the law of non-contradiction and the law of the excluded middle are equivalent and fuzzy logic requires violation of both.

But constructive logic satisfies the law of non-contradiction while denying the general validity of the excluded middle.

]]>Carlos, I see we crossed paths and both wound up at kosko.

What I’m thinking of is an alternate model of the cox proof not an alternate proof. Simply anonymize the meaning of elements of say Van Horns treatment. Instead of propositions say glarbs instead of known to be true say known to be frob etc. Now formally the structure is the same. Can we plug back meaning in such a way that in observer logic we still buy the statements but we also buy the new meaning of the statement p(model1) even though in observer logic we all agree that model1 is something like continuum mechanics which is known not to be true on some fundamental level.

]]>Kosko shows in this paper http://sipi.usc.edu/~kosko/Fuzziness_Vs_Probability.pdf that Probability theory is a special case of a fuzzy subset notion. He gets into a diatribe a bit about Lindley and Cox and soforth, but in the context of our discussion here, the important point is that the functional relations used in Cox’s derivation also apply to an alternative interpretation. In other words the degree to which some set of points S is a subset of “the points of type A” is more or less the fraction of points in S that are type A.

In this context, I’ve mentioned how we can interpret single term likelihoods as boolean questions about the “truth” of some error which can be model or measurement error, so I think the important question is what is meant by something like a likelihood p(Data | q1, Model1) p(Model1) + p(Data | q2, model2) P(Model2) when both Model1 and Model2 are known to be false at some outer-Godelian-onion layer (ie. I know continuum mechanics is wrong because I know molecules exist)

In the context of probability theory, where p(Model[i]) is a simplex vector which has a prior, this is a forced choice among alternatives, it is in essence: “assuming one of these is right, which one?”

This can be seen as a restricted “shootout” between model+measurement errors. If one model definitely dominates another in terms of giving higher probability to the actual results observed in data, the observed result of calculations is that this model will dominate and in the limit of large data have posterior probability 1 within this overall model.

I think the mistake is in interpreting something like P(Model1) as a probability “that model 1 is the true science”. Why does Model 1 wind up with all the posterior probability? Because, among the various models considered, Model1 assigns high probability to the actually observed errors. Model 1 makes the *best* assessment of its own limitations.

Even though, in each model, there is a “true” measurement + model error, if one of them consistently assigns true errors to have higher probability, it will dominate the posterior. There’s a kind of asymptotic selection for most accuracy. I think the role of this selection for most accuracy isn’t based on “scientific truth” of the model but rather something else. It being 4am here, I’ll have to leave this for now to come back to it later, but my intuition is that we may be able to construct boolean statements, one of which is true, about which the p(Model1) are everyday Cox probabilities. Doing so would help us figure out what fact is being inferred, and / or what are the “external meanings” of non-identifiability.

]]>> But alter “known to be false given the information in X” and “known to be true given the information in X” to “known to have no membership in the set Q” or “known to have full membership in the set Q” and you could get a restricted fuzzy logic out of 14 it seems.

Starting from different assumptions and using a different derivation you may be able to prove a different theorem, we agree.

> I am not going to buy into the requirement that Cox requires an underlying boolean truth value

What do you mean by “Cox”? The 1946 paper? The 1961 book? The rigorous variants of his theorem? All of those are built on Boolean propositions. Of course there could potentially exist an alternative Cox reaching alternative conclusions. But why should it be the case?

It’s not like people has not tried to find more general results. For example, Kosko (1990) concludes that fuzzy theory is an extension of probability theory. http://sipi.usc.edu/~kosko/Fuzziness_Vs_Probability.pdf

]]>I acknowledge that probability isn’t formally forced on you. It’s only formally forced on you once you accept Cox’s axioms. But I am not going to buy into the requirement that Cox requires an underlying boolean truth value incompatible with “all models are wrong some are useful” without a lot of careful examination of that assertion. Since probability *is* compatible with *frequency in countably infinite sequences* it isn’t the case that there is only one interpretation of the meaning of probability theory, and that it’s degree of plausibility of propositions with definite boolean truth values. Yes, Cox’s program was to show that it was *compatible* with that interpretation, but his program didn’t show that it’s incompatible with other interpretations.

Still, I’m very happy to have found a more precise formulation of at least one of your objections ojm, so now I have a thing to do some research on.

]]>> The thing that concerns me about using “truth 0/1” as the basis of interpretation of probability theory, is the “all models are wrong, some are useful” adage

This is sort of the point – probability theory _is_ fundamentally tied to Boolean propositions, and so is an awkward fit with ‘all models are wrong’. This is the exact argument many people have made against using probability as a ‘logic of science’. One of many, as Christian has noted, is Laurie who points out that replacing ‘true’ with ‘adequate’ breaks the whole _formal_ logic of Bayes.

You want to have your cake – probability for uncertainty forced via eg the Cox derivation – and eat it too – all models are wrong. You can use probability and claim all models are wrong but you can’t say that using probability/Bayes is formally forced via eg Cox.

]]>The thing that concerns me about using “truth 0/1” as the basis of interpretation of probability theory, is the “all models are wrong, some are useful” adage. I know for sure that my model of a bearing ball being dropped through a thick syrup is false, because my model involves the Navier Stokes equations of a continuum, whereas the syrup is a bunch of molecules.

So long as I say

(Pred(t) – Measured(t)) ~ normal(0,s)

Expresses the probability that the true modeling and measurement error takes on some value… we can get away with just talking about say true values of this error, and we could wave our hands about the problem of how do you measure the location of a bearing ball… does it even have a location at the fundamental level?…

but as soon as we want to compare say two models of the drag coefficient as a function of reynolds number… and we do it

p(Data | parameters, Model1) p(Model1) + p(Data|parameters,Model2) p(Model2)

now what is our interpretation of p(Model1), p(Model2) given that we know “Model1 is a literally true expression of the actual drag coefficient function” is exactly zero, since at the fundamental level, there *is no* drag coefficient, only a lot of molecules colliding with each other…

So, it may be possible to restrict ourselves to propositions about the goodness of approximations or something, but it’s a technicality I don’t think people usually look too deeply into. There be dragons, and we’re busy, we’ve got to deliver some statistical report to some client, or publish some paper, or get some grant. Maybe next year, on sabbatical we can discuss what the heck does it mean to evaluate the boolean truth of a model we know is false to begin with.

]]>On theorem 14:

Sure, he’s looking for an algebra of plausible knowledge about facts, so he states Theorem 14 in that language… But alter “known to be false given the information in X” and “known to be true given the information in X” to “known to have no membership in the set Q” or “known to have full membership in the set Q” and you could get a restricted fuzzy logic out of 14 it seems. Or am I being obtuse?

At this point anything I say should be considered as speculative, because I haven’t looked deep into the possibility that there are yet other alternative interpretations of the meaning of probability algebra. We know one meaning is “frequency of occurrence within countably infinite sequences of numbers”, and another meaning is “plausibility that a statement in a formal language turns out to be true, under a restricted set of information”, it’s not clear to me that we can’t get the same exact algebra from “degree of membership in a set, where the total membership across all sets is constrained to be equal to 1”, or “degree of accordance of a theoretical prediction with the measured value given a theoretical accordance function (think measurement error)”

Just Frequentist vs Orthodox Cox/Jaynes Bayesianism shows that there isn’t *one true interpretation of the formal structure*. How else / widely can it be interpreted? I don’t know.

]]>How does the enunciation of Theorem 14 (let alone the proof) make any sense without the previous definitions of Boolean propositions and operators?

]]>To me, the real advantages of using Probability in a Bayesian context is that we have a uniqueness proof in Cox’s theorem (provided you agree with the assumed properties), and Kolmogorov exhibited a constructive model, and so via model theory we have a proof of consistency. A consistent unique algebra of quantities representing degrees of *something* which has as a limiting case both “definite truth/boolean logic” and “definite membership in a set” (restricted subset of fuzzy logic) and possibly “accordance with a theory” interpretation, gives you assurance that you’re not going to calculate contradictory information after you plug in whatever your input information is: each set of input objects results in a well defined set of output objects. How you interpret *the meaning* what the output of your Stan code tells you, is in some sense up to you, and depends on how you interpreted the meaning of the input values you constructed, in the same way that if you calculate the number 4 as the radius of a crushed up ball of paper… you have to interpret that somehow because it’s not a perfect sphere, so what does 4 refer to? At least, if you calculated the number 4 using the algebra of the real numbers, you are pretty sure the rules of addition, multiplication division etc are consistent and someone else could if they accepted the logic of your input quantities, would calculate the same number 4 as you would.

]]>Carlos I see where he says that but it’s much less clear to me that there is anywhere he relies on this interpretation for the proof of any of the mathematical theorems. I would have to read it very carefully to be sure.

In fact I remember from the 90s when fuzzy logic was a big deal, that there was a proof that it was equivalent to probability Theory under some restrictions mainly related to a sum to one type restriction.

I should look for that and see if I can find a citation.

In the end I think the interpretation of meaning has to be different from the formal proof of certain properties of the formal system. If there are multiple interpretations of the meaning which all satisfy the same formal rules, then in some sense it’s up to the user to decide what they mean.

]]>Daniel, a few quotes appearing the first three pages of the paper by Van Horn that you cited:

“We stress that we are concerned with degrees of plausibility, as opposed to degrees of truth. Fuzzy logic (with the exception of possibility theory) and various other multivalued logics deal with the latter, and hence have aims distinct from ours.”

“Our logic shall be restricted to statements such as P, which are either true or false, although we may not know which.”

“A proposition is an unambiguous statement that is either true or false.”

]]>Interesting question I now have: can we justify probability as a measure of accordance, not of plausibility. That is, remember we know essentially all scientific theories are completely false. Newtonian physics for example, doesn’t respect relativity. But, Newtonian physics predictions accord extremely well with reality provided the relative velocity between two objects is much less than the speed of light, and the mass of the objects is much more than a Hydrogen atom.

Now, if there is an underlying quantity that only exists in some kind of imprecise aggregate sense, like say “the location of a baseball” then it can’t be said that the failure of the underlying quantity to actually take on the predicted value at given time t is a refutation of the theory. Only some kind of “large deviation” from the predicted quantity refutes the theory. Furthermore, refutation should be continuous. It’s not like things are just fine for some range, and then when the baseball moves an additional atomic diameter… it all goes to hell. From this perspective the likelihood could be viewed as a measure of accordance… the smaller the likelihood is, the less we believe that the prediction accords with the theory.

we combine this with prior information on parameters to get a measure of “accordance with theory and data”.

just ignore the plausibility/boolean algebra/formal language stuff for the moment.

In an algebra of accordance, the finiteness (hence, normalizability) of the accordance measure seems obviously desirable, the functional inverse relationship between accordance and not accordance seems desirable. The factorizing conditionally seems desirable: accordance of the data with the prediction obviously depends on the prediction… and hence on the inputs to the prediction: parameters.

Then we don’t need a “true” underlying theory in any way. Bayes becomes simply a measure of which portions of theory space (parameters) accord most well with data and the theoretical precision (the likelihood).

]]>My impression is that Cox’s original treatment was imperfect and that Van Horn covers a more modern minimalist version. I don’t see a requirement in Van horns exposition that there be an underlying Boolean algebra, only that if there is one the plausibility algebra respects it.

If you can exhibit a uniqueness proof for an uncertainty measure based on generalizing Heyting algebra and a model for it… Hey I’m willing to give it a very deep look. Even better if it comes with a powerful computational inference engine….

]]>Suppose you treat propositions as undefined terms. They are at least required, in Cox’s treatment, to satisfy a Boolean algebra right? That’s what I remember from last time I looked at Cox’s book.

So in the geometry analogy, I see you as asserting the parallel line postulate and claiming all geometry is Euclidean.

I’m saying if you instead use eg a Heyting algebra (which could be motivated intuitively for scientific applications eg based on a preference for constructive falsification rather than proof by contradiction), then you get a generalisation to something more akin to a non-Euclidean geometry. Without an underlying Boolean algebra, you don’t have the Cox argument, coz he explicitly uses it.

So Bayes = Euclid, non-Bayes = non-Euclid. Appropriately, I sometimes find Bayes too rigid.

(I also generally find ‘quantitative logics’ to be pretty unappealing these days but that’s another issue)

]]>ojm: If you could formalize your argument somewhat it would make it so I didn’t have to guess at what the concern is so much.

The fact is the Cox axioms give you uniqueness of a certain algebra of plausibility. Certain plausibilities are assigned (from outside Bayes, priors and likelihoods) and then once you assign those and you accept that you want your calculations to obey the Cox axioms, the unique rules for manipulating these plausibilities are the probability rules, and they guarantee that you wind up with a unique mapping that has certain consistencies, input to output

and that this mapping accords with p(A) = 1 implies P(~A)=0 and soforth.

I don’t see how this implies some “underlying” anything. There’s just formal rules for manipulating things called “states of information” which are formally probability distributions.

In particular your statement: “In a context where multiple parameters could be ‘correct’ (or all are incorrect) then logically you are saying it has to be the case that either parameter = 1 or parameter != 1. But then you are saying that it can’t be the case that both parameter = 1 and parameter = 1.01 etc. Or even parameter = 10. This can be thought of an identifiability constraint.”

but I disagree that I have to accept a=1 or a!=1 as acceptable propositions. The meaning of the word “proposition” is undefined in the Cox axioms. So I can simply say that a=1 isn’t a valid proposition about which I have to be able to assign probability.

So, if that kind of thing is your concern, it seems like your issue is something related to measure theory and the meaning of “proposition”. You are imbuing the purely formal “proposition” with particular meaning, in particular including things like “a=1” in your propositions.

In fact, measure theoretic probability works using Borel algebras, so it doesn’t allow probabilities to be assigned to a=1.

which is to say, in Cox/Bayes the meaning of the word “proposition” is undefined, just like in geometry the meaning of the word “line” is undefined. and so we can have euclidean or non-euclidean geometry based on what we take “line” to mean, and whether or not we add a parallel postulate.

So, I admit, I don’t get it. But someday I hope you will formalize some concern in a way I could actually understand it… and I also admit to making progress just by thinking about the issue… sigh… but we never seem to be able to stop posting.

]]>…one last comment.

A proposition like a is in [0,1] is fine. And you can even assign it a probability if you want, like 0.5. This is not the same as assigning a uniform probability distribution over the interval [0,1], as I assume you know?

]]>Yeah fair enough to stop for now. For the record I find your characterisations of what I ‘want’ very strange and quite far off. I’m just working with the definitions, not on what I ‘feel’ like Bayes should be or what the derivation ‘should’ say.

(BTW statements and propositions are usually distinguished in formal logic. And yes of course propositions can be convoluted or about complex objects, but they are required to ultimately be declarative T/F statements. There are whole chapters in books on philosophical logic dealing with the question ‘what is a proposition’.)

]]>A proposition is just a statement, such as “a = 1” but it could also be a statement such as “a is in the interval [0,1]” if the posterior of your Bayesian model + background knowledge + large quantities of data assigns uniform(0,1) then it seems you consider this to be a fault, that there should be an underlying atomic proposition of the form a=X for some single X which gets p(a=X | Background, LargeQuantitiesOfData) = 1

But there is nothing in the axioms which requires a proposition of this form a=X to get p=1. In fact, often this doesn’t occur. It seems as if you find this an illegitimate thing because deep inside it all “there can be only one” in other words, there is an equality proposition which we would all agree to assign truth to, if we just had enough knowledge K. I find that odd, particularly considering that we actually know ahead of time “all models are wrong, some are useful”.

anyway I think we’ve gone far enough for now. Thanks for playing again :-)

]]>(I’m assuming the usual definition of a proposition as a declarative sentence that is either true or false)

]]>(unless you have a different definition of proposition and associated Cox derivation?)

]]>Anyway, my general argument is

a) Cox-Jaynes is not the unique unobjectionable logical system for reasoning about uncertainty.

b) One probably shouldn’t want one anyway

]]>> assigns real numbers to [Boolean] propositions.

It’s right there.

]]>In particular, I think your assumption of an underlying 2 valued logic gets cast into Cox/Bayes as an additional axiom that goes something like:

“There exists a unique state of knowledge K which contains all possible correct scientific facts about the real world, and in each model, an *atomic* proposition A about the parameter vector of the model, such that p(A | K) = 1.”

And this is a *strong* axiom about *Science* that is *definitely outside* the Cox axioms. Also I think it’s obviously wrong (in particular, there is basically just one correct model of the whole universe, involving some kind of Quantum particle reality, and none of our scientific models are equal to this model, so given K every actual A we might come up with has p(A|K) = 0 ).

]]>“BTW probability is not a multi-valued logic, it assigns real numbers to propositions built on single-valued logic. So there is an underlying single-valued logic.”

Formally this is nowhere in the axioms. Probability assigns real numbers to propositions. Period. Restricted to the case where it assigns p(A) = 1 it also assigns p(~A) = 0 and the like, yes. But there is no underlying 2 valued logic required by the axioms.

]]>You might be interested in Halpern’s ‘Reasoning about uncertainty’ https://mitpress.mit.edu/books/reasoning-about-uncertainty

]]>I’ve read Van Horn – I think it begs all the same questions.

BTW probability is not a multi-valued logic, it assigns real numbers to propositions built on single-valued logic. So there is an underlying single-valued logic. It is the ‘plausibility that this is true’ not plausibility replacing T/F.

]]>ojm:

Cox Bayes only says essentially that “when your knowledge gives you complete certainty, the mathematics is the same as Boolean logic” it doesn’t say that “there is a really and truly, actually true value in every question and we just don’t know what it is”

See Kevin Van Horn’s R2 http://ksvanhorn.com/bayes/Papers/rcox.pdf

it requires only compatibility with Boolean logic, not equivalence to it in the limit of infinite data for example.

I think outside the Bayesian machinery you can claim “I will only accept a model as scientifically meaningful if there really and truly is one parameter value that is True ™” but if you do this, this is externally imposed by you, not by a sufficient set of Cox axioms.

Suppose in the limit of infinite data you come up with p(a = 1) = 0.8 and p(a = 1/100) = 0.2 does this mean your application of Bayes was illegitimate, it failed to satisfy some of the Cox axioms? No. No axiom is violated.

]]>This latter reasoning involves no probability. And no, restrictioning your initial search to a range is not the same as imposing a prior _probability_ distribution.

]]>When restricted to parameters in eg an ode your propositions are eg parameter = 1, parameter = 1.1 etc.

In a context where multiple parameters could be ‘correct’ (or all are incorrect) then logically you are saying it has to be the case that either parameter = 1 or parameter != 1. But then you are saying that it can’t be the case that both parameter = 1 and parameter = 1.01 etc. Or even parameter = 10. This can be thought of an identifiability constraint.

An alternative is to say ‘I will consider the parameters in some range. For each individual value I will evaluate its consistency with the data (eg via its implied predictions). I make no assumption that only one value can be the right answer’.

]]>ojm: if in pseudocode Stan I do:

ypred = predict_ode_results(paramvec);

y ~ normal(ypred,1);

What are the “propositions” that I’m considering? They are basically of the form

particular value of paramvec makes the vector of observations y be in a ball of a given size around ypred

There’s no “negation of the ode” it’s all propositions about particular parameter vectors and the resulting nearness to the ball of given size

I think this makes things more explicit about the object and observer logic. Why the particular size ball? Because in observer logic we believe this is the right measurement error size. Why the particular parameter values handed to us by Stan? Because in the object logic (the bayesian model) it best satisfies the declared requirements.

The Bayesian model doesn’t give us truth in the observer sense (the scientific sense) it gives us truth in the restricted object-model sense (best satisfies the formal requirements of the model).

If the model is bad, GIGO

]]>See linguistic reclamation/reappropriation

]]>FYI – another good Dover book on logic that gets at some of the subtle issues at play is Topoi: The Categorial Analysis of Logic By Robert Goldblatt.

]]>“…it is important to review the model to make sure it fits the data. If it does not, learn as to why it doesn’t work and make the necessary adjustments to reevaluate the model.”

GS: Such practice can lead to reasonable prediction, to be sure. Ptolemy was good at that, too, if you remember. Not all prediction is science even though prediction is a goal of science along with control and interpretation of complex cases. Simply continuously modifying “models” post-hoc is not science though it is frequently done in the name of science. Or am I missing something?

]]>Andrew,

> there are people who call themselves Bayesians

why not just “there are Bayesians”?

]]>ojm: Hmm, here’s what I was thinking of:

Suppose I specify a model for some science (toothbrushing or whatever) and I collect some data. I am committed to the idea that I want my analysis of this data and model together to have a certain consistency. I take Cox’s axioms as being sufficient for the consistency properties that I want. So, I choose priors over parameters that express my scientific knowledge, and conditional probabilities over data given my priors, and I code it in Stan and I get some results. Within this logic, everything I’ve done is consistent. But there are scientifically true facts which are nevertheless not provable within the Bayesian model, namely, for example “my model sucks and it doesn’t explain what actually happens very well”. Within the model, there is no way to examine this question, so I need to examine it outside the Bayesian model.

One method is model expansion. Andrew advocates this often. Now, our model is more flexible and can explain more things, so there might be a region of parameter space where in fact the fit is such that, outside the model, I will say “my model doesn’t suck, and does explain things pretty well”.

In the absence of that, I may need to simply try alternative models. How do I decide what to try?

In his book “Mathematical Logic” (see, look what you’ve done, now I’ve bought another great Dover Kindle book) Stephen Kleene introduces the subject by asking:

“Now we are proposing to study logic, and indeed by mathematical methods. Here we are confronted by a bit of a paradox. For, how can we

treatlogic mathematically (or in any systematic way) withoutusinglogic in the treatment?The solution of this paradox is simple, though it will take some time before we can appreciate fully how it works. We simply put the logic that we are studying into one compartment, and the logic that we are using to study it in another. Instead of “compartments”, we can speak of “languages”. When we are studying logic, the logic we are studying will pertain to one language, which we call the

object language…Our study of this language and its logic… we regard as taking place in another language, which we call theobserver’s language. Or we may speak of theobject logicand theobserver’s logic.

The scientific “goodness” of a Bayesian model must be evaluated at an observer level, but once the model has been formalized out of the scientific knowledge, the relative goodness of the various unknown parameter values conditional on the data is precisely formalized through the sum and product rules of Bayesian probability theory in such a way as to give you an answer that has certain strong logical consistencies. For example, you will never get a result “the expected brushing time of kindergardeners is -3 minutes” because this is excluded in the prior.

]]>…both in…

]]>I agree with the have your cake and eat it too nature of a lot of this. Both I. The positive sense and the negative sense, appropriately…

]]>Leon:

Heckman and Singer wrote, “Bayesians have no way to cope with the totally unexpected.” The term “Bayesian” refers to people who use Bayesian statistics, no? BDA is a 22-year-old book, and Heckman and Singer’s article came out just this year. If they’d said something like, “25 years ago, most Bayesians had no way to cope with the totally unexpected,” then, sure, fine. But that’s not what they said. They did not criticize the “orthodoxies of the past.” They used the present tense.

As a more technical matter, I disagree with your claim that Bayesian model checking as discussed in chapter 6 of BDA is “non-Bayesian.” What we do is completely Bayesian—it’s a working out of implications of the posterior distribution. You can call it modern Popperian (in that we are using the implications of the model to decide to (probabilistically) reject it, you can call it Lakatosian (in that we’re using problems in our model to motivate improvements), you can call it Cantorian (in that we recognize the impossibility of laying out all possible model choices ahead of time), but in any case it’s Bayesian in its use of the posterior distribution. You can read my 2003 paper for elaboration of this point.

Of course, that’s just me talking. You have as much of a right to call me non-Bayesian as I have to call myself Bayesian. And, book sales aren’t everything—but to the extent that words are defined in part by their use, I think it’s fair when talking about what “Bayesians have no way to cope with” something, to recognize that the tens of thousands of people who’ve read our book *do* have a way!

(1) Bayes = inference = adjusting a distribution over a hidden quantity by conditioning on what’s observed.

Opposed to this is

(1′) Non-Bayes = testing/checking = considering *separately* different possible values of a hidden quantity, & how well each fits what’s observed (or predicts a different hidden quantity). Model checking, p-values, and confidence intervals all instances of this.

On the other hand, in the broader sense, “(modern) Bayesian” is used to mean

(2) The applied statistical philosophy of the BDA authors, i.e.: use (Bayesian) inference (1) liberally for parameter values, and (non-Bayesian) testing/checking (1′) for models.

Opposed to this might be:

(2′) Use Bayes for everything — all hidden quantities are either assumed fixed or given a distribution. Never consider multiple models without adjusting a distribution over them.

or

(2”) Never model/”treat as random” a quantity you don’t observe.

The real question seems to be: should we use the word “Bayesian” primarily to describe a way to treat hidden quantities, or primarily to describe a (fairly ecumenical/moderate) applied statistical philosophy? I think some strong arguments in favor of the first option are

– Consumers of statistics need to understand the drastically different interpretations of (1) vs (1′). So preserving that distinction is important. And IMO both have their place practice. One should be both Bayesian and non-Bayesian.

– The distinction between model (collections) and parameter (values) is extremely superficial and at times pedagogically confusing.

– Calling an applied philosophy “Bayesian” that is actually pretty ecumenical smacks to me of gerrymandering. It’s like the opposite of “no true Scotsman”: “of course *modern* Bayesians are willing to use model checking”. Ok — why not just acknowledge that a BDA-ish philosophy is straightforwardly broader that the orthodoxies of the past?

]]>According to the Godel metaphor then, one should be non-Bayesian _within_ the model and Bayesian _outside_ the model, since only _externally_ (e.g. ‘God’s eye view’) do we have access to all true/false statements.

]]>Jadagul. Yes exactly, just like the Godel sentence is obviously true but only in the meta mathematics of the system, it’s unprovable in the system.

]]>One implication of _data analysis_ seems to me to be that there is no such thing as a confirmatory analysis…

Bayes without step 3/data analysis is exactly a logic of confirmation…

]]>