Deborah Mayo quotes me as saying, “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive.” She then follows up with:

Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree.

Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian?

My reply:

I was influenced by reading a toy example from Jaynes’s book where he sets up a model (for the probability of a die landing on each of its six sides) based on first principles, then presents some data that contradict the model, then expands the model.

I’d seen very little of this sort of this reasoning before in statistics! In physics it’s the standard way to go: you set up a model based on physical principles and some simplifications (for example, in a finite-element model you assume the various coefficients aren’t changing over time, and you assume stability within each element), then if the model doesn’t quite work, you figure out what went wrong and you make it more realistic.

But in statistics we weren’t usually seeing this. Instead, model checking typically was placed in the category of “hypothesis testing,” where the rejection was the goal. Models to be tested were straw men, build up only to be rejected. You can see this, for example, in social science papers that list research hypotheses that are *not* the same as the statistical “hypotheses” being tested. A typical research hypothesis is “Y causes Z,” with the corresponding statistical hypothesis being “Y has no association with Z after controlling for X.” Jaynes’s approach—or, at least, what I took away from Jaynes’s presentation—was more simpatico to my way of doing science. And I put a lot of effort into formalizing this idea, so that the kind of modeling I talk and write about can be the kind of modeling I actually do.

I don’t want to overstate this—as I wrote earlier, Jaynes is no guru—but I do think this combination of model building and checking is important. Indeed, just as a chicken is said to be an egg’s way of making another egg, we can view inference as a way of sharpening the implications of an assumed model so that it can better be checked.

P.S. In response to Larry’s post here, let me give a quick +1 to this comment and also refer to this post, which remains relevant 3 years later.

P.P.S. See here for years and years of Popper-blogging. And here’s my article with Shalizi and our rejoinder to the discussion.

Andrew, your first link (the one that references Larry’s post “here”) is a link to this article, not to Larry’s post (wherever that is). This is the link you gave:

http://statmodeling.stat.columbia.edu/2013/09/03/popper-and-jaynes/

Fixed; thanks.

As I wrote in a comment on my blog:* “there’s no warrant to infer a particular model that happens to do a better job fitting the data x–at least on x alone. Insofar as there are many alternatives that could patch things up, an inference to one particular alternative fails to pass with severity. I don’t understand how it can be that some of the critics of the (bad) habit of some significance testers to move from rejecting the null to a particular alternative, nevertheless seem prepared to allow this in Bayesian model testing. But maybe they carry out further checks down the road; I don’t claim to really get the methods of correcting Bayesian priors (as part of a model)”.

*http://errorstatistics.com/2013/08/31/overheard-at-the-comedy-hour-at-the-bayesian-retreat-2-years-on/

Mayo:

I’m not using model checks to “infer a model.” I’m using model checks to examine possible problems with a model that I’d already like to use.

Andrew, but don’t you then tweak your model, possibly rather slightly, as a result of those model checks? I take that as the meaning of “alternatives that could patch things up.”

Mark:

Sometimes I tweak, sometimes I’m not quite sure what to do and I stick with what I have until I can figure out where to go next. Keith put it well in his comment below when he wrote, “you stop when either when you no longer (seriously) doubt the adequacy of the model (representation) for its purposefulness or you are really in doubt as to how to improve it.” I’m often in that “really in doubt” position.

… this might be a good place to ask a question I have been wondering about for a while…

When people compute p-values they generally seem to want to reject the null. And the rejection of the null is reported as a finding.

I assume when p-values are computed for a posterior predictive check that the model building process is continued until statistician fails to reject the null at some level. Then the failure to reject the null is reported… This seems to be quite a big difference… Is that a fair comment?…

Thanks for any clarification / answers….

David:

No, it is standard practice to find that a model does not fit the data but to stick with the model, perhaps for computational reasons or because a more complicated model would be too difficult to understand at this time.

Presumably Andrew’s sticking with models for reasons such as making better predictions on out-of-sample data than anything else we have. In situations like this, knowing where a model’s predictions go wrong can in and of itself be useful in practical applications of the model, as well as in guiding future model development. You often see it called “error analysis” in machine learning.

You see this overall “model choice” behavior all the time in machine learning applications, where the focus is almost exclusively on predictive accuracy for new data, and nobody is trying to make a scientific hypothesis about the model itself, except perhaps that it’s epsilon better than the last person’s model at predictions and perhaps faster to estimate.

To answer Deborah Mayo’s question, you see “prior tuning” in this situation using cross-validation. So it’s not a proper prior in the strict sense to begin with, but more of a poor man’s hierarchical model. The other place you often see tuning of “priors” is when there’s computational instability from a very vague prior, though we’re working on this on two fronts. One, Andrew’s trying to convince people to use what he calls “weakly informative priors,” and two, we’re trying our best to eliminate the need for this kind of fiddling for purely computational reasons in Stan.

While I agree that the focus of modelling should not exclusively be on predictive accuracy but I feel that in a lot of academic modelling exercises predictive accuracy gets less focus than it deserves.

The underlying scientific hypothesis sure is important but I feel that ought to be a second order metric used when comparing between models of fairly close predictive power.

What irks me is the tendency to accord too much importance to the elegance of the model itself or often it’s mathematical features or asymptotic behavior.

I think you stop when either when you no longer (seriously) doubt the adequacy of the model (representation) for its purposefulness or you are really in doubt as to how to improve it.

But, I also thought a lot about this long before I took for my first statistics course and my favourite question then was “what makes a good representation (model)?” When I went into statistics one of my first impressions was “they always try to evaluate a model where it’s obviously not appropriate and then conclude – don’t ever use that model!”

Most of my early thinking came from CS Peirce and his 1,2,3ing –

1. This might be a good model for that

2. That model implies various things but the brute reality encountered pushed back against many implying/suggesting the model is actually not very good

3. Given 1 and 2, this other model _should_ be a better model for that (back to another round of 1,2,3ing)

And you stop when you achieve the face validity described above – so Mayo is correct in pointing out it (usually) is not (all) deductive (but some deduction is involved in step two).

I also think Larry’s post has some validity: some folks are overly enthusiastic and dismissive of those that don’t think like them (and sometime very rude). You often do see a kind of Bayesian cheerleading in introductory Bayes courses that to me is like providing _kids_ with a couple days training on bicycles (with training wheels) assuring them bicycles always work wonderfully and then sending them out into traffic (without training wheels).

Keith,

I thought Larry’s post had validity too, but (a) I thought it was funny when he wrote, “I don’t think anyone’s career was ever adversely affected by being a Bayesian. I don’t know of a single example”; and (b) I’m not a fan of the use of “religion” as a term of insult.

I certainly got point a from George’s comment and for point b, I thought it was funny that Lindley used the word sacrosanct for the likelihood principle (though did retract the religiosity) given Mike Evans seems to have shown it’s just a miss-conceived proof (valid but unlikely acceptable) http://arxiv.org/abs/1302.5468

I show something stronger: that the argument is unsound: http://errorstatistics.com/2013/07/26/new-version-on-the-birnbaum-argument-for-the-slp-slides-for-jsm-talk/ (Link to the full paper is there).

David, yes, this struck me too at some point. Obviously a non-significant p-value is not in the same way a confirmation of a model as a significant one is evidence against it. The logic seems to be, “OK, we use it because we have no evidence against it” (and no better idea, I guess), and how good this is, I think, depends strongly on whether we looked in the right corner for reasons to reject. (“Bad” model deviations are those that will make the model/methodology based on the model lead us to wrong conclusions regarding what you’re interested in, for which some aspects of the model are usually harmless and others are not, depending on what we use the model for.) And on how robust our method is against unobserved/unobservable violations of the model.

One needs to take explicit account of the violations (type and magnitude) that could not have been uncovered by the given probe.

This question seem a bit misconceived and I’m not sure whether to be charitable and say it’s due to unfamilarity or whether you are trying to lay a trap that is only going to be visible to you.

The null is false. The model is wrong. There are precious few situations in real life where this isn’t – clearly and a-priori – so. Reporting “failure to reject the null” would be only a comment on your experimental technique since we know the null is false. Iterating until you can’t reject it, likewise, is a statement about your experiments and how well

they worked, and the sample size and whatever, but doesn’t say much by itself about the null (we know it’s not true). Some

might say that working the data over and over again risk increasing the odds of a type-1 error (this is why your statement seems like a bit of a trap – but one only someone in the trap already would appreciate) – because in the real world

this is a non-issue. An entirely data-free statement “the null is false” has – outside some really strange cases – type-1 error of zero. We always know, to any practical certainty, that the model is wrong.

Recognizing the model is wrong (it IS wrong) lets one focus on the question of “ok, but it is _too_ wrong, for such-and-such purpose?”. This question may well make iteration on hypothesis testing (something our host likes but I don’t

appreciate) reasonable. But to then report “we failed to reject the null?”. That’s crazy; we can almost always give

data-free and compelling arguments why the null is, not just no-rejected, but is indeed false. Why would we invoke

data and statistics to make an even murkier statement than what is already obvious?

Thanks for the answers. Maybe there was some ambiguity in the question but Christian seemed to get what I was thinking about…

bxg… I am a bit confused about the laying a trap comment… I was only using the standard language of p-values but noting that they are usually used in a falseficationist sense to find evidence against a model, not something you usually want to do in a Bayesian setting (at least at the end point where you arrive at a model you want to keep).

I didn’t express any opinion on “all models are wrong” or p-values but FWIW…

I think plotting P(y_rep|y) against real data is a really good heuristic, although I am doubtful that a satisfying higher level philosophical explanation can be given for doing this step. In general it need not be a problem if most samples of P(y_rep|y) doesn’t look like y equally it doesn’t rule out problems if it does. Similarly computing a Bayesian p-value might or might not be useful, I am in favour of Prof Gelman’s graphical test approach as a heuristic for model exploration.

There are two possible intepretations of “all models are wrong” and both are contestable. A model might represent:

1) a coherent persons point of indifference to decisions

or

2) long run frequency behaviour. i.e. does the histogram resemble the model pdf?

for 1) where the model ‘models’ a coherent Bayesian

Particularly if imprecise probability is used I think a model encoding 1 could be correct. I don’t think it is clearly wrong to say probability of heads is between 0.4 and 0.6 or even to say it is 0.5. In a two coin tossing scenario I think it is _very_ hard to argue against (say) P(Heads,Heads) >= P(Heads,Tails).

for 2) where the model ‘models’ a long run frequency.

There is the obvious argument that unless the model assigns zero to an event that did occur, then, in what sense does it make sense to say the model is wrong?… Also there are fundamental objections to the frequency intepretation of probability memorably described by David Freedman (quoting Keynes and Heraclitus out of context) as “In the long run we are all are dead” and even more devistatingly “you can never step in the same river twice”. These problems with the intepretation make it difficult to claim the model makes sense let alone is right or wrong.

When people say “all models are wrong” I am usually unclear what is meant and why.

David:

You write, “When people say ‘all models are wrong’ I am usually unclear what is meant and why.”

I can’t be sure what others mean by that statement, but when I write it, I mean that models like this logistic regression model for vote intention and this differential equation model for toxicology and this model for responses in a lab experiment and this model for dilution assays and just about every other model I’ve ever fit, is wrong. They are full of approximations; there’s no way it could be otherwise.

Andrew: Thanks for the respones!

As I said there are two meanings for model, so I am still unclear about which one you mean.

Do you mean these models don’t encode your subjective probabilities?… and do you really believe it is impossible to construct a model that does?

..or..

Do you mean that the model doesn’t ‘fit the real world’ in some sense?.. Surely these models did not assert the probabilities of events that did occur to be zero, so in what sense are they wrong?…

I guess that a number of Bayesians see themselves as neither personalist nor frequentist and wouldn’t subscribe to neither your 1 nor 2. But perhaps that’s for such people to explain.

Anyway, from the point of view of a frequentist, don’t you think that what you cited from Freedman is a good explanation? Actually there are a number of levels on which a typical frequentist model can be wrong.

a) As Freedman says, there is no infinite repetition (neither infinite, nor repetition, actually).

b) Sometimes data have obvious features that disagree with the model (eg, continuous models for not really continuous data; infinite value range).

c) We may believe (for example in the sense of N. Cartwright’s “Dappled World”) that what is really going on is just a bit more complex than the simple straightforward thinking going into our models. Even in the simplest cases: Are coin tosses really, really 100.000% exchangeable? Quite certainly not.

From the point of view of a personalist Bayesian it should be an in some sense similar story: One could spend ages finding out that, in terms of the researcher’s thinking or the available information, the chosen prior doesn’t exactly reflect the researcher’s prior uncertainty precisely (exchangeability is again a good example).

…actually, such a model may be correct and not wrong, for sufficiently simple minded people…

You may be interested in my view on these things, which is here:

C. Hennig: Mathematical Models and Reality – a Constructivist Perspective. Foundations of Science 15, 29-49 (2010) .

Free preprint from my webpage: http://www.homepages.ucl.ac.uk/~ucakche/papers/modelpaper.pdf

Thanks for the answer Christian, I have had a skim of that paper both now and previously, I want to take time to read it carefully in the near future. Some of the material is challenging to me.

I am reasonably comfortable with saying “some models are wrong” with the caveat that you need to be able to say what a model is and what it means to be right or wrong. I think Freedom’s comments point to the difficulty to defining with clarity what a frequentist model is.

I think if you adopt an operational subjective approach it is (reasonably) clear what a model is. As you say a model might be right for simple minded people. I would add it might also be right for simple situations. I think a case can be made that complex people, modelling complex situations, could use an imprecise probability model that is not wrong. Finally imagine somebody considers relevant data to a complex real world problem then models it in some way, maybe in a non-Bayesian way certainly without specifying elicited subjective probabilities over the full partition, then they arrive at a decision and act. In what sense is a model of their beliefs, that states only that their chosen decision is the best one (on the basis of their previous information and beliefs)… wrong?

Exchangeability is I think worried about too much at least at the philosophical level (maybe too little at the applied level). For simple situations I think it can easily hold absolutely i.e. for sequences of length 1 or 2. For realistic (even very long) sequences I think it isn’t perfect, but actually isn’t far off and I think your point c) is applicable, again this might be fixable with imprecise probability…

From the operational subjective perspective I have taken model to mean the probability specification over observables with all the parameters integrated out. A frequentist clearly would refer only to the model indexed by a parameter. Another problem emerges if the model is very rich (or rich compared to the problem) how can it be wrong to model a single instance of a categorical variable with a discrete distribution. Similarly how can a non-parametric model be wrong except in the i.i.d assumption. If you are willing to entertain a non-parametric non-i.i.d. model you are really willing to entertain a model that a Popperian would see as unfalsifiable and hence difficult to say what it could mean to say such a model is wrong.

I actually think that people who use the line “all models are wrong” are usually pretty cluey but there is really a lot of underlying ambiguity in that statement.

David:

I don’t have subjective probabilities to be encoded. To me, a model is a mathematical description of reality. It captures some aspects of reality but not all. For example, in the logistic regression example I linked to, I don’t think that if I had all the data for all population subgroups, that this would be quite characterized by the logistic regression I am assuming.

So is it correct to say that your interpretation of the prior or posterior predictive distribution is a reflection of what is not captured in the model?

I’m still not sure how to make such a definition precise. I can see how that definition works in the predictive distribution of outcomes (ie simulating observable outputs of the model from samples of the posterior predictive distribution), but I don’t see how that definition works with regard to the prior/posterior predictive distribution on internal model parameters (e.g. the coefficient in a logistic distribution).

Andrew: Is characterisation really the goal? I find this a dubious end goal for two reasons: firstly what could possibly characterise the dataset better than the dataset itself?… secondly for what reason would you want to characterise a dataset?… When there is no uncertainty what role is there for probability at all?

I think a more reasonable goal is to be as articulate as possible about uncertainty in order to guide decision making.

If your goal was only to characterise the data I think as a consequence you will struggle to either explain what exactly this means and why you should do it. When the goal of modelling is so unclear how can it make any sense to say “all models are wrong”.

As a final note… It brings me quite a bit of discomfort to find myself disagreeing with you even on what is ultimately a small, and perhaps to some pedantic, philosophical issue… I am in general deeply impressed with your contributions as well as your generosity and interest in interacting with other researchers.

David:

Perhaps I see the confusion. You write, “what could possibly characterise the dataset better than the dataset itself? … for what reason would you want to characterise a dataset?” But my purpose in modeling is not to “characterise a dataset.” My purpose is to form a link between data and population (or, between data and the unknowns that I want to predict). The model necessarily goes outside the data in some way. In my logistic regression example, the model is about the entire population of American voters. In my differential-equation example, the model is about the concentration of a compound in each of several compartments within the body. And so on.

David: Apparently we have exhausted the reply levels so this refers to your posting today 11:56.

As a constructivist I think that making a confident statement that a model is “wrong” can always only refer to comparing the model with the perceptions of reality (including accepted data and what all kinds of other people think) that the person has who makes the statement. There is no difference in this between frequentists and Bayesians of any kind. Frequentists will use models that are more simplistic than how they themselves see the modelled situation. Personalists will use models that are more simplistic than how they themselves see their personal beliefs, when they try to get into them as deeply as they can (the “simple minds” comment was about the lack of will to do this that I sometimes observe). But there are good reasons for using these models anyway.

Which brings me to perhaps the most important point in this discussion, actually more directly relevant to your posting. If people say, “all models are wrong”, why do they say this? At least if it’s statisticians who use models all the time, they don’t say this because they want to convince us not to use the wrong models, but rather from stopping people to expect the wrong thing of the models that we are using.

You write: “In what sense is a model of their beliefs, that states only that their chosen decision is the best one (on the basis of their previous information and beliefs)… wrong?”

In the sense that if you ask them whether they really think that this reflects their beliefs 100% they will (hopefully!) say “no”. Which in no way is against saying that it can be useful.

“For realistic (even very long) sequences I think (exchangeability) isn’t perfect, but actually isn’t far off” – it’s still wrong as long as you don’t believe in it 100%, but this shouldn’t stop you from using it. That’s the point.

“All models are wrong” is not about criticising models, but it’s rather expressing that “is the model correct?” is the wrong question to ask.

By the way:

” I think a case can be made that complex people, modelling complex situations, could use an imprecise probability model that is not wrong.”

They’d have to specify upper and lower probabilities where Bayesians only have to specify one, and they’d need to be convinced that all these specifications are precise. That’s actually a bigger task than the one of the Bayesian. Although it is not that big an issue if you realise that “all models are wrong” is not a criticism of the model.

“For simple situations I think (exchangeability) can easily hold absolutely i.e. for sequences of length 1 or 2.”

For length 1 there is nothing to hold, I think, and I don’t see how if it can hold “absolutely” for length 2 it cannot hold “absolutely” for any length, or in other words, how any argument that challenges “perfect exchangeability” for large n does not apply to n=2.

“how can a non-parametric model be wrong except in the i.i.d assumption”

That’s tough enough an assumption really, and on top of that you can criticise any further assumption that is made such as smoothness of regression curves etc.

” If you are willing to entertain a non-parametric non-i.i.d. model you are really willing to entertain a model that a Popperian would see as unfalsifiable and hence difficult to say what it could mean to say such a model is wrong.”

– well, perhaps you can’t say that the model is wrong but you can say that it’s useless, because no structure => no target for inference. (On top of that, continuity can still be falsified in a very straightforward manner.)

The whole point is that the question “is the model wrong?” is the wrong question for assessing the quality of a model.

David:

I am putting this here but have read further down.

I find it helpful to think of models as representations – something that stands for something else for someone for some purpose.

When these representations are not self-referential they are wrong (by Heraclitus).

In mathematics representations are self-referential and cannot be wrong (when you sketch a triangle it represents a perfect triangle perfectly (at least for mathematical purposes). Whereas, representations of anything in this universe (including your thoughts) cannot be correct (but they can be adequate for certain purposes and “someone” and very inadequate for other purposes and “someone”).

These ideas are fleshed out here

Paul Forster. Peirce and the Threat of Nominalism, Cambridge University Press, 2011

But the first few chapters (explaining how nominalism is _evil_) seem to put non-philosophy readers off perhaps as it seems overly philosophical. Also Peirce was (self-admittedly) wrong about everything having understood philosophy widely and having started everything and finished nothing with the result that nobody really knows which was the final version (not necessarily being the last attempt at revision) of most of his writings that were never published. He was a practicing statistician so his insight might be more direct to our concerns.

Andrew: As you are neither in the personalist nor in the frequentist camp, how would you explain what it means that a certain probability model is correct? (Which I think is required in order to say that a certain model is wrong.)

Christian:

I think a model is wrong to the extent that it does not describe reality. For example, our toxicology model describes a person as having four compartments, each of which is in instantaneous equilibrium. Of course this is wrong. For another example, consider a model such as Pr(y=1|x) = logit^(-1)(a+bx), where y is death of a lab rat and x is dose of a toxin (we have such an example in our book). Here I’m also sure the model is false: if we get enough rats and measure the relation between Pr(y=1) and x, of course it won’t exactly be a logistic function.

So this is a rather frequentist conception of being wrong, referring to observable relative frequencies?

Christian:

As discussed in Chapter 1 of Bayesian Data Analysis, I think that probability can be addressed from several directions, depending on context. In the example of the lab rats, a frequency definition of probability seems to me to make the most sense. In the example of the four-compartment model, the approximate aspects of the model are obvious, and there are various ways that the inexact nature of the model can be shown from data. For example, in our paper we compared model predictions to data from an experiment conducted over 15 minutes, which I think was too small a time period for approximate equilibrium to be attained.

So Andrew suspects “reality” is more complicated/not respresented exactly in his current model, his prior probability for this being true is close to 1 and anytime he has had a chance to check that (the prior), if anything its probability has always increased?

(Of course we can’t get beyond our perceptions of reality but we do encounter it (sometimes) brutally in our work and life.)

The stuff you’re talking about came naturally to Jaynes (and to me) because he though of probabilistic models in ways very different from most statisticians of whatever variety.

Instead of thinking these as proto-physics models used to model some kind of weird physical force called randomness, he thought of them more as a device for connecting physical facts. Models connect some input facts A with some output facts B. If that connection has a high probability (i.e. the model has a high probability) that was interpreted as saying:

“There are lots of things we don’t know, but for almost every possible value of those unknowns it’s the case that A and B are related in the stated way”.

“probability” is the natural tool for doing this because the sum and product rule are all you need to count things like “how many unknowns are compatible with a given relationship” and propagate those counts correctly and coherently. Indeed, it’s provably true that if you don’t use probabilities, you’re going to get the “counts” wrong sometimes.

When you compare the model to reality, either A and B will be related in the stated way and this will be highly robust (almost every possible unknown implies this relationship after all), or A and B will turn out not to be related in the stated way. If the relationship turns out to be false, then that’s strong evidence that something like a new law or effect of physics/biology/economics/political science or whatever was confining the system to that tiny section of possibilities. Moreover such new laws/effects are going to important because of they have big effects.

So either way the scientist is in business, either they have a highly robust relationship or they’ve discovered something new and important.

Incidentally, it would be nice if people dropped that “guru” or “high priest” (as Mayo would say) straw man. It’s about as ridiculous as saying “I’m not like those fanatical physicists who worship Newton, he got a couple of things rights, but I’m not fooled into thinking he was a guru like they are”. It’s a cheap load of crap. You’re not the only ones aware that everyone is fallible.

Joseph:

Fair enough on the “guru” point. In my defense, I was never claiming that I’m the only one aware of Jaynes’s fallibility. What I’ve seen on occasion is people taking Jaynes’s word as truth with no reflection. But I agree with you that saying someone “is no guru” is a rhetorical trick that unfairly puts supporters of that person’s work on the defensive. So I’ll try to avoid that phrase in the future.

Nicely put and it’s the ease and convenience of Bayes in getting some “evidence that something like a new law or effect of physics/biology/economics/political science or whatever” that I value a lot.

Entsophy: “Instead of thinking these as proto-physics models used to model some kind of weird physical force called randomness, he thought of them more as a device for connecting physical facts. Models connect some input facts A with some output facts B.”

Why should the connection having a high probability mean the model has a high probability? I’ve always seen this as a slip. The connection having a high probability refers to those physical facts you mention, that is, to what would be expected were a type of experiment carried out—or something like that. The point is that what is probable is the observable correlation, not the model which asserts something about a conjectured underlying causal origin. I think it’s a mistake to suppose statisticians view their models as literally modeling weird underlying forces of randomness (read for ex. David Cox on models).Recognizing this doesn’t make one a Bayesian or a Jaynesian. Is it possible that by holding (and popularizing) this idiosyncratic view of statistical modeling, that he encouraged people to think they had to reject that view? but then, ironically, advocates a more metaphysical conception than is found in ordinary statistical modeling.

I’m not being as clear as I’d like, but I’m pondering this, and figured someone out there would see what I’m driving at.

> The point is that what is probable is the observable correlation

Given the data generating model which is just a representation not what is being represented.

Entsophy’s, if I am getting him not too wrong, is given a parameter generating model and a data generating model – again just a represention.

To me they are both just devices to help get less wrong about the world (what we are trying to fallibly represent) and for me using a parameter generating model and a data generating model often seems to serve my purposes more efficiently.

The way we can and do fallibly represent the world, is by recognizing our deliberate interactions with it via models and data generating processes. That’s why the data generating models of the error statistician are concrete. But what is the probability of a parameter generating model–it’s back to the “slip” in my last comment. It’s a level (of metaphysics) that’s not needed, and whose meaning can not in general be pinned down.

Given the choice between spending a few days to get credible intervals and assessing the relevant repeated use performance (e.g. prior/data model conflict and average coverage given a passable prior data model) versus six months to get higher order corrected profile confidence intervals (like I did in my DPhil thesis and the examiners expressed just getting regular profile confidence intervals as a notable achievement) – that I actually had little confidence in their true confidence interval coverage – I’ll risk being labelled meta-physical and not debating that.

Applications can be very, very hard to do correctly frequentistly (e.g. get uniform coverage for all possible parameter values). It can require mathematical skills that few ever achieve or maybe even appreciate. Take a two group randomized experiment with binary outcomes – there is a reason why mathematical statisticians have argued over the _correct_ analysis for about 100 years. Do a simulation using your favourite method for getting a confidence interval for a relative risk (I get to insist on the parameter of interest) and plot the coverage rate as a function of proportion of successes in one group – it will be interesting. (By the way, Antony Davison did mention he thought he could do this to third order. And I think he has since published something on that. Is so, then I’ll insist on a different parameter of interest.)

Mayo, I think you’re reading a bit more into it than was intended. It’s easier to think about in cases when the facts are very precise and not diffuse. An example of a precise relationship would be, “if A is true then some variable f will be very very close to f_0”. Whenever that happens their will appear to be a functional relationship (or law of nature) connecting them such as f_0=F(A).

A real life example is the Ideal Gas law p=nRT/V. Not every microstate having the given energy (related to T) and confined to a given volume V will satisfy this relationship. It’s possible in fact to find microstates where p=0. But it’s also the case that the vast majority of microstates compatible with T,V will have a pressure is equal to p=nRT/V.

In that sense this functional relationship is highly probable. Since the Ideal Gas law is a physical model, we could also say “this physical model is highly probable given what we know or assumed”. To carry out the “counting up” of these microstates, we introduce probability distributions which in effect describe our ignorance about the true microstate. Statisticians are usually talking about these distributions when they say “model”. If you object to calling this “statistical model” highly probable, then I have no problem with that since whether it’s “probable” is meaningless.

It’s meaningless because the “statistical model” is just an accounting mechanism which allows us to take a given state of knowledge and count states compatible with an outcome of interest. It’s ‘good’ or ‘bad’ depending on how well it achieves this goal. Whether those distributions do their job well is purely a mathematical question once a state-of-knowledge is given (for the Ideal Gas case this would be the Energy and Volume and their functional relationship to the microstates).

By the way, there’s a difference between someone who is found fallible and someone who shows a serious and persistent lack of integrity when it comes to treating and interpreting the work of others. When that person at the same time sets himself up as an oracular wise man, then I say it’s no straw man to call him a “guru” or, for that matter, a shaman–even if it’s limited to his criticisms of others. I am just one of many people to observe this about the high priest in question. I nevertheless can learn from him about where people get some of their ideas…

“someone who shows a serious and persistent lack of integrity”

What? a couple of times you looked at some throwaway verbal one-liner of Jaynes and completely ignored the mass of technical detail and mathematics surrounding it and all the legitimate points being made. Lack of integrity indeed.

With equal ease I could call Andrew a “Slavish Rubin Zombie” or call you a “Pearson cult gouprie”. That’s the problem with name calling when Marines are around – no matter how low you go, they can always go even lower.

Incidentally, the latest example of your “analysis” was to take a line which Jaynes explicitly and repeatedly stated was tong-in cheek and which he went on to say no Bayesian can object to Frequentist procedures when certain technical conditions are met and Bayesians should legitimately use them then.

That’s your big example of his “persistent lack of integrity”. On the other hand, you took the tong-in cheek comment seriously and made fun of him for it. And then completely ignored the mathematics backing of the claims that things go haywire for frequentists procedures when those “technical conditions” aren’t met.

So who’s the one with a “serious and persistent lack of integrity” here? You or Jaynes.

Now, now. We don’t need to postulate a lack of integrity on Mayo’s part to explain how she might have that impression of Jaynes — for people not used to it, the way physicists throw sharp elbows on the intellectual playing field can come as a shock.

Corey:

I was think of Entsophy’s comment “Academia is little more than welfare for small men who like to think they’re something special” ( http://normaldeviate.wordpress.com/2013/09/01/is-bayesian-inference-a-religion/#comments ) and though a bit over stated and overly blunt it makes sense to me (perhaps even largely unavoidable).

I remember overhearing a sociologist say once “people are say tow things when the ask questions at conferences – I am smart and I know important things”. All too often the speaker will respond with I am smarter and the thinks I know are more important”. Its the jungle of academia (OK maybe Lions and large Tigers don’t need to do this.)

As my Msc Biostats supervisor once told me “People would not remain in research if they did not have big egos that they took somewhat too seriously”.

Or as someone was told after they were thrown out of a research institute – “All I am allowed to tell you is that you bumped heads with someone in senior management and surprise, surprise you are no longer there.”

I believe the definitive statement on this is Sayre’s Law: “In any dispute the intensity of feeling is inversely proportional to the value of the issues at stake. That is why academic politics are so bitter.”

O’Rourke, oddly I encountered something very different in the Marines despite all the outward bombast: they were regular people who never lost a battle because they were held to a higher standard.

Corey, I’m tired at this point of defending Jaynes. I didn’t read his papers because I though he was correct. When I want “correct” I go read Gelman or Wasserman or Shalizi or any of a thousand other people. I read them because they contain vastly more top-notch research ideas per paragraph than any other writer around. This point is not only lost on those like Mayo who don’t seem to understand much of what he was saying, but also the Jaynes boosterism I’ve seen, which seems to consist mostly of mining Jaynes for ammunition in the style of Medieval theologians reading Aristotle.

Moreover, I have a strong personal revulsion to cliquish behavior and hero worship. That “MIRI, CFAN and Less Wrong” stuff and anything similar is extraordinarily off-putting regardless of what they’re saying. I’m not interested in defending it in any way.

Really, I think statisticians got themselves trapped into an intellectual cul-de-sac which they can’t back out of and they’re dragging down any field that depends heavily on probabilities (Economics, Social Sciences, and increasingly Biology and Physics). And in truth it makes little difference what they think or what they do with their welfare checks (grants). They’re just spinning their wheels generating lots of heat and precious little forward motion. I’m going to follow up a few of those research ideas that benefit me personally and leave it at that.

Entsophy’s statement is a perfect example of the harms of foisting poor arguments on a literature (and when people are told to accept it without criticism). He hasn’t read Jaynes yet he’s so sure that people like me “don’t seem to understand much of what he was saying”. I understand Jaynes, and aside from the entropy business along the lines of Jeffreys (who managed to avoid all the embarassing, over-the-top polemics), his book is loaded with logical howlers. Or does he want to defend Jaynes’ allegation that (if H entails e) and observing ~e leads to rejecting H, that the original entailment disappears. Nobody has to mine Jaynes for fallacies, they’re right on the surface, I’m afraid (I held off raising this one for 2 years).

Clark Glymour’s remark the other day echoes those I’ve heard from well-known statisticians and logicians over the years: “I started Jaynes’ book once, but found on so many points he was logically inept and dogmatic that I quit.” But Entsophy knows he’s correct, and the problem is in us: we just refuse to view his fallacies as non-fallacious! It’s our fault.

http://errorstatistics.com/2013/08/31/overheard-at-the-comedy-hour-at-the-bayesian-retreat-2-years-on/

To clarify here, Entsophy is saying that the reason he reads Jaynes is not because he thinks Jaynes is the only correct person, Entsophy reads plenty of people who are likely to be correct (including Gelman, Wasserman and Shalizi). The reason Entsophy DOES read Jaynes is because he finds Jaynes has a high density of “top-notch research ideas”.

So you saying that “He hasn’t read Jaynes yet he’s so sure that people like me ‘don’t …understand…[him]'” shows that you were misreading Entsophy. I found Entsophy’s sentence construction slightly confusing as well but on second reading I think my above reinterpretation of what he says is what he meant. It’s a blog, lots of people are busy, things don’t always get said in the most elegant way. But Entsophy is not a fool, just perhaps more frustrated than average.

To clarify further, Entsophy also claims that the real value (to Entsophy) of Jaynes, his unique insight into certain “research topics” is also lost on the “Jaynes boosters” who DO treat Jaynes as a dogmatically correct figure. One takes this to mean that Entsophy doesn’t count himself among Jaynes boosters. He seems more or less frustrated with the assumption that because he believes that Jaynes had a lot of good ideas, and the correct outlook on probability, everyone else believes that Entsophy is just a “Jaynes booster” who believes everything Jaynes says dogmatically.

Your response basically seems to confirm his viewpoint.

Oh please, I’m used to genuine tough criticism, not these rude and unconstructive ad hominem defenses. I’m out of here….

Hey Mayo, I’ve always approached you with mathematical and sustentative points and not ad hominem points. You’re the one who takes it down a notch.

How’s this for some constructive criticism:

http://www.entsophy.net/blog/?p=163

“Instead, model checking typically was placed in the category of “hypothesis testing,” where the rejection was the goal. Models to be tested were straw men, build up only to be rejected. You can see this, for example, in social science papers that list research hypotheses that are not the same as the statistical “hypotheses” being tested.”

This is a point of difference between disciplines and, I think, an important part of why much social research is not very useful or quantitative. I recommend Meehl (1967) on this:

“In the physical sciences, the usual result of an improvement in experimental design, instrumentation, or numerical mass of data, is to increase the difficulty of the “observational hurdle” which the physical theory of interest must successfully surmount; whereas, in psychology and some of the allied behavior sciences, the usual effect of such improvement in experimental precision is to provide an easier hurdle for the theory to surmount.”

http://mres.gmu.edu/pmwiki/uploads/Main/Meehl1967.pdf

Dean: Yes but Meehl is pointing up a glaring fallacy: taking a rejection of a null as evidence for a substantive hypothesis that entails (or renders expected) a significant effect. H entails stat sig difference, stat sig difference, therefore evidence for H. Anyone who commits this most glaring (affirming the consequent) fallacy is violating the significance test (even aside from the statistical-substantive distinction). It is simply to mistake what it means to increase the severity of testing H. Meehl of course knew this, and was harping against his fellow psychologists.

Yes, agreed. I meant this as a critical comment on how low the severity of statistical tests in social psychology is.

[…] Mayo finds Jaynes’s point so absurd it doesn’t need analysis, only a gleeful counter slogan. Some less charitable souls have suggested that Mayo didn’t understand his mathematics, but Mayo insists she does. Moreover, Dr. Mayo takes interpreting the works of others very seriously: […]

I realize I am late to the party here, but as I have commented here before, Wasserman is a huge believer in the ad hominem argument. His arguments about Bayesians tend to be about “them” and “they” and “someone” and so-on, never giving a specific example. At this point, to an outsider, Wasserman looks like a crank. I am sure he isn’t! I am told his insights are great. And I don’t employ the ad-hominem argument in assessing statistical practice! But to me, as a non-statistician, the laundry list of “bad things about Bayesians” looks like unspecific slander! We should argue about specific results by specific people; we are scientists and intellectuals, not Members of Congress.

David,

I think you’re misreading Wasserman. In that “Bayes is a religion stuff” he’s complaining about Bayesian’s not Bayes. When he does talk about Bayes it’s pretty much the extreme opposite of a crank or ad-hominem.

True he seems to suffer from that congenital defect that all Frequentists have – they just can’t imagine a probability distribution as anything other than a frequency distribution – so it’s not too difficult for him to find non-frequency distributions which seem bizarre if they’re interpreted a frequencies. Of course we are free to return the favor:

http://www.entsophy.net/blog/?p=115

But that defect is hardly unique to him. Most statisticians of whatever variety suffer from it in truth. At least Wasserman reduces the issue down to clear, unequivocal mathematics, which makes it the exact opposite of ad-hominem.

That’s a cheap shot about a “defect”; I think that Larry W can “imagine” pretty much anything to do with probability or frequency, just fine. Lack of agreement with you (and/or Jaynes) does not imply lack of having thought about your argument.

What do you mean cheap? I paid good money for that shot.

Sure.

I never know if your cheap shots are your own idea or just some form of inheritance, from Jaynes’ style.

I don’t think, overall, they helped Jaynes get his points across. Also don’t think they help you.

I’m sure Wasserman can easily understand my point of view since it’s simpler in several senses than his. But that doesn’t mean he’s actually done so. There are dozens of passages on his blog that wouldn’t be there if he had. One example is quoted in the link above.

Since my livelihood and wellbeing in no way depend on what legions of academic knuckleheads think about anything, I’m surprisingly ok with their continued befuddlement.

For someone professing not to give a damn what academics think, you post a lot of material – cheap shots and all – trying to persuade them to reason in your preferred manner.

That’s for my internal benefit not theirs. And it’s not a cheap shot. I said it because I believe it. Somehow I doubt Larry’s crying in his cornflakes over it.

C’mon. Writing in order to show folks what our goal in modeling errors should be, to show them how it’s achieved, proclaiming your view of the essence of statistics and just how superficial specific other people are is hardly just for your internal benefit.

And if you don’t think it’s cheap to impugn Larry W’s ability to understand what distributions mean, or indeed to label people you’ve never met as “knuckleheads”, “small men” and engage in a race to the bottom in name-calling (above), academia – and scholarly discussion in general – isn’t missing much by your absence.

As Andrew and many others over the years have noted, communication is a key part of science because of the way it forces you to clarify your own ideas. So yes it really is for my benefit.

Academia is pretty hopeless and in retrospect the professionalization of “Natural Philosophy” has been a disaster. The world currently spends vastly more on research than in the past, produces vastly more research papers than in the past, all for a tiny fraction of the effect. It’s degenerated into a scam whose main effect is to convert taxpayer’s dollars into professor’s self-esteem and huge piles of completely forgettable verbiage.

“Knuckleheads” and “small men” is pretty mild all things considered.

But none of that refers to content. When it comes to content I didn’t call anyone names. In point of fact, I defended Wasserman’s content and stated something, which I have a good deal of evidence for, that largely explains why Hogg got the impression he did.

oh, and I meant exactly what I said in reference to “small men”. Trying to win debates in hiring/tenure committees instead of a stand up fight is the act of small men. It’s far from the only such act they engage in, but it’s definitely an example.