# When confidence intervals include unreasonable values . . . When confidence intervals include only unreasonable values . . .

Robert Kaestner writes:

Economists’ love affair with randomized controlled trials (RCTs) is growing stronger by the day.

But what should we make of an RCT that produces a point estimate and confidence interval that largely includes values that most would consider implausible?

The Goldin et al. article on effects of health insurance on mortality (QJE) provides a good example. Point estimate suggests that 6 months of extra insurance coverage reduces mortality by 100%. Most of the confidence interval includes possible estimates there also implausible.

Of course the confidence interval is wide and includes smaller, perhaps plausible values.

My first response is this has nothing to do with randomized clinical trials; it’s just a general issue about confidence intervals, which Sander Greenland refers to as “compatibility intervals” because, when they work like they’re supposed to, they give an interval of parameter values that are compatible with the data.

But there are problems with this idea.

The first problem, as noted above, confidence intervals can contain lots of values that may be compatible with the data but which make no sense, i.e., they’re not compatible with our prior information. But that’s why we use Bayesian methods. Throw in your prior and these outlandish estimates should no longer be in the interval—unless the data and model really really support them.

The second problem is that confidence intervals can exclude reasonable values that are compatible with the data. We saw this with the notorious beauty-and-sex-ratio example, where the 95% confidence interval was entirely composed of unreasonable parameter values. The trouble here is that you can get unlucky, or you can sift through your data to get unlucky on purpose, as it were, but finding confidence intervals that just happen to only include unreasonable values.

Ultimately the problem is that we have uncertainty, and it’s a mistake to take an estimate—even an interval estimate—as a statement of certainty.

P.S. Just to clarify: It’s my impression that the usual way to work with confidence intervals is to take the 95% interval and act as if it contains the true value, with the understanding that you’ll be wrong 5% of the time. My criticism of this approach is the usual Bayesian criticism, that sometimes you know the interval entirely contains bad values, you know the true value is in the interval with something like 0% probability. What are you supposed to do then—just sit there and take it? That way lies Zillow madness.

## 63 thoughts on “When confidence intervals include unreasonable values . . . When confidence intervals include only unreasonable values . . .”

1. Confidence intervals are a way of measuring and expressing uncertainty. If the confidence interval includes a large number of unreasonable values, isn’t that a way of saying that the results are so uncertain that we really don’t know anything?

• No, it’s a way of saying we know a lot more than this data tells us already.

• +1

2. Gerald A Belton writes: “Confidence intervals are a way of measuring and expressing uncertainty.”

I think talking about statistics generally or confidence intervals specifically as “measuring uncertainty” is the problem. If we can measure uncertainty, then we can know how close we are to the truth. However, statistical methods like confidence intervals do not give us this kind of information. You can see this by reflecting on the fact that had your experiment reached a different point estimate, you would get different confidence intervals as well. Nevertheless, generations of medical and social scientists hold the belief that the “true effect” must lie within the confidence intervals.

• What is the source of this “must?” It’s certainly not in any teaching. There is nothing other than Bayesian priors to keep at least part of any confidence interval from being unreasonable, but even when a frequentist does frequentist statistics *perfectly* there is a substantial chance that the entire confidence interval can be unreasonable if the zone of reasonability is small enough.

So there’s just no “must” in there, and there never has been.

• Jonathan writes: “What is the source of this “must?” It’s certainly not in any teaching.”

I am not sure what your point is. I am not saying that the “true effect” must lie within the confidence intervals. I am only saying that it is a common misinterpretation. Are you saying that it is not a common misinterpretation?

Here is just one example from the internet: https://online.stat.psu.edu/stat200/lesson/4/4.2/4.2.1 The authors write, “The correct interpretation of a 95% confidence interval is that “we are 95% confident that the population parameter is between X and X.” Is that the correct interpretation? It sounds to me like they are teaching students that the true value lies within the confidence interval with 95% confidence, whatever that means.

• No one has a real a-priori definition of confidence, so they just define confidence to mean “what you have after you get a 95% confidence interval” and so yes, that is what “confidence” means in statistics just like “significant” doesn’t mean “important, or of particular interest” it just means “the p value was less than an arbitrary threshold Ronald Fisher once thought was small enough”

Bayes is actually the system where probability is set up specifically to have meanings that match some external meaning, since with Cox’s theorem as probabilities approach 1 or 0 the meaning is the same as the Boolean meaning of True or False.

• But that is the problem. People do ascribe ordinary meanings to words like “confidence” and “significant”, and the result is they think statistical test eliminate the possibility that the results are due to chance.

• Yes, I agree a big part of the problem is that Statistics coopted ordinary words. One suspects this wasn’t an accident.

• I don’t doubt that it’s a common misinterpretation (though I suspect a lot of people actually know better but are trying to navigate publishability and conveniently forget what a confidence interval is). All I’m saying is that nobody anywhere *taught* them that confidence intervals *must* contain the value. Nobody anywhere… not even in the worst statistics course I can imagine. So where did they get it? Not form the profession, that’s for sure…

• I think that the quote that I have above, (“The correct interpretation of a 95% confidence interval is that “we are 95% confident that the population parameter is between X and X”) is pretty standard and will sound to most people like we are 95% certain that the true value lies in the confidence interval, and 95% certain will mean “certain.” So, I think that the misinterpretation is the product of bad text books and science communication.

• Jonathan:

• I read your PS, and I agree with it. But 95%<100%, no matter what your paradigm. Even Bayesian intervals, except in special cases are less than 100%. They are different than frequentist intervals, and they may well be superior, but they are no more certaiin, on their own terms, of containing the "true value." Indeed as you say above, the data eventually trumps any prior, so what do you do when the data is utterly incompatible with a strongly held prior? One answer is: lie there and take it. The other is to change your mind about what is possible.

• According to the Cambridge Dictionary, one of “must” definitions is “used to show that something is very likely, probable, or certain to be true”. I think that I used the word correctly.

3. One proposal is to view the confidence interval as an object that we have freedom to optimize over rather than a deterministic output from the posterior distributions. We have tried to seek the shortest interval, but there are other goals to pursue such as the most matched coverage, the most “reasonable” values, etc., which ultimately lead to some interval based scoring rules.

• What do you mean by confidence interval? I wouldn’t say that anyone views a [frequentist] confidence interval as a deterministic output from the posterior distributions.

• Yes, even Gauss and Laplace argued over that ;-)

4. I don’t know… I feel this is an unfair account of confidence intervals. You say:

> The first problem, as noted above, confidence intervals can contain lots of values that may be compatible with the data but which make no sense

I don’t see this as a problem, really. Unreasonably large CIs indicate that the model may be inappropriate and/or the data is too scarce to draw conclusions, right? I think this is a good feature because the receiver of the results is made aware of the limitations of the analysis.

> But that’s why we use Bayesian methods. Throw in your prior and these outlandish estimates should no longer be in the interval—unless the data and model really really support them.

Following my previous point: Why would I want to do that? If the data or model are inappropriate, adding priors feels like hiding the limitations of the study. Or worse, what if the prior is off and the data is scarce? I can see reasons to do that in some cases but not in general. Since my collaborators know more than me about the underlying biology, I prefer to give them a summary of the results (CIs and all) and take it from there rather than presenting an opaque merge between my prior and the data.

Regarding the beauty and sex ratio, the author reports a significance level between 0.05 and 0.01 (do I get it right?). I would say p-values/confidence intervals are doing a good job here because a reader can ask: “Given a similar dataset where the null hypothesis holds, how hard would I need to try to achieve that significance level?”. The answer would probably be “not very hard” given all the options to analyze the data. So p-values are correctly saying that the results are not to be taken too seriously.

Also, I don’t see how a Bayesian approach can fix the problem in a general sense. Nothing prevents you from selecting the prior and model that gives astonishing results and publishing only that selection.

Loosely related, I would consider all the literature about the misuse and misinterpretations of p-values/CIs an asset of these methods since we know how they fail and how to spot misuses. In contrast, we don’t really know how Bayesian statistics would behave in the hands of people without too much experience and expertise on these methods. For example, correct me if I’m wrong, a posterior is valid provided you correctly capture the ways the data can be produced (i.e. the denominator of the Bayes theorem) – I could imagine this being an issue for many inexperienced practitioners?

• Dario:

In the beauty and sex ratio example, the published estimate is something like 8 percentage points (that is, beautiful parents being 8 percentage points more likely to have girls) with standard error of about 3 percentage points. So the 95% interval is something like [0.02, 0.14]. This is an interval that only contains unreasonable values. All the reasonable values (roughly, the range [-0.002, 0.002]) are outside the interval. And this comes up a lot in applications, that the 95% interval only contains unreasonable values. I understand that there’s selection bias—we hear about some of the worst cases!—but I also think we can do better than using a method that regularly spews out bad results.

Or, to flip it around, if people are going to create classical confidence intervals, we should realize the problem with treating these intervals as if they contain reasonable values.

5. My two bucks:
1. The examples may give a sense of why compatibility is a much weaker condition than confidence, and far more appropriate a label for a span passed off as “confidence interval” in what at least one Bayesian decried as a con game at the time Neyman promoted the method in the 1930s.
As psychiatrists have noted, the delusional system of a paranoid schizophrenic is often logically coherent and perfectly compatible with all the facts at hand; the primary empirical distinction of the system is that only the patient believes it – as opposed having a socially influential population of believers (which graduates the system to that of an established philosophy, religion, or school of statistics). The confidence religion in statistics equates compatibility to confidence, under the delusion that this misidentification enables one to make contextually sound inferences and decisions without consulting contextual information (including about the study’s conduct as well as external evidence).
2. “Ultimately the problem is that we have uncertainty, and it’s a mistake to take an estimate – even an interval estimate – as a statement of certainty.”
– I agree that’s a practical mistake, yet it is a hallmark of “statistical inference”. One academic response is that operational subjective Bayesians can in theory produce intervals that capture warranted (un)certainty if they spend an extraordinary amount of time carefully incorporating all available information into an entropy-maximized joint distribution for all known uncertainty sources. Of course, in practice to get our work done we deploy a mess of incoherent tools and heuristics. Each and all of those ignore important uncertainty sources, making it all the more important to consult contextual information at length and in depth before crunching any numbers, let alone offering any inferences or decisions.

• Based on your response what are your thoughts on this procedure (an example I’m working on in a write up).

Imagine you want to compute height differences between 2 groups but instead of using 95% or more generally some “cut off” informed by sampling variability and confidence procedure, you pick the range of values you want to compute p and/or s values for.

Data shows that the Avg Domestic NBA height is a couple inches shorter than Avg International NBA height. We could tighten up what we think is the best range of values but imagine if we choose more extreme bounds for a sense of precaution or “comprehensiveness”.

What are the most extreme bounds for human height when comparing at the group level? My answer is comparing those who have dwarfism (4 ft 10 inches or less) vs [insert sport that maximizes height].

I choose NBA international which average is around 6 ft 11 inches. Convert both to inches:

Dwarfism – 58 in
Inter NBA – 83 in

83 – 58 = 25 inches.

P or S Value Graph/Plot would show “compatibility” for mean diff for values from

-25 to 25 inch

Get p values using non-central t distribution.

While there are “unreasonable” and “reasonable” values, this decouples confidence procedure & compatibility.

• Sorry, I’m not clear on how what your proposing is different in substance from simply examining contextually specified points on a P-value function (P-function or compatibility curve) or S-value function.

Also, I don’t see why you are using a noncentral distribution;
it looks to me all that is required here is a location shifted score:
|observed difference – test difference|/SE(obs|test)

• > Sorry, I’m not clear on how what your proposing is different in substance from simply examining contextually specified points on a P-value function (P-function or compatibility curve) or S-value function.

Sorry that’s what I’m saying to do. I’m not proposing something different but in support of contextually specifying values and plugging them in.

I was giving an example of a conservative way to define range of values to plug in and plot. IE we can think about values and not be religiously tied to confidence interval.

Also true about non-central we can always shift eg difference of 5

X1 – X2 – 5

I guess I like to think about getting away from 0 if that makes sense?

• If you are for example discussing people who have a circulatory disease related to being very tall due to excess growth hormones then it would be stupid to include in your height estimate anyone with dwarfism. This is the analogous procedure to lots of other cases where frequentist confidence intervals fail. We can in this example construct a prior for height by for example utilizing an interval between the median human height and the tallest recorded human height and we will exclude people who have very short heights presumably because they don’t have the cardiovascular issue at hand.

6. Andrew — Its possible you’re slowly converting me (or I’ve been mugged by reality enough to have developed Bayesian sympathies), but the issue appears to be two-fold:

1. These people are morons, hacks, or moronic hacks. How stuff gets published that doesn’t pass a simple smell test (especially around effect sizes that are 10x too big or more) is beyond me.

2. Economists mostly have frequentist stats crammed into our heads, and/or — like many who use stats but arent statisticians — don’t quite understand core principles of uncertainty.

Should we just fundamentally rethink how we teach stats? (yes)

• Rethinking how we teach stats is ongoing by many – but far from resolved especially for different backgrounds.

I have put a lot of effort into material to help others to better grasp the logic of statistics where I think the biggest challenge is getting across that what you observed is of little scientific interest as it is – as that is just the dead past of what happened whereas science is future oriented. That is, what happened is only of scientific interest if it credibly informs what would repeatedly happen in the future. Confidence intervals are an attempt to do that but seen widely as far too successful. Note some of the comments on this post.

Unfortunately what repeatedly would happen in the future is counterfactual, not empirical, and can only be grasped through abstractions such as probability models and many people have a lot of difficulty with those as well as thinking abstractly. Now, I know that is just not a lack of mathematics as I have cast it all through as diagrams, simulations (no formulas) and assessments of compatibility and people still have challenges getting the important points and some don’t. For some reason, people with backgrounds in chemistry have an easier time with the material, even than those with statistics. So the bottom line of this getting lengthy paragraph is that like in the land of the blind where the one eyed man is king, in the land of those who can’t critically evaluate abstractions they will be ruled by rules and procedures that guarantee far more than they can deliver. Certainty about uncertainties.

• I’m not one for central planning (in before “Hoover Institution”!), but the problem is made worse by the explosion in specific fields that all use stats, but learn them in a domain-specific way. How do we avoid a future where _every_ discipline is social psychology-level “assume its laugh-out-loud garbage until you can prove otherwise” instead of a world where only pocket disciplines are poorly trained? How do we fix the problem at its root?

While getting my Econ training (at an R1 research university in the midwest), I was taught Probit regression _before_ I was taught logistic regression — all because of discipline-specific inertia. Econometrics was taught as if the 1970s GLM revolution happened in the early 1990s instead (or so it felt!) Andrew is quick to remind us that stats is hard, but that’s cold comfort when we’re churning out PhDs at a concerning rate and they’re ill-equipped to do good research because their tools are crappy.

• Keith said, “So the bottom line of this getting lengthy paragraph is that like in the land of the blind where the one eyed man is king, in the land of those who can’t critically evaluate abstractions they will be ruled by rules and procedures that guarantee far more than they can deliver. Certainty about uncertainties.”

When I have taught a continuing education course called “Common Mistakes in Using Statistics: Spotting them and Avoiding them,” I say pretty early in the course, that I think “the most common mistake in using statistics is expecting too much certainty.”

7. To quote Jaynes after giving a real example of CI guaranteed to *not* contain the true value, where “guaranteed” means “follows from the same assumptions used to calculate the CI”:

start quote: “It is perfectly true that, *if* the distribution is indeed identical with the limiting frequencies of various sample values, and *if* we could repeat all this an indefinitely large number of times, then the of the CI *would* lead us, in the long run, to a correct statement 90 percent of the time. But it would lead us to a wrong answer 100 percent of the time in the subclass of cases where [given condition], and we know from the sample whether we are in that subclass. …

Our job is not to follow blindly a rule which would prove correct 90 percent of the time in the long run; there are an infinite number of radically different rules, all with this property. Our job is to draw the conclusions that are most likely to be right in the specific case at hand …

This does not mean that there are no connections at all between individual case and long-run performance; for if we have found the procedure which is ‘best’ in each individual case, it is hard to see how it would fail to be ‘best’ also in the long run.

The point is that the converse does not hold; having found a rule whose long-run performance is proved to be as good as can be obtained, it does not follow that this rule is necessarily the best in any particular individual case. One can trade off increased reliability for one class of samples against decreased reliability for another, in a way that has no effect on long-run performance; but has a very large effect on performance in the individual case.” end quote.

Those first two “if”‘s are carrying a lot of water. They are almost never true. The model “tests” and “verifications” frequentists use to “guarantee” that they hold, do no such a thing the vast majority of the time.

• A confidence interval makes extra assumptions for computational efficiency. When it “works”, that is because the result happens to correspond to a credible interval using a uniform prior.

The confidence interval is a heuristic that can be used instead of the credible interval. It works for almost every simple stats 101 problem but, like any heuristic, can fail on more complex problems.

That is all it is.

8. I don’t know about the original article but the post about Zillow is incredible. And here I thought AI/ML plus lots and lots of data can never go wrong!!!

If you haven’t followed that link you should.

• yes, it’s a great read, especially since i have a personal grudge against zillow

9. I like a point estimate with a CI better than one without.

The CI usually only expresses random sampling error. The result can contain other errors (i.e. sampling bias, p-hacking, …), and I’d look for those when faced with an unexpected result (I look for these anyway, but then I’d look harder).

In other words, the 95% wager on the “truth” in the CI is only fair if the study is otherwise perfect. Which few are.

10. Stark (2015) makes the counterpoint that if all you have are bounds on your estimate, using any prior at all builds much more information than you actually possess into your estimate.

It seems perfectly reasonable to take the agnostic stance that sometimes your a priori knowledge comes in the form of constraints (more naturally represented without a prior) and sometimes it comes as information about the relative plausibility of values in addition to constraints (more naturally represented with a prior). It doesn’t seem like there is a point to be scored for either frequentism or Bayesians here, just that there are two different tools that are most naturally useful on different problems.

Stark, Philip B. “Constraints versus Priors.” SIAM/ASA Journal on Uncertainty Quantification 3, no. 1 (January 2015): 586–98. https://doi.org/10.1137/130920721.

• I’m fairly familiar with Stark’s works. He simply cannot comprehend any meaning for a probability distribution other than “frequency”. So if a parameter is bound by -1 to +1 he sees a uniform prior on [-1,+1] as implying a positive value for the parameter physically occurs 50 percent of the time, which he naturally interprets as a entirely unwarranted yet massive assumption .

It’s amusing to consider what would happen if you asked a Frequentist to do a sensitivity analysis for this parameter (take a non-trivial example where the ‘results’ aren’t monotonic in the parameter). They’d have to vary it, but keeping it within the -1,+1 bounds. The vast majority of the time they would use a uniform distribution to select the test parameter values for the sensitivity analysis, and when they did so, no one would suggest they were adding in unwarranted assumptions. Indeed, they wouldn’t have any philosophical qualms about it all.

Anyway, as is usual with frequentists, you can learn a great deal about Stark from his papers mentioning Bayes, but you’ll learn precious little about bayes from them.

• Anon:

I agree with Stark’s statement, “absent a meaningful prior, Bayesian uncertainty measures lack meaning.” I would just make this general by replacing the word “prior” with the word “model” and replace the word “Bayesian” with the word “statistical.”

• Stark would probably say that a statistical model without a prior can be meaningful.

I also agree with him that for a uncertainty measure to be meaningul in a probabilistic sense a prior is required though. But a frequentist uncertainty measure is not meaningful on its own. You may not need a prior but you need a context.

• > Stark (2015) makes the counterpoint that if all you have are bounds on your estimate, using any prior at all builds much more information than you actually possess into your estimate.

One could can also argue that if all you have are bounds, using frequentist methods you also put more information that you actually have into your inference.

For example, say that you are estimating the size of circles and “all you have” is the bound corresponding to size being positive. Using the radius or the area you will get different results. If your only a priori knowledge is that size is not negative should you be using the radius or the area for your inference?

• Carlos:

Sure, Bayesian inferences can be uncalibrated too. The point of my above comment is not to say that Bayes will necessarily solve your problem; rather, I’m just saying that if you take unconditional probability statements out into the world, you can get burned on the conditional miscalibration.

• I just found funny the Zillow example given that they shot themselves in the foot with a Bayesian shotgun (I guess, I don’t know about that particular model but they have published blog posts about using hierarchical models and whatnot).

I think that the reasons for that debacle run much deeper. They are not the kind of issues that are solved with a nicer prior and better calibration.

• I haven’t followed the zillow issue. Isn’t it that they tried to use some kind of average rather than sending someone out to assess how much it would cost to repair the houses the were buying?

• They are not the kind of issues that are solved with a nicer prior and better calibration.

Indeed, even a perfect oracle with exactly true posteriors conditional on available data would fail in the zillow case. As long as there’s any information available to agents and not to the oracle, the agents will choose to sell when the oracle is overpriced and buy when the oracle is underpriced.

The available data is stuff like number of bedrooms, square feet, bathroom, location. But there are many things that obviously matter a lot to buyers that don’t admit an obvious parameterization for ML applications, like overall geometry and appearance. The information asymmetry is inherent to this problem domain.

On top of that, there’s a complex interdependency between the values of different houses at many levels of hierarchy. The values of houses conditional on a particular intersection might look independent for small daily swings but correlated for bigger ones, bigger swings will move an entire town at once, the biggest ones will move a country. Solving the estimation problem here to just pick the right houses to buy is impossible, you *have* to hedge your portfolio with some anticorrelated assets.

This whole idea was doomed from the start, the kind of idea an executive has when they can’t tell the difference between “algorithms” and magic. It’s astounding that so much money can be thrown at an idea this obviously bad.

• Somebody:

Well, there is the vig, so they don’t have to be perfect.

• Can you give a quick TLDR of where you see “madness” in their methodology? What was the “debacle” you mention later?

11. In general, there is a prior so that the credible interval equals the confidence interval. If the prior is unreasonable, the credible interval will probably be unreasonable. The confidence interval doesn’t magically become reasonable just because you pretended that you weren’t using a prior.

Priors are not optional, as shown by considering the lady tasting tea experiment versus an ESP experiment versus a guessing the composer of a musical work experiment. The three experiments might produce the same data, but the conclusions differ.

12. This post and most of the comments section are emblematic of why I stopped being ‘a Bayesian’, even if I occasionally use Bayes (mostly when collaborators or clients want me to)

• Ojm:

Could you please explain why poor performance of confidence intervals would make you not want to be a Bayesian?

I do appreciate the point that all statistical methods, Bayesian and otherwise, have problems, which would motivate anyone to not want to be “an X,” where X represents any rigid philosophical position. But such an objection is general enough that I wouldn’t see why you would need anything like this post to push you toward a philosophical or methodological agnosticism.

• The critiques, claims and alternatives offered throughout are just so off base and tunnel vision too me I’m like ‘man I don’t want any part of this’. Then I look into alternatives and I’m like ‘yeah that actually seems alright’. And here I am. It’s not the most rational or coherent path I admit…

• Oliver,

You haven’t quite gone all-in with your hipster statistician shtick. Here’s some suggestions to perfect the genre:

“You haven’t really heard Neyman-Pearson until you’ve listened to them on vinyl”

“Is that a vegan credibility interval?”

“It’s kinda like Bono meets Weezer, but it’s just me with a p-value”

“I was into Severity before it went corporate.”

“It’s Dempster-Shafer, I doubt you’ve heard of it”

“I only use locally sourced statistical philosophy”

• I think ojm has a point, many of the conversations about being Bayesian have not progressed much here in the last 10 or 15 years.

The posterior is just a conditional “subset” of the prior and if the prior is not credible there is no reason to expect the posterior to be. The good properties that a posterior might have are just the frequentist properties of it’s credible intervals.

There has been increasing acceptance with Bayesian work flow such as checking priors and some calibration (aka frequencies). But most Bayesians continue to hear no frequencies, speak no frequencies and speak no frequencies.

However, the question of how to get a credible prior and assess it as credible remains largely an open question.

• While I think ojm’s thoughts could have been explicated in greater detail (and I hope that they are) that doesn’t warrant whatever the hell this list is. I haven’t seen ojm comment nearly as often as he used to, and I feel that is a shame for this blog and its readership. His sparring with Lakeland and Carlos – among others – always led me to think about things from a completely different perspective and that is something I value highly.

Point being ojm really could have expanded on why he thought the comments were so tunnel visioned (and I that hope he does), but you could ask him to do so without being a dramatic about it

• Haha thanks Allan, nice of you to say. In all honestly I don’t think I have the appetite for it anymore. I just legitimately feel like Bayes take doesn’t resonate with me anymore for various reasons. I think the whole Jaynes logical probability stuff is flawed (who cares about building stat on propositional logic when theories aren’t proportions etc), the subjective stuff never grabbed me, possibly for similar reasons etc. Then you have all the ‘of course if we want to include prior info we need to represent it as a prior probability’, the dismissal of Frequentist inference without understanding of some of the what to me turned out to be fascinating history and motivation when I went back to (eg re-reading Peirce, Neyman, Hacking etc) etc etc.

Anyway, someone linked this on Twitter so I came by to see what I’d been missing, got depressed reading the same old cliche stuff and posted an ill advised and not particularly constructive comment to this effect.

Feel free to ignore/delete this whole little comment thread :-)

• I second this. OJM always has an interesting take on things. Have a look at his “Tropical Bayes” paper on arXiv, for instance.

• I always liked ojm pushback as well. He made me finally buy and read a book on set theory, which in the end was a good thing, and though he never convinced me to waver from my basic Bayesian position, he did convince me that the boundaries of the theory still have problems, namely that it’s always the case that there are theories outside the ones included in any given analysis and that we should be open to reanalysis when new theories come into our line of sight.

But my impression is that the “M-open” problem isn’t ojm’s only objection, and perhaps some of his objection isn’t mathematical or scientific but rather social. That us Bayesians are just too smug so to speak. That’s too bad.

if that really was Joseph above, dude check your Signal messages more often!