Frequentist compatibility is conditional on the specific tested parameter – how often would possible data be this or more discrepant than the observed data, with the specific tested parameter (if the specific tested parameter was true).

Bayesian compatibility is conditional on the observed data – what’s the distribution of parameters that each would generate the exact same possible data as the observed data at least this often (or plausibly) or more.

Actually allows one to mitigate the degree of uncertainty laundering with Frequentist methods while introducing Bayesian inference in a way that is inoculated against uncertainty laundering using a workflow introduced first with Frequentist methods.

Or so I hope.

]]>HDPI may be OK insofar as it sounds hard to misinterpret, but researchers are creative so may prove me wrong if they adopt it (which does not seem likely any time soon in my field).

]]>My wife was discussing the budget for a grant with one of her colleagues, she proposed putting something in the budget explicitly for data analysis. Her colleague just said they should find some collaborator whose lab would do it free, after all it’s only a few hours of a grad student’s time or something to press the buttons on the bioinformatics software and write up the results right?

:-|

]]>No surprise then that the labor involved in modeling out uncertainty sources in detail is well beyond that budgeted for analyses, and far beyond the training or competence of most teams. Worst of all, the incentives are all stacked to do no such thing, because it will inevitably lead to weaker conclusions not even worth a press release let alone acceptance in a high-status journal.

I strongly doubt the situation is any better in other health sciences or social sciences or psychology. In the face of such harsh reality, I see no alternative than to try and force honest description of conventional outputs. At least get away from terms promoting overconfidence, like “significance”, “confidence”, “coverage”, “credibility”, etc. in favor of less sensational, more modest ordinary-language descriptions, as illustrated in Chow & Greenland, http://arxiv.org/abs/1909.08579

]]>For this iteration, in response:

First: “CI” is just an abbreviation for “confidence”, “coverage”, “credible”, “compatibility” etc. (e.g., “crap”) interval. It solves only a speed-typing problem. What they share is that none of them capture uncertainty outside of stylized (and in my work, unrealistic) examples. Otherwise we should face the fact that the interval estimates in research articles and textbooks do not deserve labels as strong as “confidence”, “coverage”, or “credible”. The key question is: Why should we care about uncertainty (or coverage, confidence, or credibility) given unrealistic models? At best we are only getting compatibility with those models (distinguished from the other Cs only in that it is not a hypothetical conditional; see Greenland & Chow, http://arxiv.org/abs/1909.08583).

Second: Fully agree that”confidence intervals” rarely have their claimed coverage properties and so are not coverage intervals; (thus their name is a confidence trick, as Bowley said upon seeing them in 1934). That’s why I call them “compatibility intervals” in my work. And fully agree that “credible intervals” rarely warrant credibility near what is stated (e.g., 95%) and often contain incredible values, so that at least one modern Bayesian text (McElreath) also calls them “compatibility intervals” (albeit here the compatible models include an explicit prior).

Third: If you agree that all these CIs are model-based and thus do not capture total uncertainty, then you’ve made my point: “Uncertainty interval” (UI) is a very bad term for them because (apart from very special cases) CIs do not capture total uncertainty. Worse, CIs often capture only a minority of uncertainty, for the reasons I stated.

Adding those up: You have been in a leader in condemning uncertainty laundering, hence I’m baffled as to why you’d continue to promote labeling CIs as UIs. It seems obvious (to me anyway) from past researcher performance that they already take CIs as representing total uncertainty; thus relabeling CIs as “uncertainty intervals” will only dig in this misinterpretation even deeper. At best, they could be labeled as “MINIMAL-uncertainty intervals” with a massive emphasis on “minimal”, but then we should caution that they may be WAY too narrow, and may be biased WAY off to an unknown side.

]]>having done all that, we won’t be perfect, but we won’t be fooling ourselves either, and now, with those components in our models, we can discuss them explicitly and argue over what a good model for them is…

Anything else is I agree fooling ourselves, and like Feynman said in his cargo cult lecture, the first thing we need to do is not fool *ourselves*.

]]>Statistical interval estimates are used in different ways, including to express confidence in a conclusion, to express a range of credible values, and to express uncertainty about an inference. In that sense, all three terms, “confidence interval,” “credibility interval,” and “uncertainty interval,” are reasonable, as they represent three different goals that are served by interval estimation. Separating these concepts can help, as there are examples of confidence intervals that do not include credible values and do not summarize uncertainty, there are examples of credible intervals that do not convey confidence and do not capture uncertainty, and there are examples of uncertainty intervals that are not interpretable as confidence or credibility statements.

Regarding your point: all three of these concepts—“confidence interval,” “credibility interval,” and “uncertainty interval”—are model-based, and all of our models are wrong. So, sure, I agree, except in some rare cases, uncertainty intervals do not capture total uncertainty. But the same is the case for confidence and credibility intervals. Except in some rare cases, confidence intervals do not have the claimed confidence properties, and, except in some rare cases, credible intervals can exclude credible values and include incredible values.

If you want to call the term “uncertainty interval” a “sales gimmick,” fine. I’d prefer to say it’s a mathematical statement conditional on a model, which is what I’d also say of “confidence interval” or “credibility interval.” I don’t see how calling it a “CI” solves this problem.

]]>Now, if you’d only stop claiming that “confidence” and “credibility” intervals (CI) are “uncertainty intervals” we might approach stat nirvana. Until then, that “uncertainty” label is conning the reader and ourselves. Why? Because CI do NOT capture total uncertainty (outside of the highly idealized examples that characterize the toy universe of math stat). That means calling either kind of CI an “uncertainty interval” is part of the usual stat sales gimmick of empty quality assurance (AKA “error control”).

Look, we always compute CI from a data model. In 100% of my work (and I bet about the same % of yours) there’s serious uncertainty about the underlying physical data-generating process. That process has important features not captured by our model, like measurement errors and selection biases. In that case the CI flopping out of our software (whether SAS, Stan or Stata) are OVERCONFIDENCE intervals, and should not be assigned anything near either the numeric confidence or credibility shown alongside them.

Unless you carry out the arduous task of including all important uncertainty sources in the model, CI do NOT account for our actual uncertainties about the mechanisms producing the data. And that uncertainty can far exceed any uncertainty from the “random variation” allowed by the assumed model; see for example Greenland, S. (2005). Multiple-bias modeling for analysis of observational data (with discussion). J Royal Statist Soc A, 168, 267-308.

Note well: Model averaging and so-called “robust” (another con word) methods don’t address this uncertainty problem. Those methods only address uncertainty about the “best” mathematical form for combining the observations, not problems with the observations like measurement error, selection bias, and (in allegedly causal analyses) uncontrolled confounding.

At best then, we can only say that CI show us a range of good-fitting models (models “highly compatible with the data”) within the very restricted model family used to combine the observations.

]]>It’s not a fallacy, it’s an assumption! But I agree that assumptions should be clear. So I’ve rewritten that entry; it now says, “16: You need this much more of a sample size to estimate an interaction that is half the size of a main effect.”

I agree that the entry as written was potentially misleading, so thanks for giving me the push to fix it.

]]>As an antidote to this fallacy go to this exchange:

https://statmodeling.stat.columbia.edu/2020/02/10/evidence-based-medicine-eats-itself/#comment-1242382 ]]>

It would be great if you got John Ioannidis here to debate the p-value debate. What is its disposition? Everyone goes off on leaving just shy of making an impact debate wise. Is one to conclude that this debate on backburner?

]]>I have identified some individuals who I think can make superb contributions. This forum too can be helpful.

]]>On the one hand, I love learning about ALL of this stuff, especially the more subtle fallacies.

But on the other hand, my list of things to read just exploded exponentially.

So, thank you. Jerk.

]]>So just skip the earlier parts?

]]>