Comments on: Handy statistical lexicon

By: Sameera Daniels

Sameera Daniels — Sat, 26 Feb 2022 03:06:50 +0000

In reply to Andrew. KEWL Andrew. I'll love to go through them. Thanks

By: Andrew

Andrew — Sat, 26 Feb 2022 02:47:41 +0000

In reply to Peter Burkimsher. Peter: Thanks! And you might also like the 77 best lines from my Bayesian Data Analysis course from a few years ago.

By: Peter Burkimsher

Peter Burkimsher — Sat, 26 Feb 2022 02:08:37 +0000

Thank you for a useful lexicon!
I learned about the meaning of Kangaroo via “The claimed effect size is about a zillion times higher than is plausible” which was posted to Hacker News.

As it is the duty as a peer, to review :

A bunch of hoops to jump through: Research methods and statistics, as viewed my many researchers.
->
A bunch of hoops to jump through: Research methods and statistics, as viewed by many researchers.

Yes, it’s nitpicky. But I forgive you, the typo is p < 0.03 ;-)

By: bbis

bbis — Thu, 09 Sep 2021 02:01:09 +0000

In reply to Matt VE. Control + F will allow a search of the page. It is pretty easy. It only took me half a dozen visits before it occurred to me.

By: Andrew

Andrew — Thu, 09 Sep 2021 01:32:33 +0000

In reply to Matt VE. Matt: They're in chronological order of when I thought of adding them to the list.

By: Matt VE

Matt VE — Thu, 19 Aug 2021 13:40:45 +0000

Have you considered organizing these in alphabetical (or some other informative) order?

By: Mike

Mike — Sun, 18 Jul 2021 20:07:55 +0000

Not sure if this is a legit lexicon entry but…

Schrodinger’s data: “Source X’s credibility exists in a fuzzy superposition of ‘totally credible’ and ‘entirely untrustworthy’ until we find out whether its claim fits comfortably within our politics [or other biases], at which point its status collapses conveniently into one state or the other.”

https://jessesingal.substack.com/p/how-science-based-medicine-botched

By: Manuel Onate

Manuel Onate — Wed, 16 Dec 2020 15:17:43 +0000

This is not strictly an statistical fallacy, but it is my favorite geometry axiom:

Axiom of the fat point

Given a line and a point outside it, there are as many lines on the plane containing the point and the line that pass through the point and don’t intersect with the line, provided that the point is sufficiently fat.

This axiom and the many postulates that derive from it make geometry so much more easy …

By: Jessica Hullman

Jessica Hullman — Wed, 25 Nov 2020 17:23:08 +0000

In reply to Sander Greenland.

I’m a fan of this idea – such a small change in label but a rather drastic change in what it implies about how to think about model estimates. Assuming people take the time to understand why its called a compatibility interval, it’s a shift in emphasis so that we’re leading with the unquantifiable/hard to quantify uncertainty over the assumption that the specific model is good, which so often gets overlooked but can explain a lot of seemingly implausible estimates.

By: Keith O'Rourke

Keith O'Rourke — Fri, 14 Feb 2020 23:06:29 +0000

In reply to Zad.

As Sander put it – only the “compatibility” criterion differs.

Frequentist compatibility is conditional on the specific tested parameter – how often would possible data be this or more discrepant than the observed data, with the specific tested parameter (if the specific tested parameter was true).

Bayesian compatibility is conditional on the observed data – what’s the distribution of parameters that each would generate the exact same possible data as the observed data at least this often (or plausibly) or more.

Actually allows one to mitigate the degree of uncertainty laundering with Frequentist methods while introducing Bayesian inference in a way that is inoculated against uncertainty laundering using a workflow introduced first with Frequentist methods.

Or so I hope.

By: Sander Greenland

Sander Greenland — Fri, 14 Feb 2020 20:14:56 +0000

In reply to Zad. I share concerns about confusing types of intervals, but there is a sense in which they are more alike than different whenever the model is only hypothetical. In that case there is no real coverage validity (calibration), and both types of intervals are only showing compatibility of the data with their assumed model; only the "compatibility" criterion differs. This raises the possibility of other criteria, but then the resulting interval functions have usually turned out to be numerically the same as particular coverage or credibility functions (as with pure likelihood). HDPI may be OK insofar as it sounds hard to misinterpret, but researchers are creative so may prove me wrong if they adopt it (which does not seem likely any time soon in my field).

By: Daniel Lakeland

Daniel Lakeland — Fri, 14 Feb 2020 18:52:15 +0000

In reply to Sander Greenland.

From a getting paid to do work perspective of course you are absolutely right. I’d just say that being open and up-front about those particular issues, and telling people what *should* be done even if it can’t be done is a task we should bend over backwards to do. of course, it doesn’t make getting contracts any easier… let me agree entirely on that. Its a pleasure when you find someone who will buy the real deal.

My wife was discussing the budget for a grant with one of her colleagues, she proposed putting something in the budget explicitly for data analysis. Her colleague just said they should find some collaborator whose lab would do it free, after all it’s only a few hours of a grad student’s time or something to press the buttons on the bioinformatics software and write up the results right?

:-|

By: Zad

Zad — Fri, 14 Feb 2020 18:46:59 +0000

In reply to Sander Greenland. I do like that McElreath calls Bayesian intervals "compatibility intervals", but I think it might open the doors for possible confusion with frequentist intervals. What do you think about 95% highest density posterior intervals (HDPI) for Bayesian intervals, it's one that John Kruschke uses in his Bayes book

By: Sander Greenland

Sander Greenland — Fri, 14 Feb 2020 18:20:35 +0000

In reply to Daniel Lakeland.

Daniel: I agree to all that in principle. Unfortunately as with so much that is good “in principle”, it’s simply not practical (apart from infrequent exceptions), at least in my main application field (medical drug, device, and practice surveillance). There, few researchers can correctly interpret a P-value or CI let alone comprehend in detail the ordinary unrealistic model generating those; some high-prestige journals like JAMA even force authors to misinterpret P-values and CIs!

No surprise then that the labor involved in modeling out uncertainty sources in detail is well beyond that budgeted for analyses, and far beyond the training or competence of most teams. Worst of all, the incentives are all stacked to do no such thing, because it will inevitably lead to weaker conclusions not even worth a press release let alone acceptance in a high-status journal.

I strongly doubt the situation is any better in other health sciences or social sciences or psychology. In the face of such harsh reality, I see no alternative than to try and force honest description of conventional outputs. At least get away from terms promoting overconfidence, like “significance”, “confidence”, “coverage”, “credibility”, etc. in favor of less sensational, more modest ordinary-language descriptions, as illustrated in Chow & Greenland, http://arxiv.org/abs/1909.08579

By: Sander Greenland

Sander Greenland — Fri, 14 Feb 2020 17:55:26 +0000

In reply to Andrew.

Thanks Andrew! I’d like to think we are getting closer…

For this iteration, in response:

First: “CI” is just an abbreviation for “confidence”, “coverage”, “credible”, “compatibility” etc. (e.g., “crap”) interval. It solves only a speed-typing problem. What they share is that none of them capture uncertainty outside of stylized (and in my work, unrealistic) examples. Otherwise we should face the fact that the interval estimates in research articles and textbooks do not deserve labels as strong as “confidence”, “coverage”, or “credible”. The key question is: Why should we care about uncertainty (or coverage, confidence, or credibility) given unrealistic models? At best we are only getting compatibility with those models (distinguished from the other Cs only in that it is not a hypothetical conditional; see Greenland & Chow, http://arxiv.org/abs/1909.08583).

Second: Fully agree that”confidence intervals” rarely have their claimed coverage properties and so are not coverage intervals; (thus their name is a confidence trick, as Bowley said upon seeing them in 1934). That’s why I call them “compatibility intervals” in my work. And fully agree that “credible intervals” rarely warrant credibility near what is stated (e.g., 95%) and often contain incredible values, so that at least one modern Bayesian text (McElreath) also calls them “compatibility intervals” (albeit here the compatible models include an explicit prior).

Third: If you agree that all these CIs are model-based and thus do not capture total uncertainty, then you’ve made my point: “Uncertainty interval” (UI) is a very bad term for them because (apart from very special cases) CIs do not capture total uncertainty. Worse, CIs often capture only a minority of uncertainty, for the reasons I stated.

Adding those up: You have been in a leader in condemning uncertainty laundering, hence I’m baffled as to why you’d continue to promote labeling CIs as UIs. It seems obvious (to me anyway) from past researcher performance that they already take CIs as representing total uncertainty; thus relabeling CIs as “uncertainty intervals” will only dig in this misinterpretation even deeper. At best, they could be labeled as “MINIMAL-uncertainty intervals” with a massive emphasis on “minimal”, but then we should caution that they may be WAY too narrow, and may be biased WAY off to an unknown side.

By: Daniel Lakeland

Daniel Lakeland — Fri, 14 Feb 2020 17:31:02 +0000

In reply to Sander Greenland.

Sander, shouldn’t we be advocating that people actually model those often unmodeled uncertainties. I mean, for example unless your measurement apparatus is quite good, you should probably have a measurement error in your model, and unless you’re doing an extraordinary job of recruiting a wide variety of patients to match the demographics of your country, you should be including some kind of sample bias or something in your model, and when there are generating process issues, you should add reasonable “width” to your likelihoods, which can be accomplished through informative priors that bias the error scales away from zero intentionally…

having done all that, we won’t be perfect, but we won’t be fooling ourselves either, and now, with those components in our models, we can discuss them explicitly and argue over what a good model for them is…

Anything else is I agree fooling ourselves, and like Feynman said in his cargo cult lecture, the first thing we need to do is not fool *ourselves*.

By: Andrew

Andrew — Fri, 14 Feb 2020 16:46:47 +0000

In reply to Sander Greenland.

Sander:

Statistical interval estimates are used in different ways, including to express confidence in a conclusion, to express a range of credible values, and to express uncertainty about an inference. In that sense, all three terms, “confidence interval,” “credibility interval,” and “uncertainty interval,” are reasonable, as they represent three different goals that are served by interval estimation. Separating these concepts can help, as there are examples of confidence intervals that do not include credible values and do not summarize uncertainty, there are examples of credible intervals that do not convey confidence and do not capture uncertainty, and there are examples of uncertainty intervals that are not interpretable as confidence or credibility statements.

Regarding your point: all three of these concepts—“confidence interval,” “credibility interval,” and “uncertainty interval”—are model-based, and all of our models are wrong. So, sure, I agree, except in some rare cases, uncertainty intervals do not capture total uncertainty. But the same is the case for confidence and credibility intervals. Except in some rare cases, confidence intervals do not have the claimed confidence properties, and, except in some rare cases, credible intervals can exclude credible values and include incredible values.

If you want to call the term “uncertainty interval” a “sales gimmick,” fine. I’d prefer to say it’s a mathematical statement conditional on a model, which is what I’d also say of “confidence interval” or “credibility interval.” I don’t see how calling it a “CI” solves this problem.

By: Sander Greenland

Sander Greenland — Fri, 14 Feb 2020 16:34:07 +0000

In reply to Andrew.

Great!
Now, if you’d only stop claiming that “confidence” and “credibility” intervals (CI) are “uncertainty intervals” we might approach stat nirvana. Until then, that “uncertainty” label is conning the reader and ourselves. Why? Because CI do NOT capture total uncertainty (outside of the highly idealized examples that characterize the toy universe of math stat). That means calling either kind of CI an “uncertainty interval” is part of the usual stat sales gimmick of empty quality assurance (AKA “error control”).

Look, we always compute CI from a data model. In 100% of my work (and I bet about the same % of yours) there’s serious uncertainty about the underlying physical data-generating process. That process has important features not captured by our model, like measurement errors and selection biases. In that case the CI flopping out of our software (whether SAS, Stan or Stata) are OVERCONFIDENCE intervals, and should not be assigned anything near either the numeric confidence or credibility shown alongside them.

Unless you carry out the arduous task of including all important uncertainty sources in the model, CI do NOT account for our actual uncertainties about the mechanisms producing the data. And that uncertainty can far exceed any uncertainty from the “random variation” allowed by the assumed model; see for example Greenland, S. (2005). Multiple-bias modeling for analysis of observational data (with discussion). J Royal Statist Soc A, 168, 267-308.

Note well: Model averaging and so-called “robust” (another con word) methods don’t address this uncertainty problem. Those methods only address uncertainty about the “best” mathematical form for combining the observations, not problems with the observations like measurement error, selection bias, and (in allegedly causal analyses) uncontrolled confounding.

At best then, we can only say that CI show us a range of good-fitting models (models “highly compatible with the data”) within the very restricted model family used to combine the observations.

By: Andrew

Andrew — Fri, 14 Feb 2020 15:21:01 +0000

In reply to Sander Greenland.

Sander:

It’s not a fallacy, it’s an assumption! But I agree that assumptions should be clear. So I’ve rewritten that entry; it now says, “16: You need this much more of a sample size to estimate an interaction that is half the size of a main effect.”

I agree that the entry as written was potentially misleading, so thanks for giving me the push to fix it.

By: Sander Greenland

Sander Greenland — Fri, 14 Feb 2020 15:12:25 +0000

“16: You need this much more of a sample size to estimate an interaction than to estimate a main effect.”
As an antidote to this fallacy go to this exchange:
https://statmodeling.stat.columbia.edu/2020/02/10/evidence-based-medicine-eats-itself/#comment-1242382

By: Evidence-based medicine eats itself « Statistical Modeling, Causal Inference, and Social Science

Mon, 10 Feb 2020 14:39:37 +0000

[…] effects for individuals or population subsets is difficult. A quick calculation finds that it takes 16 times the sample size to estimate an interaction as a main effect, and given that we are lucky if […]

By: Advice for a Young Economist at Heart « Statistical Modeling, Causal Inference, and Social Science

Thu, 06 Feb 2020 14:48:34 +0000

[…] Again, though, expect that most things will not be statistically significant—remember 16—but that doesn’t mean they’re not important. Instead of thinking of your study as […]

By: Schoolmarms and lightning bolts: Data faker meets Edge foundation in an unintentional reveal of problems with the Great Man model of science « Statistical Modeling, Causal Inference, and Social Science

Wed, 02 Oct 2019 13:04:51 +0000

[…] You can see how this could create big problems for Hauser. To start with, if you think all that matters are the lightning bolts of intuition, then you’re putting yourself under a lot of pressure to stand in just the right place in that rain cloud, to be where the voltage is highest so you can throw that lightning bolt. Second, once you become a celebrated Harvard professor, then you’re under even more pressure, either to come up with that damn bolt of lightning, or to play the part and act as if you’ve already discovered it. Remember the Armstrong principle. […]

By: “Less Wow and More How in Social Psychology” « Statistical Modeling, Causal Inference, and Social Science

Tue, 01 Oct 2019 14:21:18 +0000

[…] mistake, which is to just assume that the claims of the 1996 study are correct. Remember the time-reversal heuristic? Pretend the large, careful study with its null finding came first, followed by the small, […]

By: “Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science” « Statistical Modeling, Causal Inference, and Social Science

Tue, 17 Sep 2019 13:01:11 +0000

[…] in the meantime, decisions need to be made, and are being made, every day. This is related to the Chestertonian principle that extreme skepticism is a form of […]

By: Sameera Daniels

Sameera Daniels — Sat, 12 May 2018 18:05:04 +0000

Andrew,

It would be great if you got John Ioannidis here to debate the p-value debate. What is its disposition? Everyone goes off on leaving just shy of making an impact debate wise. Is one to conclude that this debate on backburner?

By: Sameera Daniels

Sameera Daniels — Fri, 11 May 2018 16:38:16 +0000

One can just relegate thinking to the dustbin of history b/c much thinking, more generally is constituted from these concepts & methods. Statistics if enabling such thinking will be futile. That’s what I myself have been trying to convey to my circles. I think we are due for new epistemics/epistemology. I can visualize some dimensions already. But how to communicate it is my challenge.

I have identified some individuals who I think can make superb contributions. This forum too can be helpful.

By: Zack

Zack — Thu, 30 Nov 2017 16:25:21 +0000

I can’t decide if I’m very happy or very annoyed that this exists.

On the one hand, I love learning about ALL of this stuff, especially the more subtle fallacies.

But on the other hand, my list of things to read just exploded exponentially.

So, thank you. Jerk.

By: Proposal of a new term: "DOCO" | Stephen R. Martin

Proposal of a new term: "DOCO" | Stephen R. Martin — Thu, 21 Sep 2017 19:25:37 +0000

[…] am proposing a new term: DOCO. I will, in spirit, add it to the already impressive list of useful terminology. DOCO stands for Data(or datum) Otherwise Considered […]

By: New Media in Psychology and Philosophy - Daily Nous

New Media in Psychology and Philosophy - Daily Nous — Thu, 22 Sep 2016 15:11:07 +0000

[…] using abundant researcher degrees of freedom. It’s the paradigm of the theory that in the words of sociologist Jeremy Freese, is “more vampirical than empirical—unable to be killed by […]

By: The New Dogs of Politics | ansurs

The New Dogs of Politics | ansurs — Tue, 28 Apr 2015 18:42:26 +0000

[…] analysis, and concomitant immersion in the internet. I landed on Andrew Gelman’s stat blog and remembered that ‘humor’ is a great approach and natural response to dealing with […]

By: Andrew Gelman

Andrew Gelman — Tue, 16 Mar 2010 13:42:52 +0000

By: Ken Williams

Ken Williams — Tue, 16 Mar 2010 12:15:48 +0000

I'm not grokking what "WWJD" stands for. "What Would Jennifer Do"?

By: jonathan

jonathan — Tue, 26 May 2009 11:22:09 +0000

Mister P, huh? Isn't that reflective of the old male dominant paradigm?

By: Andrew Gelman

Andrew Gelman — Mon, 25 May 2009 19:23:10 +0000

Marcel: When I say "through chapter 10," I mean, "from chapters 1 through 10." And in the last sentence above, I meant "optional," not "optimal." I'll fix that.

By: marcel

marcel — Mon, 25 May 2009 16:54:34 +0000

In WWJD, you say, "My quick answer is, Yeah, I think it would be excellent for an econometrics class if the students have applied interests. Probably I'd just go through chapter 10 (regression, logistic regression, glm, causal inference), with the later parts being optimal."

So just skip the earlier parts?