Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

“Valid t-ratio Inference for instrumental variables”

A couple people pointed me to this recent econometrics paper, which begins: In the single IV model, current practice relies on the first-stage F exceed- ing some threshold (e.g., 10) as a criterion for trusting t-ratio inferences, even though this yields an anti-conservative test. We show that a true 5 percent test instead requires an […]

Body language and machine learning

Riding on the street, I can usually tell what cars in front of me are going to do, based on their “body language”: how they are positioning themselves in their lane. I don’t know that I could quite articulate what the rules are, but I can tell what’s going on, and I know that I […]

Between-state correlations and weird conditional forecasts: the correlation depends on where you are in the distribution

Yup, here’s more on the topic, and this post won’t be the last, either . . . Jed Grabman writes: I was intrigued by the observations you made this summer about FiveThirtyEight’s handling of between-state correlations. I spent quite a bit of time looking into the topic and came to the following conclusions. In order […]

Reference for the claim that you need 16 times as much data to estimate interactions as to estimate main effects

Ian Shrier writes: I read your post on the power of interactions a long time ago and couldn’t remember where I saw it. I just came across it again by chance. Have you ever published this in a journal? The concept comes up often enough and some readers who don’t have methodology expertise feel more […]

She’s wary of the consensus based transparency checklist, and here’s a paragraph we should’ve added to that zillion-authored paper

Megan Higgs writes: A large collection of authors describes a “consensus-based transparency checklist” in the Dec 2, 2019 Comment in Nature Human Behavior. Hey—I’m one of those 80 authors! Let’s see what Higgs has to say: I [Higgs] have mixed emotions about it — the positive aspects are easy to see, but I also have […]

Alexey Guzey plays Stat Detective: How many observations are in each bar of this graph?

How many data points are in each bar of the top graph above? (See here for background.) It’s from this article: Milewski MD, Skaggs DL, Bishop GA, Pace JL, Ibrahim DA, Wren TA, Barzdukas A. Chronic lack of sleep is associated with increased sports injuries in adolescent athletes. Journal of Pediatric Orthopaedics. 2014 Mar 1;34(2):129-33. […]

Reasoning under uncertainty

John Cook writes, “statistics is all about reasoning under uncertainty.” I agree, and I think this is a good way to put it. Statistics textbooks sometimes describe statistics as “decision making under uncertainty,” but that always bothered me, because there’s very little about decision making in statistics textbooks. “Reasoning” captures it much more than “decision […]

Uri Simonsohn’s Small Telescopes

I just happened to come across this paper from 2015 that makes an important point very clearly: It is generally very difficult to prove that something does not exist; it is considerably easier to show that a tool is inadequate for studying that something. With a small-telescopes approach, instead of arriving at the conclusion that […]

It’s kinda like phrenology but worse. Not so good for the “Nature” brand name, huh? Measurement, baby, measurement.

Federico Mattiello writes: I thought you might find this thread interesting, it’s about a machine learning paper building a “trustworthiness score” from faces databases and historical (mainly British) portraits. It checks many bias boxes I believe, but my biggest complaint (I know it shouldn’t be) is the linear regression of basically spherical clouds of points: […]

Randomized but unblinded experiment on vitamin D as a coronavirus treatment. Let’s talk about what comes next. (Hint: it involves multilevel models.)

Under the heading, “Here we go again,” Dale Lehman writes: If you want to blog on the continuing theme – try this (it’s from Marginal Revolution, the citation): https://marginalrevolution.com/marginalrevolution/2020/09/a-vitamin-d-bet.html https://www.sciencedirect.com/science/article/pii/S0960076020302764 Vitamin D Can Likely End the COVID-19 Pandemic What is striking is the analysis by the Rootclaim group – repeated reliance on p values as […]

A question of experimental design (more precisely, design of data collection)

An economist colleague writes in with a question: What is your instinct on the following. Consider at each time t, 1999 through 2019, there is a probability P_t for some event (e.g., it rains on a given day that year). Assume that P_t = P_1999 + (t-1999)A. So P_t has a linear time trend. What […]

Does this fallacy have a name?

Rafa Irizarry writes: What do we call it when someone thinks cor(Y,X) = 0 because lim h -> 0 cor( X, Y | X \in (x-h, x+h) ) = 0 Example: Steph, Kobe, and Jordan are average (or below average) height in the NBA so height does not predict being good at basketball. GRE math […]

We want certainty even when it’s not appropriate

Remember the stents example? An experiment was conducted comparing two medical procedures, the difference had a p-value of 0.20 (after a corrected analysis the p-value was 0.09) and so it was declared that the treatment had no effect. In other cases, of course, “p less than 0.10” is enough for publication in PNAS and multiple […]

Everything that can be said can be said clearly.

The title as many may know, is a quote from Wittgenstein. It is one that has haunted me for many years. As a first year undergrad, I had mistakenly enrolled in a second year course that was almost entirely based on Wittgenstein’s  Tractatus. Alarmingly, the drop date had passed before I grasped I was supposed […]

Taking the bus

Bert Gunter writes: This article on bus ridership is right up your alley [it’s a news article with interactive graphics and lots of social science content]. The problem is that they’re graphing the wrong statistic. Raw ridership is of course sensitive to total population. So they should have been graphing is rates per person, not […]

What are my statistical principles?

Jared Harris writes: I am not a statistician but am a long time reader of your blog and have strong interests in most of your core subject matter, as well as scientific and social epistemology. I’ve been trying for some time to piece together the broader implications of your specific comments, and have finally gotten […]

Who are you gonna believe, me or your lying eyes?

This post is by Phil Price, not Andrew. A commenter on an earlier post quoted Terence Kealey, who said this in an interview in Scientific American in 2003: “But the really fascinating example is the States, because it’s so stunningly abrupt. Until 1940 it was American government policy not to fund science. Then, bang, the […]

Rethinking Rob Kass’ recent talk on science in a less statistics-centric way.

Reflection on a recent post on a talk by Rob Kass’ has lead me to write this post. I liked the talk very much and found it informative. Perhaps especially for it’s call to clearly distinguish abstract models from brute force reality. I believe that is a very important point that has often been lost […]

FDA statistics scandal update

The other day we reported on the director of the FDA who got embarrassed after garbling some statistics at a news conference. At the time, I wrote: The commissioner of the FDA might well too busy to be carefully reading the individual studies. I assume the fault is with whatever assistant prepared the numbers for […]

Statistics is hard, especially if you don’t know any statistics (FDA edition)

Paul Alper shares this story: From the NYT: Dr. Stephen M. Hahn, the commissioner of the Food and Drug Administration, said 35 out of 100 Covid-19 patients “would have been saved because of the administration of plasma.” He later walked this back because of confusion between Absolute Risk Reduction and Relative Risk Reduction, a common […]