Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

“RA Fisher and the science of hatred”

Mark Brown points us to this thoughtful article by Richard Evans regarding the controversy over Ronald Fisher, who during the twentieth century made huge contributions to genetics and statistical theory and methods and who also had serious commitments to racism and eugenics. The controversy made its way into statistics. The Committee of Presidents of Statistical […]

Thinking about election forecast uncertainty

Some twitter action Elliott Morris, my collaborator (with Merlin Heidemanns) on the Economist election forecast, pointed me to some thoughtful criticisms of our model from Nate Silver. There’s some discussion on twitter, but in general I don’t find twitter to be a good place for careful discussion, so I’m continuing the conversation here. Nate writes: […]

Job opportunity: statistician for carbon credits in agriculture

Charlie Brummitt: I’d like to share a job opportunity to pass on to your students and colleagues: to do survey statistics and uncertainty quantification for carbon credits in agriculture. We’re planning on using post-stratification techniques like those you used with Wei Wang. (Wei and I were interns together at Microsoft Research in 2013 when you […]

“Frequentism-as-model”

Christian Hennig writes: Most statisticians are aware that probability models interpreted in a frequentist manner are not really true in objective reality, but only idealisations. I [Hennig] argue that this is often ignored when actually applying frequentist methods and interpreting the results, and that keeping up the awareness for the essential difference between reality and […]

What can be our goals, and what is too much to hope for, regarding robust statistical procedures?

Gael Varoquaux writes: Even for science and medical applications, I am becoming weary of fine statistical modeling efforts, and believe that we should standardize on a handful of powerful and robust methods. First, analytic variability is a killer, e.g. in “standard” analysis for brain mapping, for machine learning in brain imaging, or more generally in […]

Probabilities for action and resistance in Blades in the Dark

Later this week, I’m going to be GM-ing my first session of Blades in the Dark, a role-playing game designed by John Harper. We’ve already assembled a crew of scoundrels in Session 0 and set the first score. Unlike most of the other games I’ve run, I’ve never played Blades in the Dark, I’ve only […]

If variation in effects is so damn important and so damn obvious, why do we hear so little about it?

Earlier today we posted, “To Change the World, Behavioral Intervention Research Will Need to Get Serious About Heterogeneity,” and commenters correctly noted that this point applies not just in behavioral research but also in economics, public health, and other areas. I wanted to follow this up with a question: If variation in effects is so […]

Adjusting for Type M error

Erik Drysdale discusses and gives some formulas, demonstrating on an example that will be familiar to regular readers of this blog.

The “scientist as hero” narrative

We’ve talked about the problems with the scientist-as-hero paradigm; see “Narrative #1” discussed here. And, more recently, we’ve considered how this narrative has been clouding people’s thinking regarding the coronavirus; see here and here. That latter example is particularly bad because it involved a reporter with an undisclosed conflict of interest. But the scientist-as-hero narrative […]

Regression and Other Stories is available!

This will be, without a doubt, the most fun you’ll have ever had reading a statistics book. Also I think you’ll learn a few things reading it. I know that we learned a lot writing it. Regression and Other Stories started out as the first half of Data Analysis Using Regression and Multilevel/Hierarchical Models, but […]

Inference for coronavirus prevalence by inverting hypothesis tests

Panos Toulis writes: The debate on the Santa Clara study actually me to think about the problem from a finite sample inference perspective. In this case, we can fully write down the density f(S | θ) in known analytic form, where S = (vector of) test positives, θ = parameters (i.e., sensitivity, specificity and prevalence). […]

This one quick trick will allow you to become a star forecaster

Jonathan Falk points us to this wonderful post by Dario Perkins. It’s all worth a read, but, following Falk, I want to emphasize this beautiful piece of advice, which is #5 on their list of 10 items: How to get attention: If you want to get famous for making big non-consensus calls, without the danger […]

The two most important formulas in statistics

0.5/sqrt(n) (which in turn is short for sqrt(p*(1-p)/n) 5^2 + 12^2 = 13^2 With an honorable mention to 16.

(Some) forecasting for COVID-19 has failed: a discussion of Taleb and Ioannidis et al.

Nassim Taleb points us to this pair of papers: On single point forecasts for fat tailed variables, by Nassim Taleb Forecasting for COVID-19 has failed, by John Ioannidis, Sally Cripps, and Martin Tanner The two articles agree in their mistrust of media-certified experts. Here’s Taleb: Both forecasters and their critics are wrong: At the onset […]

Am I missing something here? This estimate seems off by several orders of magnitude!

A reporter writes: I’m writing about a new preprint by doctors at Stanford University and UCLA on relative COVID-19 risk, in which they assert the risk is much less than most people might think. One author in an interview compared it to the risk of food poisoning. It’s a preprint so it’s obviously not fully […]

Using the rank-based inverse normal transformation

Luca La Rocca writes: You may like to know that the approach suggested in your post, Don’t do the Wilcoxon, is qualified as “common practice in Genome-Wide Association Studies”, according to this forthcoming paper in Biometrics to which I have no connection (and which I didn’t inspect beyond the Introduction). The idea is that, instead […]

Statistical Workflow and the Fractal Nature of Scientific Revolutions (my talk this Wed at the Santa Fe Institute)

Wed 3 June 2020 at 12:15pm U.S. Mountain time: Statistical Workflow and the Fractal Nature of Scientific Revolutions How would an A.I. do statistics? Fitting a model is the easy part. The other steps of workflow—model building, checking, and revision—are not so clearly algorithmic. It could be fruitful to simultaneously think about automated inference and […]

Age-period-cohort analysis.

Chris Winship and Ethan Fosse write with a challenge: Since its beginnings nearly a century ago, Age-Period-Cohort analysis has been stymied by the lack of identification of parameter estimates resulting from the linear dependence between age, period, and cohort (age= period – cohort). In a series of articles, we [Winship and Fosse] have developed a […]

“Banishing ‘Black/White Thinking’: A Trio of Teaching Tricks”

Richard Born writes: The practice of arbitrarily thresholding p values is not only deeply embedded in statistical practice, it is also congenial to the human mind. It is thus not sufficient to tell our students, “Don’t do this.” We must vividly show them why the practice is wrong and its effects detrimental to scientific progress. […]

What a difference a month makes (polynomial extrapolation edition)

Someone pointed me to this post from Cosma Shalizi conveniently using R to reproduce the famous graph endorsed by public policy professor and A/Chairman @WhiteHouseCEA. Here’s the original graph that caused all that annoyance: Here’s Cosma’s reproduction in R (retro-Cosma is using base graphics!), fitting a third-degree polynomial on the logarithms of the death counts: […]