Regarding this horrible Table 4: Eric Loken writes: The clear point or your post was that p-values (and even worse the significance versus non-significance) are a poor summary of data. The thought I’ve had lately, working with various groups of really smart and thoughtful researchers, is that Table 4 is also a model of their […]

**Miscellaneous Statistics**category.

## Simulation-based statistical testing in journalism

Jonathan Stray writes: In my recent Algorithms in Journalism course we looked at a post which makes a cute little significance-type argument that five Trump campaign payments were actually the $130,000 Daniels payoff. They summed to within a dollar of $130,000, so the simulation recreates sets of payments using bootstrapping and asks how often there’s […]

## Michael Crichton on science and storytelling

Javier Benitez points us to this 1999 interview with techno-thriller writer Michael Crichton, who says: I come before you today as someone who started life with degrees in physical anthropology and medicine; who then published research on endocrinology, and papers in the New England Journal of Medicine, and even in the Proceedings of the Peabody […]

## Should he go to grad school in statistics or computer science?

Someone named Nathan writes: I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several […]

## Our hypotheses are not just falsifiable; they’re actually false.

Everybody’s talkin bout Popper, Lakatos, etc. I think they’re great. Falsificationist Bayes, all the way, man! But there’s something we need to be careful about. All the statistical hypotheses we ever make are false. That is, if a hypothesis becomes specific enough to make (probabilistic) predictions, we know that with enough data we will be […]

## When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for”

This comes up from time to time. We were discussing a published statistical blunder, an innumerate overconfident claim arising from blind faith that a crude regression analysis would control for various differences between groups. Martha made the following useful comment: Another factor that I [Martha] believe tends to promote the kind of thing we’re talking […]

## How post-hoc power calculation is like a shit sandwich

Damn. This story makes me so frustrated I can’t even laugh. I can only cry. Here’s the background. A few months ago, Aleksi Reito (who sent me the adorable picture above) pointed me to a short article by Yanik Bababekov, Sahael Stapleton, Jessica Mueller, Zhi Fong, and David Chang in Annals of Surgery, “A Proposal […]

## Published in 2018

R-squared for Bayesian regression models. {\em American Statistician}. (Andrew Gelman, Ben Goodrich, Jonah Gabry, and Aki Vehtari) Voter registration databases and MRP: Toward the use of large scale databases in public opinion research. {\em Political Analysis}. (Yair Ghitza and Andrew Gelman) Limitations of “Limitations of Bayesian leave-one-out cross-validation for model selection.” {\em Computational Brain and […]

## Combining apparently contradictory evidence

I want to write a more formal article about this, but in the meantime here’s a placeholder. The topic is the combination of apparently contradictory evidence. Let’s start with a simple example: you have some ratings on a 1-10 scale. These could be, for example, research proposals being rated by a funding committee, or, umm, […]

## “Check yourself before you wreck yourself: Assessing discrete choice models through predictive simulations”

Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins: Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated […]

## What is probability?

This came up in a discussion a few years ago, where people were arguing about the meaning of probability: is it long-run frequency, is it subjective belief, is it betting odds, etc? I wrote: Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are […]

## June is applied regression exam month!

So. I just graded the final exams for our applied regression class. Lots of students made mistakes which gave me the feeling that I didn’t teach the material so well. So I thought it could help lots of people out there if I were to share the questions, solutions, and common errors. It was an […]

## Carol Nickerson explains what those mysterious diagrams were saying

A few years ago, James Coyne asked, “Can you make sense of this diagram?” and I responded, No, I can’t. At the time, Carol Nickerson wrote up explanations for two of the figures in the article in question. So if anyone’s interested, here they are: Carol Nickerson’s explanation of Figure 2 in Kok et al. […]

## The causal hype ratchet

Noah Haber informs us of a research article, “Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review,” that he wrote with Emily Smith, Ellen Moscoe, Kathryn Andrews, Robin Audy, Winnie Bell, Alana Brennan, Alexander Breskin, Jeremy Kane, Mahesh Karra, Elizabeth McClure, and Elizabeth Suarez, and […]

## Exploring model fit by looking at a histogram of a posterior simulation draw of a set of parameters in a hierarchical model

Opher Donchin writes in with a question: We’ve been finding it useful in the lab recently to look at the histogram of samples from the parameter combined across all subjects. We think, but we’re not sure, that this reflects the distribution of that parameter when marginalized across subjects and can be a useful visualization. It […]

## Classifying yin and yang using MRI

Zad Chow writes: I wanted to pass along this study I found a while back that aimed to see whether there was any possible signal in an ancient Chinese theory of depression that classifies major depressive disorder into “yin” and “yang” subtypes. The authors write the following, The “Yin and Yang” theory is a fundamental […]

## How we should they carry out repeated cross-validation? They would like a third expert opinion…”

Someone writes: I’m a postdoc studying scientific reproducibility. I have a machine learning question that I desperately need your help with. . . . I’m trying to predict whether a study can be successfully replicated (DV), from the texts in the original published article. Our hypothesis is that language contains useful signals in distinguishing reproducible […]

## Latour Sokal NYT

Alan Sokal writes: I don’t know whether you saw the NYT Magazine’s fawning profile of sociologist of science Bruno Latour about a month ago. I wrote to the author, and later to the editor, to critique the gross lack of balance (and even of the most minimal fact-checking). No reply. So I posted my critique […]

## My talk tomorrow (Tues) noon at the Princeton University Psychology Department

Integrating collection, analysis, and interpretation of data in social and behavioral research Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University The replication crisis has made us increasingly aware of the flaws of conventional statistical reasoning based on hypothesis testing. The problem is not just a technical issue with p-values, not can […]

## The p-value is 4.76×10^−264

Jerrod Anderson points us to Table 1 of this paper: It seems that the null hypothesis that this particular group of men and this particular group of women are random samples from the same population, is false. Good to know. For a moment there I was worried. On the plus side, as Anderson notes, the […]