Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

June is applied regression exam month!

So. I just graded the final exams for our applied regression class. Lots of students made mistakes which gave me the feeling that I didn’t teach the material so well. So I thought it could help lots of people out there if I were to share the questions, solutions, and common errors. It was an […]

Carol Nickerson explains what those mysterious diagrams were saying

A few years ago, James Coyne asked, “Can you make sense of this diagram?” and I responded, No, I can’t. At the time, Carol Nickerson wrote up explanations for two of the figures in the article in question. So if anyone’s interested, here they are: Carol Nickerson’s explanation of Figure 2 in Kok et al. […]

The causal hype ratchet

Noah Haber informs us of a research article, “Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review,” that he wrote with Emily Smith, Ellen Moscoe, Kathryn Andrews, Robin Audy, Winnie Bell, Alana Brennan, Alexander Breskin, Jeremy Kane, Mahesh Karra, Elizabeth McClure, and Elizabeth Suarez, and […]

Exploring model fit by looking at a histogram of a posterior simulation draw of a set of parameters in a hierarchical model

Opher Donchin writes in with a question: We’ve been finding it useful in the lab recently to look at the histogram of samples from the parameter combined across all subjects. We think, but we’re not sure, that this reflects the distribution of that parameter when marginalized across subjects and can be a useful visualization. It […]

Classifying yin and yang using MRI

Zad Chow writes: I wanted to pass along this study I found a while back that aimed to see whether there was any possible signal in an ancient Chinese theory of depression that classifies major depressive disorder into “yin” and “yang” subtypes. The authors write the following, The “Yin and Yang” theory is a fundamental […]

How we should they carry out repeated cross-validation? They would like a third expert opinion…”

Someone writes: I’m a postdoc studying scientific reproducibility. I have a machine learning question that I desperately need your help with. . . . I’m trying to predict whether a study can be successfully replicated (DV), from the texts in the original published article. Our hypothesis is that language contains useful signals in distinguishing reproducible […]

Latour Sokal NYT

Alan Sokal writes: I don’t know whether you saw the NYT Magazine’s fawning profile of sociologist of science Bruno Latour about a month ago. I wrote to the author, and later to the editor, to critique the gross lack of balance (and even of the most minimal fact-checking). No reply. So I posted my critique […]

My talk tomorrow (Tues) noon at the Princeton University Psychology Department

Integrating collection, analysis, and interpretation of data in social and behavioral research Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University The replication crisis has made us increasingly aware of the flaws of conventional statistical reasoning based on hypothesis testing. The problem is not just a technical issue with p-values, not can […]

The p-value is 4.76×10^−264

Jerrod Anderson points us to Table 1 of this paper: It seems that the null hypothesis that this particular group of men and this particular group of women are random samples from the same population, is false. Good to know. For a moment there I was worried. On the plus side, as Anderson notes, the […]

Stephen Wolfram explains neural nets

It’s easy to laugh at Stephen Wolfram, and I don’t like some of his business practices, but he’s an excellent writer and is full of interesting ideas. This long introduction to neural network prediction algorithms is an example. I have no idea if Wolfram wrote this book chapter himself or if he hired one of […]

These 3 problems destroy many clinical trials (in context of some papers on problems with non-inferiority trials, or problems with clinical trials in general)

Paul Alper points to this news article in Health News Review, which says: A news release or story that proclaims a new treatment is “just as effective” or “comparable to” or “as good as” an existing therapy might spring from a non-inferiority trial. Technically speaking, these studies are designed to test whether an intervention is […]

Hey, check this out: Columbia’s Data Science Institute is hiring research scientists and postdocs!

Here’s the official announcement: The Institute’s Postdoctoral and Research Scientists will help anchor Columbia’s presence as a leader in data-science research and applications and serve as resident experts in fostering collaborations with the world-class faculty across all schools at Columbia University. They will also help guide, plan and execute data-science research, applications and technological innovations […]

The State of the Art

Christie Aschwanden writes: Not sure you will remember, but last fall at our panel at the World Conference of Science Journalists I talked with you and Kristin Sainani about some unconventional statistical methods being used in sports science. I’d been collecting material for a story, and after the meeting I sent the papers to Kristin. […]

Robustness checks are a joke

Someone pointed to this post from a couple years ago by Uri Simonsohn, who correctly wrote: Robustness checks involve reporting alternative specifications that test the same hypothesis. Because the problem is with the hypothesis, the problem is not addressed with robustness checks. Simonsohn followed up with an amusing story: To demonstrate the problem I [Simonsohn] […]

Hey! Here’s what to do when you have two or more surveys on the same population!

This problem comes up a lot: We have multiple surveys of the same population and we want a single inference. The usual approach, applied carefully by news organizations such as Real Clear Politics and Five Thirty Eight, and applied sloppily by various attention-seeking pundits every two or four years, is “poll aggregation”: you take the […]

Watch out for naively (because implicitly based on flat-prior) Bayesian statements based on classical confidence intervals! (Comptroller of the Currency edition)

Laurent Belsie writes: An economist formerly with the Consumer Financial Protection Bureau wrote a paper on whether a move away from forced arbitration would cost credit card companies money. He found that the results are statistically insignificant at the 95 percent (and 90 percent) confidence level. But the Office of the Comptroller of the Currency […]

“35. What differentiates solitary confinement, county jail and house arrest” and 70 others

Thomas Perneger points us to this amusing quiz on statistics terminology: Lots more where that came from.

“Statistical and Machine Learning forecasting methods: Concerns and ways forward”

Roy Mendelssohn points us to this paper by Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos, which begins: Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose […]

The purported CSI effect and the retroactive precision fallacy

Regarding our recent post on the syllogism that ate science, someone points us to this article, “The CSI Effect: Popular Fiction About Forensic Science Affects Public Expectations About Real Forensic Science,” by N. J. Schweitzer and Michael J. Saks. We’ll get to the CSI Effect in a bit, but first I want to share the […]

“We are reluctant to engage in post hoc speculation about this unexpected result, but it does not clearly support our hypothesis”

Brendan Nyhan and Thomas Zeitzoff write: The results do not provide clear support for the lack-of control hypothesis. Self-reported feelings of low and high control are positively associated with conspiracy belief in observational data (model 1; p