Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

Don’t talk about hypotheses as being “either confirmed, partially confirmed, or rejected”

Kevin Lewis points us to this article by Paige Shaffer et al., “Gambling Research and Funding Biases,” which reports, “Gambling industry funded studies were no more likely than studies not funded by the gambling industry to report either confirmed, partially confirmed, or rejected hypotheses.” The paradox is that this particular study was itself funded by […]

My review of Ian Stewart’s review of my review of his book

A few months ago I was asked to review Do Dice Play God?, the latest book by mathematician and mathematics writer Ian Stewart. Here are some excerpts from my review: My favorite aspect of the book is the connections it makes in a sweeping voyage from familiar (to me) paradoxes, through modeling in human affairs, […]

The latest Perry Preschool analysis: Noisy data + noisy methods + flexible summarizing = Big claims

Dean Eckles writes: Since I know you’re interested in Heckman’s continued analysis of early childhood interventions, I thought I’d send this along: The intervention is so early, it is in their parents’ childhoods. See the “Perry Preschool Project Outcomes in the Next Generation” press release and the associated working paper. The estimated effects are huge: […]

Call for proposals for a State Department project on estimating the prevalence of human trafficking

Abby Long points us to this call for proposals for a State Department project on estimating the prevalence of human trafficking: The African Programming and Research Initiative to End Slavery (APRIES) is pleased to announce a funding opportunity available through a cooperative agreement with the U.S. Department of State, Office to Monitor and Combat Trafficking […]

Linear or logistic regression with binary outcomes

Gio Circo writes: There is a paper currently floating around which suggests that when estimating causal effects in OLS is better than any kind of generalized linear model (i.e. binomial). The author draws a sharp distinction between causal inference and prediction. Having gotten most of my statistical learning using Bayesian methods, I find this distinction […]

Exciting postdoc opening in spatial statistics at Michigan: Coccidioides is coming, and only you can stop it!

Jon Zelner is an collaborator who does great work on epidemiology using Bayesian methods, Stan, Mister P, etc. He’s hiring a postdoc, and it looks like a great opportunity: Epidemiological, ecological and environmental approaches to understand and predict Coccidioides emergence in California. One postdoctoral fellow is sought in the research group of Dr. Jon Zelner […]

Do we still recommend average predictive comparisons? Click here to find the surprising answer!

Usually these posts are on 6-month delay but this one’s so quick I thought I’d just post it now . . . Daniel Habermann writes: Do you still like/recommend average predictive comparisons as described in your paper with Iain Pardoe? I [Habermann] find them particularly useful for summarizing logistic regression models. My reply: Yes, I […]

Lorraine Daston (1994): “How Probabilities Came to Be Objective and Subjective”

Sander Greenland points us to a paper by Lorraine Daston from 1994, How Probabilities Came to Be Objective and Subjective. Also relevant are the papers by Glenn Shafer and Michael Cowles and Caroline Davis that we linked to a few months ago.

The long pursuit

In a comment on our post, Using black-box machine learning predictions as inputs to a Bayesian analysis, Allan Cousins writes: I find this combination of techniques exceedingly useful when I have a lot of data on an indicator that informs me about the outcome of interest but where I have relatively sparse data about the […]

“But when we apply statistical models, do we need to care about whether a model can retrieve the relationship between variables?”

Tongxi Hu writes: Could you please answer a question about the application of statistical models. Let’s take regression models as an example. In the real world, we use statistical models to find out relationships between different variables because we do not know the true relationship. For example, the crop yield, temperature, and precipitation. But when […]

External vs. internal validity of causal inference from natural experiments: The example of charter school lottery studies

Alex Hoffman writes: I recently was discussing/arguing about the value of charter schools lottery studies. I suggested that their validity was questionable because of all the data that they ignore. (1) They ignore all charter schools (and their students) that are not so oversubscribed that they need to use lotteries for admission. (2) They ignore […]

Oscar win probability as a function of age. And many other things . . .

I received the book “Oscarmetrics: The Math Behind the Biggest Night in Hollywood,” by Ben Zauzmer. I liked it; it was a lot of fun, a good mixture of stories and graphs: This one is my favorite: Also this: I also passed the book over to a student to review:

What does it mean when they say there’s a 30% chance of rain?

Gur Huberman points us to this page [link fixed] from the National Weather Service, “Explaining ‘Probability of Precipitation,’” which says: Probability of Precipitation = C x A where “C” = the confidence that precipitation will occur somewhere in the forecast area, and where “A” = the percent of the area that will receive measureable precipitation, […]

Postdoctoral research position on survey research with us at Columbia School of Social Work

Here it is: The Center on Poverty and Social Policy at the Columbia University School of Social Work, the Columbia Population Research Center, and the Institute for Social and Economic Research and Policy are seeking a postdoctoral scholar with a PhD in statistics, economics, political science, public policy, demography, psychology, social work, sociology, or a […]

How many Stan users are there?

This is an interesting sampling or measurement problem that came up in a Discourse thread started by Simon Maskell: It seems we could look at a number of pre-existing data sources (eg discourse views and contributors, papers, StanCon attendance etc) to inform an inference of how many people use Stan (and/or use things that use […]

What’s wrong with null hypothesis significance testing

Following up on yesterday’s post, “What’s wrong with Bayes”: My problem is not just with the methods—although I do have problems with the method—but also with the ideology. My problem with the method You’ve heard this a few zillion times before, and not just from me. Null hypothesis significance testing collapses the wavefunction too soon, […]

What’s wrong with Bayes

My problem is not just with the methods—although I do have problems with the method—but also with the ideology. My problem with the method It’s the usual story. Bayesian inference is model-based. Your model will never be perfect, and if you push hard you can find the weak points and magnify them until you get […]

What’s wrong with Bayes; What’s wrong with null hypothesis significance testing

This will be two posts: tomorrow: What’s wrong with Bayes day after tomorrow: What’s wrong with null hypothesis significance testing My problem in each case is not just with the methods—although I do have problems with the methods—but also with the ideology. A future post or article: Ideologies of Science: Their Advantages and Disadvantages.

Amazing coincidence! What are the odds?

This post is by Phil Price, not Andrew Several days ago I wore my cheapo Belarussian one-hand watch. This watch only has an hour hand, but the hand stretches all the way out to the edge of the watch, like the minute hand of a normal watch. The dial is marked with five-minute hash marks, […]

Why “bigger sample size” is not usually where it’s at.

Aidan O’Gara writes: I realized when reading your JAMA chocolate study post that I don’t understand a very fundamental claim made by people who want better social science: Why do we need bigger sample sizes? The p-value is always going to be 0.05, so a sample of 10 people is going to turn up a […]