Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

“The issue of how to report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly.”

Ricardo Vieira writes: I recently came upon this study from Princeton published in PNAS: Implicit model of other people’s visual attention as an invisible, force-carrying beam projecting from the eyes In which the authors asked people to demonstrate how much you have to tilt an object before it falls. They show that when a human […]

“I feel like the really solid information therein comes from non or negative correlations”

Steve Roth writes: I’d love to hear your thoughts on this approach (heavily inspired by Arindrajit Dube’s work, linked therein): This relates to our discussion from 2014: My biggest takeaway from this latest: I feel like the really solid information therein comes from non or negative correlations: • It comes before • But it doesn’t […]

What can be learned from this study?

James Coyne writes: A recent article co-authored by a leading mindfulness researcher claims to address the problems that plague meditation research, namely, underpowered studies; lack of or meaningful control groups; and an exclusive reliance on subjective self-report measures, rather than measures of the biological substrate that could establish possible mechanisms. The article claims adequate sample […]

Amending Conquest’s Law to account for selection bias

Robert Conquest was a historian who published critical studies of the Soviet Union and whose famous “First Law” is, “Everybody is reactionary on subjects he knows about.” I did some searching on the internet, and the most authoritative source seems to be this quote from Conquest’s friend Kingsley Amis: Further search led to this elaboration […]

Here are some examples of real-world statistical analyses that don’t use p-values and significance testing.

Joe Nadeau writes: I’ve followed the issues about p-values, signif. testing et al. both on blogs and in the literature. I appreciate the points raised, and the pointers to alternative approaches. All very interesting, provocative. My question is whether you and your colleagues can point to real world examples of these alternative approaches. It’s somewhat […]

You are invited to join Replication Markets

Anna Dreber writes: Replication Markets (RM) invites you to help us predict outcomes of 3,000 social and behavioral science experiments over the next year. We actively seek scholars with different voices and perspectives to create a wise and diverse crowd, and hope you will join us. We invite you – your students, and any other […]

Are supercentenarians mostly superfrauds?

Ethan Steinberg points to a new article by Saul Justin Newman with the wonderfully descriptive title, “Supercentenarians and the oldest-old are concentrated into regions with no birth certificates and short lifespans,” which begins: The observation of individuals attaining remarkable ages, and their concentration into geographic sub-regions or ‘blue zones’, has generated considerable scientific interest. Proposed […]

Causal Inference and Generalizing from Your Data to the Real World (my talk tomorrow, Sat., 6pm in Berlin)

For the Berlin Bayesians meetup, organized by Eren Elçi: Causal Inference and Generalizing from Your Data to the Real World Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University Learning from data involves three stages of extrapolation: from sample to population, from treatment group to control group, and from measurement to the […]

I don’t have a clever title but this is an interesting paper

Why do we, as a discipline, have so little understanding of the methods we have created and promote? Our primary tool for gaining understanding is mathematics, which has obvious appeal: most of us trained in math and there is no better form of information than a theorem that establishes a useful fact about a method. […]

Just forget the Type 1 error thing.

John Christie writes: I was reading this paper by Habibnezhad, Lawrence, & Klein (2018) and came across the following footnote: In a research program seeking to apply null-hypothesis testing to achieve one-off decisions with regard to the presence/absence of an effect, a flexible stopping-rule would induce inflation of the Type I error rate. Although our […]

Swimming upstream? Monitoring escaped statistical inferences in wild populations.

Anders Lamberg writes: In my mails to you [a few years ago], I told you about the Norwegian practice of monitoring proportion of escaped farmed salmon in wild populations. This practice results in a yearly updated list of the situation in each Norwegian salmon river (we have a total of 450 salmon rivers, but not […]

What’s published in the journal isn’t what the researchers actually did.

David Allison points us to these two letters: Alternating Assignment was Incorrectly Labeled as Randomization, by Bridget Hannon, J. Michael Oakes, and David Allison, in the Journal of Alzheimer’s Disease. Change in study randomization allocation needs to be included in statistical analysis: comment on ‘Randomized controlled trial of weight loss versus usual care on telomere […]

Calibrating patterns in structured data: No easy answers here.

“No easy answers” . . . Hey, that’s a title that’s pure anti-clickbait, a veritable kryptonite for social media . . . Anyway, here’s the story. Adam Przedniczek writes: I am trying to devise new or tune up already existing statistical tests assessing rate of occurrences of some bigger compound structures, but the most tricky […]

The garden of 603,979,752 forking paths

Amy Orben and Andrew Przybylski write: The widespread use of digital technologies by young people has spurred speculation that their regular use negatively impacts psychological well-being. Current empirical evidence supporting this idea is largely based on secondary analyses of large-scale social datasets. Though these datasets provide a valuable resource for highly powered investigations, their many […]

Harvard dude calls us “online trolls”

Story here. Background here (“How post-hoc power calculation is like a shit sandwich”) and here (“Post-Hoc Power PubPeer Dumpster Fire”). OK, to be fair, “shit sandwich” could be considered kind of a trollish thing for me to have said. But the potty language in this context was not gratuitous; it furthered the larger point I […]

We’re done with our Applied Regression final exam (and solution to question 15)

We’re done with our exam. And the solution to question 15: 15. Consider the following procedure. • Set n = 100 and draw n continuous values x_i uniformly distributed between 0 and 10. Then simulate data from the model y_i = a + bx_i + error_i, for i = 1,…,n, with a = 2, b […]

Pharmacometrics meeting in Paris on the afternoon of 11 July 2019

Julie Bertrand writes: The pharmacometrics group led by France Mentre (IAME, INSERM, Univ Paris) is very pleased to host a free ISoP Statistics and Pharmacometrics (SxP) SIG local event at Faculté Bichat, 16 rue Henri Huchard, 75018 Paris, on Thursday afternoon the 11th of July 2019. It will features talks from Professor Andrew Gelman, Univ […]

Question 15 of our Applied Regression final exam (and solution to question 14)

Here’s question 15 of our exam: 15. Consider the following procedure. • Set n = 100 and draw n continuous values x_i uniformly distributed between 0 and 10. Then simulate data from the model y_i = a + bx_i + error_i, for i = 1,…,n, with a = 2, b = 3, and independent errors […]

Question 14 of our Applied Regression final exam (and solution to question 13)

Here’s question 14 of our exam: 14. You are predicting whether a student passes a class given pre-test score. The fitted model is, Pr(Pass) = logit^−1(a_j + 0.1x), for a student in classroom j whose pre-test score is x. The pre-test scores range from 0 to 50. The a_j’s are estimated to have a normal […]

Question 13 of our Applied Regression final exam (and solution to question 12)

Here’s question 13 of our exam: 13. You fit a model of the form: y ∼ x + u full + (1 | group). The estimated coefficients are 2.5, 0.7, and 0.5 respectively for the intercept, x, and u full, with group and individual residual standard deviations estimated as 2.0 and 3.0 respectively. Write the […]