Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

How to interpret inferential statistics when your data aren’t a random sample

Someone named Adam writes: I’m having a bit of a ‘crisis’ of confidence regarding inferential statistics. I’ve been reading some of the work by Stephen Gorard (e.g. “Against Inferential Statistics”) and David Freedman and Richard Berk (e.g. “Statistical Assumptions as empirical commitments”). These authors appear to be saying this: (1) Inferential statistics assume random sampling […]

A regression puzzle . . . and its solution

Alex Tabarrok writes: Here’s a regression puzzle courtesy of Advanced NFL Stats from a few years ago and pointed to recently by Holden Karnofsky from his interesting new blog, ColdTakes. The nominal issue is how to figure our whether Aaron Rodgers is underpaid or overpaid given data on salaries and expected points added per game. […]

Workflow and the role of hypothesis-free data analysis

In our discussion a couple days ago on the role of hypotheses in science, Lakeland wrote: Even “this data is relevant to the question we’re studying” is already a hypothesis. There’s no such thing as hypothesis free data analysis. I’ve sometimes said similar things, in that I like to interpret exploratory graphics as model checks, […]

Theoretical Statistics is the Theory of Applied Statistics: Two perspectives

After watching my video, Theoretical Statistics is the Theory of Applied Statistics: How to Think About What We Do, Ron Kenett points us to these articles: Conceptual Thinking in Statistics and Data Science Education: Interactive Formative Assessment with Meaning Equivalence Reusable Learning Objects (MERLO): Computer age statistics, machine learning, data science and in general, data […]

The Xbox before its time: Using the famous 1936 Literary Digest survey as a positive example of statistical adjustment rather than a negative example of non-probability sampling

In this article from 2017, Sharon Lohr and J. Michael Brick write: The Literary Digest poll of 1936 is a byword for bad survey research. Textbooks have long used it as a prime example of how sampling goes bad . . . The story of the 1936 poll is well known. Ten million ballots were […]

“Using Benford’s Law to Detect Bitcoin Manipulation”

Economist Gary Smith sends along this post with the above title and the subtitle, “Market prices are not invariably equal to intrinsic values.” Here’s Smith: For a while, there was a popular belief among finance professors that the stock market is “efficient” in the sense that stock prices are always correct — the prices that […]

Can statistical software do better at giving warnings when you apply a method when maybe you shouldn’t?

Gaurav Sood writes: There are legions of permutation-based methods which permute the value of a feature to determine whether the variable should be added (e.g., Boruta Algorithm) or its importance. I couldn’t reason for myself why that is superior to just dropping the feature and checking how much worse the fit is or what have […]

Thoughts on “The American Statistical Association President’s Task Force Statement on Statistical Significance and Replicability”

Megan Higgs writes: The statement . . . describes establishment of the task force to “address concerns that a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The 2019 editorial recommended eliminating the use of ‘p

Is this a refutation of the piranha principle?

Jonathan Falk points to this example of a really tiny stimulus having a giant effect (in brain space) and asks if it’s a piranha violation. I don’t think it is, but the question is amusing.

Top 10 Ideas in Statistics That Have Powered the AI Revolution

Aki and I put together this listsicle to accompany our recent paper on the most important statistical ideas of the top 50 years. Kim Martineau at Columbia, who suggested making this list, also had the idea that youall might have suggestions for other important articles and books; tweet your thoughts at @columbiascience of put them […]

Not being able to say why you see a 2 doesn’t excuse your uninterpretable model

This is Jessica, but today I’m blogging about a blog post on interpretable machine learning that co-blogger Keith wrote for his day job and shared with me. I agree with multiple observations he makes. Some highlights: The often suggested simple remedy for this unmanageable complexity is just finding ways to explain these black box models; […]

Ira Glass asks. We answer.

This post is a rerun. I was listening to This American Life on my bike today and heard Ira say: There’s this study done by the Pew Research Center and Smithsonian Magazine . . . they called up one thousand and one Americans. I do not understand why it is a thousand and one rather […]

This awesome Pubpeer thread is about 80 times better than the original paper

This came up already, but in the meantime this paper in the Journal of Surgical Research has been just raked over the coals, over and over and over again, in this delightful Pubpeer thread. 31 comments so far, all of them just slamming the original published paper and many with interesting insights of their own. […]

Evidence-based medicine eats itself in real time

Robert Matthews writes: This has just appeared in BMJ Evidence Based Medicine. It addresses the controversial question of whether lowering LDL using statins leads to reduced mortality and CVD rates. The researchers pull together 35 published studies, and then assess the evidence of benefit – but say a meta-analysis is inappropriate, given the heterogeneity of […]

Guttman points out another problem with null hypothesis significance testing: It falls apart when considering replications.

Michael Nelson writes: Re-reading a classic from Louis Guttman, What is not what in statistics, I saw his “Problem 2” with new eyes given the modern replication debate: Both estimation and the testing of hypotheses have usually been restricted as if to one-time experiments, both in theory and in practice. But the essence of science […]

Job opening at the U.S. Government Accountability Office

Sam Portnow writes: I am a statistician at the U.S. Government Accountability Office, and we are hiring for a statistician. The full job announcement is below. Personally, I think our office is a really great place to do social science research within the federal government. ———————————————————————- The U.S. Government Accountability Office (GAO) has two vacancies […]

The University of California statistics department paid at least $329,619.84 to an adjunct professor who did no research, was a terrible teacher, and engaged in sexual harassment

I have one of the easy jobs at the university, well paid with pleasant working conditions. It’s not so easy for adjuncts. Ideally, an adjunct professor has a main job and teaches a course on the side, to stay connected to academia and give back something to the next generation. But in an all-too-common non-ideal […]

He wants to test whether his distribution has infinite variance. I have other ideas . . .

Evan Warfel asks a question: Let’s say that a researcher is collecting data on people for an experiment. Furthermore, it just so happens that due to the data collection procedure, data is gathered and recorded in 100-person increments. (Making it so that the researcher effectively has a time series, and at some point t, they […]

Impressions of differential privacy for supreme court justices

This is Jessica. A couple weeks ago Priyanka Nanayakkara pointed me to the fact that Alabama is suing the Census Bureau on the grounds that by using differential privacy it is “intentionally skew[ing] the population tabulations provided to States to use for redistricting” and “forc[ing] Alabama to redistrict using results that purposefully count people in […]

Network of models

Ryan Bernstein shows this demo of a prototype of the network of models visualization in Stan. This is related to the topology of models, an idea that we’ve discussed on occasion and is a key part of statistical workflow that I don’t think is handled well by existing theory or software. What Ryan is doing […]

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.