Our first post was on 12 Oct 2004: A weblog for research in statistical modeling and applications, especially in social sciences; followed by The Electoral College favors voters in small states; Why it’s rational to vote; Bayes and Popper; and Overrepresentation of small states/provinces, and the USA Today effect.
Later that month we had our first post on the Red State/Blue State Paradox, a guest post on statistical issues in modeling social space, and a stab at partial pooling of interactions.
On 27 Oct came one of my favorite early posts, The blessing of dimensionality, and early the next month came Sam Cook’s post on her now-classic work on Bayesian software validation, which we now call SBC, for simulation-based calibration checking, an early post on cross-validation for Bayesian multilevel modeling—a topic on which we made lots of progress in the decades since.
Also in Nov 2004 came our first formulation of the important (to me) concept of institutional decision analysis, my discovery about correlations in before-after data, and a debunking of a claim of possible election fraud in Florida.
Those were the days!
Here’s my question for you!
What are your favorite posts? It would be fun to compile a list in commemoration of 20 years and over 12,000 posts. So post titles and links to your favorites in the comments below. You can include as many as you want.
Also if you have any favorite posts from the past 20 years on other blogs, share those too. Thanks!
I’m first? “What Has Happened Down Here Is, the Winds Have Changed” is in my opinion one of the greatest posts of the blog era, and Jessica has said here that she agrees. That was a master class in employing links, stylish prose, and thematic unity.
I concur.
+1. This blog has been part of my morning routine for many years now, I’ve forgotten more posts than I remember but this was a bit of a classic.
Here’s the link:
https://statmodeling.stat.columbia.edu/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/
https://statmodeling.stat.columbia.edu/2024/02/12/torment-executioners-in-reno-nevada/
Torment executioners!
You ask tough questions, sir. I do not have a favourite post. But I can tell you that the post that convinced me to actively follow this blog was ‘The most-cited statistics papers ever’. It was not so much the post as the discussion in the comments. The commenters discussed the importance of a paper being highly cited, and the difference (and relevance) between applied and theoretical papers. I do not think I read a single paper from the list, but the discussion gave me a lot of food for thought.
And then there were funny comments like this thread:
https://statmodeling.stat.columbia.edu/2014/03/31/cited-statistics-papers-ever/#comment-156135
One of my favorite posts is: https://statmodeling.stat.columbia.edu/2016/12/13/bayesian-statistics-whats/.
Comments are still a little slow, so from other blogs I’ll nominate:
Best short post: “tire rims and anthrax.” https://balloon-juice.com/2009/02/05/youll-never-get-this-21-minutes-of-your-life-back/
(Another nominee could be Matt Yglesias’s 2009 post coining the Green Lantern theory of the presidency, but the site and the link are dead.)
Kyle:
I don’t really get the “tire rims and anthrax” post—it looks too much like what we see so much of on twitter nowadays.
Yglesias’s Green Lantern post from 2016 is here on the internet archive.
If we’re gonna be citing that, we also should point to Mickey Kaus’s post on the Feiler Faster Thesis. That one’s from 2000—the very dawn of the blogging era!
Along these general lines, my absolute favorite is the 2004 post from Dan Davies with the line, “Good ideas do not need lots of lies told about them in order to gain public acceptance.”
There’s also Brian Wansink’s classic post from 2016, “The Grad Student Who Never Said ‘No,'” but that’s in a different category: it’s just of historical interest as something to laugh at or be horrified by, not an interesting contribution in its own right.
+3
I think you have an overly high assessment of our ability to remember things, Andrew.
But, as I searched for something I vaguely remembered related to the power pose debacle, I was reminded of your “time-reversal heuristic”, which I think is excellent: https://statmodeling.stat.columbia.edu/2016/01/26/more-power-posing/
Raghu:
Good point. Of course I remember many of my thousands of posts—no surprise, given that (a) I wrote them, and (b) each post was motivated by something I’d read or seen or some experience I’ve had, so they are all salient to me. But for the rest of you . . . yeah, it would all be a vague blur, or not even that!
This one is a classic, straight to the point: https://statmodeling.stat.columbia.edu/2004/12/29/type_1_type_2_t/
It’s hard to choose, but here’s four that I found referenced in my notes from the last time I did a complete overhaul of how I was teaching introductory statistics:
https://statmodeling.stat.columbia.edu/2015/03/02/what-hypothesis-testing-is-all-about-hint-its-not-what-you-think/
https://statmodeling.stat.columbia.edu/2016/11/17/thinking-more-seriously-about-the-design-of-exploratory-studies/
https://statmodeling.stat.columbia.edu/2017/03/09/preregistration-like-random-sampling-controlled-experimentation/
https://statmodeling.stat.columbia.edu/2015/04/28/whats-important-thing-statistics-thats-not-textbooks/
The recent thread on election forecasting reminded me of this post that I liked:
https://statmodeling.stat.columbia.edu/2021/03/12/the-social-sciences-are-useless-so-why-do-we-study-them-heres-a-good-reason/
One of the things I like most about this blog is the many different topics that come up. Some of them come from the blog-author(s) of course, but some of them also come from commenters who provide links and information to many different things.
I remember starting to read this blog about 10 years ago or so (?), as a result of coming across problematic issues in academia and Psychological Science in particular. Even though I dislike statistical modeling, and statistical analyses, and am not competent in them, I appreciate the many other topics on this blog, and also the possible capability and willingness of both blog author(s) and commenters to phrase things in a particular manner that might be more clear, and to the point, than at many other places. I think this is of tremendous value, and its importance might be overlooked.
Anyway, not so much one particular post or comment but more of a bigger picture view. I have been annoyed many times on this blog, and I have had fun reading certain posts and comments. Although I don’t come here often anymore, I hop in from time to time, and its posts like this one that make me feel like I might as well say something. I think this blog has helped me with my view on academia and Psychological Science in particular, and has helped me concerning certain conclusions with regards to these things, but also with regard to my personal involvement concerning these things. I think it has also done this by providing me with the opportunity to (anonymously) comment on things, and hereby helped me to learn what I like.
I have learned that I like writing, playing with words, painting a picture, shining a light on certain things, and trying to make things clear. Participating on this blog has helped me realize I may have done all I can and want to do concerning academia and Psychological Science. It may also have helped me to do this the best way I could have done. If this is the case, thanks for that go out to this blog, the author(s), and the commenters. I like writing lyrics and poetry, and have share some of that on here in the past. I hope it’s okay to share what I think is the best I have written. I have send it to a musician, but haven’t heard from her in a while. Perhaps it’s fitting in some way, shape, or form to post it here now.
“Because of what you told me”
You said I could look for you at the lake
by watching the surface of the water
And see our reflections looking down
like we used to do, beside each other
But when I go now and sit on the dock
murky water is all there seems to be
That, and ripples on the surface
instead of the picture of you and me
You said I could look for you in the field
by sitting down against the tree
And see the sunlight through the branches
like we used to do, just you and me
But when I go now and walk over there
to see if I can see a ray of light
I can’t find space to see through
the leaves and tangled branches block my sight
You said I could look for you on the hill
by looking up at the moonlit sky
And search for the brightest star
like we used to do, in that place up high
But when I go now to that same spot
clouds and darkness are all I see
I can’t see any of the lights
unlike what I saw when it was you and me
I’m starting to wonder whether you lied
whether you just said those things because I cried
I’m starting to wonder why I can’t see
maybe you told the truth
but it’s just me
Whatever may be the case
Whatever may be
I’ll keep looking
because of what you told me
Very powerful and evocative poem!
Thank you for your comment! I appreciate it very much.
It also made me think about some things concerning academic and/or scientific writing which might be interesting, amusing, and/or useful for the readers to read. Here we go:
1) The poem you refer to was originally intended to be a set of lyrics for a song. I can’t sing or play an instrument though, so I have send the text to a musician, and later also to a couple of online poetry sites. The whole thing reminded me of sending a manuscript to an editor and have a few people decide whether or not to publish it at that point in time at that place. I don’t like that. I just want the stuff I have written to be out there so it can possibly be found and appreciated and used. Perhaps me posting the lyrics and/or poem here is similar to me posting my manuscripts on a pre-print server somewhere.
2) There are some double meanings/interpretations of several key words in the lyrics/poem which I am pretty sure I was NOT all conscious of when writing. Words like “see through”, “murky water”, and “space” all have multiple meanings that fit both the more literal and the more interpretative understanding of the lyrics/poem. I like that in lyrics or poetry but also in scientific writing. It might be very useful, when a scientific text is written in a way that it can be understood at multiple levels. These multiple levels may be at the understanding and comprehension level, but may also be at a more fun or creative level like using words with double meanings or humor or stuff like that. The latter are sort of hidden gems for some folks to find and see.
3) Writing the lyrics/poem involved certain rules and frameworks which reminded me of writing academic and/or scientific stuff. I tried to be mindful of the number of syllables for instance, and of course rhyming. And there is the structure of the verses, and the matching content within these verses. And all these things must combine and interact and result in “painting a picture”. This all reminds me a lot of scientific writing with its rules and frameworks. It’s sort of like a puzzle to me, where you use rules and a framework to work at small parts but have to keep the “bigger picture” in mind when doing so.
4) Your comment about the poem being evocative to you made me think about writing scientific manuscripts. I think I like to get to the point as quickly as possible, and use certain language to try and help me make things clear or paint a picture or set the scene in the introduction of a paper. Although I don’t think scientific writing should resemble poetry or a novel, I do think some more creative or evocative writing is scientifically “proper” and useful. This can perhaps also help in saying or writing things as clearly and simple as possible, which I think might be very important but difficult. I think the writing style of the author(s), and commenters, of this blog helped me realize this all more and more over the years.
5) This interaction on this blog is what I like about this blog and what I referred to in my first reply. We’re jumping from 20 years of blogging to some links to top blog-posts to some random person talking about some personal lessons learned to sharing a set of lyrics and/or a poem to another person giving feedback on the poem to the random person trying to tie it all back to scientific writing to hopefully provide some amusement and/or some interesting or perhaps even useful thoughts, reasoning, ideas, views, experiences, etc. I think that all might best sum up and describe this blog for me.
I’ve learned a lot from the blog, which I’ve read consistently since I started graduate school in 2015. A few posts that stand out in my memory:
“How did white people vote? Updated maps and discussion.” https://statmodeling.stat.columbia.edu/2009/05/11/discussion_and/
I believe these maps made it into Red State Blue State. This was one of the earliest posts that I came across, and I bookmarked it as I was learning how to display data on maps.
“4 different meanings of p-value (and how my thinking has changed)” https://statmodeling.stat.columbia.edu/2023/04/14/4-different-meanings-of-p-value-and-how-my-thinking-has-changed/
“Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation” https://statmodeling.stat.columbia.edu/2013/07/15/forward-causal-inference-is-about-estimation-reverse-causal-inference-is-about-model-checking-and-hypothesis-generation/
Research projects in political science (my field) are often motivated by questions of the form, “What explains X?” In general, this type of question does not have a single well-defined answer, both because most things are multicausal and because there are multiple levels at which explanations can operate. This post (and the eventual paper with Imbens) helped clarify some of these issues for me, and starts to provide a framework for unifying causal inference with these big-picture questions about the world.
“This is what ‘power = 0.06’ looks like. Get used to it.” https://statmodeling.stat.columbia.edu/2014/11/17/power-06-looks-like-get-used/
A small post with a very memorable graphic!
“Simple Bayesian analysis inference of coronavirus infection rate from the Stanford study in Santa Clara county.”
https://statmodeling.stat.columbia.edu/2020/05/01/simple-bayesian-analysis-inference-of-coronavirus-infection-rate-from-the-stanford-study-in-santa-clara-county/
I remember this post for three reasons: 1) At the time, like everyone else, I was totally consumed with concerns over covid; 2) It’s a relatively simple analysis that shows the power of Bayesian inference in quantifying uncertainty in the face of imperfect data; 3) I had done a similar modeling exercise before I saw this post, producing similar results (it’s always nice when others independently come up with the same conclusion)
My personal favorite post is “Gathering of philosophers and physicists unaware of modern reconciliation of Bayes and Popper” (https://statmodeling.stat.columbia.edu/2015/12/17/gathering-of-philosophers-and-physicists-unaware-of-modern-reconciliation-of-bayes-and-popper/).
I probably liked it because it directed me to the excellent Gelman and Shalizi paper (http://www.stat.columbia.edu/~gelman/research/published/philosophy.pdf), which is one of three or four papers I re-read regularly.
At the time, I was working my first job out of college, and I was learning Bayesian statistics. I don’t exactly recall the details, but I was pretty much having an existential crisis about the validity of the Bayesian approach. This post and the linked paper addressed these concerns and gave me a lot of clarity on the practice of Bayesian analysis. I subsequently presented this paper at a journal club, and I remember the vivid discussion that followed, and really connecting for the first time with some of my colleagues.
Is there a category for “Most citations for a publication which was eventually withdrawn”? Possible subcategories would depend on the reason for withdrawal. For example, fabricated data, numerical calculation mistake, most number of authors and the very modern Photoshop manipulation of western blots.
Or “most citations for a publication that was never withdrawn but should’ve been.” I guess that’s a tough call because some people would place much of the writings of Marx and Freud into that category . . .
I’m not sure that ‘favourite’ is the right word, but an article that I know I’ve re-read a few times is: https://statmodeling.stat.columbia.edu/2022/04/05/confidence-intervals-compatability-intervals-uncertainty-intervals/
These, but nothing improved in the decade since they appeared.
http://www.stat.columbia.edu/~gelman/research/published/ChanceEthics8.pdf
https://statmodeling.stat.columbia.edu/2015/08/11/its-hard-to-replicate-that-is-duplicate-analyses-in-sociology/
This too, but very few followed Andrew’s 10 statistical tips.
https://statmodeling.stat.columbia.edu/2022/06/14/statistics-and-science-reform-my-conversation-with-economist-noah-smith/
Joey, I thought the comments on the first one were especially interesting. I wonder if Andrew has a favorite comment thread of the past 20 years.
Can’t speak for Andrew but for myself Alex Z’s comment summarizing views in his field is on the money:
“Every paper is just one argument for its conclusion and the conclusion is accepted only if many papers support the conclusion from a variety of angles.”
Dear Ndrew,
first of all, kudos and thank you for writing and running this blog and adding some much-needed intellectual stimulation to my dreary everday academic routine (lots of teaching, even more administration, very little actual lab work and research). I realize that over the years the amount of work you invested into this blog and the moderation of all of our responses must be the equivalent of writing at least two additional textbooks or dozens of insightful papers in scholarly journals, whose growing cumulative citation count would have secured you an appointment at Harvard long ago. Foregoing this in order to have a daily lively exchange on this blog with other researchers shows the true colors of the scientist Andrew Gelman! Your blog is frequently my first stop in the morning after grabbing a cup of coffee and sitting down at my desk.
And now to my favorite blog entry — that was “What has happened down here is the winds have changed”. What a lyrical and at the same time sharp-minded reckoning of the sins of psychology!
Keep up the great work. And thank you again!
Oliver
…and of course I didn’t spot the missing A in your name when I checked the post for typoes before sending it off. Sorry about that! For hapless, impulsive guys like me, a correction option on this blog would be truly great!
The “Undrew” option!
I went back through the posts on regression to the mean. They were very educational for me, although I’m not sure I can pick out a favorite.
For many years, I had cast a jaundiced eye towards any invocation of RttM, assuming folks were using the term primarily to appear smart without any clear sense of whether it really applied. But ever since Andrew posted a response back to me on this blog that finally got me to understand it, I now suspect those folks who invoked it probably understood what was going on better than I did.
One thing blogging provides is an opportunity for researchers to post work that for whatever reason will not appear in the peer-reviewed literature. Jim Bouldin was originally a member of the climate communication team that included Gavin Schmidt and Michael Mann. Bouldin became disaffected with some of the practices in dendrochronology. So like any good scientist, he spent two years formulating a highly technical statistical critique, “Severe Analytical Problems in Dendrochronology.” It was never going to get published. Jim explains what happened with his attempts in parts nine and ten. Here is a link to Part 1:
https://ecologicallyoriented.wordpress.com/page/2/?s=Severe+analytical+problems+in+dendroclimatology%2C+part+1&submit=Search
There are 15 parts. You can pull up the rest using the search bar at his blog.
It has been grimly-amusing-slash-disturbing to read so many comments here over the years, apparently by credentialed specialists (although usually, how would I really know) in the nature of, “Oh, your field is fundamentally messed up? So’s mine! Get this.”
I think this post, or maybe a similar one, got my attention.
The difference between “statistically significant” and “not statistically significant” is not in itself necessarily statistically significant
https://statmodeling.stat.columbia.edu/2005/06/14/the_difference/
I thought that it was wrong. Then I thought that it was obvious. Then I started seeing the error all over the place.
The best ever ? Clearly this one : Everything I need to know about Bayesian statistics, I learned in eight schools. Bayesian statistics in a nutshell, clearly explained. It convinced me definitively that Bayes is the way to go.
There are also a couple of posts on confidence intervals that should stand on the podium.
+1, that was a great post