Skip to content

What if it’s never decorative gourd season?

If it rains, now we’ll change
We’ll hold and save all of what came
We won’t let it run away
If it rains — Robert Forster

I’ve been working recently as part of a team of statisticians based in Toronto on a big, complicated applied problem. One of the things about working on this project is that, in a first for me, we know that we need to release all code and data once the project is done. And, I mean, I’ve bolted on open practices to the end of an analysis, or just released a git repo at the end (sometimes the wrong one!). But this has been my first real opportunity to be part of a team that is weaving open practices all the way through an analysis. And it is certainly a challenge.

It’s worth saying that, notoriously “science adjacent” as I am, the project is not really a science project. It is a descriptive, explorative, and predictive study, rather than one that is focussed on discovery or confirmation. So I get to work my way through open and reproducible science practices without, say, trying desperately to make Neyman-Pearson theory work.

A slight opening

Elisabeth Kübler-Ross taught us that there are five stages in the transition to more open and reproducible scientific practices: 

  • Denial (I don’t need to do that!)
  • Anger (How dare they not do it!)
  • Bargaining (A double whammy of “Please let this be good enough” and “Please let other people do this as well”)
  • Depression (Open and reproducible practices are so hard and no one wants to do them properly)
  • Acceptance (Open and reproducible science is not a single destination, but a journey and an exercise in reflective practice)

And, really, we’re often on many parts of the journey simultaneously. (Although, like, we could probably stop spending so long on Anger, because it’s not that much fun for anyone.)  

And a part of this journey is to carefully and critically consider the shibboleths and touchstones of open and reproducible practice. Not because everyone else is wrong, but more because these things are complex and subtle and working out how to weave them into our idiosyncratic research practices.

So I’ve found myself asking the following question.

Should we release code with our papers?

Now to friends and family who are also working their way through the Kübler-Ross stages of Open Science, I’m very sorry but you’re probably not going to love where I land on this. Because I think most code that is released is next to useless. And that it would be better to release nothing than release something that is useless. Less digital pollution.

It’s decorative gourd season!

A fairly well known (and operatically sweary) piece in McSweeney’s Internet Tendency celebrates the Autumn every year by declaring It’s decorative gourd season, m**********rs! And that’s the piece. A catalogue of profane excitement at the chance to display decorative gourds. Why? Because displaying them is enough!

But is that really true for code? While I do have some sympathy for the sort of “it’s been a looonnng day and if you just take one bite of the broccoli we can go watch Frozen again”-school of getting reluctant people into open science, it’s a desperation move and at best a stop-gap measure. It’s the type of thing that just invites malicious compliance or, perhaps worse, indifferent compliance.

Moreover, making a policy (even informally) that “any code release is better than no code release” is in opposition to our usual insistence that manuscripts reach a certain level of (academic) clarity and that our analyses are reported clearly and conscientiously. It’s not enough that a manuscript or a results section or a graph simply exist. We have much higher standards than that.

So what should the standard for code be?

The gourd’s got potential! Even if it’s only decorative, it can still be useful.

One potential use purely decorative code is the idea that the code can be read to help us understand what the paper is actually doing.

This is potentially true, but it definitely isn’t automatically true. Most code is too hard to read to be useful for this purpose. Just like most gourds aren’t the type of decorative gourd you’d write a rapturous essay about.

So unless code meets a certain standard, it’s going to need do something than just sit there and look pretty, which means we will need our code to be at least slightly functional. 

A minimally functional gourd?

This is actually really hard to work out. Why? Well there are just so many things we can look at. So let’s look at some possibilities. 

Good code “runs”. Why the scare quotes? Well because there are always some caveats here. Code can be good even if it takes some setup or a particular operating system to run. Or you might need a Matlab license. To some extent, the idea of whether “the code runs” is an ill-defined target that may vary from person to person. But in most fields there are common computing set ups and if your code runs on one of those systems it’s probably fine.

Good code takes meaningful input and produces meaningful output: It should be possible to, for example, run good code on similar-but-different data.  This means it shouldn’t require too much wrangling to get data into the code. There are some obvious questions here about what is “similar” data. 

Good code should be somewhat generalizable. A simple example of this: good code for a regression-type problem should not assume you have exactly 7 covariates, making it impossible to use when there data has 8 covariates. This is vital for dealing with, for instance, the reviewer who asks for an extra covariate to be added, or for a graph to change.

How limited can code be while still being good? Well that depends on the justification. Good code should have justifiable limitations.

Code with these 4 properties is no longer decorative! It might not be good, but it at least does something.  Can we come up with some similar targets for the written code to make it more useful? It turns out that this is much harder because judging the quality of code is much more subjective.

Good gourd! What is that smell?

The chances that a stranger can pick up your code and, without running it, understand what the method is doing are greatly increased with good coding practice. Basically, if it’s code you can come back to a year later and modify as if you’d never put it down, then your code is possibly readable. 

This is not an easy skill to master. And there’s no agreed upon way to write this type of code. Like clearly written prose, there are any number of ways that code can be understandable. But like writing clear prose, there are a pile of methods, techniques, and procedures to help you write better code.

Simple things like consistent spacings and doing whatever RStudio’s auto-format does like adding spaces each side of “+” can make your code much easier to read. But it’s basically impossible to list a set of rules that would guarantee good code. Kinda like it’s impossible to list a set of rules that would make good prose. 

So instead, let’s work out what is bad about code. Again, this is a subjective thing, but we are looking for code that smells.

If you want to really learn what this means (with a focus on R), you should listen to Jenny  Bryan’s excellent keynote presentation on code smell (slides etc here). But let’s summarize.

How can you tell if code smells? Well if you open a file and are immediately moved to not just light a votive candle but realize in your soul that without intercessory prayer you will never be able to modify even a corner of the code, then the code smells.  If you can look at it and at a glance see basically what the code is supposed to do, then your code smells nice snd clean.

If this sounds subjective, it’s because it is. Jenny’s talk gives some really good advice about how to make less whiffy code, but her most important piece of advice is not about a specific piece of bad code. It’s the following:

Your taste develops faster than your ability. 

To say it differently, as you code more you learn what works and what doesn’t. But a true frustration is that (just like with writing) you tend to know what you want to do before you necessarily have the skills to pull it off. 

The good thing is that code for academic work is iterative. We do all of our stuff, send it off for review, and then have to change things. So we have a strong incentive to make our code better and we have multiple opportunities to make it so.

Because what do you do when you have to add a multilevel component to a model? Can you do that by just changing your code in a single place? Or do you have to change the code in a pile of different places? Because good smelling code is often code that is modular and modifiable.

But because we build our code over the full lifecycle of a project (rather than just once after which it is never touched again), we can learn the types of structures we need to build into our code and we can share these insights with our friends, colleagues, and students.

A gourd supportive lab environment is vital to success

The frustration we feel when we want to be able to code better than our skills allow is awful. I think everyone has experienced a version of it. And this is where peers and colleagues and supervisors have their chance to shine. Because just as people need to learn how to write scientific reports and people need to learn how to build posters and people need to learn how to deliver talks, people need to learn how to write good code.

Really, the only teacher is experience. But you can help experience along. Work through good code with your group. Ask for draft code. Review it. Just like the way you’ll say “the intro needs more “Piff! Pop! Woo!” because right now I’m getting “*Sad trombone*” and you’ve done amazing work so this should reflect that”, you need to say the same thing about the code. Fix one smell at a time. Be kind. Be present. Be curious. And because you most likely were also not trained in programming, be open and humble.

Get members of your lab to swap code and explain it back to the author. This takes time. But this time is won back when reviews come or when follow up work happens and modifications need to be made. Clean, nice code is easy to modify, easy to change, and easy to use.

But trainees who are new at programming are nervous about programming.

They’re usually nervous about giving talks too. Or writing. Same type of strategy.

But none of us are professional programmers

Sadly, in the year of our lord two thousand and nineteen if you work in a vaguely quantitative field in science, social science, or the vast mire that surrounds them, you are probably being paid to program. That makes you a professional programmer.  You might just be less good at that aspect of your job than others.

I am a deeply mediocre part time professional programmer. I’ve been doing it long enough to learn how code smells, to have decent practices, and to have a bank of people I can learn from. But I’m not good at it. And it does not bring me joy. But neither does spending a day doing forensic accounting on the universities bizarre finance system. But it’s a thing that needs to be done as part of my job and for the most part I’m a professional who tries to do my best even if I’m not naturally gifted at the task.

Lust for gourds that are more than just decorative

In Norwegian, the construct “to want” renders “I want a gourd” as “Jeg har lyst på en kalebas” and it’s really hard, as an english speaker, not to translate that to “I have lust for a gourd”. And like that’s the panicking Norwegian 101 answer (where we can’t talk about the past because it’s linguistically complex or the future because it’s hard, so our only verbs can be instantaneous. One of the first things I was taught was “Finn er sjalu.” (Finn is jealous.) I assume because jealousy has no past or future).

But it also really covers the aspect of desiring a better future. Learning to program is learning how to fail to program perfectly. Just like learning to write is learning to be clunky and inelegant. To some extent you just have to be ok with that. But you shouldn’t be ok with the place you are being the end of your journey.

So did I answer my question? Should we release code with out papers?

I think I have an answer that I’m happy with. No in general. Yes under circumstances.

We should absolutely release code that someone has tried to make good code. Even though they will have failed. We should carry each other forward even in our imperfection. Because the reality is that science doesn’t get more open by making arbitrary barriers. Arbitrary barriers just encourages malicious compliance. 

When I lived in Norway as a newly minted gay (so shiny) I remember once taking a side trip to Gay’s The Word, the LGBTQIA+ bookshop in London and buying (among many many others) a book called Queering Anarchism. And I can’t refer to it because it definitely got lost somewhere in the nine times I’ve moved house since then.

The thing I remember most about this book (other than being introduced to the basics of intersectional trans-feminism) was its idea of anarchism as a creative force. Because after tearing down existing structures, anarchists need to have a vision of a new reality that isn’t simply an inversion of the existing hierarchy (you know. Reducing the significance threshold. Using Bayes Factors instead of p-values. Pre-registering without substantive theory.) A true anarchist, the book suggested, needs to queer rather than invert the existing structures and build a more equitable version of the world.

So let’s build open and reproducible science as a queer reimagining of science and not a small perturbation of the world that is. Such a system will never be perfect. Just lusting to be better.

Extra links:

Instead of replicating studies with problems, let’s replicate the good studies. (Consider replication as an honor, not an attack.)

Commenter Thanatos Savehn pointed to an official National Academy of Sciences report on Reproducibility and Replicability that included the following “set of criteria to help determine when testing replicability may be warranted”:

1) The scientific results are important for individual decision-making or for policy decisions.
2) The results have the potential to make a large contribution to basic scientific knowledge.
3) The original result is particularly surprising, that is, it is unexpected in light of previous evidence and knowledge.
4) There is controversy about the topic.
5) There was potential bias in the original investigation, due, for example, to the source of funding.
6) There was a weakness or flaw in the design, methods, or analysis of the original study.
7) The cost of a replication is offset by the potential value in reaffirming the original results.
8) Future expensive and important studies will build on the original scientific results.

I’m ok with items 1 and 2 on this list, and items 7 and 8: You want to put in the effort to replicate on problems that are important, and where the replications will be helpful. One difficulty here is are determining if “The scientific results are important . . . potential to make a large contribution to basic scientific knowledge.” Consider, for example, Bem’s notorious ESP study: if the claimed results are true, they could revolutionize science. If there’s nothing there, though, it’s not so interesting. This sort of thing comes up a lot, and it’s not clear how we should answer questions 1 and 2 above in the context of such uncertainty.

But the real problem I have is with items 3, 4, 5, and 6, all of which would seem to favor replications of studies that have problems.

In particular consider item 6: “There was a weakness or flaw in the design, methods, or analysis of the original study.”

I’d think about it the other way: If a study is strong, it makes sense to try to replicate it. If a study is weak, why bother?

Here’s the point. Replication often seems to be taken as a sort of attack, something to try when a study has problems, an attempt to shoot down a published claim. But I think that replication is an honor, something to try when you think a study has found something, to confirm something interesting.

ESP, himmicanes, ghosts, Bigfoot, astrology etc.: all very interesting if true, not so interesting as speculations not supported by any good evidence.

So I recommend changing items 3, 4, 5, and 6 of the National Academy of Sciences. Instead of replicating studies with problems, let’s replicate the good studies.

To put it another way: The problem with the above guidelines is that they implicitly assume that if a study doesn’t have obvious major problems, that it should be believed. Thus, they see the point of replications as checking up on iffy claims. But I’d say it the other way: unless a study in its design, data collection, and results are unambiguously clear, we should default to skepticism, hence replication can be valuable in giving support to a potentially important claim.

Tomorrow’s post: Is “abandon statistical significance” like organically fed, free-range chicken?

Participate in Brazilian Reproducibility Initiative—even if you don’t live in South America!

Anna Dreber writes:

There’s a big reproducibility initiative in Brazil on biomedical research led by Olavo Amaral and others, which is an awesome project where they are replicating 60 studies in Brazilian biomedical research. We (as usual lots of collaborators) are having a prediction survey and prediction markets for these replications – would it be possible for you to post something on your blog about this to attract participants? I am guessing that some of your readers might be interested.

Here’s more about the project and here is how to sign up:

Sounds like fun.

To do: Construct a build-your-own-relevant-statistics-class kit.

Alexis Lerner, who took a couple of our courses on applied regression and communicating data and statistics, designed a new course, “Jews: By the Numbers,” at the University of Toronto:

But what does it mean to work with data and statistics in a Jewish studies course? For Lerner, it means not only teaching her students to work with materials like survey results, codebooks, archives and data visualization, but also to understand the larger context of data. . . .

Lerner’s students are adamant that the quantification and measurement they performed on survivor testimonies did not depersonalize the stories they examined, a stereotype often used to criticize quantitative research methods.

“Once you learn the methods that go into statistical analysis, you understand how it’s not reductionist,” says Daria Mancino, a third-year student completing a double major in urban studies and the peace, conflict and justice program. “That’s really the overarching importance of this course for the social sciences or humanities: to show us why quantifying something isn’t necessarily reductionist.” . . .

Lerner hopes her students will leave her class with a critical eye for data and what goes into making it. Should survey questions be weighted, for example? How large of a sample size is large enough for results to be reliable? How do we know that survey respondents aren’t lying? How should we calculate margins of error?

Lerner’s students will leave the course with the tools to be critical analysts, meticulous researchers and – perhaps most importantly – thoughtful citizens in an information-heavy world.

This sounds great, and of course the same idea could be used to construct a statistics course based on any minority group. You could do it for other religious minorities or ethnic groups or states or countries or political movements or . . . just about anything.

So here’s what I want someone to do: Take this course, abstract it, and make it into a structure that could be expanded by others to fit their teaching needs. Wouldn’t it be great if there were hundreds of such classes, all over the world, wherever statistics is taught?

A build-your-own-relevant-statistics-class kit.

Let’s take Lerner’s course as a starting point, because we have it already, and from there abstract what is needed to create a structure that others can fill in.

Tomorrow’s post: Instead of replicating studies with problems, let’s replicate the good studies. (Consider replication as an honor, not an attack.)

The hot hand and playing hurt

So, was chatting with someone the other day and it came up that I sometimes do sports statistics, and he told me how he read that someone did some research finding that the hot hand in basketball isn’t real . . .

I replied that the hot hand is real, and I recommended he google “hot hand fallacy fallacy” to find out the full story.

We talked a bit about that, and then I was thinking of something related, which is that I’ve been told that professional athletes play hurt all the time. Games are so intense, and seasons are so long, that they just never have time to fully recover. If so, I could imagine that much of the hot hand has to do with temporarily not being seriously injured, or with successfully working around whatever injuries you have.

I have no idea; it’s just a thought. And it’s related to my reflection from last year:

The null model [of “there is no hot hand”] is that each player j has a probability p_j of making a given shot, and that p_j is constant for the player (considering only shots of some particular difficulty level). But where does p_j come from? Obviously players improve with practice, with game experience, with coaching, etc. So p_j isn’t really a constant. But if “p” varies among players, and “p” varies over the time scale of years or months for individual players, why shouldn’t “p” vary over shorter time scales too? In what sense is “constant probability” a sensible null model at all?

I can see that “constant probability for any given player during a one-year period” is a better model than “p varies wildly from 0.2 to 0.8 for any player during the game.” But that’s a different story. The more I think about the “there is no hot hand” model, the more I don’t like it as any sort of default.

Hey! Participants in survey experiments aren’t paying attention.

Gaurav Sood writes:

Do survey respondents account for the hypothesis that they think people fielding the survey have when they respond? The answer, according to Mummolo and Peterson, is not much.

Their paper also very likely provides the reason why—people don’t pay much attention. Figure 3 provides data on manipulation checks—the proportion guessing the hypothesis being tested correctly. The change in proportion between control and treatment ranges from -.05 to .25, with a bulk of the differences in Qualtrics between 0 and .1. (In one condition, authors even offer an additional 25 cents to give an answer consistent with the hypothesis. And presumably, people need to know the hypothesis before they can answer in line with it.) The faint increase is especially noteworthy given that on average, the proportion of people in the control group who guess the hypothesis correctly—without the guessing correction—is between .25–.35 (see Appendix B).

So, the big thing we may have learned from the data is how little attention survey respondents pay. The numbers obtained here are in a similar vein to those in Appendix D of Jonathan Woon’s paper. The point is humbling and suggests that we need to: (a) invest more in measurement, and (b) have yet larger samples, which is an expensive way to overcome measurement error.

(The two fixes are things you have made before. I claim no credit. I wrote this because I don’t think I fully grasped how much noise there is on online surveys. And I think is likely useful to explore the consequences carefully.)

P.S. I think my Mturk paper gives one potential explanation for why things are so noisy—not easy to judge quality on surveys:

Here’s one relevant bit of datum from the turk paper: we got our estimates after recruiting workers “with a HIT completion rate of at least 95%.” This latter point also relates to a recent “reputation inflation” online paper.

So, if I’m understanding this correctly, Mummolo and Peterson are saying we don’t have to worry about demand effects in psychology experiments, but Sood is saying that this is just because the participants in the experiments aren’t really paying attention!

I wonder what this implies about my own research. Nothing good, I suppose.

P.S. Sood adds three things:

1. The numbers on compliance in M/P aren’t adjusted for guessing—some people doubtlessly just guessed the right answer. (We can back it out from proportion incorrect after taking out people who mark “don’t know.”)

2. This is how I [Sood] understand things: Experiments tell us the average treatment effect of what we manipulate. And the role of manipulation checks is to shed light on compliance.

If conveying experimenter demand clearly and loudly is a goal, then the experiments included probably failed. If the purpose was to know whether clear but not very loud cues about “demand” matter—and for what it’s worth, I think it is a very reasonable goal; pushing further, in my mind, would have reduced the experiment to a tautology–—the paper provides the answer. (But your reading is correct.)

3. The key point that I took from the experiment, Woon, etc. was still just about how little attention people pay on online surveys. And compliance estimates in M/P tell us something about the amount of attention people pay because compliance in their case = reading something—simple and brief—that they quiz you on later.

Tomorrow’s post: To do: Construct a build-your-own-relevant-statistics-class kit.

When Prediction Markets Fail

A few years ago, David Rothschild and I wrote:

Prediction markets have a strong track record and people trust them. And that actually may be the problem right now. . . . a trader can buy a contract on an outcome, such as the Democratic nominee to win the 2016 presidential election, and it will be worth $1 if the outcome occurs and $0 if the outcome does not occur. The price at which people are willing to buy and sell that contract can be interpreted as the probability of the outcome occurring, or at least the collective subjective probability implicitly assigned by the crowd of people who trade in these markets. . . .

But more recently, prediction markets have developed an odd sort of problem. There seems to be a feedback mechanism now whereby the betting-market odds reify themselves. . . .

Traders are treating market odds as correct probabilities and not updating enough based on outside information. Belief in the correctness of prediction markets causes them to be too stable. . . . pollsters and pundits were also to some extent anchoring themselves off the prediction odds. . . .

And that’s what seems to have happened in the recent Australian election. As Adrian Beaumont wrote:

The poll failure was caused in part by “herding”: polls were artificially too close to each other, afraid to give results that may have seemed like outliers.

While this was a failure for the polls, it was also a failure of the betting markets, which many people believe are more accurate than the polls. . . . the Betfair odds . . . implying that the Coalition had only an 8% chance of winning. . . . It is long past time that the “betting markets know best” wisdom was dumped. . . .

I don’t want to overstate the case here. The prediction markets are fine for what they are. But they’re a summary of what goes into them, nothing more.

P.S. Yes, if all is calibrated, if the stated probability is 8%, then the event will occur 8% of the time. You can’t demonstrate lack of calibration from one prediction. So let me flip it around: why should we assume that the prediction markets are some sort of oracle? Prediction markets are a particular information aggregation tool that can be useful, especially if you don’t take them too seriously. The same goes for any other approach to information aggregation, including those that I’ve promoted.

Australian polls failed. They didn’t do Mister P.

Neil Diamond writes:

Last week there was a federal election in Australia. Contrary to expectations and to opinion polls, the Government (a coalition between the Liberal (actually conservative) and National parties, referred to as LNP or the Coalition) was returned with an increased majority defeating the Australian Labor Party (ALP or Labor, no “u”).

Voting in Australia is a bit different since we have compulsory voting, that is you get fined if you don’t vote, and we have preferential voting. Allocation of preferences is difficult and sometimes based on what happened last election and the pollsters all do it differently.

Attached is a graph of the two party preferred vote over the last three years given by Kevin Bonham, one of the most highly regarded poll analysts in Australia. Note that in Australia Red means Labor and Blue means Liberal. The stars correspond to what actually happened at the election.

Since the election there has been much analysis of what went wrong with the polls. I’m attaching two links—one by a Nobel Laureate, Professor Brian Schmidt of the Australian National University, who pointed out that the published polls had a much lower variability than was expected, and another (very long) post from Kevin Bonham which looks at what has happened and suggests among other things that the polls “may have been oversampling voters who are politically engaged or highly educated (often the same thing).”

Diamond also links to this news article where Adrian Beaumont writes:

The Electoral Commission’s two party preferred projection is . . . the Coalition wins by 51.5-48.5 . . . Polls throughout the campaign gave Labor between 51 and 52% of the two party preferred vote. The final Newspoll had a Labor lead of 51.5-48.5 [in the other direction as what happened; thus the polls were off by 3 percentage points] . . . I [Beaumont] believe the poll failure was caused in part by “herding”: polls were artificially too close to each other, afraid to give results that may have seemed like outliers.

While this was a failure for the polls, it was also a failure of the betting markets, which many people believe are more accurate than the polls. . . . the Betfair odds . . . implying that the Coalition had only an 8% chance of winning. . . . It is long past time that the “betting markets know best” wisdom was dumped. . . .

Another reason for the poll failure may be that pollsters had too many educated people in their samples. Australian pollsters ask for age and gender of those they survey, but not for education levels. Perhaps pollsters would have been more accurate had they attempted to stratify by education to match the ABS Census statistics. People with higher levels of education are probably more likely to respond to surveys than those with lower levels.

Compulsory voting in Australia may actually have contributed to this problem. In voluntary voting systems, the more educated people are also more likely to vote. . . .

If there is not a large difference between the attitudes of those with a high level of education, and those without, pollsters will be fine. . . . If there is a big difference, as occurred with Trump, Brexit, and now it appears the [Australian] federal election, pollsters can miss badly. If you sort the seats by two party swing, those seats that swung to Labor tended to be highly educated seats in the cities, while those that swung biggest to the Coalition were regional electorates. . . .

I’m surprised to hear that Australian polls don’t adjust for education levels. Is that really true? In the U.S., it’s been standard for decades to adjust for education (see for example here). In future, I recommend that Australian pollsters go Carmelo Anthony.

Battle for the headline: Hype and the effect of statistical significance on the ability of journalists to engage in critical thinking

A few people pointed me to this article, “Battle for the thermostat: Gender and the effect of temperature on cognitive performance,” which received some uncritical press coverage here and here. And, of course, on NPR.

“543 students in Berlin, Germany” . . . good enuf to make general statements about men and women, I guess! I wonder if people dress differently in different places . . . .

Padres need Stan

Cody Zupnick writes:

I’m working in baseball research for the San Diego Padres, and we’re looking for new people, potentially with Stan experience. Would you mind seeing if any of your readers have any interest?


Epic Pubpeer thread continues

Here. (background here and here)

“I’m sick on account I just ate a TV dinner.”

I recently read “The Shadow in the Garden,” a book by James Atlas that’s a mix of memoir about his experiences as a biographer of poet Delmore Schwartz and novelist Saul Bellow, and various reflections and anecdotes about biography-writing more generally.

I enjoyed the book so much that I’m pretty much just gonna have a post with long quotes from it. This is a labor of love, because (a) I don’t think these sorts of posts get many readers, and (b) it won’t even make Atlas himself happy, as he died a couple years after writing this book.

Before going on, let me say that Atlas reminds me of David Owen or, at a more exalted level, George Orwell: a common person who is a sort of stand-in for the reader. There’s something appealing about this regular-guy thing. (See here for further discussion of this concept.)

OK, now on to the quotes:


“The art is in what’s made up.” Well put.

pp.47-48, writing about details in a letter written by Schwartz:

They form not “another piece of the puzzle”—the pieces are infinite and in any case can’t be put together . . .

“And in any case can’t be put together . . .” Indeed.


I’m not naive: that I am approaching the end of my time in this world doesn’t mean the world is approaching it’s end. Forgive me for indulging in this common—no, universal—preconception: how many of us have the fortitude to see things as they really are?

I think all of us have the fortitude to see some things as they really are. It’s just that different people see different things.


Sometimes, out on the trail, you could go too far—like the night I got lost in the wilds of rural New Jersey during a snowstorm.

Rural New Jersey! Charmingly local of him.

On p.65 he quotes the poem During December’s Death, “one of the last poems included in [Schwartz’s] collection Summer Knowledge:

This doesn’t quite motivate me to read more poems by Delmore Schwartz, but I do feel like I got something out of that one.


“I’m sick on account I just ate a TV dinner.”


But the kicker comes on the next page:

“Would it have killed the biographer to nail down this fact?” I love Atlas’s voice here.

p.106, in a footnote:

Another contemporary biographer of Charlemagne, the memorably named Notker the Stammerer, was almost defiantly insouciant about his editorial methods. “Since the occasion has offered itself, although they have nothing to do with my subject matter, it does not seem to be a bad idea to add these two stories to my official narrative, together with a few more which happened at the same time and are worthy of being recorded.” Note to Notker: Don’t try this at The New Yorker.

The New Yorker does make mistakes—I notice it sometimes when they mangle political statistics—still, that was a funny line.

p.114, discussing a book by biographer Ian Hamilton:

I’ve decided to quote from it at inordinate length: why struggle over some lame paraphrase with a writer as good as Hamilton?

p.138, talking about the great Dwight Macdonald:

“Just read your excellent wrecking job on that academic bronze-ass Bruccoli’s hagiography of O’Hara,” he wrote me . . .

Followed by this delightful footnote:

I’m not sure what Dwight meant by this word, which recurs in his correspondence: its dictionary definition (minus the “ass”) is “spiritual person” or “Buddhist,” but Dwight gave it a perplexingly negative spin. Maybe such types were anathema to his practical mind.

“Minus the ‘ass'” . . . I love that!

Just as an aside, I think John O’Hara is currently underrated, not so much as a writer (but he is a pretty good writer) but as an influence. A few years ago I was disappointed to see a whole article on John Updike written by the estimable Louis Menand that didn’t mention O’Hara even once. And then Patricia Lockwood did it again: an article all about Updike with no O’Hara, despite all their similarities.

p.141, continuing with Macdonald:

“A steady stream of bouillabaisse” . . . sure, in some sense this is writing by the numbers, following up a general description with telling detail. But Atlas does it so well! I’m loving it.

p.145, in a footnote:

As regular readers of this blog will recall, a “Feynman story” is any anecdote that someone tells that is structured so that the teller comes off as a genius and everyone else in the story comes off as an idiot. The above anecdote is an anti-Feynman story: it’s amusingly cringe-worthy and I admire Atlas for sharing it with us.

p.155, writing about Edmund Wilson:

This reminds me of what I wrote about Owen, Orwell—and Atlas!—above. When writing this passage about Wilson, was Atlas thinking about himself too? Maybe so, as he does write, “Wilson was my model.”


On the essay’s last page, the thought occurs to Wilson that he might be “stranded,” out of touch with his own life and times.

Maybe this is true of all of us. I’m thinking of a mathematical argument here: Culture is a high-dimensional space, and you can’t be at the center of culture in all dimensions. And, even if you could, then you wouldn’t be the “typical set,” as they say in probability theory. Someone who is central in all dimensions is, in aggregate, extremely unusual. So in that sense maybe it’s no surprise that so many of us—all of us, maybe—feel ourselves to be out of time in some way or another.

p.161, Atlas refers to his “one published novel.” That’s very gently put. The use of the word “published” suggests that he had one or more other novels that never saw the light of day. Atlas here is acting as a biographer of himself, providing relevant information to us, the readers, but in a way that is polite and respectful to the subject, which in this case happens to be him. Kinda like when they announce the wedding date and the birth date of the first child, and it’s up to you to figure out that they are less than nine months apart.

p.163, Atlas discussing a Saul Bellow novel:

What good has it done the world? What good has it done him? What does he want? “But that’s just it—not a solitary thing. I am pretty well satisfied to be, to be just as it is willed, and for as long as I may remain in occupancy.”

I’m not the world’s biggest Saul Bellow fan (that would be Martin Amis): to me, Bellow’s writing is beautiful but hard to read, kinda like Melville. (I tried to read Moby Dick once but only got through the first few chapters before giving up from exhaustion.) Nonetheless, I reacted with joy to the above quote because it’s sooooo Bellow-like. You gotta admire someone with such a strong style.

p.166, there’s more:

Back in his apartment, I brought up the matter of the fishmonger’s cluelessness. “People aren’t aware of my presence,” Bellow said with apparent unfeigned equanimity. “What am I compared to the Cubs, the Bears?”

So Bellow-like again. Wonderful!

And on page 170, there’s more:

“There are enough people with their thumbprint on my windpipe.” I don’t know what it is, exactly, but it has the unmistakeable sound of Bellow.

And this:

I’m starting to like this Harris guy. “Stat or Staps or Stat or Stap.” Great rhythm he’s got there. It’s the kind of thing I could imagine a Bellow character saying I’m thinking that Harris spent so much time living inside Bellow that he started to write like him, or like an imitation of him.

p.171, in a footnote regarding details of biography, Atlas concludes, “Facts matter.” The stubbornness of facts: something Basbøll and I have spent a lot of time worrying over, in the course of shadowboxing with various plagiarists and bullshit artists.


Bellow saw four psychiatrists during his lifetime: Dr. Chester Raphael, a Reichian who practiced in Queens and who was the model for Dr. Sapir in his unfinished novel about Rosenfeld; Paul Meehl, a psychologist in Minneapolis he had consulted during the disintegration of his second marriage, when he was teaching at the University of Minnesota, Albert Ellis, the famous “sexologist” whom Bellow saw for what he once described as “pool room work,” or sexual technique; and Heinz Kohut.

Hey, wait a minute! Paul Meehl? Paul Meehl?? The Paul Meehl??? Yup.

And then this:

I [Atlas] had interviewed the first three, all of whom were willing, no doubt out of vanity, to violate patient/doctor (or psychologist) confidentiality.

Dayum. I guess each of us is complicated. Still, sad to hear this about one of my heroes. I would’ve hoped better of Meehl. Or maybe it was ok for him to share whatever stories he had with Atlas. Neither Meehl or Atlas is around now to discuss it.

On p.207, Atlas shares with us that Meehl is the model for Dr. Edvig in Bellow’s novel Herzog.

I think I’ll have to read Herzog now, just to learn more about Meehl. Did anyone ever write a biography of Meehl? A quick web search doesn’t reveal anything. The closest I can find is an autobiographical essay and a book, “Twelve Years of Correspondence With Paul Meehl: Tough Notes From a Gentle Genius,” by Donald R. Peterson. I don’t think either will give the insight that I’d get from some passages from Herzog. But we’ll see. I’ll report back to you once I’ve read it.

p.211: Bellow’s lawyer is named Walter Pozen. I know a law professor named David Pozen! Walter’s grandson, perhaps? Could be, no?

p.217, a footnote relating to Samuel Johnson’s sobriquet, the Great Cham:

For a long time, I thought this nickname had something to do with “champion,” but it’s actually an Anglicization of khan, someone who rules over a domain—in this case, literature.

I had no idea!

p.223, describing a family trip to Scotland:

We stayed in drafty castles and threadbare bed-and-breakfasts that would have made no Top Hundred Resorts list. . . . we were headed for “a country where no wheel has rolled,” as Johnson put it, the inns were “verminous,” the people “savages,” and the weather “dreary.”

“A country where no wheel has rolled” . . . there’s only one Samuel Johnson!

p.228, on the writing of the Life of Johnson:

Boswell had devised an ingenious method of transcription: having memorized as much as he could of a dialogue, he would scribble down rapid condensed notes, sometimes in Johnson’s presence, abbreviating all but key words—“the heads,” he called them, the ingredients of “portable soup,” “a kind of stock cube from which I could make up a broth, when the time came to feed.” It didn’t always congeal. “I have the substance,” he confided in his journal, “but the felicity of expression, the flavor, is not fully preserved unless taken instantly.” . . .

I know that feeling. I’m bad with exact quotes. If I don’t write it down word for word when I hear it, I can never reconstruct it just right. It’s so frustrating. I don’t think I could ever be a playwright. My dialogue generator just doesn’t work so well. George V. Higgins I ain’t.

Hey—I caught a mistake! On p.235, describing the apartment of sociologist Edward Shils: “On the top shelf was a long row of the Journal of American Sociology.” No! He’s thinking of the American Journal of Sociology. Or maybe the American Sociological Review. Funny how that just stuck out like a sore thumb in my reading.

p.239, reporting a conversation with Bellow:

He [Bellow] was having fun. A famous writer who had never got over the “Trotsky worship” of the 1930s he dismissed as “a grade-school radical.” A well-known Oxford academic was “a twit.” Of a literary critic who had made a career out of the Trancendentalists: “He thinks mystique is a perfume.” I [Atlas] marveled at this unguardedness, at once so calculated and so naive. Bellow never said, “Don’t quote me” or “This is off the record.”

So, OK, then who was the “grade-school radical”? Who were the Oxford academic and the literary critic? I want to know this (extremely low-level) gossip. Or maybe Atlas is making a point by not saying who he’s talking about?

p.242, I learn that Bellow was nearly jailed for perjury, for lying about his income in a divorce proceeding. Wow! I guess that five marriages pretty much sucked up all his ready cash.

p.249, getting to some of Bellow’s political leanings:

He launched into a tirade about ‘affirmative suction’—he had a weakness for bad puns . . .

I admire Atlas for giving an actual bad pun here. So often when we hear that someone likes bad puns, we get examples that are actually funny—“groaners,” but funny in their own way. But “affirmative suction”: that’s not funny, even in a so-bad-it’s-good sense. It’s just kinda crude and stupid. No big deal—all of us say stupid things from time to time, and if a biographer followed me around all day, I’m sure I’d give him plenty of raw material to make me look bad, if he so chose—still, it’s a telling detail and I appreciate that Atlas included it, rather than just portraying Bellow as a lovable curmudgeon.

p.254: “By the time I left, I was way over my limit of Bellow exposure—the amount of time I could spend around him before I got Bellow burnout. So much concentration, combined with the suppression of self, was exhausting.” I can believe that.

Also on p.254, Atlas gives this charming slice-of-life of the biographer:

Late one night in the autumn of 1993, I flew into O’Hare and got my car from the Avis lot. I loved this part of the job [emphasis added]: tossing my suitcase into the back, hanging up my jacket on the plastic hook, and driving off in a bright-colored Chevrolet Impala, fiddling with the dial until I found WFMT, “Chicago’s classical music” station, 98.7 on the dial.

Again, he’s Everyman. David Owen, not David Foster Wallace. George Orwell, not George Gershwin. Edmund Wilson, not Vladimir Nabokov. And I’m happy to be in his company.


There is no such thing as Biography School, but if there were, Shils cold have been its dean. Among the lessons he taught me: you had to place your subject in a historical context . . . you had to make people sound authentic . . . you had to listen to what people said and be skeptical about pronouncements that sounded smart but on closer scrutiny meant nothing . . .

Above all, you had to get your facts straight, however trivial they seemed (“There is no streetcar on 51st Street”), because if you got a fact wrong, even if no one noticed, it would set off a vibration of wrongness that made everything around it, all the facts and quotes and speculations, feel somehow off.

Are you listening, Marc Hauser? Brian Wansink? Susan Fiske?

Probably not. James Atlas never gave a Ted talk.

p.283, in a footnote:

“Narrative truth can be defined as the criterion we use to decide when a certain experience has been captured to our satisfaction; it depends on continuity and closure and the extent to which the fit of the pieces takes on an aesthetic finality.” Narrative Truth and Historical Truth: Meaning and Interpretation in Psychoanalysis, by Donald P. Spence, a book every biographer should read.

Interesting. I don’t like the use of the word “truth” to mean “coherence,” but on the other hand, ultimately only truth is coherent—as Mark Twain famously put it, if you tell the truth you don’t have to remember anything—so maybe this is ok after all.

p.287, footnoting a description of Nabokov as a “control freak”:

I [Atlas] have circled around this phrase, deleting and restoring it several times. It feels somehow too idiomatic, and therefor inappropriate, even faintly insulting to a master of usage like Nabokov. But isn’t the goal in writing to approximate ordinary speech? And Nabokov was a control freak. Stet.

I just love so much that Atlas cared about getting this just right. I feel the same way about each of my paragraphs—including those in my blogs.

p.295, Atlas reveals himself—briefly:

It was an older crowd, verging on the geriatric, but there were lots of younger people, too, in their thirties and forties. Bellow was read now by a new generation; he still had the goods.

You gotta be kind of old yourself to think of people in their thirties and forties as the new generation. I mean, sure, literally they are the ages of the children or grandchildren of Bellow’s first readers. But still.

p.296, Atlas talks about Bellow, Updike, and Roth. In addition to being dead white males, all three of these authors are striking to me as being perpetual children, never parents. Sure, Updike had 4 kids and Bellow had 3. But in their writings, even when they’re older, they still seem to approach the world as curious or sensitive or petulant children. They never seem to have the parental view of the world. This seems so sad to me. Having kids, if you have them, is such a central part of life. To have children but not let this affect you . . . it’s just too bad (also discussed here).

p.297, Atlas calls the Bellow home and reaches Saul’s young wife:

It was Janis who answered: “This is Mrs. Bellow.” She was friendly when I announced myself. “Hello, Jim Atlas,” she said pertly . . .

A vivid description in just a few words. Well done, Jim Atlas.

p.301, as the biography-writing continues and gets more challenging:

“Bellow’s portrait was beginning to darken, like a negative exposed to light. Even his friends had unkind things to say. . . . I [Atlas] had a disagreeable interview with Mel Tumin . . . Tumin was hostile; he dwelled on his recent gallbaldder operation, disparaged biography (“There’s no such thing as truth”), and assured me that Bellow’s girlfriends at the University of Chicago “weren’t pretty.” . . .

That last bit’s kinda funny, someone getting back at an old pal by disparaging the looks of his college girlfriends. But I guess the real message here is not to do any interviews right after major surgery.


“When is that book of yours coming out?” Bellow wrote me a few weeks before publication day. “I feel as if I should go off to Yemen.”

Yemen! Again, that just sounds sooooo Bellow. Amazing, that voice.

p.318: Atlas refers to someone informing him “with the kind of tedious precision that often attends recounting of wrongs . . .”

Indeed! I’ve done that sometimes, and I’m sure it’s tedious to others. I haven’t had a lot of wrongs done to me in my long life, but the ones that have, I’ll recount with tedious precision, that’s for sure.


I see Maggie Simmons, and we embrace. Maggie maintained a close relationship with Bellow for half a century and was, according to many, the love of his life.

The love of Bellow’s life! Here we are on page 326, the book is almost over, and this is the first time we hear about her? Or maybe Atlas is doing this on purpose, too keep adding twists to the story all the way to the very end? In any case, I feel a bit manipulated to have only heard about this person right now, so late in the book.

p.333: Atlas quotes critic James Wood as saying of Bellow being an inattentive parent:

“How, really, could the drama of paternity have competed with the drama of creativity?” asked Wood. For Bellow, the writing was the living.

I agree with Atlas that this is ridiculous. For one thing, it’s not like you need to be a creative artist to be a bad parents. Lots of people are bad parents without creating anything at all. Parenting takes work, that’s all.


History is ever regenerative. New subjects arise as the old ones disappear—including people we never heard of Virginia Woolf asked: “Is not anyone who has lived a life, and left a record of that life, worthy of biography—the failures as well as the successes, the humble as well as the illustrious?” What about all the people I’ve known who didn’t leave records of their own lives? Don’t they deserve biographies, too? Sing now of Scottie A., my best friend when I was growing up in Highland Park, Illinois, who built snow forts with me in the days when there was snow, and who died of cancer at the age of fifty-eight, which maybe wasn’t such a terrible thing as he was about to be put on trial for securities fraud. . . .

That’s a cheap laugh, but a laugh nonetheless. OK, sad too. Anyway, Atlas makes this point well with that fine one-sentence mini-biography of the unfortunate Scottie A.

And, in a footnote on p.350:

It reminds me [Atlas] of the passage in Lord Jim where the young sailor on the deck of a ship bound for the East watches “the big ships departing, the broad-beamed ferries constantly on the move, the little boats floating far below his feet, with the hazy splendor of the sea in the distance, and the hope of a stirring life in the world of adventure.”

I’ve never read Lord Jim! I guess I should.

And now we’ve come to the end.

I’m so glad Atlas put in the effort to write this book. And it all makes me so nostalgic. I think I’m gonna read Catcher in the Rye again. And I really wish Atlas were still alive to read this.

Finally, if you’ll allos me a Geoff Dyer moment, I’ll say that I find Atlas’s book about his biography more compelling than Bellow’s novels—and also more compelling than I imagine Atlas’s biography of Bellow to be. But at this point I’m curious enough that expect I will read that biography. I doubt I’ll get around to reading about Delmore Schwartz, though: his story just sounds too sad.

P.S. Jeez—I spent 2 hours writing this. Whoever of you reads this to the end . . . just remember, I wrote it for you.

Have prices have risen more quickly for people at the bottom of the income distribution than for those at the top? Lefty window-breakers wait impatiently while economists struggle to resolve this dispute.

Palko points us to this post by Mike Konczal pointing to this news article by Annie Lowrey reporting on research by Christopher Wimer, Sophie Collyer, and Xavier Jaravel finding that “prices have risen more quickly for people at the bottom of the income distribution than for those at the top.”

This new result counters an earlier study that got a bit of attention back in 2008, which I’ll get back to in a bit.

Before getting to the main topic of this post, which has nothing really to do with income inequality, let me talk about all the statistical and political challenges here.

First the political challenges. All the people mentioned above are coming from the left or center-left in the U.S. context, generally supporting economic redistribution, government regulation of the economy, and taking the side of labor in disputes with business. All these positions are relative, of course—I don’t think there are any Soviet-style communists in the room—but they have a general motivation to report that things are relatively worse for the poor. This would generally be the case, and it’s even more so during a Republican administration.

If a Democrat is president, the political motivations are more mixed: on one hand, people on the left will still want to emphasize the difficulties of being poor, but at the same time, people on the right might want to talk about rising inequality as a way to discredit the Democrats.

I don’t want to overplay this political point here. People on all sides of this discussion may well be addressing the data as honestly as they can, but we should still recognizing their political incentives.

The second political challenge is that I know two of the authors of this paper. I work with Chris Wimer and Sophie Collyer, and we’ve spent a lot of time talking about issues of measurement and poverty within households. I haven’t been involved in the particular work being discussed here—my contributions are with a survey of New York City families, and this appears to be a national study—but in any case I have this professional connection that you should be aware of.

The statistical challenge is that definitions of poverty depend, in large part, on survey responses and survey adjustments. I have no quick answers here, and I’ve not read this new study in detail. I just know that in economics, the data are not simply sitting there; they need to be constructed. And this can drive some of the differences in conclusions.

And there’s more. Going back to Konczal’s above-linked post and you’ll see a link to a post from Will Wilkinson in 2008 pointing to a Freakonomics blog post by Steven Levitt from that year. The link to Levitt’s post no longer seems to work, and the new link is missing the comment section, so I’ll point you to the Internet Archive version, where the rogue economist writes:

Inequality is growing in the United States. The data say so. Knowledgeable experts like Ben Bernanke say so. Ask just about any economist and they will agree. . . . According to two of my University of Chicago colleagues, Christian Broda and John Romalis, everyone is wrong.

Inequality has not grown over the last decade — at least not very much. What we think is a rise in inequality is merely an artifact of how we measure things.

As improbable as it may seem, I believe them.

Their argument could hardly be simpler. . . .

Let’s dissect this. The statement has to be an “improbable” surprise—“just about any economist” thinks the opposite—but yet it “could hardly be simpler.”

Levitt continues into a digression regarding “lefties” and “the sorts of people who break store windows in Davos,” which doesn’t seem to be so relevant, given that earlier he’d said that this new study was contradicting everyone, from Davos window-breakers to “just about any economist.”

Konczal also links to this 2008 post by sociologist Lane Kenworthy, who summarizes the argument of Broda and Romalis:

Income inequality has increased over time. But analysis of consumption data indicates that people with low incomes are more likely than those with high incomes to buy inexpensive, low-quality goods. In part because those goods increasingly are produced in China, their prices rose less between 1994 and 2005 than did the prices of goods the rich tend to consume. Hence the standard measure of inequality, which is based on income rather than consumption, greatly overstates the degree to which inequality increased. The incomes of the rich rose more than those of the poor, but because the cost of living increased more for the rich than for the poor, things more or less evened out.

The discussion then turns on the question of whether rich people are getting anything for these expensive purchases. Or, to put it another way, whether poor people are suffering for not being able to afford nice things. Kenworthy argues no.

Then again, I’ve collaborated with Kenworthy so maybe I’m more likely to hear out his arguments.

So, to summarize:

– In 2008 there was agreement, or tentative agreement, regarding the claim that the prices of things that poor people bought were going up more slowly than the prices of things that rich people bought. But there was disagreement about whether this should be taken to imply that consumption inequality was decreasing.

– As of 2019, it seems that the prices of things that poor people bought have been going up faster than the prices of things that rich people bought.

Is this a contradiction? I’m not sure. The time periods of the two studies differ: the 2008 study covers the 1994-2005 period, and the 2019 study covers the 2004-2018 period. So it’s possible that the poor people’s products had a relative decline in price for one decade, followed by a relative increase during the next. Also, the two studies are using different methods. It would be good if someone could apply the methods of the first study to the data of the second study, and vice-versa.

Putting this all together, you can see that the statistics and economics questions connect only tangentially to the political questions. Levitt was sharing an empirical claim, but it only took him a few paragraphs to start ranting about window-breaking leftists. Kenworthy accepted the empirical claim but refused to draw the same political conclusion. Now the empirical claim goes the other way, so the arguments about relevance can be spun in the opposite direction.

In saying all this, I’m not trying to imply that the economic questions are unimportant. I think it’s worth trying to measure these things carefully, even while interpretations can differ.

In this case, it’s a lot less effort for me to write a thousand words about the dispute, than to carefully read the two research articles and try to figure out exactly what’s going on. I skimmed through the Broda and Romalis article but then I got to Figure 4A which scared the hell out of me!

P.S. More here from Elena Botella.

Columbia statistics department is hiring!

Official announcement is below.

Please please apply to these faculty and postdoc positions. We really need some people who do serious applied work, especially in social sciences. Obv these will be competitive, but please give it a shot, because we’d like to have some strong applied candidates in the mix for all of these positions. Thanks!

The Department of Statistics at Columbia University is looking to fill multiple faculty positions. Please see here for full listings.

Tenure-Track Assistant Professor (review of applications begins on November 29, 2019) This is a tenure-track Assistant Professor position to begin July 1, 2020. A Ph.D. in statistics or a related field is required. Candidates will be expected to sustain an active research and publication agenda and to teach in the departmental undergraduate and graduate programs. The field of research is open to any area of statistics and probability.

Assistant Professor (limited-term) (*multiple openings*; review of applications begins on December 2, 2019) These are four-year term positions at the rank of Assistant Professor to begin July 1, 2020. A Ph.D. in statistics or a related field is required, as is a commitment to high-quality research and teaching in statistics and/or probability. Candidates will be expected to sustain an active research and publication agenda and to teach in the departmental undergraduate and graduate programs. Candidates with expertise in machine learning, big data, mathematical finance, and probability theory are particularly encouraged to apply.

Lecturer in Discipline (review of applications begins on January 6, 2020) This is a full-time faculty appointment with multi-year renewals contingent on successful reviews. This position is to contribute to the Departmental educational mission at the undergraduate and masters level.

The department currently consists of 35 faculty members and 59 Ph.D. students. The department has been expanding rapidly and, like the University itself, is an extraordinarily vibrant academic community. We are especially interested in candidates who, through their research, teaching and/or service, will contribute to the diversity and excellence of the academic community.

In addition to the above faculty positions, the department is also considering applications to our Distinguished Postdoctoral Fellowships in Statistics. Review of applications begins on January 13, 2020. See for details.

Women and minorities are especially encouraged to apply. For further information about the department and our activities, centers, research areas, and curricular programs, please go to our web page at

P.S. Thanks to Zad Chow for the above photo, which demonstrates how relaxed you’ll be as one of our colleagues here at Columbia.

The incentives are all wrong (causal inference edition)

I was talking with some people the other day about bad regression discontinuity analyses (see this paper for some statistical background on the problems with these inferences), examples where the fitted model just makes no sense.

The people talking with me asked the question: OK, we agree that the published analysis was no good. What would I have done instead? My response was that I’d consider the problem as a natural experiment: a certain policy was done in some cities and not others, so compare the outcome (in this case, life expectancy) in exposed and unexposed cities, and then adjust for differences between the two groups. A challenge here is the discontinuity—the policy was implemented north of the river but not south—and that’s a challenge, but this sort of thing arises in many natural experiments. You have to model things in some way, make some assumps, no way around it. From this perspective, though, the key is that this “forcing variable” is just one of the many ways in which the exposed and unexposed cities can differ.

After I described this possible plan of analysis, the people talking with me agreed that it was reasonable, but they argued that such an analysis could never have been published in a top journal. They argued that the apparently clean causal identification of the regression discontinuity analysis made the result publishable in a way that a straightforward observational study would not be.

Maybe they’re right.

If so, that’s really frustrating. We’ve talked a lot about researchers’ incentives to find statistical significance, to hype their claims and not back down from error, etc., as well as flat-out ignorance, as in the above example, researchers naively thinking that some statistical trick can solve their data problems. But this latest thing is worse: the idea that a better analysis would have a lower chance of being published in a top journal, for the very reasons that makes it better. Talk about counterfactuals and perverse incentives. How horrible.

Filling/emptying the half empty/full glass of profitable science: Different views on retiring versus retaining thresholds for statistical significance.

Unless you are new to this blog, you likely will know what this is about.

Now, by profitable science in the title is meant repeatedly producing logically good explanations  which “through subjection to the test of experiment experiment, to lead to the avoidance of all surprise and to the establishment of a habit of positive expectation that shall not be disappointed.” CS Peirce

It all started with a Nature commentary by Valentin Amrhein, Sander Greenland, and Blake McShane. Then the discussion , then thinking about it , then an argument that it is sensible and practical , then an example of statistical significance not working and then a dissenting opinion by Deborah Mayo .

Notice the lack of finally!

However, Valentin Amrhein, Sander Greenland, and Blake McShane have responded with a focused and concise discernment why they think retiring statistical significance will fill up the glass of profitable science while maintaining hard default thresholds for declaring statistical significance will continue to empty it. Statistical significance gives bias a free pass. This is their just published letter to the editor (JPA Ioannidis) on TA Hardwicke and JPA Ioannidis’ Petitions in scientific argumentation: Dissecting the request to retire statistical significance, where Hardwicke and Ioannidis argued (almost) the exact opposite.

“In contrast to Ioannidis, we and others hold that it is using – not retiring – statistical significance as a “filtering process” or “gatekeeper” that “gives bias a free pass”. “

A two sentence excerpt that I liked the most was “Instead, it [retiring statistical significance] encourages honest description of all results and humility about conclusions, thereby reducing selection and publication biases. The aim of single studies should be to report uncensored information that can later be used to make more general conclusions based on cumulative evidence from multiple studies.”

However, the full letter to the editor is only slightly longer than two pages – so should be read in full – Statistical significance gives bias a free pass.

I also can’t help but wonder how much of the discussion that ensued from the initial  Nature commentary could have been avoided if less strict page limitations had been allowed.

Now it may seem strange for an editor who is also an author on the paper drawing a critical letter to the editor – accepts it. It happens, but not always. I also submitted a letter to the editor on this same paper and the same editor rejected it without giving a specific reason. That full letter of mine is below for those who might be interested.

My letter was less focused but had three main points. Someone with a strong position on a topic that undertakes to do a survey themselves displaces the opportunity for others without such strong positions to learn more, univariate  summaries of responses can be misleading and pre-registration (minor) violations and comments (only given in the appendix) can provided insight into the quality of the design and execution of thw survey. For instance, the authors had anticipated analyzing nominal responses with correlation analysis.

Read more.

Continue reading ‘Filling/emptying the half empty/full glass of profitable science: Different views on retiring versus retaining thresholds for statistical significance.’ »

“The paper has been blind peer-reviewed and published in a highly reputable journal, which is the gold standard in scientific corroboration. Thus, all protocol was followed to the letter and the work is officially supported.”

Robert MacDonald points us to this news article by Esther Addley:

It’s another example of what’s probably bad science being published in a major journal, where other researchers point out its major flaws and the author doubles down.

In this case, the University of Bristol has an interesting reaction. It’s pulled down its article praising the research, which is good, but it’s also distancing itself from him. Whereas it was originally very happy to associate itself with this work, now they’re saying it was done independently and has nothing to do with Bristol. I’m actually pretty disappointed in that, partly because they can’t have it both ways but also because it seems (to me) like it weakens the university-faculty relationship.

The author’s response (dripping with arrogance) is a concise summary of the sort of “published research is unquestionable” mentality you’ve been talking about. As quoted in the article:

The paper has been blind peer-reviewed and published in a highly reputable journal, which is the gold standard in scientific corroboration. Thus, all protocol was followed to the letter and the work is officially supported. Given time, many scholars will have used the solution for their own research of the manuscript and published their own papers, so the small tide of resistance will wane.

I find it particularly interesting that he’s arguing not that others will see he’s right, but that other people will start using his results — so I guess the resistance will dry up because his results will become embedded in the fabric of the whole field.

Yup. The research incumbency rule. Just horrible.

How to teach sensible elementary statistics to lower-division undergraduates?

Kevin Carlson writes:

Though my graduate education is in mathematics, I teach elementary statistics to lower-division undergraduates.

The traditional elementary statistics curriculum culminates in confidence intervals and hypothesis tests. Most students can learn to perform these tests, but few understand them. It seems to me that there’s a great opportunity to reform the elementary curriculum along Bayesian lines, but I also see no texts that attempt to bring Bayesian techniques below the prerequisite level of calculus and linear algebra. Do you think it’s currently possible to teach elementary stats in a Bayesian way? If not now, what might need to happen before this became possible?

My reply:

I do think there’s a better way to teach introductory statistics but I’m not quite there yet. I think we’d want to do it using simulation, but inference is a sticking point.

To start with, let’s consider three levels of intro stat:

1. The most basic, “stats for poets” class that provides an overview but few skills and no derivations. Currently this seems to usually be taught as a baby version of a theoretical statistics class, and that doesn’t make sense. Instead I’m thinking of a course where each week is a different application area (economics, psychology, political science, medicine, sports, etc.) and then the concepts get introduced in the context of applications. Methods would focus on graphics and simulation.

2. The statistics course that would be taken by students in social science or biology. Details would depend on the subject area, but key methods would be comparisons/regression/anova, simple design of experiments and bias adjustment, and, again, simulation and graphics. The challenge here is that we’d want some inference (estimates and standard errors, and, at the theoretical level, discussions of bias and variance) but this all relies on concepts such as expectation, variance, and some version of Bayesian inference, and all of these can only be taught at a shallow level.

3. A statistics class with mathematical derivations. For this you should be able to teach the material any way you want, but in practice these classes have a pretty shallow mathematical level and give pseudo-proofs of the key results. I don’t think there’s any way to teach statistics rigorously in one semester from scratch. You really need that one semester on probability theory first.

Option #2 above is closest to what I teach, and it’s what Jennifer and Aki and I do in our forthcoming book, Regression and Other Stories. We do lots of computing, and we keep the math to a minimum. Bayes is presented as a way of propagating error in predictions, and a way to include prior information in an analysis. We don’t do any integrals.

I’m not yet sure how to do the intro stat course. Regression and Other Stories starts from scratch, but the students who take that class have already taken introductory statistics somewhere else.

For that first course, I think we need to teach the methods and the concepts, without pretending to have the derivations. Students who want the derivations can go back and learn probability theory and theoretical statistics.

Hey, Stan power users! PlayStation is Hiring.

Imad writes:

The Customer Lifecycle Management team at PlayStation is looking to hire a Senior Data Modeler (i.e. Data Scientist). DM me if you like building behavioral models and working with terabytes of data. You’ll have the opportunity use whatever tools you want (e.g. Stan) to build your models.

I’m not into videogames myself, but for the right person I’m guessing this job would be a lot of fun.

The dropout rate in his survey is over 60%. What should he do? I suggest MRP.

Alon Honig writes:

I work for a cpg company that conducts longitudinal surveys for analysis of customer behavior. In particular they wanted to know how people are interacting with our product. Unfortunately the designers of these surveys put so many questions (100+) that the dropout rate (those that did not complete the survey) was over 60%. The researchers of the data (all had an academic background) told me that this drop rate was in fact quite normal for such studies. In the past when I did marketing analysis we would start getting concerned about a dataset when the dropout rate was above 20%. That is because we knew there was something strange about the remaining population, making inference on the general population faulty. The research team acknowledged the issue bit didn’t seem to be concerned about the bias of their findings.

I wanted to know how we should think about the dropout rate after conducting a survey. Does this mean we should create a new one? Or should we adjust our results to account for this? What is a reasonable rate anyways?

My quick answer is to do multilevel regression and poststratification to adjust for known differences between sample and population. Use multilevel regression to model each outcome of interest, conditional on whatever variables you think are predictive of dropout and the outcome. In your regression, include interactions of these predictors with anything you care about in your modeling. Then use poststratification to take the predictions from your model and average them over your population.

And, yes, you should still try your best to minimize dropout, and to identify what factors determine dropout so that you can try to measure them and include them in your model.

P.S. Just to clarify: I’m not saying that MRP automatically solves this problem. What I’m saying is that MRP is a framework that can allow us to attack the problem.