Keith O’Rourke

From his obituary:

Keith worked in landscaping throughout his undergraduate years at the University of Toronto, then at Moore Business Forms, and in the western and northern provinces and territories in the field of compressed air before returning to UofT to complete and MBA and undertake an MSc. For many years he worked as a biostatistician at the Toronto General Hospital and the Ottawa Hospital on numerous studies in the fields of cancer, diabetes, SARS and infectious diseases research before completing a PhD at Oxford University in 2004. Having worked at Duke University, McGill and Queen’s, he joined Health Canada in the late 2000’s as a biostatistician, initially in health care and then in pesticide management. A conscientious intellectual and deep-thinker, Keith endeavoured to make our world a better and safer place.

Longtime readers of this blog will recognize Keith for his occasional posts and many comments and his idealistic and skeptical take on medical statistics. We’re all sorry to hear that he is gone.

Centering predictors in Bayesian multilevel models

Someone who goes by the name John Snow writes:

I’ve been moving into Bayesian modeling and working to translate my understanding and approach to a probabilistic framework. This is a general question about using mean centering to handle time-varying predictors in hierarchical Bayesian models for longitudinal data.

To motivate this question, imagine I have a dataset where a group of people rated their sleep quality and recorded the number of alcoholic drinks they had the day before for 30 days. I want to estimate the relationship between alcohol consumption and sleep quality over time. I land on a multilevel model with random intercepts and slopes; time points are at level 1 and people are at level 2. Pretty straightforward.

One recommendation for handling a time-varying predictor like alcohol consumption would be to create two versions using mean centering: one version is person-centered, where you subtract a person’s mean alcohol consumption from all their values (Xi – X_bar); the other is grand mean-centered, where you take the person’s mean alcohol consumption and subtract the grand mean of alcohol consumption (X_bar – X_gm). (As I understand it the subtracted value could actually be any constant but it seems like the grand mean is used for convenience most of the time). You would then enter the person-centered version as a level 1 predictor and use the grand-mean centered version to explain random intercept and slope variance. The idea is that person-centering isolates the within-person variation and grand mean centering isolates between-person variation. If instead you entered alcohol consumption into the model as a level 1 fixed effect without mean centering, the resulting estimate would capture a mixture of within person and between person variance. Lesa Hoffman has called this a “smushed” effect.

A lot has been written about mean centering in the frequentist MLM literature, and there is a lot of debate and argument about when and how to use mean centering for substantive reasons beyond just making the intercept interpretable. However, I’ve not seen any discussion of this topic in books on the subject or seen it used in code examples (I’m primarily using Pymc). I can’t help but wonder why that is. Is it because mean centering isn’t really needed in a Bayesian MLM? Or, is it just a function of the way people think about and approach MLMs in Bayesian stats?

My reply:

Yes, this topic has come up from time to time, for example:

from 2006: Fitting multilevel models when predictors and group effects correlate

from 2008: “Beyond ‘Fixed Versus Random Effects’”

from 2015: Another example of why centering predictors can be good idea

from 2017: Fitting multilevel models when predictors and group effects correlate

Update 3 – World Cup Qatar 2022 predictions (round of 16)

World Cup 2022 is progressing, many good matches and much entertainment. Time then for World Cup 2022 predictions of the round of 16 matches from our DIBP model  – here the previous update. In the group stage matches the average of the model probabilities for the actual final results was about 0.52.

Here there are the posterior predictive match probabilities for the held-out matches of the Qatar 2022 round of 16 to be played from December 3rd to December 6th, along with some ppd ‘chessboard plots’ for the exact outcomes in gray-scale color – ‘mlo’ in the table denotes the ‘most likely result’ , whereas darker regions in the plots correspond to more likely results. In the plots below, the first team listed in each sub-title is the ‘favorite’ (x-axis), whereas the second team is the ‘underdog’ (y-axis). The 2-way grid displays the 8 held-out matches in such a way that closer matches appear at the top-left of the grid, whereas more unbalanced matches (‘blowouts’) appear at the bottom-right.  The matches are then ordered from top-left to bottom-right in terms of increasing winning probability for the favorite teams. The table reports instead the matches according to a chronological order.

Apparently, Brazil is highly favorite against South Korea, and Argentina seems much ahead against Australia, whereas much balance is predicted for Japan-Croatia, Netherlands-United States and Portugal-Switzerland. Note: take in consideration that these probabilities refer to the regular times, then within the 90 minutes. The model does not capture supplementary times probabilities.

You find the complete results, R code and analysis here. Some preliminary notes and model limitations can be found here.

Next steps: we’ll update the predictions for the quarter of finals. We are still discussing about the possibility to report some overall World Cup winning probabilities, even though I am personally not a huge fan of these ahead-predictions (even coding this scenario is not straightforward…!). However, we know those predictions could be really amusing for fans, so maybe we are going to report them after the round of 16. We also could post some pp checks for the model and more predictive performance measures.

Stay tuned!

The “noble lie” in science reporting

David Weakliem writes:

A few days ago, the New York Times had an opinion piece by Huw Green, a clinical psychologist, which said “A clear causal link between psychiatric illness and gun violence has not been established…” I followed the link, which was an interview with Ragy Girgis, a professsor of psychiatry at Columbia University. That story had a caption saying “Findings from the Columbia database help dispel the myth that having a severe psychiatric illness is predictive of who will perpetrate mass murder.” It also contained a link to an article by Dr Girgis and others using the database (an attempt to compile a comprehensive list of mass murders since 1900), which said “the prevalence of psychotic symptoms among mass murderers is much higher than that in the general population (11% v. approximately 0.3-1%).” That is, people with psychotic symptoms were between 10 and 30 times more likely to commit mass murder than people without psychotic symptoms.

Weakliem continues:

How did we go from 10 to 30 times more likely to “dispel the myth”? The interviewer asked “Are people with mental health disorders more likely to commit mass shootings or mass murder?”

Good question. Here’s what Weakliem suggests:

This could just be a case of miscommunication–anything involving probabilities can be confusing. However, I [Weakliem] think it’s an example of a more general problem: sometimes a focus on making sure that people don’t draw the wrong conclusions comes at the expense of explaining what the research actually found.

I [Weakliem] first noted this when writing a post on a study of coffee consumption, where accounts emphasized a point that wasn’t supported by the data: that benefits only occurred with moderate consumption, not high consumption. I saw another example later in the summer, when a study of diet and exercise was described as showing “that healthy eating and regular workouts do not, in isolation, stave off later health issues. They need to be done together.” In fact, the study suggested exactly the opposite—exercise and diet had additive effects on mortality, and no interactions were found. The reason seemed to be a goal of getting people to think of exercise as a way to improve one’s overall health rather than a way of getting away with a bad diet . . .

I think Weakliem has a point, that health reporting often seems to have two conflicting aims: (1) to report the science, and (2) to encourage healthy behavior, and these goals can interfere. An additional twist comes when bad science is being reported, and journalists put skepticism aside when reporting claims that support their political or scientific ideologies.

Alison Bechdel (4) vs. Willie Nelson; Hammurabi advances

We had a good exchange the other day. Manuel led off with a strong argument for Kahlo:

I can’t believe Frida is unseeded, what’s cooler than the combination of art, politics, sex & tequila shots?

Regarding the current competition, can you imagine how physically punishing can be to attend a Hammurabi seminar? His main publication weighs almost 4 tons! What if he starts to distribute handouts and you forgot to bring a forklift?

But then Anonymous Pigeon bounced it right back:

Well then you can simply take your favorite law home with you to hang on your wall. Just jot a mental note to bring a crane.

Also strong supporting arguments from Raghu:

There have been so many blog posts here about bad / inept / fraudulent scientific studies that I’m warming to the idea of “physically punishing the perpetrator.”

With two more commenters adding a solid one-two punch:

John: I really wanna see what Hammurabi does with PowerPoint.

Jeff: More than enough bullets for everyone.

So the lawgiver it is! Just bring a crane.

Today’s matchup

Alison Bechdel is fourth seeded in the “Creators of laws or rules” category (see some blog discussion here). Willie Nelson is . . . an alleged tax cheat. (If someone’s in this competition and you can’t guess their category, then the “Alleged tax cheats” category is good place to check.)

Who should be? On one hand, Bechdel created the Bechdel Test; long after her books have been forgotten, we’ll still have her Test. On the other hand, the theater production of Fun Home was excellent but had mediocre music; I’m guessing Nelson could’ve helped with that. On the other other hand, with no tax money our government would fall apart: no public schools, libraries, parks, fire departments, police departments, army, navy, NSF, NIH, . . . to be honest, Columbia would probably no longer have the funds to run this seminar. So it’s a tough call, and we need your help with some witty arguments!

Again, here are the announcement and the rules.

Not frequentist enough.

I think that many mistakes in applied statistics could be avoided if people were to think in a more frequentist way.

Look at it this way:

In the usual way of thinking, you apply a statistical procedure to the data, and if the result reaches some statistical-significance threshold, and you get similar results from a robustness study, changing some things around, then you’ve made a discovery.

In the frequentist way of thinking, you consider your entire procedure (all the steps above) as a single unit, and you consider what would happen if you apply this procedure to a long series of similar problems.

The first thing to recognize is that the frequentist way of thinking requires extra effort: you need to define this potential series of similar problems and then either do some mathematical analysis or, more likely, set up a simulation on the computer.

In the usual way of teaching statistics, the extra effort required by the frequentist approach is not clear, for two reasons. First, textbooks present the general theory in the context of very simple examples such as linear models with no selection, where there are simple analytic solutions. Second, textbook examples of statistical theory typically start with an assumed probability model for the data, in which case most of the hard work has already been done. The model is just there, postulated; it doesn’t look like a set of “assumptions” at all. It’s the camel that is the likelihood (although, strictly speaking, the likelihood is not the data model; additional assumptions are required to go from an (unnormalized) likelihood function to get a generative model for the data).

An example

To demonstrate this point, I’ll use an example from a recent article, Criticism as asynchronous collaboration: An example from social science research, where I discussed a published data analysis that claimed to show that “politicians winning a close election live 5–10 years longer than candidates who lose,” with this claim being based on a point estimate from a few hundred elections: the estimate was statistically significantly different from zero and similar estimates were produced in a robustness study in which various aspects of the model were tweaked. The published analysis was done using what I describe above as “the usual way of thinking.”

Now let’s consider the frequentist approach. We have to make some assumptions. Suppose to start with that losing an election has the effect of increasing your lifespan by X years, where X has some value between -1 and 1. (From an epidemiological point of view, an effect of 1 year is large, really on the high end of what could be expected from something as indirect as winning or losing an election.) From there you can work out what might happen from a few hundred elections, and you’ll see that any estimate will be super noisy, to the extent that if you fit a model and select on statistical significance, you’ll get an estimated effect that’s much higher than the real effect (a large type M error, as we say). You’ll also see that, if you want to get a large effect (large effects are exciting, right!) then you’ll want the standard error of your estimate to be larger, and you can get this by the simple expedient of predicting future length of life without including current age as a predictor. For more discussion of all these issues, see section 4 of the linked article. My point here is that whatever analysis we do, there is a benefit to thinking about it from a frequentist perspective—what would things look like if the procedure were replied repeatedly to many datasets?—rather than to fixate on the results of the analysis as applied to the data at hand.

“But shouldn’t we prefer these outside delusions . . .”: Malcolm Gladwell in a nutshell

I was reading a recent New Yorker and what should I come to but a Malcolm Gladwell article. With the same spirit that leads us to gawk at car crashes, I read it.

I gotta give Gladwell some credit for misdirection on this one. It was an article about corporate executive and financial fraudster Jack Welch, and in the magazine’s the table of contents the article was listed as, “General Electric’s legendary C.E.O.” The article itself was titled, “Severance: Jack Welch was the most admired C.E.O. of his day. Have we learned the right lessons from him.” And, right near the beginning of the article is the Gladwellesque line, “The great C.E.O.’s have an instinct for where to turn in a crisis, and Welch knew whom to call.”

“The great C.E.O.’s” . . . nice one! But then, as the article goes on, Gladwell ultimately gives it an anti-Welch spin, arguing that the famously ruthless executive had no values. An interesting twist, actually. As I said, a nice bit of misdirection, which the New Yorker kinda ruined in its online edition by changing the title to, “Was Jack Welch the Greatest C.E.O. of His Day—or the Worst? As the head of General Electric, he fired people in vast numbers and turned the manufacturing behemoth into a financial house of cards. Why was he so revered?” Kind of gives the game away, no?

“But shouldn’t we prefer these outside delusions . . .”

What really jumped out at me when reading this article, though, was not the details about Welch—some guy who had a talent for corporate infighting and was willing to cheat to get what he wanted—but this bit from Gladwell:

It has become fashionable to deride today’s tech C.E.O.s for their grandiose ambitions: colonizing Mars, curing all human disease, digging a world-class tunnel. But shouldn’t we prefer these outsized delusions to the moral impoverishment of Welch’s era?

This is horrible in so many ways.

First, there’s the empty, tacky, “It has become fashionable” framing. I can just imagine this dude when Copernicus came out with his ideas. “It has become fashionable to say that the Earth goes around the Sun. But . . .” Or, during the mid-twentieth century, “It has become fashionable to claim that cigarette smoking causes cancer. But . . .” Or, more recently, “It has become fashionable to claim that university officials should take responsibility when children are being sexually abused on campus. But . . .” Or, “It has become fashionable to argue that planes take off into the wind. But . . .”

I absolutely detest when writers take an idea they disagree with and label it as “fashionable,” as if it makes them adorable rogues to take the other side. What next, a hot take that Knives Out was really a bad movie? After all, it sold a lot of tickets and the critics loved it. It’s “fashionable,” right? Let me, right here, stake out the contrarian position that a take can be unfashionable, contrarian, and dumb.

And then the three examples: “colonizing Mars, curing all human disease, digging a world-class tunnel.” Which one does not belong, huh?

– “Colonizing Mars” may never happen, and it might be a bad idea even if it could happen (to the extent that such a hugely expensive project would take resources away from more pressing concerns), but it’s undeniably cool, and it’s bold. OK, the concept of colonizing Mars isn’t so bold—it’s century-old science fiction—but to actually do it, yeah, that would be awesome.

– “Curing all human disease”: that would be absolutely wonderful. I can only assume it’s an impossible goal, but it would be amazing to get just part of the way there, and there’s no logical reason that some progress can’t me made. I can see how this would appeal to a tech C.E.O., or to just about anyone.

– “Digging a world-class tunnel” . . . Huh? That’s not much of an ambition at all! World-class tunnels already exist! It’s hardly an “outsized delusion” to want to do this. All you need is a pile of money and a right-of-way. But . . . when referring to a “world-class tunnel” Gladwell couldn’t possibly be referring to this public relations stunt, could he?

Anyway, kind of revealing that he puts digging a tunnel in the same category as colonizing Mars or curing all human disease. I guess those Hyperloop press releases really worked on him!

In any case, the idea that “outsized delusions” are a good thing: it’s just kinda funny to hear this, but maybe not such a surprise coming from Gladwell. I was curious on his take on other executives with outsized delusions so I googled *gladwell theranos* and came across this interview where he answers a question about “The book I couldn’t finish”:

I [Gladwell] don’t finish books all the time. But the last book I couldn’t finish? I really, really wanted to finish John Carreyrou’s book on the Theranos scam, Bad Blood. But halfway through, I started saying to myself: “I get it! I get it! She made it all up!”

“Halfway through,” huh? I think all the other readers of that book caught on in the first few pages what was going on.

Gladwell sounds like the kind of guy who turns off the Columbo episode after 45 minutes because he’s finally figured out who the killer is.

The more I thought about them, the less they seemed to be negative things, but appeared in the scenes as something completely new and productive

This is Jessica. My sabbatical year, which most recently had me in Berkeley CA,  is coming to an end. For the second time since August I was passing through Iowa. Here it is on the way out to California from Chicago and on the way back.

A park in Iowa in AugustA part in Iowa in November

If you squint (like, really really squint), you can see a bald eagle overhead in the second picture.

One association that Iowa always brings to mind for me is that Arthur Russell, the musician, grew up there. I have been a fan of Russell’s music for years, but somehow had missed Iowa Dream, released in 2019 (Russell died of AIDS in 1992, and most of his music has been released posthumously). So I listened to it while we were driving last week. 

Much of Iowa Dream is Russell doing acoustic and lofi music, which can be surprising if you’ve only heard his more heavily produced disco or minimalist pop. One song, called Barefoot in New York, is sort of an oddball track even amidst the genre blending that is typical of Russell. It’s probably not for everyone, but as soon as I heard it I wanted to experience it again. 

NPR called it “newfound city chaos” because Russell wrote it shortly after moving to New York, but there’s also something about the rhythm and minutae of the lyrics that kind of reminds me of research. The lyrics are tedious, but things keep moving like you’re headed towards something. The speech impediment evokes getting stuck at times and having to explore one’s way around the obstruction. Sometimes things get clear and the speaker concludes something. Then back to the details that may or may not add up to something important. There’s an audience of backup voices who are taking the speaker seriously and repeating bits of it, regardless of how inconsequential. There’s a sense of bumbling yet at the same time iterating repeatedly on something that may have started rough but becomes more refined. 

Then there’s this part:

I really wanted to show somehow how things deteriorate

Or how one bad thing leads to another

At first, there were plenty of things to point to

Lots of people, places, things, ideas

Turning to shit everywhere

I could describe these instances

But the more I thought about them

The less they seemed to be negative things

But appeared in the scenes as something completely new and productive

And I couldn’t talk about them in the same way

But I knew it was true that there really are

Dangerous crises

Occurring in many different places

But I was blind to them then

Once it was easy to find something to deplore

But now it’s even worse than before

I really like these lyrics, in part because they make me uncomfortable. On the one hand, the idea of wanting to criticize something, but losing the momentum as things become harder to dismiss closer up, seems opposite of how many realizations happen in research, where a few people start to notice problems with some conventional approach and then it becomes hard to let them go. The replication crisis is an obvious example, but this sort of thing happens all the time. In my own research, I’ve been in a phase where I’m finding it hard to unsee certain aspects of how problems are underspecified in my field, so some part of me can’t relate to everything seeming new and productive. 

But at the same time the idea of being won over by what is truly novel feels familiar when I think about the role of novelty in defining good research. I imagine this is true in all fields to some extent, but especially in computer science, there’s a constant tension around how important novelty is in determining what is worthy of attention. 

Sometimes novelty coincides with fundamentally new capabilities in a way that’s hard to ignore. The reference to potentially “dangerous crises” brings to mind the current cultural moment we’re having with massive deep learning models for images and text. For anyone coming from a more classical stats background, it can seem easy to want to dismiss throwing huge amounts of unlabeled data at too-massive-and-ensembled-to-analyze models as a serious endeavor… how does one hand off a model for deployment if they can’t explain what it’s doing? How do we ensure it’s not learning spurious cues, or generating mostly racist or sexist garbage? But the performance improvements of deep neural nets on some tasks in the last 5 to 10 years is hard to ignore, and phenomena like how deep nets can perfectly interpolate the training data but still not overfit, or learn intermediate representations that align with ground truth even when fed bad labels, makes it hard to imagine dismissing them as a waste of our collective time. Other areas, like visualization, or databases, start to seem quaint and traditional. And then there’s quantum computing, where the consensus in CS departments seems to be that we’re going all in regardless of how many years it may still be until its broadly usable. Because who doesn’t like trying to get their head around entanglement? It’s all so exotic and different.

I think many people gravitate to computer science precisely because of the emphasis on newness and creating things, which can be refreshing compared to fields where the modal contribution it to analyze rather than invent. We aren’t chained to the past the way many other fields seem to be. It can also be easier to do research in such an environment, because there’s less worry about treading on ground that’s already been covered.

But there’s been pushback about requiring reviewers to explicitly factor novelty into their judgments about research importance or quality, like by including a seperate ranking for “originality” in a review form like we do in some visualization venues. It does seem obvious that including statements like “We are first to …” in the introduction of our papers as if this entitles us to publication doesn’t really make the work better. In fact, often the statements are wrong, at least in some areas of CS research where there’s a myopic tendency to forget about all but the classic papers and what you saw get presented in the last couple years. And I always cringe a little when I see simplistic motiations in research papers like, no one has ever has looked at this exact combination (of visualization, form of analysis etc) yet. As if we are absolved of having to consider the importance of a problem in the world when we decide what to work on.

The question would seem to be how being oriented toward appreciating certain kinds of novelty, like an ability to do something we couldn’t do before, affects the kinds of questions we ask, and how deep we go in any given direction over the longer term. Novelty can come from looking at old things in new ways, for example developing models or abstractions that relate previous approaches or results. But these examples don’t always evoke novelty in the same way that examples of striking out in brand new directions do, like asking about augmented reality, or multiple devices, or fairness, or accessibility, in an area where previously we didn’t think about those concerns much.

If a problem is suddenly realized to be important, and the general consensus is that ignoring it before was a major oversight, then its hard to argue we should not set out to study the new thing. But a challenge is that if we are always pursuing some new direction, we get islands of topics that are hard to relate to one another. It’s useful for building careers, I guess, to be able to relatively easily invent a new problem or topic and study it in a few papers then move on. And I think it’s easy to feel like progress is being made when you look around at all the new things being explored. There’s a temptation I think to assume that  it will all “work itself out” if we explore all the shiny new things that catch our eye, because those that are actually important will in the end get the most attention. 

But beyond not being to easily relate topics to one another, a problem with expanding, at all times, in all directions at once, would seem to be that no particular endeavor is likely to be robust, because there’s always an excitement about moving to the next new thing rather than refining the old one. Maybe all the trendy new things distract from foundational problems, like a lack of theory to motivate advances in many areas, or sloppy use of statistics. The perception of originality and creativity certainly seem better at inspiring people than obsessing over being correct.

Barefoot in NY ends with a line about how, after having asked whether it was in “our best interest” to present this particular type of music, the narrator went ahead and did it, “and now, it’s even worse than before.” It’s not clear what’s worse than before, but it captures the sort of committment to rapid exploration, even if we’re not yet sure how important the new things are, that causes this tension. 

Hammurabi (1) vs. Frida Kahlo; Seuss advances

We’re done with the first half of the bracket!

In the battle of Yertle vs. Creed, Jonathan offers a Bayesian take:

When I was a child I was frightened by Thing 1 and Thing 2. Not sure I’ve gotten over it. But statistically, are they exchangeable?

Unfortunately, I’m not sure if that argument favors Seuss (as a creator of an example that is relevant to probability modeling) or Jordan (who is, based on first and last name, exchangeable with a well-known shoe salesman).

So we’ll have to go with Dmitri’s data-based recommendation:

Sometime around junior high school, my entire class was bused down to Springfield, MA to hear a talk by Dr. Seuss, who spoke to a giant auditorium of adolescents, illustrating his words by very quickly sketching on a giant pad of paper propped on an easel. The whole room started out WAY to cool for silly kids’ stuff and ended up completely captivated. By the end you could hear a pin drop. Still one of the greatest lectures I’ve ever heard, and I’ve heard some good ones.

So, you can advance whoever you want but I’m going to wherever Seuss is talking.

There’s no point in having the seminar if Dmitri’s not in the audience, so Seuss it is. Quick, Henry, the Flit!

Today’s matchup

Hammurabi is the top seed in the very important “Creators of laws or rules” category. I just looked it up on wikipedia and it turns out that it’s at the Louvre, so I could go check it out. Look on my Works, ye Mighty, and despair! On the other side is Frida Kahlo from the “Cool people” category. Frida is undeniably cool. Continuing on the wikipedia theme: Frida has pages in 152 languages (sample item: “In 2018, Mattel unveiled seventeen new Barbie dolls in celebration of International Women’s Day, including one of Kahlo. Critics objected to the doll’s slim waist and noticeably missing unibrow.”), Hammurabi in only 103 (sample item: “Earlier Sumerian law codes had focused on compensating the victim of the crime, but the Code of Hammurabi instead focused on physically punishing the perpetrator.”). I’d guess that either of them could give a good talk—or not.

Again, here are the announcement and the rules.

Hey! Check out this short new introductory social science statistics textbook by Elena Llaudet and Kosuke Imai

Elena Llaudet points us to this new textbook for complete beginners. Here’s the table of contents:

1. Introduction [mostly on basic operations in R]

2. Estimating Causal Effects with Randomized Experiments [goes through an experiment on class sizes and test scores]

3. Inferring Population Characteristics via Survey Research [goes through the example of who in U.K. supported Brexit]

4. Predicting Outcomes Using Linear Regression [example of predicting GDP from night-time light emissions]

5. Estimating Causal Effects with Observational Data [example of estimating effects of Russian TV on Ukrainians’ voting behavior]

6. Probability [distributions, law of large numbers, central limit theorem]

7. Quantifying Uncertainty [estimation, confidence intervals, hypothesis testing]

And the whole thing is less than 250 pages! I haven’t looked at the whole book, but what I’ve seen is very impressive. Also, it’s refreshing to see an intro book proceeding from an entirely new perspective rather than just presenting the same old sequence of topics. There are lots of intro statistics books out there, with prices ranging from $0 to $198.47. This one is different, in a good way.

Seems like a great first book on statistics, especially for the social sciences—but not really limited to the social sciences either, as the general concepts of measurement, comparison, and variation arise in all application areas.

After you read, or teach out of, Llaudet and Imai’s new book, I recommend our own Regression and Other Stories, which I love so much that Aki and I have almost finished an entirely new book full of stories, activities, and demonstrations that can be used when teaching that material—but Regression and Other Stories is a lot for students who don’t have previous statistical background, so it’s good to see this new book as a starting point. As the title says, it’s a friendly and practical introduction!

Dr. Seuss (1) vs. Michael B. Jordan; Dylan advances

Jonathan gets the ball rolling:

I’m going to go with Eliot, simply because genteel antisemitism is so much more refreshing than the stuff we have today.

Adam points out a connection:

Bob Dylan sang about T. S. Eliot (in Desolation Row–which came out the same year Eliot died) but I don’t think T. S. Eliot ever sang about Bob Dylan.

Not sure which direction that points in, but surely it merits consideration.

Owen follows up:

I’d love to hear from Bob about what exactly it was that Eliot was fighting Pound for in the captains tower – I guess Eliot might even be able to tell us himself.

Or how about we just split the difference and get Ezra Pound to give the seminar instead: That might calm Andrew from “…Shouting, ‘Which side are you on?’”.

That’s raising the stakes from a fascist sympathizer to an actual fascist. Indeed, Pound would fit right into this competition, as an unseeded contestant in the Traitors category.

Along those lines, Manuel writes:

Please note that at some point in his career, 4th seed in “cool people” was also called like 1st seed in “traitors”, which I guess put him into the “namesakes” category.

Hey—I forgot about that story, the concert where Bob Dylan was called a traitor. That ices it for me. Being in 3 categories has to count for something, so Bob ftw.

Today’s matchup

It’s The Lorax versus The Wire. Top seed in the Children’s book authors category versus an unseeded namesake. It all comes down to whadda want: Acting or Rhyming?

Again, here are the announcement and the rules.

WikiGuidelines: A new way to set up guidelines for medical practices to better acknowledge uncertainty

John Williams points to this post by Brad Spellberg, Jaimo Ahn, Robert Centor, who write:

Clinical guidelines greatly influence how doctors care for their patients. . . . Regulators, insurance payers and lawyers can also use guidelines to manage a doctor’s performance, or as evidence in malpractice cases. Often, guidelines compel doctors to provide care in specific ways.

We are physicians who share a common frustration with guidelines based on weak or no evidence. We wanted to create a new approach to medical guidelines built around the humility of uncertainty, in which care recommendations are only made when data is available to support the care. In the absence of such data, guidelines could instead present the pros and cons of various care options.

This sounds like a great idea.

Spellberg et al. continue:

The clinical guidelines movement first began to gain steam in the 1960s. Guideline committees, usually composed of subspecialty experts from academic medical centers, would base care criteria on randomized clinical trials, considered the gold standard of empirical evidence.

So far so good, as long as we recognize that the so-called gold standard isn’t always so good: what with dropout etc., randomized trials have biases too, also there are the usual problems with summarizing studies based on statistical significance or lack thereof, and meta-analysis can be a disaster when averaging experiments of varying quality.

In any case, misinterpretation of randomized trials can be the least of our problems. Spellberg et al. write:

Unfortunately, many committees have since started providing answers to clinical questions even without data from high-quality clinical trials. Instead, they have based recommendations primarily on anecdotal experiences or low-quality data.

Interesting point. If doctors, payers, and regulators are wanting or expecting “guidelines,” then what sort of guidelines do you get when available knowledge is insufficient for strong guidelines? You can see the problem.

Spellberg et al. give several examples where rough-and-ready guidelines led to bad outcomes, and then they give their proposal, which is a more open-ended, and open, approach to creating consensus-based guidelines that recognize uncertainty:

To create a new form of medical guideline that takes the strength of available evidence for a particular practice into account, we gathered 60 other physicians and pharmacists from eight countries on Twitter to draft the first WikiGuideline. Bone infections were voted as the conditions most in need of new guidelines.

We all voted on seven questions about bone infection diagnosis and management to include in the guideline, then broke into teams to generate answers. Each volunteer searched the medical literature and drafted answers to a clinical question based on the data. These answers were repeatedly revised in open dialogue with the group.

These efforts ultimately generated a document with more than 500 references and provided clarity to how providers currently manage bone infections. Of the seven questions we posed, only two had sufficient high-quality data to make a “clear recommendation” on how providers should treat bone infection. The remaining five questions were answered with reviews that provided pros and cons of various care options.

They contrast to the existing standard approach to medical guidelines:

The recommendations WikiGuidelines arrived at differ from current bone infection guidelines by professional group for medical specialists. For example, WikiGuidelines makes a clear recommendation to use oral antibiotics for bone infections based on numerous randomized controlled trials. Current standard guidelines, however, recommend giving intravenous antibiotics, despite the evidence that giving treatment orally is not only just as effective as giving it intravenously, but is also safer and results in fewer side effects.

Interesting. I recognize that I’m hearing only one side of the story. That said, their argument sounds persuasive to me, and of course I’m generally receptive to statements about the importance of recognizing uncertainty.

So I hope this WikiGuidelines for medicine works out and that similar procedures can be done in other fields. The next question will be, who’s on the committee? It could be that it’s easier to get a nonaligned committee for a guideline on bone infections than for something more controversial such as nudge interventions.

Bob Dylan (4) vs. T. S. Eliot; Helmsley advances

Tom has no idea what joker put seven dog lice in my Iraqi fez box, but he gives a class-based analysis in favor of the man from Ithaca:

[Leona’s] use of the phrase ‘little people’ would surely make the audience want to rise up and overthrow the speaker, voiding the seminar. I will vote for the person who at least reminds me that ‘a working class hero is something to be’.

Ben follows up:

I’m not impressed by the Wikipedia article on Leona. Maybe a good character for a Knives Out #3 but doubt there’s much interesting for a seminar — so you have a lot of money and are a mean weirdo? Yawn.

For John I ended up reading the whole 404 error page on his website. If he can do a compelling 404 page, I’m sure he can swing a seminar. I also bought a couple books. John Lennon hands down.

A compelling 404 page—that really is impressive! And John’s thoughts on collaboration suggest he could give a legitimately interesting academic seminar that I’d enjoy.

So I was all ready to move John into the next round, but then I thought I’d check what Ben had said . . . I went to John’s website, clicked on some links trying to reach the fabled 404 page, and instead came across this page, which reveals that John has a podcast!

There’s too many podcasters out there already, and the last thing we need is one more in our seminar series. So Leona it is, following our (previously unstated) No Podcasters rule.

Also, a good character for a Knives Out #3? That could be fun! Maybe Rian Johnson will show up too and we can get into a big debate about whether Brick is overrated.

Today’s matchup

Two poets. The fourth-seeded person in the “Cool people” category, vs. an overrated (in my opinion) author who is known by his initials. I really wish I’d chosen P. G. Wodehouse instead. When it comes to mid-twentieth-century upper-class twit British fascist-sympathizer authors, I prefer the funny one. Too late now, though!

So what’ll it be? It’s all right now, or fear in a handful of dust? Neither Bob nor Tom is known to be a particularly nice guy, but they’re two of the most quotable people around. We’d just be sitting there in the audience waiting for Bob to say, “tangled up in blue” or Tom to say, “reasons of race and religion combine to make any large number of free-thinking Jews undesirable.” What I’m saying is that either of them would have the potential to kill with new material, but either could also coast with a greatest-hits set, and nobody would complain.

We’re relying on you, the loyal commenters, to provide the raw material for our decision on who (or, as Eliot would surely say, “whom”) to advance to the next round.

Again, here are the announcement and the rules.

Where does this quote come from? “I don’t trust anything until it runs. In fact, I don’t trust anything until it runs twice.”

Google the above quote and you’ll see it attributed to me. And it does sound like the kind of thing I might say. But I don’t recall actually having ever said it, and a search does not turn up any original appearance of this saying.

Does anyone know the original source? Is it something I said in a talk and have forgotten? Or maybe it was misattributed to me, but then who said it, and when?

P.S. I was the one who said it! Back in 2017. Commenter Ahm reports:

I put it in my “Ideas to Inflict Upon Co-authors and PhD Students” notebook after hearing it in the talk “Theoretical Statistics is the Theory of Applied Statistics: How to Think About What We Do” at the 2017 New York R Conference.

The talk is on Youtube (timestamp 27:51). Link here.

The full-ish quote is “Computer programing is basically the last bastion of rigor in the modern world. I don’t trust anything until it runs. In fact, until it runs twice on the computer.”

J. Robert Lennon (3) vs Leona Helmsley; Brutus advances

Jonathan offered a randomized decision rule:

Brutus is the Ohio State mascot. As I type this, they trail by 4 against Michigan. I invoke aleatoric uncertainty to let Brutus go through or not dependent on the outcome of his football team.

Michigan won, so that implies that Mo Willems should advance.

And Manuel offers a positive case:

I confess that I ignore everything about Elephant & Piggy, but I guess the slides with the child-like drawings will keep me awake. And Willems can talk about many other things.

Knuffle Bunny!!!

But Raghu ices it with a citation to the recent scientific literature:

“Let the pigeon drive the bus: pigeons can plan future routes in a room” — Brett Gibson, Matthew Wilkinson & Debbie Kelly, *Animal Cognition* 15:379-391 (2012). Link

From the Introduction: “In the beloved children’s book — ‘Don’t let the pigeon drive the bus!’ — a human bus driver pleads with the reader not to let a pigeon drive his bus while he takes a break (Williams 2003). On the one hand, the bus driver might have been concerned that the pigeon would not be able to safely drive the vehicle. On the other hand, maybe the driver was more concerned that the pigeon would not be able to take a route that would efficiently pick up all the passengers at the various stops throughout the city. … The task of determining an optimal route to several locations or nodes is called the traveling salesperson problem (TSP). … It remains unclear whether animals, other than non-human primates, are using rigid rule-based solutions when solving TSPs, or a more flexible solution…”

Having written this, I now vote for Brutus.

Agreed. No way do I want to go to a seminar and hear about the “traveling salesperson” problem.

Today’s matchup

John Lennon is an excellent novelist and author of the unforgettable New Sentences For The Testing Of Typewriters. Leona Helmsley is famous for saying, “only the little people pay taxes.” Either of them would probably have a lot of stories to tell; on the other hand, Lennon might be saving his best material for future books, and Helmsley might stay mum on her best stories as they would just incriminate her.

What joker put seven dog lice in my Iraqi fez box?

Again, here are the announcement and the rules.

“Science as a Vocation”

Paul Alper writes:

Max Weber gave a speech at Munich University in 1917, or possibly 1918 – the date is in dispute – while his country was about to lose World War I and a pandemic was about to engulf the world. His speech was blandly entitled, Science as a Vocation, and the written version first appeared in 1919. Here is a link to an English translation of his lecture. Not bad for one talk during WW I and a pandemic. Weber died in 1922 at the young age of 56 so he did not live long enough to fully appreciate how prescient his views were in the decade/decades to follow.

This really is a great lecture. I’d heard of Max Weber and I guess read one or two things by him, but I’d never read this one all the way through. Like the best lectures, it’s densely packed with ideas, kind of like a story that keeps introducing new characters to the drama before you have time to get tired of the ones who are already there.

I’d quote from Weber’s lecture but really you just have to read the whole thing. So please do so. It relates to many of the discussions we’ve had over the years regarding scientific culture, political science, the role of assumptions and modeling, scientific practice and scientific discovery, academic life, and lots more.

The bit about “intellectual sacrifice” near the end was great. Lots of secular academics do some intellectual sacrifice too, in order to play by the rules and not offend anyone powerful, or even internally, to not disturb their preconceived notions.

Brutus (2) vs. Mo Willems; Cleary advances

Our pals at McKinsey tell us that “The business case for diversity, equity, and inclusion is stronger than ever,” and Jonathan makes a strong DEI case for the initialed contestant in yesterday’s competition:

Foyt was known as Indy car driver, but he won 7 NASCAR races as well, as well as LeMans. That’s the sort of diversity you’re looking for in a speaker, right?

Also, since he’s still alive, he might live longer than Beverly Cleary; in other words, her status as a long-lived famous person only probabilistically exceeds Foyt’s chances in a right-censored distribution sense, so her xomparison there needs to be discounted. And he’s one tough dude when it comes to longevity.

Also, as Kurtis Blow pointed out all those years ago, A. J. is cool. Don’t you love America, my favorite country?

But Ben then upset the applecart with an important revelation:

I was leaning Foyt for the racing stories. Racing seems like a thing of the past. There are so many cars around these days and I don’t wanna get run over by someone hotrodding! Listening to racing stories as some sort of nostalgia trip seems about right.

Then I learned Beverly Cleary authored the Mouse and the Motorcycle — a definitive autosports book! A. J. Foyt may be a real racer, but as a consumer I’m more interested in racing as a fantasy. Beverly Cleary has a better track record here.

Bev it is. It was A. J.’s misfortune to go up against one of the few other speaker candidates with motorsports experience.

Today’s matchup

The #2 traitor of all time—what can you say, dude’s an absolute legend!—up against the author of some modern classics of the kids-book genre, along with some real stinkers. (I’m looking at you, Elephant and Piggy!) It all comes down to two questions:
1. How do yo feel about regicide?
2. Will you let the pigeon drive the bus?
Let the strongest and most amusing arguments win!

Again, here are the announcement and the rules.

When a conceptual tool is used as a practical tool (Venn diagrams edition)

Everyone’s seen Venn diagrams so they’re a great entry to various general issues in mathematics and its applications.

The other day we discussed the limitations of Venn diagrams with more than 3 circles as an example of our general failures of intuitions in high dimensions.

The comment thread from that post featured this thoughtful reflection from Eric Neufeld:

It’s true that Venn diagrams are not widely applicable. But thinking about this for a few days, suggests to me that Venn diagrams play a role similar to truth tables in propositional logic. We can quickly establish the truth of certain tautologies, mostly binary or ternary, with truth tables, and from there move to logical equivalences. And so on. But in a foundation sense, we use the truth tables to assert certain foundational elements and build from there.

Something identical happens with Venn diagrams. A set of basic identifies can be asserted and subsequently generalized to more widely applicable identifies.

Some find it remarkable that all of logic can be seen as resting on purely arbitrary definitions of two or three primitive truth tables (usually and, or and not). Ditto, the core primitives of sets agree with intuition using Venn diagrams. No intuition for gigantic truth tables or multidimensional Venn diagrams.

That’s an interesting point and it got me thinking. Venn diagrams are a great way to teach inclusion/exclusion in sets, and the fact that they can be cleanly drawn with one, two, or three binary factors underlines the point that inclusion/exclusion with interactions is a general idea. It’s great that Venn diagrams are taught in schools, and if you learn them and mistakenly generalize and imagine that you could draw complete Venn diagrams with 4 or 5 or more circles, that’s kind of ok: you’re getting it wrong with regard to these particular pictures—there’s no way to draw 5 circles that will divide the plane into 32 pieces—but you’re correct in the larger point that all these subsets can be mathematically defined and represent real groups of people (or whatever’s being collected in these sets).

Where the problem comes up is not in the use of Venn diagrams as a way to teach inclusions, unions, and intersections of sets. No, the bad stuff happens when they’re used as a tool for data display. Even in the three-circle version, there’s the difficulty that the size of the region doesn’t correspond to the number of people in the subset—and, yes, you can do a “cartogram” version but then you lose the clear “Venniness” of the three-circle image. The problem is that people have in their minds that Venn diagrams are the way to display interactions of sets, and so they try to go with that as a data display, come hell or high water.

This is a problem with statistical graphics, that people have a few tools that they’ll use over and over. Or they try to make graphs beautiful without considering what comparisons are being facilitated. Here’s an example in R that I pulled off the internet.

Yes, it’s pretty—but to learn anything from this graph (beyond that there are high numbers in some of the upper cells of the image) would take a huge amount of work. Even as a look-up table, the Venn diagram is exhausting. I think an Upset plot would be much better.

And then this got me thinking about a more general issue, which is when a wonderful conceptual tool is used as an awkward practical tool. A familiar example to tech people of a certain age would be the computer language BASIC, which was not a bad way for people to learn programming, back in the day, but was not a great language for writing programs for applications.

There must be many other examples of this sort of thing: ideas or techniques that are helpful for learning the concepts but then people get into trouble by trying to use them as practical tools? I guess we could call this, Objects of the class Venn diagrams—if we could just think of a good set of examples.

A different Bayesian World Cup model using Stan (opportunity for model checking and improvement)

Maurits Evers writes:

Inspired by your posts on using Stan for analysing football World Cup data here and here, as well as the follow-up here, I had some fun using your model in Stan to predict outcomes for this year’s football WC in Qatar. Here’s the summary on Netlify. Links to the code repo on Bitbucket are given on the website.

Your readers might be interested in comparing model/data/assumptions/results with those from Leonardo Egidi’s recent posts here and here.

Enjoy, soccerheads!

P.S. See comments below. Evers’s model makes some highly implausible predictions and on its face seems like it should not be taken seriously. From the statistical perspective, the challenge is to follow the trail of breadcrumbs and figure out where the problems in the model came from. Are they from bad data? A bug in the code? Or perhaps a flaw in the model so that the data were not used in the way that were intended? One of the great things about generative models is that they can be used to make lots and lots of predictions, and this can help us learn where we have gone wrong. I’ve added a parenthetical to the title of this post to emphasize this point. Also good to be reminded that just cos a method uses Bayesian inference, that doesn’t mean that its predictions make any sense! The output is only as good as its input and how that input is processed.

Beverly Cleary (2) vs. A. J. Foyt; Chvátil advances

From yesterday, Jonathan succinctly summarizes the case for Fawkes.

Chvátil makes the rules, Fawkes breaks the rules. Making is harder, but breaking is the rock to its scissors.

+1 for the rosham reference.

Raghu makes the case based on audience interest. I’ll have to share his whole story here:

A few years ago, my [Raghu’s] younger son made a diorama of the houses of parliament with space below for Guy Fawkes’ gunpowder for an elementary school assignment in which they had to say something about a holiday. I took a photo of this. This is of interest, of course, to no one but me. However: I thought about this today and wanted to find the photo; I couldn’t remember when it was taken. I typed the word “cardboard” into Google Photos and, like magic, it came up (along with about 10 other photos I’ve taken of cardboard in its many manifestations). Squabbling about AI comes up a lot on this blog, hence this comment. I was stunned, even though I know how this works. The cleverness of machine learning algorithms and the sheer volume of training data is really amazing.

+1 for referring to recent blog discussions.

And Anonymous gives a strong argument against Chvátil:

If I asked a bunch of people who created code names, half of them would say: “Is it on Netflix?” and the other half would say no. There would be one guy who would say “Vladimir something” and a tiny percentage of people who would no. All in all, Guy Fawkes would be interesting while Vladimir is so overrated.

But . . . how can Vladimir be overrated if almost nobody has heard of him? You can’t have it both ways, Anon!

The deciding comment, though, comes from bbis:

Rules are made to be broken and rulers to be blown up. One suggestion would be to go with ‘da bomb’. While that might be a blast, it would probably end too quickly. Also, if you want to be a guy outstanding in your field and invited to give important seminars, you shouldn’t be tunneling under London. Chvatil may be able to discuss how to get a good balance between rigidity and flexibility in rules to get good outcomes.

I’d like to hear about that balance! Also, yeah, tunneling under London, not cool for a seminar.

Today’s matchup

The Sage of Klickitat Street vs. an Indy racing legend. Beverly Cleary was one of the longest-lived famous people ever; A. J. Foyt could drive really really fast. Neither of these things is particularly relevant for a seminar, but both of them would probably have some good stories to share.

Again, here are the announcement and the rules.