Skip to content

Causal inference data challenge!

Susan Gruber, Geneviève Lefebvre, Tibor Schuster, and Alexandre Piché write:

The ACIC 2019 Data Challenge is Live!
Datasets are available for download (no registration required) at (bottom of the page).
Check out the FAQ at
The deadline for submitting results is April 15, 2019.

The fourth Causal Inference Data Challenge is taking place as part of the 2019 Atlantic Causal Inference Conference (ACIC) to be held in Montreal, Canada
( The data challenge focuses on computational methods of inferring causal effects from quasi-real world data. This year there are two tracks: low dimensional and high dimensional data. Participants will analyze 3200 datasets in either Track 1 or Track 2 to estimate marginal additive treatment effects and associated 95% confidence intervals. Entries will be evaluated with respect to bias, variance, mean squared error, and confidence interval coverage across a variety of data generating processes.

I’m not a big fan of 95% intervals, and I am aware of the general problems arising from this sort of competition: the problems in the contest are not necessarily similar to the problems to which a particular method might be applied. That said, Jennifer has assured me that she and others learned a lot from the results of previous competitions in this series, so on that basis I encourage all of you to take a look and check out this one.

M. F. K. Fisher (1) vs. Serena Williams; Oscar Wilde advances

The best case yesterday was made by Manuel:

Leave Joe Pesci at home alone. Wilde’s jokes may be very old, but he can use slides from The PowerPoint of Dorian Gray.

As Martha put it, not great, but the best so far in this thread.

On the other side, Jonathan wrote, “I’d definitely rather hear Wilde, but I hate it when speakers aren’t live, and the video connections with Reading Gaol are lousy.”—which wasn’t bad—but then he followed it up with, “Please, though. No Frankie Valli stories.” If even the best Pesci endorsement is so lukewarm, we’ll have to go with Oscar to face off against hot dog guy in the next round.

Today our contest features the #1 food writer of all time vs. an unseeded GOAT. I’ve never actually read anything by M. F. K. Fisher but the literary types rave about her, hence her top seeding in that category. As for Serena Williams, I did go to the U.S. Open once but only to see some of those free qualifying rounds. So this particular matchup is a bit of a mystery to me. Whaddya got?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Data partitioning as an essential element in evaluation of predictive properties of a statistical method

In a discussion of our stacking paper, the point came up that LOO (leave-one-out cross validation) requires a partitioning of data—you can only “leave one out” if you define what “one” is.

It is sometimes said that LOO “relies on the data-exchangeability assumption,” but I don’t think that’s quite the right way to put it, but LOO does assume the relevance of a data partition. We discuss this briefly in section 3.5 of this article. For regular Bayes, p(theta|y) proportional to p(y|theta) * p(theta), there is no partition of data. “y” is just a single object. But for loo, y can be partitioned. At first this bothered me about loo, but then I decided that this is a fundamental idea, related to the idea of “internal replication” discussed by Ripley in his spatial statistics book. The idea is that with just “y” and no partitions, there is no internal replication and no statistically general way of making reliable statements about new cases.

This is similar to (but different from) the distinction in chapter 6 of BDA between the likelihood and the sampling distribution. To do inference for a given model, all we need from the data is the likelihood function. But to do model checking, we need the sampling distribution, p(y|theta), which implies a likelihood function but requires more assumptions (as can be seen, for example, in the distinction between binomial and negative binomial sampling). Similarly, to do inference for a given model, all we need is p(y|theta) with no partitioning of y, but to do predictive evaluation we need a partitioning.

Oscar Wilde (1) vs. Joe Pesci; the Japanese dude who won the hot dog eating contest advances

Raghuveer gave a good argument yesterday: “The hot dog guy would eat all the pre-seminar cookies, so that’s a definite no.” But this was defeated by the best recommendation we’ve ever had in the history of the Greatest Seminar Speaker contest, from Jeff:

Garbage In, Garbage Out: Mass Consumption and Its Aftermath
Takeru Kobayashi

Note: Attendance at both sessions is mandatory.

Best. Seminar. Ever.

So hot dog guy is set to go to the next round, against today’s victor.

It’s the wittiest man who ever lived, vs. an unseeded entry in the People from New Jersey category. So whaddya want: some 125-year-old jokes, or a guy who probably sounds like a Joe Pesci imitator? You think I’m funny? I’m funny how, I mean funny like I’m a clown, I amuse you?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Does Harvard discriminate against Asian Americans in college admissions?

Sharad Goel, Daniel Ho and I looked into the question, in response to a recent lawsuit. We wrote something for the Boston Review:

What Statistics Can’t Tell Us in the Fight over Affirmative Action at Harvard

Asian Americans and Academics

“Distinguishing Excellences”

Adjusting and Over-Adjusting for Differences

The Evolving Meaning of Merit

Character and Bias

A Path Forward

The Future of Affirmative Action

Carol Burnett (4) vs. the Japanese dude who won the hot dog eating contest; Albert Brooks advances

Yesterday was a tough matchup, but ultimately John “von” Neumann was no match for a very witty Albert Einstein.

The deciding argument, from Martha:

I’d like to see Von Neumann given four parameters and making an elephant wiggle his trunk. And if he could do it, there would be the chance that Jim Thorpe could do it if they met in a later round.

No way do I think that Neumann could fit that elephant. As I wrote earlier, that elephant quote just seems like bragging! For one thing, I can have a model with a lot more than five parameters and still struggle to fit my data.

I almost want to invite Neumann to speak, just so we can put him on the spot, ask him to fit the damn elephant, and watch him fail. But that’s not cool, to invite a speaker just for the purpose of seeing him crash and burn. That way lies madness.

Today’s contest features two unique talents. Carol Burnett was the last of the old-time variety-show hosts, she can sing, she can dance, and according to Wikipedia, she was “the first celebrity to appear on the children’s series Sesame Street.” But she’s facing stiff competition, from the Japanese dude who won the hot dog eating contest. That’s an accomplishment, to have done something so impressive that this one feat defines you. So I think that whoever advances to the next round will be a strong competitor. Neither Carol Burnett nor the Japanese dude who won the hot dog eating contest are top seeds, but both of them are interesting dark horse candidates.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Storytelling: What’s it good for?

A story can be an effective way to send a message. Anna Clemens explains:

Why are stories so powerful? To answer this, we have to go back at least 100,000 years. This is when humans started to speak. For the following roughly 94,000 years, we could only use spoken words to communicate. Stories helped us survive, so our brains evolved to love them.

Paul Zak of the Claremont Graduate University in California researches what stories do to our brain. He found that once hooked by a story, our brain releases oxytocin. The hormone affects our mood and social behaviour. You could say stories are a shortcut to our emotions.

There’s more to it; stories also help us remember facts. Gordon Bower and Michal Clark from Stanford University in California let two groups of subjects remember random nouns. One group was instructed to create a narrative with the words, the other to rehearse them one by one. People in the story group recalled the nouns correctly about six to seven times more often than the other group.

But my collaborator Thomas Basboll is skeptical:

It seems to me that a paper that has been written to mimic the most compelling features of Hollywood blockbusters (which Anna explicitly invokes) is also, perhaps unintentionally, written to avoid critical engagement. Indeed, when Anna talks about “characters” she does not mention the reader as a character in the story, even though the essential “drama” of any scientific paper stems from the conversation that reader and writer are implicitly engaged in. The writer is not simply trying to implant an idea in the mind of the reader. In a research paper, we are often challenging ideas already held and, crucially, opening our own thinking to those ideas and the criticism they might engender.

Basboll elaborates:

Anna promises that storytelling can produce papers that are “concise, compelling, and easy to understand”. But I’m not sure that a scientific paper should actually be compelling. . . . A scientific paper should be vulnerable to criticism; it should give its secrets away freely, unabashedly. And the best way to do that is, not to organise it with the aim of releasing oxytocin in the mind of the reader, but by clearly identifying your premises and your conclusions and the logic that connects them. You are not trying to bring your reader to a narrative climax. You are trying to be upfront about where your argument will collapse under the weight of whatever evidence the reader may bring to the conversation. Science, after all, is not so much about what Coleridge called “the suspension of disbelief” as what Merton called “organised skepticism”.

In our article from a few years ago, Basboll and I wrote about how we as scientists learn from stories. In discourse about science communication, stories are typically presented as a way for scientists to frame, explain, and promote their already-formed ideas; in our article, Basboll and I looked from a different direction, considering how it is that scientists can get useful information from stories. We concluded that stories are a form of model checking, that a good story expresses true information that contradicts some existing model of the world.

Basboll’s above exchange with Clemens is interesting in a different way: Clemens is saying that stories are an effective way to communicate because they compelling and memorable. Basboll replies that science shouldn’t always be compelling: so much of scientific work is mistakes, false starts, blind alleys, etc., so you want the vulnerabilities of any scientific argument to be clear.

The resolution, I suppose, is to use stories—but not in a way that hides the potential weaknesses of a scientific argument. Instead, harness the power of storytelling to make it easier for readers to spot the flaws.

The point is that there are two dimensions to scientific communication:

1. The medium of expression. Storytelling can be more effective than a dry sequence of hypothesis, data, results, conclusion.

2. The goal of communication. Instead of presenting a wrapped package of perfection, our explanation should have lots of accessible points: readers should be able to pull the strings so the arguments can unravel, if that is possible.

P.S. More on this from Basboll here.

Coursera course on causal inference from Michael Sobel at Columbia

Here’s the description:

This course offers a rigorous mathematical survey of causal inference at the Master’s level. Inferences about causation are of great importance in science, medicine, policy, and business. This course provides an introduction to the statistical literature on causal inference that has emerged in the last 35-40 years and that has revolutionized the way in which statisticians and applied researchers in many disciplines use data to make inferences about causal relationships. We will study methods for collecting data to estimate causal relationships. Students will learn how to distinguish between relationships that are causal and non-causal; this is not always obvious. We shall then study and evaluate the various methods students can use — such as matching, sub-classification on the propensity score, inverse probability of treatment weighting, and machine learning — to estimate a variety of effects — such as the average treatment effect and the effect of treatment on the treated. At the end, we discuss methods for evaluating some of the assumptions we have made, and we offer a look forward to the extensions we take up in the sequel to this course.

Last year Bob Carpenter and I started to put together a Coursera course on Bayesian statistics and Stan, but we ended up deciding we weren’t quite ready to do so. In any case, causal inference is a (justly) popular topic, and I expect that this online version of Michael’s course at Columbia will be good.

John van Neumann (3) vs. Albert Brooks; Paul Erdos advances

We had some good arguments on both sides yesterday.

For Erdos, from Diana Senechal:

From an environmental perspective, Erdos is the better choice; his surname is an adjectival form of the Hungarian erdő, “forest,” whereas “Carson” clearly means “son of a car.” Granted, the son of a car, being rebellious and all, might prove especially attentive to the quality of the air, but we have no evidence of this.

On the other side Stephen Oliver had an excellent practical point:

Johnny Carson, because if Erdos gave a talk it would be overrun by mathematicians trying to get a paper with him.

But I had to call it for Erdos after this innovative argument from Ethan Bolker, who said, “I have a good argument for Erdos but will save it for a later round. If he loses this one you’ll never know . . .” I think you can only use that ploy once ever—but he used it!

Our next bout features two people who changed their own names. In one corner, one of the most brilliant mathematicians of all time, but a bit of a snob who enjoyed hobnobbing with government officials and apparently added “von” to his name to make himself sound more upper-class. In the other corner, a very funny man who goes by “Brooks” because he didn’t feel like going through life with the name Albert Einstein.

From what I’ve read about von Neumann, I find him irritating and a bit of a braggart. But, if we want to go negative, we can get on Brooks’s case for not fulfilling his early comedic promise. So maybe we should be looking for positive things to say about these two guys.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

How post-hoc power calculation is like a shit sandwich

Damn. This story makes me so frustrated I can’t even laugh. I can only cry.

Here’s the background. A few months ago, Aleksi Reito (who sent me the adorable picture above) pointed me to a short article by Yanik Bababekov, Sahael Stapleton, Jessica Mueller, Zhi Fong, and David Chang in Annals of Surgery, “A Proposal to Mitigate the Consequences of Type 2 Error in Surgical Science,” which contained some reasonable ideas but also made a common and important statistical mistake.

I was bothered to see this mistake in an influential publication. Instead of blogging it, this time I decided to write a letter to the journal, which they pretty much published as is.

My letter went like this:

An article recently published in the Annals of Surgery states: “as 80% power is difficult to achieve in surgical studies, we argue that the CONSORT and STROBE guidelines should be modified to include the disclosure of power—even if <80%---with the given sample size and effect size observed in that study”. This would be a bad idea. The problem is that the (estimated) effect size observed in a study is noisy, especially so in the sorts of studies discussed by the authors. Using estimated effect size can give a terrible estimate of power, and in many cases can lead to drastic overestimates of power . . . The problem is well known in the statistical and medical literatures . . . That said, I agree with much of the content of [Bababekov et al.] . . . I appreciate the concerns of [Bababekov et al.] and I agree with their goals and general recommendations, including their conclusion that “we need to begin to convey the uncertainty associated with our studies so that patients and providers can be empowered to make appropriate decisions.” There is just a problem with their recommendation to calculate power using observed effect sizes.

I was surgically precise, focusing on the specific technical error in their paper and separating this from their other recommendations.

And the letter was published, with no hassle! Not at all like my frustrating experience with the American Sociological Review.

So I thought the story was over.

But then my blissful slumber was interrupted when I received another email from Reito, pointing to a response in that same journal by Bababekov and Chang to my letter and others. Bababekov and Chang write:

We are greatly appreciative of the commentaries regarding our recent editorial . . .

So far, so good! But then:

We respectfully disagree that it is wrong to report post hoc power in the surgical literature. We fully understand that P value and post hoc power based on observed effect size are mathematically redundant; however, we would point out that being redundant is not the same as being incorrect. . . . We also respectfully disagree that knowing the power after the fact is not useful in surgical science.

No! My problem is not that their recommended post-hoc power calculations are “mathematically redundant”; my problem is that their recommended calculations will give wrong answers because they are based on extremely noisy estimates of effect size. To put it in statistical terms, their recommended method has bad frequency properties.

I completely agree with the authors that “knowing the power after the fact” can be useful, both in designing future studies and in interpreting existing results. John Carlin and I discuss this in our paper. But the authors’ recommended procedure of taking a noisy estimate and plugging it into a formula does not give us “the power”; it gives us a very noisy estimate of the power. Not the same thing at all.

Here’s an example. Suppose you have 200 patients: 100 treated and 100 control, and post-operative survival is 94 for the treated group and 90 for the controls. Then the raw estimated treatment effect is 0.04 with standard error sqrt(0.94*0.06/100 + 0.90*0.10/100) = 0.04. The estimate is just one s.e. away from zero, hence not statistically significant. And the crudely estimated post-hoc power, using the normal distribution, is approximately 16% (the probability of observing an estimate at least 2 standard errors away from zero, conditional on the true parameter value being 1 standard error away from zero). But that’s a noisy, noisy estimate! Consider that effect sizes consistent with these data could be anywhere from -0.04 to +0.12 (roughly), hence absolute effect sizes could be roughly between 0 and 3 standard errors away fro zero, corresponding to power being somewhere between 5% (if the true population effect size happened to be zero) and 97.5% (if the true effect size were three standard errors from zero). That’s what I call noisy.

Here’s an analogy that might help. Suppose someone offers me a shit sandwich. I’m not gonna want to eat it. My problem is not that it’s a sandwich, it’s that it’s filled with shit. Give me a sandwich with something edible inside; then we can talk.

I’m not saying that the approach that Carlin and I recommend—performing design analysis using substantively-based effect size estimates—is trivial to implement. As Bababekov and Chang write in their letter, “it would be difficult to adapt previously reported effect sizes to comparative research involving a surgical innovation that has never been tested.”

Fair enough. It’s not easy, and it requires assumptions. But that’s the way it works: if you want to make a statement about power of a study, you need to make some assumption about effect size. Make your assumption clearly, and go from there. Bababekov and Chang write: “As such, if we want to encourage the reporting of power, then we are obliged to use observed effect size in a post hoc fashion.” No, no, and no. You are not obliged to use a super-noisy estimate. You were allowed to use scientific judgment when performing that power analysis you wrote for your grant proposal, before doing the study, and you’re allowed to use scientific judgment when doing your design analysis, after doing the study.

The whole thing is so frustrating.

Look. I can’t get mad at the authors of this article. They’re doing their best, and they have some good points to make. They’re completely right that authors and researchers should not “misinterpret P > 0.05 to mean comparison groups are equivalent or ‘not different.'” This is an important point that’s not well understood; indeed my colleagues and I recently wrote a whole paper on the topic, actually in the context of a surgical example. Statistics is hard. The authors of this paper are surgeons and health policy researchers, not statisticians. I’m a statistician and I don’t know anything about surgery; no reason to expect these two surgeons to know anything about statistics. But, it’s still frustrating.

P.S. After writing the above post a few months ago, I submitted it (without some features such as the “shit sandwich” line) as a letter to the editor of the journal. To its credit, the journal is publishing the letter. So that’s good.

Johnny Carson (2) vs. Paul Erdos; Babe Didrikson Zaharias advances

OK, our last matchup wasn’t close. Adam Schiff (unseeded in the “people whose name ends in f” category) had the misfortune to go against the juggernaut that was Babe Didrikson Zaharias (seeded #2 in the GOATs category). Committee chair or not, the poor guy never had a chance. As Diana Senechal wrote, “From an existential standpoint, If Schiff won this match, life would be absurd. Perhaps it is, but I still look for interludes of logic and meaning: for instance, right here. Let this battle be such an interlude, and let Babe claim the victory she deserves.”

Next up is Johnny Carson #2 in the TV personalities category and arguably the best talk-show host ever, against Paul Erdos, one of the weirdest and prolific mathematicians of all time. I’m guessing that the commenters here will side with Erdos, but I dunno. From everything I’ve read about Erdos, he’s always seemed irritating to me. In some ways, I can relate to the guy: like me, he liked to solve research problems with lots of different collaborators, but there’s something about all those indulgent descriptions of the guy that rub me the wrong way. In contrast, Johnny Carson is just brilliant. But, in any case, it’s up to you, not me, to give the most compelling arguments on both sides.

Remember, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

This is one offer I can refuse

OK, so this came in the email today:

Dear Contributor,


[978 1 78347 485 1]

Regular price: $455.00

Special Contributor price: $113.75 (plus shipping)

We are pleased to announce the publication of the above title. Due to the limited print run of this collection and the high number of contributing authors, we are unable to offer a complimentary copy. In recognition of your contribution, however, we are delighted to offer you one copy of this title at a discount of 75% off the list price (excluding postage and packing). Please note that these purchases should be for personal use and not for resale.

If you would like to take advantage of this offer, please visit our website at the link below. To receive your 75% discount on one copy of this title enter ( FRANZESE75 ) in the discount code field during checkout.


You can also purchase further copies of this title and other titles from the Elgar list at a 50% author discount.

As a thank you to our authors and contributors, Edward Elgar Publishing offers a 50% discount on all titles. Orders must be prepaid and are for personal use only. To take advantage of this offer at any time, please enter the discount code ‘EEAUTHOR’ on the payment page of our website: Please note only one discount code is allowed per order. Any further questions please feel free to contact us.

With best wishes,

Research Collections Department
Edward Elgar Publishing

Independent Publisher of the Year 2017- Independent Publishers Guild
Academic & Professional Publisher of the Year 2017 & 2014 – Independent Publishers Guild
Digital Publisher of the Year 2015 – Independent Publishers Guild
Independent, Academic, Educational and Professional Publisher of the Year 2014 & 2013 – The Bookseller

Wow, a mere $113.75 (plus shipping), huh? I guess that’s what it takes to be named Digital Publisher of the Year.

Also, I just love it that this extremely-low price of $113.75 excludes “postage and packing.” No free lunches here, no siree!

New blog hosting!

Hi all. We’ve been having some problems with the blog caching, so that people were seeing day-old versions of the posts and comments. We moved to a new host and a new address,, and all should be better.

Still a couple glitches, though. Right now it doesn’t seem to be possible to comment. We hope to get that fixed soon (unfortunately it’s Friday evening and I don’t know if anyone’s gonna look at it over the weekend), will let you know when comments work again. Regularly scheduled posts will continue to appear.

Comments work too now!

NYC Meetup Thursday: Under the hood: Stan’s library, language, and algorithms

I (Bob, not Andrew!) will be doing a meetup talk this coming Thursday in New York City. Here’s the link with registration and location and time details (summary: pizza unboxing at 6:30 pm in SoHo):

After summarizing what Stan does, this talk will focus on how Stan is engineered. The talk follows the organization of the Stan software.

Stan math library: differentiable math and stats functions, template metaprorgrams to manage constants and vectorization, matrix derivatives, and differential equation derivatives.

Stan language: block structure and execution, unconstraining variable transforms and automatic Jacobians, transformed data, parameters, and generated quantities execution.

Stan algorithms: Hamiltonian Monte Carlo and the no-U-turn sampler (NUTS), automatic differentiation variational inference (ADVI).

Stan infrastructure and process: Time permitting, I can also discuss Stan’s developer process, how the code repositories are organized, and the code review and continuous integration process for getting new code into the repository

Becker on Bohm on the important role of stories in science

Tyler Matta writes:

During your talk last week, you spoke about the role of stories in scientific theory. On page 104 of What Is Real: The Unfinished Quest for the Meaning of Quantum Physics, Adam Becker talks about stories and scientific theory in relation to alternative conceptions of quantum theory, particularly between Bohm’s pilot-wave interpretation and Bohr’s Copenhagen interpretation:

The picture of the world that comes along with a physical theory is an important component of that theory. Two theories that are identical in their predictions can have wildly different pictures of the world… and those pictures, in turn, determine a lot about the daily practice of science… The story that comes along with a scientific theory influences the experiments that scientists choose to perform, the way new evidence is evaluated, and ultimately, guides the search for new theories as well.

Anyways, I just wanted to share the passage as I think Becker has done a nice job of connecting the two.

A lot of things came up in my talk, but at the beginning I did discuss how in science we learn from stories. For researchers, stories for scientists are not just a way for us to vividly convey our findings to others. Stories also frame our understanding of the world. I discussed the idea of stories being anomalous and immutable (see second link above for more on this); the above Becker quote is interesting in that it captures the importance of story-like structures in our understanding as well as in our communication.

Babe Didrikson Zaharias (2) vs. Adam Schiff; Sid Caesar advances

And our noontime competition continues . . .

We had some good arguments on both sides yesterday.

Jonathan writes:

In my experience, comedians are great when they’re on-stage and morose and unappealing off-stage. Sullivan, on the other hand, was morose and unappealing on-stage, and witty and charming off-stage, or so I’ve heard. This comes down, then, to deciding whether the speaker treats the seminar as a stage or not. I don’t think Sullivan would, because it’s not a “rilly big shew.”

That’s some fancy counterintuitive reasoning: Go with Sullivan because he won’t take it seriously so his pleasant off-stage personality will show up.

On the other hand, Zbicyclist goes with the quip:

Your Show of Shows -> Your Seminar of Seminars.

Render unto Caesar.

I like it. Sid advances.

For our next contest, things get more interesting. In one corner, the greatest female athlete of all time, an all-sport trailblazer. In the other, the chairman of the United States House Permanent Select Committee on Intelligence, who’s been in the news lately for his investigation of Russian involvement in the U.S. election. He knows all sorts of secrets.

If the seminar’s in the statistics department, Babe, no question. For the political science department, it would have to be Adam. But this is a university-wide seminar (inspired by this Latour-fest, remember?), so I think they both have a shot.

MRP (multilevel regression and poststratification; Mister P): Clearing up misunderstandings about

Someone pointed me to this thread where I noticed some issues I’d like to clear up:

David Shor: “MRP itself is like, a 2009-era methodology.”

Nope. The first paper on MRP was from 1997. And, even then, the component pieces were not new: we were just basically combining two existing ideas from survey sampling: regression estimation and small-area estimation. It would be more accurate to call MRP a methodology from the 1990s, or even the 1970s.

Will Cubbison: “that MRP isn’t a magic fix for poor sampling seems rather obvious to me?”

Yep. We need to work on both fronts: better data collection and better post-sampling adjustment. In practice, neither alone will be enough.

David Shor: 2012 seems like a perfect example of how focusing on correcting non-response bias and collecting as much data as you can is going to do better than messing around with MRP.

There’s a misconception here. “Correcting non-response bias” is not an alternative to MRP; rather, MRP is a method for correcting non-response bias. The whole point of the “multilevel” (more generally, “regularization”) in MRP is that it allows us to adjust for more factors that could drive nonresponse bias. And of course we used MRP in our paper where we showed the importance of adjusting for non-response bias in 2012.

And “collecting as much data as you can” is something you’ll want to do no matter what. Yair used MRP with tons of data to understand the 2018 election. MRP (or, more generally, RRP) is a great way to correct for non-response bias using as much data as you can.

Also, I’m not quite clear what was meant by “messing around” with MRP. MRP is a statistical method. We use it, we don’t “mess around” with it, any more than we “mess around” with any other statistical method. Any method for correcting non-response bias is going to require some “messing around.”

In short, MRP is a method for adjusting for nonresponse bias and data sparsity to get better survey estimates. There are other ways of getting to basically the same answer. It’s important to adjust for as many factors as possible and, if you’re going for small-area estimation with sparse data, that you use good group-level predictors.

MRP is a 1970s-era method that still works. That’s fine. Least squares regression is a 1790s-era method, and it still works too! In both cases, we continue to do research to improve and better understand what we’re doing.

Ed Sullivan (3) vs. Sid Caesar; DJ Jazzy Jeff advances

Yesterday’s battle (Philip Roth vs. DJ Jazzy Jeff) was pretty low-key. It seems that this blog isn’t packed with fans of ethnic literature or hip-hop. Nobody in comments even picked up on my use of the line, “Does anyone know these people? Do they exist or are they spooks?” Isaac gave a good argument in favor of Roth: “Given how often Uncle Phil threw DJ Jazzy Jeff out of the house, it seems like he should win here,” but I’ll have to give it to Jazz, based on Jrc’s comment: “From what I hear, Roth was only like the 14th coolest Jew at Weequahic High School (which, by my math, makes him about the 28th coolest kid there). And we all know DJ Jazzy Jeff was the second coolest kid at Bel-Air Academy.” Good point.

Our next contest features two legendary TV variety show hosts who, at the very least, can tell first-hand stories about Elvis Presley, the Beatles, Mel Brooks, Woody Allen, and many others. Should be fun.

The full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Reproducibility and Stan

Aki prepared these slides which cover a series of topics, starting with notebooks, open code, and reproducibility of code in R and Stan; then simulation-based calibration of algorithms; then model averaging and prediction. Lots to think about here: there are many aspects to reproducible analysis and computation in statistics.

Philip Roth (4) vs. DJ Jazzy Jeff; Jim Thorpe advances

For yesterday’s battle (Jim Thorpe vs. John Oliver), I’ll have to go with Thorpe. We got a couple arguments in Oliver’s favor—we’d get to hear him say “Whot?”, and he’s English—but for Thorpe we heard a lot more, including his uniqueness as greatest athlete of all time, and that we could save money on the helmet if that were required. We also got the following bad reason: “the chance to hear him say, ‘I’ve been asked to advise those of you who are following this talk on social media, whatever that means, to use “octothorpe talktothorpe.”‘” Even that bad reason ain’t so bad, also it’s got 3 levels of quotation nesting, which counts for something right there. What iced it for Thorpe was this comment from Tom: “Seeing as he could do everything better than everyone else, just by giving it a go, he would surely give an incredible seminar.”

And for our next contest, it’s the Bard of Newark vs. a man who’s only in this contest because it was hard for me to think of 8 people whose name ended in f, whose entire fame comes from the decades-old phrase, “Fresh Prince and DJ Jazzy Jeff.” So whaddya want: riffs on Anne Frank and suburban rabbis, or some classic 80s beats? I dunno. I think Roth would be much more entertaining when question time comes along, but he can’t scratch.

Does anyone know these people? Do they exist or are they spooks?

The full bracket is here, and here are the rules:

We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!