Skip to content

Huge partisan differences in who wants to get vaccinated

Jonathan Falk writes:

This piece by Noah Rothman argues (appropriately hedged “It’s just one poll, and the breakdown of subsamples to the narrowest possible margins forces us to be cautious when citing the findings” which is always something good to see) that vaccine hesitancy, which shows a pronounced Republican/Democratic split may not be a Republican/Democratic split at all, but may simply be a rural/urban split with party affiliation forming an proxy. But since polls so frequently focus on the party subtabs rather than the geographic subtabs and since the correlation of these two slices has risen, we run the risk, do we not, of misattribution to Trumpiness what is in fact a population density driver.

Just as we “learned” in 2016 that we couldn’t understand the Trump vote until we disaggregated by education, maybe we can’t understand vaccine reluctance until we disaggregate by geography.

There’s a polling journalism point here as well, which is that topline numbers are complicated enough to give the public; at best you can disaggregate by one dichotomous variable and then repeat for other slices of the data. Of course, if the correlation is high enough, you can’t ever quantify the true drivers, but at least acknowledging alternative explanations ought to be part of the journalists’ toolkit.

I wasn’t sure what to think about this, so I passed it over to polling expert David Weakliem, whom we last heard from when discussing attitudes toward coronavirus restrictions—it turned out that some prominent pundits were way off on that one.

Regarding the issue discussed above of partisanship or urban/rural splits, Weakliem said:

I couldn’t access the original article (it’s subscribers only) but I found an Axios/Ipsos survey from late Feb. which asked “How likely, if at all, are you to get the first generation COVID-19 vaccine, as soon as it’s available?” The percent who said “not at all likely” (of those who hadn’t already had it):

Republican 40%
Democratic 11%
Indep. 25%
Other 41%

Urban 21%
Suburban 27%
Rural 41%

Breaking it down by both variables:

Urb Suburb Rural
Republican 28% 39% 54%
Dem. 9% 12% 11%
Ind. 23% 24% 27%
Other 33% 40% 57%

So in that survey, partisanship is a much more important factor than urban/rural. Of course, urban/rural can be defined in different ways, but given the size of the partisan divide I don’t think the general conclusion would change (they also had a variable for in/not in a MSA and the results were very similar).

Partisanship is usually the biggest division on things related to covid, so my guess is that the survey Commentary cites is just an outlier. But I agree with the general point about the limitations of relying on tabulations in published reports. Surveys should routinely make the data available for online analysis—although journalists aren’t going to do elaborate analysis, they could at least get used to doing three-way crosstabs.

Falk then replied:

The poll Rothman was talking about is here. The question he talks about appears at page 23, and I do think, glancing at it without any more analysis, that it seems to square with the Axios/Ipsos poll, though I’m not entirely sure what you do with people who tell you they already got vaccinated.

I agree that the poll you cite is not just a rural/urban divide, easily seen by the fact that Democrats and Independents have identical proclivities across geography. But something is clearly going on among Republicans, isn’t it?. After all, the leap from 28% in urban Republicans (in shouting distance of independents) to 54% is pretty stark.

It’s striking that the urb/suburb/rural variable matters for Republicans but not Democrats or Independents. It also seems to matter a lot for Others, but I guess there aren’t so many Others out there? Wealiem writes, “The other (the label is ‘something else’) group is about 15% of the sample, which is bigger than I’ve seen in other surveys, but I haven’t looked at the codebook to get the exact question. But it does seem that the urban/rural divide matters more among Republicans than among Democrats or independents.”

Answers to your questions about polling and elections.

1. David Callaway writes:

I read elsewhere (Kevin Drum) that the response rate to telephone polling is around 5%. It seems to me that means you are no longer dealing with a random sample, what you have instead is a self selected pool. I understand that to an extent you can correct a model for data problems, but 5% response? How do you know what to correct for? Take another poll to determine what errors to correct for and how much? Use the time honored “listen to my gut” method?

Instead of going to a lot of trouble to gin up a random list sized to leave an adequate-sounding sample after 95% hangups pollsters might instead be honest and just say the current polling methods don’t return usable data, take their ball and go home.

My reply: I’ve heard that the response rate is closer to 1% than to 5%. As to your question of whether pollsters should just “go home” . . . I don’t think so! This is their business. And, hey, the polls were off by about 2.5 percentage points on average. For most applications, that’s not bad at all. I do feel that we have too many polls, but you have to remember that the main economic motivation for polls is not the horse-race questions. Those are just the way that pollsters get attention. They make their money by asking questions for businesses. And if you’re a business trying to learn what percentage of people have heard of your product or whatever, then an error of 2.5 percentage points is not bad at all.

Also, if pollsters were all gonna quit just because their polls are off by more than their stated margin of error, then they should’ve already quit years ago.

2. Another person writes:

I respectfully suggest that you owe it to readers of The Economist (and others) to comment on how you figure it is that your forecasting model erred so badly re the electoral count and the popular vote for the presidency as well.

For a quick answer, we forecast Biden to receive between 259-415 electoral votes and it seems that he’ll end up with about 300, so I wouldn’t say the model “erred so badly.” Similarly for the popular vote. But there were some issues, which we discuss here and here. I do think our model had problems, but there’s a big difference between having problems and “erred so badly.” If you think that what happened was “erred so badly,” I think the problem is that your expectation is too high: the point of the wide uncertainty interval is that our model can’t make a precise prediction.

3. Ricardo Vieira asks what I think about this post, which stated:

Over this and past USA presidential elections my memory says that many states have been won by <<1% between the major parties. I've heard it suggested that this is due to parties modifying their platforms to appeal to just enough voters to win in relevant swing states. It makes sense that they try to rebalance this way, but as a mechanism for the near perfect splits we often observe it is insufficient. If any party had the techniques to know where 50% appeal is, pollsters would too, so we'd also have much more accurate polls. And the "omniscience party" would surely give themselves enough cushion to rarely ever lose. The alternate explanation is chance, i.e.: the results of each state's election are a random sampling over the distribution of possible % splits; in our age the mean of that distribution happens to have shifted to be near 50%; and the "swing states" are those that fall very close to the mean and are thus often decided by <<1%. That's fine in principle, but some of the margins are mind-bogglingly small. The most notable example is Florida in 2000 (wikipedia):

After an intense recount process and the United States Supreme Court’s decision in Bush v. Gore, Bush won Florida’s electoral votes by a margin of only 537 votes out of almost six million cast and, as a result, became the president-elect.

Georgia this year is currently <1500 votes difference (<.1%) with >99% votes counted. PA probably won’t be that close in the end but it’s still gonna pretty damn close, within a few tenths of a point.

Having grown up in this era and never having been a student of political science, how normal is this? Can anyone link a study where folks have looked at the distributions and shown how likely it is for a vote to be decided by 1k or less margins this frequently over 50 states?

Actually Gore won Florida by 30,000 votes, or he would have if all the votes had been counted. Getting to the larger question: I haven’t looked at the data carefully, but, yes, I think the explanation for the occasional very close election in a state is just that there are a fair number of states that could end up close, so every once in awhile you get something very close.

We’ve had enough election posts for now so I’ll put this one on lag. Once it appears, maybe no one will care. I hope in the future we can move to a system where all votes get counted, but that’s probably too much to hope for. Recently it seems we’ve moved toward a position where whether the votes should be counted is itself subject to partisan debate.

Postdoctoral opportunity with Sarah Cowan and Jennifer Hill: causal inference for Universal Basic Income (UBI)

See below from Sarah Cowan:

I write to announce the launch of the Cash Transfer Lab. Our mission is to build an evidence base regarding cash transfer policies like a Universal Basic Income. We answer the fundamental questions of how a Universal Basic Income policy would transform American families, communities and economies. The first major initiative is to study the effects of the permanent fund dividend (PFD), a policy in Alaska that has paid every Alaskan resident a substantial amount of cash annually since 1982. This is the closest policy in the world to a Universal Basic Income.
You can read about the Cash Transfer Lab here.
We are hiring a postdoctoral associate to work on causal inference modeling. This scholar will work primarily with myself and Jennifer Hill. Details on the position and how to apply is here. We are looking for someone who has expertise in causal inference and has earned a PhD or will by September 1, 2021.
Please spread the word!

The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time

Kevin Lewis points us to this article by Joachim Vosgerau, Uri Simonsohn, Leif Nelson, and Joseph Simmons, which begins:

Several researchers have relied on, or advocated for, internal meta-analysis, which involves statistically aggregating multiple studies in a paper . . . Here we show that the validity of internal meta-analysis rests on the assumption that no studies or analyses were selectively reported. That is, the technique is only valid if (a) all conducted studies were included (i.e., an empty file drawer), and (b) for each included study, exactly one analysis was attempted (i.e., there was no p-hacking).

This is all fine, and it’s consistent with the general principle that statistical analysis must take into account data collection, in particular that you should condition on all information involved in measurement and selection of observed data (see chapter 8 of BDA3, or chapter 7 of the earlier editions, for derivation and explanation from a Bayesian perspective).

I just want to point out one little thing.

This bit is wrong:

“exactly one analysis was attempted (i.e., there was no p-hacking)”

There is still a problem even if only one analysis was performed on the given data. What is required is that the analysis would have been done the same way, had the data been different (i.e., there were no forking paths). As Eric Loken and I put it, multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time.

Vosgerau et al. clarify this point in the last sentence of their abstract, where they emphasize that “preregistrations would have to be followed in all essential aspects”—so I know they understand the above point about forking paths. I just wouldn’t want people to just read the first part and mistakenly think that, because they did only one analysis on their data, they’re not “p-hacking” and so they have nothing to worry about.

How much granularity do you need in your Mister P?

Matt Kosko writes:

I had a question for you about the appropriate number of groups in an MRP model. I’m currently working on streamlining some of the code we use to estimate state-level political opinions from our surveys. I have state-level predictors and Census data for poststratification (i.e., population totals in each age-sex-state-education cell), but I’ve also found some information about Congressional districts.

My question is, is there anything gained by including Congressional districts as another group if we just want to get state-level estimates? I know having cell populations with Congressional districts as another group is necessary for poststratifying and getting district-level opinions (which don’t seem to be very interesting for presidential elections), but does it do anything to increase the efficiency of any of the parameter estimates or is there another benefit?

My reply: I guess that including information at the congressional district level won’t really help you with state-level inferences—but it could. The way that CD-level info could make a difference is if there is nonresponse that varies by CD and is correlated with political outcomes, beyond whatever variables you’re already adjusting for in your MRP analysis. For example, suppose your survey undersamples rural whites, and you did not adjust for urban/rural/suburban in your model. Then it could be that including CD will fix some of this. In such a case I think a better solution would be to include urban/rural/suburban, as adjusting for CD is kind of a crude tool. So my recommendation is to start by thinking carefully about including relevant variables in your MRP model, and also to include relevant state-level predictors.

PhD student and postdoc positions in Norway for doing Bayesian causal inference using Stan!

Guido Biele writes:

I have two positions for a postdoc and PhD student open in a project where we will use observational data from Norwegian National registries, structural models (or the potential outcomes framework, the main thing is that we want to think systematically about identification), and Bayesian estimation in Stan to estimate causal effects of treatments for ADHD on school performance.

We are looking especially for candidates with a background in causal inference or more broadly statistics and probabilistic modeling. We do not require domain knowledge about ADHD and its treatment, which can be acquired on the job.

The positions are in Oslo, at the Norwegian Institute of Public Health. Norwegian language skills are not required for the position.

Here is more information about the project: https://gbiele.github.io/project/nfrsea/
And here is a link to the application form: https://945000.webcruiter.no/Main/Recruit/Public/4320447363?language=en&link_source_id=0

I think the project is especially interesting for people interested in applied statistics, and those who want to do Bayesian statistical modeling with large datasets.

Application deadline is already March 23rd.

Cool! If you’re interested in Bayesian causal inference, I recommend you start by working through this case study, “Model-based Inference for Causal Effects in Completely Randomized Experiments,” by Joon-Ho Lee, Avi Feller, and Sophia Rabe-Hesketh.

Webinar: On Bayesian workflow

This post is by Eric.

This Wednesday, at 12 pm ET, Aki Vehtari is stopping by to talk to us about Bayesian workflow. You can register here.

Abstract

We will discuss some parts of the Bayesian workflow with a focus on the need and justification for an iterative process. The talk is partly based on a review paper by Gelman, Vehtari, Simpson, Margossian, Carpenter, Yao, Kennedy, Gabry, Bürkner, and Modrák with the following abstract: “The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using these models, along with many remaining challenges in computation. Using Bayesian inference to solve real-world problems requires not only statistical skills, subject matter knowledge, and programming, but also awareness of the decisions made in the process of data analysis. All of these aspects can be understood as part of a tangled workflow of applied Bayesian statistics. Beyond inference, the workflow also includes iterative model building, model checking, validation and troubleshooting of computational problems, model understanding, and model comparison. We review all these aspects of workflow in the context of several examples, keeping in mind that applied research can involve fitting many models for any given problem, even if only a subset of them are relevant once the analysis is over.” The pre-print is available here.

A Bayesian state-space model for the German federal election 2021 with Stan

I didn’t do anything on this, just stood still and listened while others talked. I’ll share the whole thread with you, just to give you a sense of how these research conversations go.

This post is for you if:

– You’re interested in MRP, or

– You’re interested in German elections, or

– You want to overhear this particular email exchange.

It started when I received a message from Marcus Groß:

We published a project on forecasting the German federal election outcome using a Bayesian state-space model. Similar to your forecast for the US presidential elections 2020, the model is written in Stan and freely available on github. A webpage is here (in German, English version follows soon; you may use google webpage translate in the meantime).

The model consists of two parts:

– A model for the election outcome, i.e. the vote share for each party based on poll data.

– A model to assign a probability to the coalition of the actual government using expert survey data. For a given election outcome it is not certain which parties are going to be part of the government coalition, as in most cases multiple options are conceivable and this is subject to negotiations between parties.

On the first part:

I think that everybody concerned with this topic agrees that accounting for important sources of biases and uncertainties is a key to derive sound estimates on election outcome events. The model for the German federal election accounts for the following sources:

a) Uncertainty about future events and their correlations between parties, e.g. an event positive for party A might have a strongly negative effect for party B, but much less so for party C.

b) Long-term and short term bias (“house effects”) of a certain pollster for a certain party, i.e. a certain pollster might generally favour a specific party (long-term) whereas the size of this bias might fluctuate over time (short-term).

c) Common pollster bias for specific parties and correlations between parties. The entirety of all pollsters might be wrong / biased. Following an election the pollsters will likely adjust their poll schemes and correction methods to account for it. Afterwards the bias magnitude will slowly creep up again.

d) Polling error due to finite sample size

The basic model idea follows a Bayesian state-space model, such as the model of Simon Jackman for the australian election. However, a major change is that the latent voter’s intention is not modeled by a random walk, but rather uses a “long-short term memory” extension. The concept behind this is that a specific event that influences the vote share of a specific party (scandals, nomination of candidates..) are first spread through the media which causes an overreaction or overshoot (short-term memory) with a maximum effect at around four weeks. Medium to long term, however, we see that this effect declines somewhat as other topics are covered by the media now and voters forget about the concerning event. As the remaining long term effect is smaller than the initial one, we see a regression to the mean behaviour. This is modeled by a time-series model structure resembling a mixture of autoregressive processes. This concept can be also confirmed empirically and improves the forecast quite a bit (between 5%-15% depending on the forecast horizon).

On the second part:

Here, we asked experts on their opinion on coalition preferences between parties. They had to rank coalitions under the premise that the parties were free to choose, independently of the election result. (Imagine a situation where party representatives are locked into a room to form coalitions. Who would find together first, second and so on..) Thus, these rankings are orthogonal to the election outcome or vote share simulations, which make it quite straightforward (under certain assumptions) to derive probabilities for final government coalitions.

The model description is here and the Github link is here.

I sent this to Merlin Heidemanns who worked with me to set up the Economist model and is also German. I guess Germany is more difficult to model because you have more than two major parties.

Merlin had a question for Marcus:

Something I found weird looking at the website. You say that Jamaica has a 99.8% chance of receiving a majority of the vote share but that the coalition will be CDU/CSU/Greens with 63%? Do the experts make their prediction based on the model probabilities? The probabilities for the different coalitions don’t necessarily seem congruent with each other. Maybe I am missing something.

Marcus replied:

This is because different questions are answered. Yes, Jamaica has an estimated chance of 99.8% of getting a majority, and CDU/CSU/Greens of 83.5% (see at the bottom of the web page). These numbers are not influenced by the expert rankings and are derived from the “first part” of the model that uses only poll data. The 63% for CDU/CSU/Greens in the first graph do however not answer the question of getting a majority, but rather the probability of being in charge after the elections (i.e. constituting the government coalition). These probabilities therefore sum up to 100%! To derive these probabilities, the expert rankings are combined with the election outcome simulation results (see last section of the notebook.pdf for a detailed explanation).

Merlin then had further thoughts:

1. I had thought about the German elections for a while and have so far thought that some MRP based approach as YouGov did it would probably be better than a model based only on essentially the second vote choice. Also with regard to the Ueberhangsmandate.

I like the model (and that it’s public)! Some things I would be thinking about:

2. How predictive are polling errors across elections? I have recently begun to think that a mean zero assumption for polling errors is not sensible. I obviously am unaware of how the situation is in Germany but at least here in the US year to year correlation is between 0.4 and 0.6. Maybe one can do better by predicting polling errors for the upcoming election rather than sampling them anew.

3. Similarly, I’ve been concerned about house effects that assume that the average is equal to 0. I would like at patterns in the polling errors by pollster and decompose polling error year by year into an average error and a polling house specific error and see whether there is a pattern that one could use to derive priors for the next election cycle from the previous one.

4. I am not a fan of fixed hyper parameters for the short and mid term decay. I’d either put priors on them or try to estimate them with auxiliary data though my fear is that the data you fit will affect them and cause some undesirable behavior that inflates them as the model tries to fit random excess variation in the polls.

5. Both epsilon and pollError can probably be decomposed with the standard normal cholesky decomposition and that could be faster? A week is a lot to fit a model like this.

6. What do you gain from estimating the polling error for every election week? Is the assumption that polling error is larger or smaller at different points in the election season?

Marcus replied:

1. You mean using polls on state / county level in a MRP model? On state level, the poll data is quite sparse in Germany, therefore we used only the second vote choice poll data. But, yes this might be something to look at. With the Ueberhangsmandate, I think there won’t be a problem, because there was a change in the voting system some years ago, that guarantees that the second vote choice shares and the Bundestag seat shares are guaranteed to be the same (ignoring discretization).

2. That might be actually worth looking into as they are currently are assumed to be uncorrelated within the model. My assumption was that the pollsters learn after each election and try to recalibrate and correct their polling methods. At a first glance, there does not seem much correlation, but I’ll look into that in more detail.

3. I think I would run into an identification problem otherwise. The total bias, i.e. the difference between the published poll for a certain party from a specific pollster is decomposed into:
total_bias_party_pollster_election = poll_error_party_election + house_bias_party_pollster + house_bias_party_pollster_election. So, the house bias of a specific pollster for a party has two parts:the long term house_bias (” housebias”- parameter) and the short-term house_bias (” housebiasEl” parameter). A specific pollster can have a house effect != 0 for a specific party, but the average over all pollsters is 0 (not possible to differentiate from the poll error and short term house effect otherwise).

4. I tried that as well, but the model got much less stable. I think however, fixing them is okay, as the extent of the decay is estimated by the model, but only the half time parameters are fixed. (I see that somewhat similar to fixed knots in a spline model; I also tried to introduce a long term decay as well, this didn’t change much, however.).

5. The multi_normal_cholesky statement is currently used, actually or do you mean some other formulation? I tried a lot of other model formulations in stan, but the current one was the fastest I could do. The model takes only about 18 hours to fit currently, not a week; the web page is updated every day. (Maybe you took a look at the poster? It states an outdated version of the model).

6. The poll error is only estimated per election and party, not by week; (see line 54 in https://github.com/INWTlab/lsTerm-election-forecast/blob/master/stan_models/lsModelMulti.stan). This was also changed some months ago (info on the poster is outdated).

Merlin responded:

That clarifies a lot. Thanks!

YouGov did essentially a very large poll and fit an MRP model post stratifying to census variables to get win probabilities for each electoral district and the second vote choice. I am not aware of how frequent vote splitting is or how voters are generally distributed in Germany based on party identification so whether or not that is comparatively better or not I am not certain.

Yes, I looked at the poster. 18 hours is a lot but if you just have it run on a server somewhere I guess it doesn’t matter.

We used the non-centered parameterization for our multivariate random variables like so

mu_b[:,t] = cholesky_cov_mu_b * raw_mu_b[:, t] + mu_b[:, t – 1]

That could help speed things up (or not) but if you haven’t already might be worth trying.

On the pollsters correct after each election, it’s hard to say. Do you have graphs of pollster-party level error by election?

And that makes sense on the stability issue when not fixing those decay parameters. It’s probably also related to identifiability.

And then Marcus replied:

To fit a electoral-district-level model, it is not easy to get poll data and I don’t think it would make much difference on the federal level. The reason is that the first vote doesn’t have any influence on the relative seat distribution in the parliament since 2011 (https://de.wikipedia.org/wiki/%C3%9Cberhangmandat#Reform_wegen_Verfassungswidrigkeit), where so-called Ausgleichsmandate were introduced. Thus for the parliament seat distribution in the parliament we have a popular vote only (only second vote counts). Überhangmandate still exist but the new Ausgleichsmandate increase the parliament size, such that the seat distribution is identical to the second vote result. Still, getting estimates on the electoral-district level including winning probabilities of the candidates would be very nice.

I tried that non-centered parameterization, but it didn’t matter performance-wise in this case. These kinds of hacks do unfortunately make the code less readable (e.g. the “epsilon” variable is also multivariate-t, which is anything but obvious), one of the drawbacks of Stan. However, i’m really happy that the model runs at all! Stan really was the only thing that worked. State-space models with this level of convolution are notoriously hard to fit.

On the correlations of poll errors, maybe a short explanation would help:

For a given party:

– There is a positive correlation for errors of a specific pollster as the (long-term) house effect parameter is shared (“housebias” variable), even more so if the polls are from the same election cycle. In this case the election specific (short-term) house effect parameter (“housebiasEl” variable) is shared as well.

– There is also a positive correlation for errors between different pollsters for a given election cycle, as they share the common pollster bias variable (“pollError” in the Stanmodel, that expresses a global bias of all pollsters within an election cycle).

The assumed pollsters correction after an election is to be interpreted globally, i.e. the common pollster bias is reset. For example, in the 2017 election all pollsters overestimated the CDU/CSU vote share. As of now, the model doesn’t assume any persistence of this kind of bias, i.e. for the 2021 election the model draws this bias from a (multi-) normal with mean zero. However, I’ll look into that next week, maybe there is some room for improvement.

That’s where it is so far. You can follow the links at the top of the post to see Marcus’s model, data, and documentation.

The 5-sigma rule in physics

Eliot Johnson writes:

You’ve devoted quite a few blog posts to challenging orthodox views regarding statistical significance. If there’s been discussion of this as it relates to the 5-sigma rule in physics, then I’ve missed that thread. If not, why not open up a critical discussion about it?

Here’s a link to one blog post about 5-sigma.

My reply: Physics is an interesting realm to consider for hypothesis testing, because it’s one area where the null hypothesis really might be true, or at least true to several decimal places. On the other hand, with experimental data there will always be measurement error, and your measurement error model will be imperfect.

It’s hard for me to imagine a world in which it makes sense to identify 5 sigma as a “discovery”—but maybe that just indicates the poverty of my imagination!

In all seriousness, I guess I’d have to look more carefully at some particular example. Maybe some physicist could help on this one. My intuition would be that in any problem for which we might want to use such a threshold, we’d be better off fitting a hierarchical model.

No, I don’t like talk of false positive false negative etc but it can still be useful to warn people about systematic biases in meta-analysis

Simon Gates writes:

Something published recently that you might consider blogging: a truly terrible article in Lancet Oncology.

It raises the issue of interpreting trials of similar agents and the issue of multiplicity. However, it takes a “dichotomaniac” view and so is only concerned about whether results are “significant” (=”positive”) or not, and suggests applying Bonferroni-type multiplicity adjustments to them. I found it amazing that such an approach was being seriously suggested in 2019 – it’s like decades of systematic reviews, the Cochrane Collaboration and so on never happened.

The first author has a podcast called Plenary Session that is widely listened to by doctors, so he’s quite influential. From what I’ve heard it’s usually pretty good, and raises some really important points about research and clinical trials. But I think he’s got it badly wrong here. The consequences concern me; this is in one of the top journals in oncology; it’s read by doctors treating people for very serious conditions, who make life and death decisions, so it’s really important to get these issues right and not mislead ourselves. I’ve written a letter to the journal (just heard it was accepted this morning) but I don’t think they really have any impact. Most people who read the paper won’t see the letter.

I guess most people who read the paper won’t see this blog either, but here goes . . .

I followed the link and I didn’t think the paper in question was so terrible. I mean, sure, the method they propose is not something that I would ever recommend—but it seemed to me that the main purpose of the paper was not so much to recommend Bonferroni-type multiplicity adjustments (a bad idea, I agree) but rather to warn people not to take published significance tests and confidence intervals seriously when there are forking paths in the data processing and analysis.

So, yes, I disagree (see here and here) with their statement, “The rare so-called positive trial within a sea of negative studies is more likely a false positive than a true positive.”

And I disagree with their statement that “we need to correct for the portfolio of trials not within a single pharmaceutical company but across all companies”—I think it’s good to analyze the whole portfolio of trials, not to skim the statistically significant results and then try to use statistical methods to try to estimate the rest of the iceberg.

But I agree with their conclusion:

To deliver high-quality cancer care in a sustainable health ecosystem, clinicians, investigators, and policy makers will need to identify therapies that offer benefits that are substantial in magnitude and not statistical artifacts.

So I would not call this a “truly terrible article.” I’d say that it’s a reasonable article that is imperfect in that it is operating under an outdated statistical framework (unfortunately a framework that remains dominant in theoretical and applied statistics!). But, as a whole, it still seems reasonable for doctors to read this sort of paper.

P.S. I wrote this post in part to avoid selection bias. I get lots of emails from people pointing out things they read and liked, and I share these with you. And I get lots of emails from people pointing out things they read and hated, and I share those too. So when I get such emails and I disagree with the reader’s assessment, I should let you know that too!

The social sciences are useless. So why do we study them? Here’s a good reason:

Back when I taught at Berkeley, you could always get a reaction from the students by invoking Stanford. The funny thing is, though, if you’re at Stanford and mention Berkeley, nobody cares. You have to bring up Harvard to get a reaction.

Similarly, MIT students have a chip on their shoulder about Harvard, but Harvard students don’t really think much about MIT. There’s an asymmetry. Harvard and Yale, that’s more symmetric.

This came up the other day in conversation with a biologist who was saying that, when they consider the scientific method, they think of physics as the gold standard. But when we do social science, we often look up to biology.

The thing about social science is that it hasn’t produced much. We social scientists don’t have an inferiority complex; we really are inferior. Physics has produced locomotives, semiconductors, and the atomic bomb. Chemistry has produced amazing new materials. Biology has produced the coronavirus vaccine, and lots more. Social science has produced . . . what, exactly? A method of evaluating redistricting plans? Better polling? The Big Five? The Implicit Association Test? A better auction rule? Some cool marketing tricks? The past two hundred years of social science have given us nothing as useful and important as what gets produced every day in biology, chemistry, and physics.

But then the question arises: What’s the point of social science? Why do we do it at all? That’s a good question for me, given that I teach in the political science department and write papers on districting, voting power, social penumbras, gaydar, and all the rest.

Here’s my answer. We study the natural sciences because they help us understand the natural world and they also solve problems, from vaccines to the building of bridges to more efficient food production. We study the social sciences because they help us understand the social world and because, whatever we do, people will engage in social-science reasoning.

The baseball analyst Bill James once said that the alternative to good statistics is not no statistics, it’s bad statistics. Similarly, the alternative to good social science is not no social science, it’s bad social science.

The reason we do social science is because bad social science is being promulgated 24/7, all year long, all over the world. And bad social science can do damage.

In summary: the utilitarian motivation for the natural sciences is that can make us healthier, happier, and more comfortable. The utilitarian motivation for the social sciences is they can protect us from bad social-science reasoning. It’s a lesser thing, but that’s what we’ve got, and it’s not nothing.

Regression discontinuity analysis is often a disaster. So what should you do instead? Here’s my recommendation:

Summary

If you have an observational study with outcome y treatment variable z and pre-treatment predictors X, and treatment assignment depends only on X, then you can estimate the average causal effect by regressing y on z and X and looking at the coefficient of z. If there is lack of complete overlap in X of the treatment and control groups, then your inference can be highly sensitive to the form of the model for E(y|z,X) as a function of X.

A special case is discontinuity analysis, where the treatment assignment depends entirely on one of the pre-treatment variables, call it x, with z=1 or 0 when x is above or below some threshold. Here, when running your regression of y on z, you’ll definitely want to include this “running variable” x in your pre-treatment predictors—but in general you’ll also want to adjust for other X variables. Just because the treatment assignment depends on x, this doesn’t ensure overlap and balance across other variables in X. The other thing is that there’s no overlap on x, so your inference is sensitive to the functional form of how x enters the regression model. That’s just the way it is. Deciding to use a local regression or a polynomial or whatever doesn’t resolve this problem; these models are nothing more than tools that allow you to try to construct a reasonable fit, and if the fit is unreasonable, there’s no reason to trust the result. In some settings, you can fit your regression discontinuity analysis only adjusting for x and no other variables in X, but that’s only in the special case where x is a really important predictor and you can assume something close to balance on all the other pre-treatment variables, for example if y is post-test score and x is the score on a highly predictive pre-test. This is as with any observational study: If you have a really good pre-treatment predictor, you might be able to get away with just adjusting for that and nothing else, but this is not a general principle. In general you need to be concerned with balance on all pre-treatment predictors, and when there’s lack of overlap, the form of the regression function can be important.

Considering many of the bad regression discontinuity analyses we’ve looked at in recent years, some common features are:

– The running variable x is not a strong predictor of the outcome;

– The fitted functional form for E(y|z,x) lacks face validity;

– The analysis does not always adjust for other pre-treatment variables (what I’m calling the rest of X);

– The people who did the analysis think they’re doing everything right, so they don’t question the results.

The point of this post is (a) to talk about how to do a better analysis using the general perspective of observational studies, and also (b) to free people from thinking that the simplistic regression discontinuity (in which only x is adjusted for, and in which there’s no concern about the fitted functional form of the regression) is the right thing to do. I’m hoping that when released of that attitude, researchers can be liberated to do better analyses.

All of this is separate from the concerns of forking paths and summaries based on statistical significance. These topics are also important, and they also come up with regression discontinuity analysis, but I won’t be discussing them today.

Background

The other day I was speaking with some economics students and we were discussing problems with regression discontinuity analyses. For background see here, here, here, here, here, here, here, here, here, and here. One interesting thing about these examples is that the analyses are obviously wrong, to the extent that the students are surprised they were ever taking seriously—but yet these examples keep on coming.

The purpose of today’s talk is not to explain what went wrong in all those analyses—you can see the above links for that—but rather to outline the analysis I’d recommend instead.

The trick is to take the good part of the regression discontinuity design but not the bad part.

The good part is that you have a natural experiment: everyone with x below some threshold was exposed, everyone with x below that threshold was unexposed. So no need to worry about selection bias in the way that it is often a concern with observational studies.

The bad part is the idea that you’re supposed to model y given x and the discontinuity and nothing else: y_i = a + theta*z_i + f(x_i, phi) + error, where theta is the treatment effect, z_i is the treatment variable (1 if exposed, 0 if not), x is the running variable, and phi is the vector of parameters governing E(y|x) in the absence of any treatment.

There’s lots of focus on what functional form to use in the above expression, and Guido and I have contributed to this discussion, but really the problem is not with any particular family of curves but rather with the idea that you’re only supposed to adjust for the running variable x and nothing else. That’s the mistake right there.

My advice

So here’s how I recommend attacking the problem of causal inference in a discontinuity design:

1. It’s an observational study. You’re comparing outcomes for exposed and unexposed units, and you want to adjust for pre-treatment differences between the two groups.

2. It’s a natural experiment. The treatment assignment only depends on x. That’s great news! But you still need to adjust for pre-treatment differences between the two groups.

3. Adjusting for a functional form f(x, phi) does not in general adjust for pre-treatment differences between the two groups. It adjusts for differences in x but not for anything else.

4. It makes sense to adjust for x and to fit a reasonable smooth function to do this. The treatment and control groups have zero overlap on x, so you want to think hard about how to do this adjustment. “Think hard” includes using an appropriate functional form and also looking at the fit to see if it makes sense.

The punch line is: Adjust for x and also adjust for other relevant pre-treatment variables. It’s an observational study! No reason to expect balance for pre-treatment characteristics that don’t happen to be captured by the running variable.

We discuss regression discontinuity in section 21.3 of Regression and Other Stories. We have an example there and we give some good advice. But now I’m wishing we had something punchier like what I just wrote above. Sometimes it’s worth putting in some words to dispel misconceptions.

I’m trying to help here!

Sometimes people get annoyed when I criticize these papers, either because they’re written by important people and so who am I to question, or because they’re written by less important people and so why am I picking on them.

The reason why I criticize is the same as the reason why I offer advice. It’s because I think policy analysis is important! I’m glad that youall are uncovering these natural experiments and doing these studies. I just want to help you do a better job of it. What’s the point of making avoidable errors? Sure, in the short term if you do a bad analysis and nobody notices you can get some twitter action and maybe even a published paper out of it. But long term you’re just wasting everyone’s time, and for your own career development it’s better to learn how to do things right.

This has come up before

Here’s what I wrote a couple years ago:

I was talking with some people the other day about bad regression discontinuity analyses . . . The people talking with me asked the question: OK, we agree that the published analysis was no good. What would I have done instead? My response was that I’d consider the problem as a natural experiment: a certain policy was done in some cities and not others, so compare the outcome (in this case, life expectancy) in exposed and unexposed cities, and then adjust for differences between the two groups. A challenge here is the discontinuity—the policy was implemented north of the river but not south—and that’s a challenge, but this sort of thing arises in many natural experiments. You have to model things in some way, make some assumps, no way around it. From this perspective, though, the key is that this “forcing variable” is just one of the many ways in which the exposed and unexposed cities can differ.

After I described this possible plan of analysis, the people talking with me agreed that it was reasonable, but they argued that such an analysis could never have been published in a top journal. They argued that the apparently clean causal identification of the regression discontinuity analysis made the result publishable in a way that a straightforward observational study would not be.

If so, that’s really frustrating, the idea that a better analysis would have a lower chance of being published in a top journal, for the very reasons that makes it better. Talk about counterfactuals and perverse incentives.

What would be helpful

It can be hard to communicate with economists—they use a different language. To really make the points in this article, it would be helpful to translate to econ-speak and write a paper with a couple of theorems. That could make a difference, maybe.

It was a year ago today . . .

We posted the following item: “We taught a class using Zoom yesterday. Here’s what we learned.”

I was full of earnest thoughts. If you’d asked me whether I’d still be teaching on Zoom a year later, what would I have said? I’m not sure.

The most relevant piece of information I can share with you is that my classes have got worse and worse since then. We’re all much more used to zoom, but the initial excitement and enthusiasm level has gone way down.

Here’s the last paragraph of my post from 10 Mar 2020, overconfident and full of hope:

I’ve probably missed a few things. So far I’ve been happy with the remote teaching experience. The challenge will be keeping the students engaged. I have a horrible feeling that half of them are texting or reading the news on the web while half-listening to the class. I appreciate students’ patience with our technology struggles, but going forward I want them to be even more engaged. I don’t want to be wasting their time and attention. Any suggestions?

Read it and weep.

“The presumption of wisdom and/or virtue causes intellectuals to personalize situations where contending ideas are involved.”

Mark Tuttle writes:

A friend recommended the book Intellectuals and Society, by Thomas Sowell. The book is from 2010, but before this recommendation I hadn’t heard of it.

Note the last paragraph, below, in the Wikipedia entry:

Ego-involvement and personalization

The presumption of wisdom and/or virtue causes intellectuals to personalize situations where contending ideas are involved. This often results in: (a) the demonization of opponents, and (b) personal fulfillment serving as a substitute for debate and evidence. Sowell does not make it clear if intellectuals acquired these traits from politicians, or the other way around.

It reminded me of some of your observations.

My reply: I hadn’t heard of this book either. I’ve read a few op-eds by Sowell over the years but not a whole book. Based on the wikipedia description, this one looks interesting.

The above-quoted paragraph reminds me of the defensive attitudes that have led leading academics to label their critics as Stasi, terrorists, second-stringers, data thugs, etc. Or you could say the description applies to me, when I make fun of Gremlin Man, Albedo Boy, Pizzagate, Weggy, himmicanes, and all the rest.

I think my name-calling is more legitimate than theirs, though.

When they call us Stasi, terrorists, second-stringers, data thugs, they offer no evidence, no reason for these labels. For example, none of these name-callers has ever given an actual example of Stasi-like behavior, terrorism, second-string work, or thuggery that any of us have done. When I laugh at their claims on albedo, or their misclassified data points, or their ridiculous claims from the statistical equivalent of reading tea leaves, or their disappearing data, or their unwillingness to correct or even admit their errors, I give clear evidence. Yes, I’m mocking them, but (a) I don’t think I’m “demonizing” them (I’m just pointing out what they did and expressing my frustration and annoyance), and (b) whatever personalizing is done is not “a substitute for debate and evidence,” it’s a dramatization of existing debate and evidence.

In summary: Not all mocking/criticism/name-calling is the same. Name-calling as a substitute for debate and evidence is not the same as name-calling that dramatizes debate and evidence.

It’s possible that name-calling is a bad idea, even when it is legitimate, as it can lower the discourse, induce defensiveness, etc. On the other hand, a bit of name-calling can make a dry scientific debate a bit more entertaining. And don’t forget the Javert paradox.

I’m not quite sure what to say about “personalization.” I think that intellectual discussion should be about ideas and evidence, not personalities; but ideas come from and are presented by people. Sometimes both the ideas and the people are relevant. When David Brooks, say, refuses to correct an error, part of the problem is the error and part of the problem is that he is given a platform to make authoritative-sounding pronouncements to an audience of millions without any duty to check his facts. He gets to do this in part because of his status as David Brooks, New York Times columnist.

But, yeah, the personalities can be distracting, and sometimes we can do without them. For example, in our recent discussion of the criminology journal scandal, I used some humor, but I focused on the events, not the personalities involved. So it can be done.

OK, we’ve gone through this issue before on the blog.

But what about Sowell’s more general point, that the presumption of wisdom or virtue causes intellectuals to personalize situations where contending ideas are involved?

I’ll say two things here.

First, I think he’s right, I think it’s a real issue that pollutes intellectual discourse. People hold on to a position and just don’t let go. A bit of stickiness is fine—we need have a diversity of intellectual views, so it’s good that people have different thresholds for being swayed by any particular bit of evidence—but a lot of people take it too far, holding on to theories long after they’ve been deprived of whatever evidence was originally taken to support them.

Second, it’s not just intellectuals. In my experience, everybody personalizes situations where contending ideas are involved. No “presumption of wisdom or virtue” is required. Just consider any political debate at a prototypical bar or country club. Lots of over-certainty.

From this perspective, the problem with intellectuals is not that they’re worse than everyone else. The problem is that they’re not enough better than everyone else. It’s frustrating when tenured professors, with all their education, job security, and avowed ethos of openness, shut their ears to criticism and personalize disagreements as a way to avoid intellectual discussion and debate.

I guess I’ll actually have to read the book to see Sowell’s full take on this one. At this point I’m just engaging with the general ideas as summarized on that Wikipedia page; I’ll be interested to see the full argument.

P.S. Zad sent in the above picture of two adorable baby cats who would never demonize their opponents. They just want to play!

“From Socrates to Darwin and beyond: What children can teach us about the human mind”

This talk is really interesting. I like how she starts off with the connections between psychological essentialism and political polarization, as an example of the importance of these ideas in so many areas of life.

Ahhhh, Cornell!

What’s up with that place?

From his webpage:

Sternberg’s main research interests are in intelligence, creativity, wisdom, thinking styles, teaching and learning, love, jealousy, envy, and hate.

That pretty much covers it.

Yes, there is such a thing as Eurocentric science (Gremlins edition)

Sometimes we hear stories about silly cultural studies types who can’t handle the objective timeless nature of science. Ha ha ha, we laugh—and, indeed, we should laugh if we don’t cry because some of that stuff really is ridiculous.

But let us not forget that science really can be culture-bound.

Not just silly psychology journals that act as if a study of 24 psychology students and 100 people on the internet can give general insights into the human condition.

Culture-bound research also appears in more quantitative research.

Recall this, published in the Review of Environmental Economics and Policy:

This review of estimates in the literature indicates that the impact of climate change on the economy and human welfare is likely to be limited, at least in the twenty-first century. . . . negative impacts will be substantially greater in poorer, hotter, and lower-lying countries . . . climate change would appear to be an important issue primarily for those who are concerned about the distant future, faraway lands, and remote probabilities.

“Faraway lands” . . . this is a laughably Eurocentric perspective. In one sense, this is fine given that this appeared in “the official journal of the Association of Environmental and Resource Economists and the European Association of Environmental and Resource Economists.” On the other hand, it’s called “the Review of Environmental Economics and Policy,” not “the European Review of Environmental Economics and Policy.” And the article in question is called, “Economic Impacts of Climate Change,” not “European Economic Impacts of Climate Change.”

So, yeah, even physical sciences can suffer from an implicit Eurocentric perspective. Again, if you’re European and you want to present your own perspective, that’s fine, and it’s natural. I write lots of things from an American perspective! The error here is in considering one’s own perspective as default or universal. The “faraway lands” quote is an amusing tell, but it also represents a larger issue.

What is the landscape of uncertainty outside the clinical trial’s methods?

I live in the province of British Columbia in the country of Canada (right, this post is not by Andrew, it is by Lizzie). Recently one of our top provincial health officials, Dr. Bonnie Henry, has received extra scrutiny based on her decision to delay second doses of the vaccine. The general argument against this is the one I have heard from Dr. Fauci of the US, who has been various levels of adamant that you do what the trial did (I would say very adamant, very adamant, very adamant, then slightly less adamant after the kerfuffle with the UK). You don’t deviate from the methods of the trial.

This has got me wondering what the landscape of uncertainty looks like as you move away from the methods of a clinical trial. And what progress we’ve made — if any — on this in the last couple of decades, when I first realized how stark the divide between inside and outside the trial methods is for many.

Over 15 years ago I was helping take care of a 50-year-old family member who had cancer and was struggling to get through a 6 week regime of radiation + chemotherapy at a major cancer institute in Boston. She had gotten through the first couple of weeks okay, even driving herself the 6-8+ round-trip hours from her home to the institute five days a week for her daily radiation appointments. But things got progressively worse in the third and following weeks (when I was her trusty chauffeur and companion). By her fifth week she was in and out of the ER with various major issues and was receiving various infusions to attempt to prop up her system so she could survive the next dose of radiation. Every day before radiation she needed a series of tests followed by a visit to her oncologist to get approval for that day’s dose of radiation, and this did not seem out of the ordinary for the later weeks of high-dose radiation therapy.

At one visit, when things were going particularly poorly, the radiation oncologist was brought in to consult on whether to continue treatment. He was advising for continuing, though it would be hard. It took a lot of her energy to speak, so she was often quiet, but on this day she asked him: ‘why do I have to do this? I have done this for most of the 30 visits, why do these last few matter so much?’ And he told her the truth — “because we only have data on the people who get the full dose. We don’t know what happens if you don’t take the full dose, or you take a few days off before continuing.” It was very helpful. I remember she said something along the lines of ‘okay,’ and we drove home in semi-shock, but at least we knew why they were pushing for this now. It was always her choice, but until then neither of us realized how gaping the uncertainty was between, say, going for 27 of your total 30 radiation visits, and going for all 30.

Clearly, the ethics matter, and that’s especially clear with a highly infectious and deadly disease like Covid. I assume that many of these deviant public health officials who have delayed second doses have done the simple SIR model math and figured out that: (n higher number of people vaccinated at X% efficacy given Y weeks of delaying the second dose)*(black box uncertainty as you deviate away from the trial methods)=likely more lives saved. Henry has cited studies showing >90% efficacy for the three weeks after the first dose of the Moderna and Pfizer vaccines, so I suspect she’s feeling good that her X in extending the second dose to four months is still fairly high and thus has some internal estimate on the landscape of uncertainty beyond the methods of the trial, and there’s growing data on this.

But if you listen to various interviews with Fauci and other public health officials, I start drifting into memories of discussions that start with, ‘what is the variance of a fixed effect? It’s either 0 or infinity.’ Now, I don’t mean exactly that — but I do mean there seems to be a large gap in perspectives here. In one — the clinical trial methods must be followed to a T, until a new or properly vetted trial of any deviation is approved, conducted and reviewed. And in another — some adjustments happen given the potential for lives saved despite the uncertainty and that ‘population health data’ is then used to make further adjustments on the fly (in conjunction with other ways of viewing the clinical trial data you have).

These debates have made me wonder what progress have we made addressing this uncertainty from both a bioethics, and data collection and design standpoint? I am not (at all) a bioethicist but the rigid adherence to the trial methods doesn’t feel terribly ethical to me, and I think Covid has highlighted that. So I wonder how much has changed in last 10, 20 or 30 years of how those who deviate or ‘drop out’ of clinical trials are handled as datapoints. Are they required to be tracked? Or is it better to save money by focusing only on those who follow the trial perfectly? Is there an incentive for research or new methods or databases that compile these deviants to start fleshing out that landscape of uncertainty beyond the clinical trial methods? Or is everything beyond basically zero, or maybe infinity? Or maybe somewhere in between.

Bayesian methods and what they offer compared to classical econometrics

A well-known economist who wishes to remain anonymous writes:

Can you write about this agent? He’s getting exponentially big on Twitter.

The link is to an econometrician, Jeffrey Wooldridge, who writes:

Many useful procedures—shrinkage, for example—can be derived from a Bayesian perspective. But those estimators can be studied from a frequentist perspective, and no strong assumptions are needed.

My [Woolridge’s] hesitation with Bayesian methods—when they differ from classical ones—is that they are not “robust” in the econometrics sense.

Suppose I have a fractional response and I have panel data. I’m interested in effects on the mean. I want to allow y(i,t) to have any pattern of serial correlation and any distribution. I want to allow heterogeneity correlated with covariates.

I know how I would approach this: pooled quasi-MLE with a Chamberlain device and using cluster-robust inference.

How does a Bayesian solve this problem under the same assumptions plus a prior? I think it’s possible, but are such methods out there and in use?

My reaction to this will be milder than you might expect. Compared to the remarks of some other anti-Bayesians (see for example here and here), Wooldridge is pretty modest in his claims. He’s not saying that Bayesian methods are bad, just that they give him some hesitation.

Wooldridge’s main point seems to be that he and his colleagues have had success with non-Bayesian methods and, on the occasions that they’d looked around to see if Bayesian ideas could help, they haven’t been clear on where to start.

This suggests a need for a short paper taking some of his classical models and expressing them in Bayesian terms with Stan code. Wooldridge appears to be a Stata user, so it could also be useful to include some Stata code using StataStan to call the Stan program.

Somebody other than me will have to do that, as I don’t know what is meant by a fractional response, or pooled quasi-MLE, or a Chamberlain device. It won’t be possible to exactly duplicate these models Bayesianly—he wants it to work for “any pattern of serial correlation and any distribution,” and a Bayesian model would need some parametric form. But these parametric forms can be very flexible (splines, Gaussian processes, etc.), and I don’t actually think you really need the procedure to work for any pattern of serial correlation and any distribution. There are some patterns you’re never gonna see, and some distributions with such long tails that that no procedure would work to estimate effects on the mean. So I think his procedures must have some implicit constraints. In any case, I expect that it should be possible to set up a Bayesian model that pretty much does what Wooldridge wants, without taking too long to compute.

Regarding the claim that Bayesian methods are not robust in the econometrics sense . . . I dunno. I guess I’d have to see some simulation studies. I guess that in some sense his claim must be true, in the following sense: By construction, Bayesian inference maximizes statistical efficiency under the assumptions of the model Efficiency is only one of many goals of inference; thus, if you’re maximizing efficiency you must be losing somewhere else. We could just as well flip it around and say that I have hesitation with any statistical procedure X because it will be flawed when its assumptions fail.

As we’ve discussed in the past, one common failure mode of purportedly conservative or robust-but-inefficient methods is that users want results. They don’t want confidence intervals that are robust but are a mile wide. The way to get reasonable-sized confidence intervals with a statistically inefficient procedure is to throw in more data. For example, when fitting a time-series cross-section model, you might pool data from 40 years rather than just 10 years, so that you can estimate the average treatment effect with a desired level of precision. The trouble is, then you’re estimating some average over 40 years, and this might not be what you’re interested in. People will take this average treatment effect and act as if it applies in new cases, even though it’s not clear at all what to do with this average. Or, to put it another way, this parameter is answering the questions you want to answer—as long as you’re willing to make some strong assumptions about stability of the treatment effect.

So, ultimately, you’re trading off one set of assumptions for another. I’d typically rather make strong assumptions about something minor like the covariance structure of an error term and then be flexible about the things I really care about, like treatment interactions. But I guess the best choice will depend on the particular problems you work on, along with what can be done with the tools you’re familiar with.

I respect Wooldridge’s decision to stick with the methods he’s familiar with. I do that too! It makes sense. There’s a learning curve with any new approach, and I can well believe that Wooldridge using classical econometrics techniques will do better data analysis than Wooldridge using Bayesian methods, especially given that the tutorial I’ve outlined above does not yet exist.

Also, I agree with him that Bayesian methods can be studied from a frequentist perspective. That’s a point that Rubin often made. Rubin described Bayesian inference as a way of coming up with estimators and decision rules, and frequentist statistics as a framework them. And remember that Bayesians are frequentists.

I recommend that Wooldridge continue to use the methods he’s comfortable with. What would motivate him to try out Bayesian methods? If he’s working on a problem where strong prior information is available (as here) or where he has lots of data in scenario A and wants to generalize to similar-but-not-identical scenario B (as here) or where he wants to pipe his inferences into a decision analysis (as here) or where he’s interested in small-area estimation (as here) or various other settings. But until he ends up working on such problems, there’s no immediate need for him to switch away from what works for him. And we end up working on problems that our methods work on. Pluralism!

Being able to go into detail on this is a big reason I prefer blogs to twitter. I enjoy a good quip as much as the next person, but it’s also good to have space to explain myself and not just have to take a position.

My talk’s on April Fool’s but it’s not actually a joke

For the Boston chapter of the American Statistical Association, I’ll be speaking on this paper with Aki:

What are the most important statistical ideas of the past 50 years?

We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss common features of these ideas, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science.

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.