Skip to content

Linear or logistic regression with binary outcomes

Gio Circo writes:

There is a paper currently floating around which suggests that when estimating causal effects in OLS is better than any kind of generalized linear model (i.e. binomial). The author draws a sharp distinction between causal inference and prediction. Having gotten most of my statistical learning using Bayesian methods, I find this distinction difficult to understand. As part of my analysis I am always evaluating model fit, posterior predictive checks, etc. In what cases are we estimating a causal effect and not interested in what could happen later?

I am wondering if you have any insight into this. In fact, it seems like economists have a much different view of statistics than other fields I am more familiar with.

The above link is to a preprint, by Robin Gomila, “Logistic or linear? Estimating causal effects of treatments on binary outcomes using regression analysis,” which begins:

When the outcome is binary, psychologists often use nonlinear modeling strategies suchas logit or probit. These strategies are often neither optimal nor justified when the objective is to estimate causal effects of experimental treatments. . . . I [Gomila] draw on econometric theory and established statistical findings to demonstrate that linear regression is generally the best strategy to estimate causal effects of treatments on binary outcomes. . . . I recommend that psychologists use linear regression to estimate treatment effects on binary outcomes.

I don’t agree with this recommendation, but I can see where it’s coming from. So for researchers who are themselves uncomfortable with logistic regression, or who work with colleagues who get confused by the logistic transformation, I could modify the above advice, as follows:

1. Forget about the data being binary. Just run a linear regression and interpret the coefficients directly.

2. Also fit a logistic regression, if for no other reason than many reviewers will demand it!

3. From the logistic regression, compute average predictive comparisons. We discuss the full theory here, but there are also simpler versions available automatically in Stata and other regression packages.

4. Check that the estimates and standard errors from the linear regression in step 1 are similar to the average predictive comparisons and corresponding standard errors in step 3. If they differ appreciably, then take a look at your data more carefully—OK, you already should’ve taken a look at your data!—because your results might well be sensitive to various reasonable modeling choices.

Don’t get me wrong—when working with binary data, there are reasons for preferring logistic regression to linear. Logistic should give more accurate estimates and make better use of the data, especially when data are sparse. But in many cases, it won’t make much of a difference.

To put it another way: in my work, I’ll typically just do steps 3 and 4 above. But, arguably, if you’re only willing to do one step, then step 1 could be preferable to step 2, because the coefficients in step 1 are more directly interpretable.

Another advantage of linear regression, compared to logistic, is that linear regression doesn’t require binary data. Believe it or not, I’ve seen people discretize perfectly good data, throwing away tons of information, just because that’s what they needed to do to run a chi-squared test or logistic regression.

So, from that standpoint, the net effect of logistic regression on the world might well be negative, in that there’s a “moral hazard” by which the very existence of logistic regression encourages people to turn their outcomes into binary variables. I have the impression this happens all the time in biomedical research.

A few other things

I’ll use this opportunity to remind you of a few related things. My focus here is not on the particular paper linked above but rather on some of these general questions on regression modeling.

First, if the goal of regression is estimating an average treatment effect, and the data are well behaved, then linear regression might well behave just fine, if a bit inefficiently. The time when it’s important to get the distribution right is when you’re making individual predictions. Again, even if you only care about averages, I’d still generally recommend logistic rather than linear for binary data, but it might not be such a big deal.

Second, any of these methods can be a disaster if the model is far off. Both linear and logistic regression assume a monotonic relation between E(y) and x. If E(y) is a U-shaped function of x, then linear and logistic could both fail (unless you include x^2 as a predictor or something like that, and then this could introduce new problems at the extremes of the data). In addition, logistic assumes the probabilities are 0 and 1 at the extremes, and if the probabilities asymptote out at intermediate values, you’ll want to include that in your model too, which is no problem in Stan but can be more difficult with default procedures and canned routines.

Third, don’t forget the assumptions of linear regression, ranked in decreasing order of importance. The assumptions that first come to mind, are for many purposes the least important assumptions of the model. (See here for more on assumptions.)

Finally, the causal inference thing mentioned in the linked paper is a complete red herring. Regression models make predictions, regression coefficients correspond to average predictions over the data, and you can use poststratification or other tools to use regression models to make predictions for other populations. Causal inference using regression is a particular sort of prediction having to do with potential outcomes. There’s no reason that linear modeling is better or worse for causal inference than for other applications.

Exciting postdoc opening in spatial statistics at Michigan: Coccidioides is coming, and only you can stop it!

Jon Zelner is an collaborator who does great work on epidemiology using Bayesian methods, Stan, Mister P, etc.

He’s hiring a postdoc, and it looks like a great opportunity:

Epidemiological, ecological and environmental approaches to understand and predict Coccidioides emergence in California.

One postdoctoral fellow is sought in the research group of Dr. Jon Zelner (Dept. of Epidemiology, University of Michigan School of Public Health) who is leading a cluster epidemiologic modeling studies as part of a new NIH-funded study (R01AI148336; 2020-2025) examining the emergence of coccidioidomycosis (cocci) in Southwestern states, which are currently experiencing among the highest incidence rates ever recorded. This project will be completed in collaboration with Dr. Justin Remais (Environmental Health Sciences, UC Berkeley). The research will fill critical gaps in our understanding of the
environmental transmission of cocci, including the dust exposures that pose the highest risk of infection; the role of zoonotic hosts in sustaining Coccidioides spp. survival and growth in soil; how land use, drought, and wind influence pathogen dynamics in soil and air; and how variation in dust and pathogen exposure interact with sociodemographic risk factors.

The postdoctoral fellow will have the opportunity to contribute to a cluster of studies that integrate machine learning/predictive modeling of the spatiotemporal distribution of pathogens in soil and air; whole genome sequencing to discover how pathogens adapt to changes in their physical environment; metagenomic analysis to determine taxonomic and phylogenetic relationships of known and newly emerged Coccidioides strains; precise methods of dust and exposure estimation; and spatiotemporal analysis of population-level data to elucidate fundamental aspects of cocci epidemiology. These activities will support two core epidemiologic studies in California: a retrospective cohort study of over 65,000 cocci cases (2000-2018) to determine key environmental and demographic drivers of cocci transmission foci at a high spatial resolution; and a case-crossover study with prospective surveillance for incident cases to estimate the pathogen exposure-response relationship within key high-risk subgroups. A major goal is to inform the public health response to the current epidemic through the design of improved surveillance and environmental intervention.

The postdoc would join an exceptional team at Michigan, along with collaborators at Berkeley working on a collection of related studies examining the consequences of environmental change for the dynamics of infectious diseases. Ideal applicants would have a PhD and a demonstrated record of scientific achievement in infectious disease epidemiology, statistics, geospatial analysis, disease dynamics, or similarly quantitative fields in biology and medicine. Applicants should be proficient at statistical and scientific programming, for transmission modeling and data analysis (e.g., R, Python, Julia, or similar). Experience with Bayesian hierarchical modeling, spatio-temporal modeling, modeling dynamical systems, and high-performance computing [emphasis added] is especially valuable.

Candidates with backgrounds in statistics [emphasis added], applied mathematics, computer science, engineering, the quantitative environmental sciences, and physics are also encouraged to apply. A track record of research excellence and strong quantitative skills is essential, as well as a strong interest in the area of applied infectious disease epidemiology.

Interested applicants should submit a curriculum vitae, a 1-2 page letter that describes the professional qualifications for the above-described activities, and contact information for three references to Jon Zelner (

In case you’re wondering what Coccidioides is, I looked it up for you on Wikipedia. Coccidioides is “a genus of dimorphic ascomycetes in the family Onygenaceae.” So there you have it.

Ann Arbor is the site of a world-class university with the world’s best psychology department, a world-class institute for social research, excellent departments of statistics and biostatistics, and the best college football team west of Ohio, north of Indiana, and east of Illinois and Wisconsin.

No, I don’t think that this study offers good evidence that installing air filters in classrooms has surprisingly large educational benefits.

In a news article on Vox, entitled “Installing air filters in classrooms has surprisingly large educational benefits,” Matthew Yglesias writes:

An emergency situation that turned out to be mostly a false alarm led a lot of schools in Los Angeles to install air filters, and something strange happened: Test scores went up. By a lot. And the gains were sustained in the subsequent year rather than fading away.

That’s what NYU’s Michael Gilraine finds in a new working paper titled “Air Filters, Pollution, and Student Achievement” that looks at the surprising consequences of the Aliso Canyon gas leak in 2015. . . .

If Gilraine’s result holds up to further scrutiny, he will have identified what’s probably the single most cost-effective education policy intervention — one that should have particularly large benefits for low-income children. . . .

He finds that math scores went up by 0.20 standard deviations and English scores by 0.18 standard deviations, and the results hold up even when you control for “detailed student demographics, including residential ZIP Code fixed effects that help control for a student’s exposure to pollution at home.”

I clicked through the link, and I don’t believe it. Not the thing about air filters causing large improvements in test scores—I mean, sure, I’m skeptical of claims of large effects, but really I have no idea about what’s going on with air pollution and the brain—no, what I don’t believe is that the study in question provides the claimed evidence.

Here’s the key graph from the paper:

The whole thing is driven by one data point and a linear trend which makes no theoretical sense in the context of the paper (from the abstract: “Air testing conducted inside schools during the leak (but before air filters were installed) showed no presence of natural gas pollutants, implying that the effectiveness of air filters came from removing common air pollutants”) but does serve to create a background trend to allow a big discontinuity with some statistical significance.

We’ve been down this road before. When we discussed it earlier (see also here) it was in the context of high-degree polynomials, but really it’s a more general problem with analysis of observational data.

Given what we know about statistics, how should we think about this problem?

It goes like this: The installation of air filters can be considered as a natural experiment, so the first step is to compare outcomes in schools in the area with and without air filters: in statistics terms, a regression with one data point per school, with the outcome being average post-test score and the predictors being average pre-test score and an indicator for air filters. Make a scatterplot of post-test vs. pre-test with one point per school, displaying treated schools as open circles and control schools as dots. Make a separate estimate and graph for each grade level if you’d like, but I’m guessing that averages will give you all the information you need. Similarly, you can do a multilevel model using data from individual students—why not, it’s easy enough to do—but I don’t think it will really get you much of anything beyond the analysis of school-level averages. If you know the local geography, that’s great, and to gain insight you could make plots of pre-test scores, post-test scores, and regression residuals on a map, using color intensities. I don’t know that this will reveal much either, but who knows. I’d also include the schools in the neighborhood that were not part of the agreement (see caption of figure 1 in the linked paper).

I’m not saying that the proposed analysis plan is perfect or anything like it; it’s a starting point. If you take the above graphs and ignore the distracting lines on them, you won’t see much—it all pretty much looks like noise, which is no surprise given the variability of test scores and the variability among schools—but it’s fine to take a look. Observational data analysis is hard, especially when all you have is a small sample, in a small area, at just one time. That’s just the way it is.

What about that regression discontinuity analysis? As noted above, it doesn’t make a lot of sense given the context of the problem, and it’s a bit of a distraction from the challenges of data analysis from an observational study. Just for example, Table 3 reports that they adjust for “a cubic of lagged math and English scores interacted with grade dummies” as well as latitude and longitude?? Lots of researcher degrees of freedom here. Yes, there are robustness claims, but the result are still being driven by a data-based adjustment which creates that opportunity for the discontinuity.

Again, the point is not that the paper’s substantive conclusion—of the positive effects of air filters on cognitive performance—is wrong, but rather that the analysis presented doesn’t provide any real evidence for that claim. What we see is basically no difference, that becomes a large and possibly statistically significant difference after lots of different somewhat arbitrary knobs are twisted in the analysis. Also, and this is a related point, we should not by default believe this result just because it’s been written up and put on the internet, any more than we should by default believe something published in PNAS. The fact that the analysis includes an identification strategy (in this case, regression discontinuity) does not make it correct, and we have to watch out for the sort of behavior by which such claims are accepted by default.

What went wrong? What could the reporter have done better?

I don’t want to pick on the author of the above paper, who’s studying an important problem with a unique dataset using generally accepted methods. And it’s just a preprint. It’s good practice for people to release preliminary findings in order to get feedback. Indeed, that’s what’s happening here. That’s the way to go: be open about your design and analysis, share your results, and engage the hivemind. It was good luck to get the publicity right away so now there’s the opportunity to start over on the analysis and accept that the conclusions probably won’t be so clear, once all is said and done.

If there’s a problem, it’s with the general attitude in much of economics, in which it is assumed that identification strategy + statistical significance = discovery. That’s a mistake, and it’s something we have to keep talking about. But, again, it’s not about this particular researcher. Indeed, now the data are out there—I assume that at least the average test scores for each school and grade can be released publicly?—other people can do their own analyses. Somebody had to get the ball rolling.

There was a problem with the news article, in that the claims from the research paper were reported completely uncritically, with no other perspectives. Yes, there were qualifiers (“If Gilraine’s result holds up to further scrutiny . . . it’s too hasty to draw sweeping conclusions on the basis of one study”) but the entire article was written from the perspective that the claims from this study were well supported by the data analysis, which turns out not to have been the case.

What could the reporter have done? Here’s what I recommended a couple years ago:

When you see a report of an interesting study, he contact the authors and push them with hard questions: not just “Can you elaborate on the importance of this result?” but also “How might this result be criticized?”, “What’s the shakiest thing you’re claiming?”, “Who are the people who won’t be convinced by this paper?”, etc. Ask these questions in a polite way, not in any attempt to shoot the study down—your job, after all, is to promote this sort of work—but rather in the spirit of fuller understanding of the study.

I think the biggest problem with that news article was its misleading title, “Installing air filters in classrooms has surprisingly large educational benefits.”

Looking forward

To his credit Yglesias follows up with another caveat and comes up with a reasonable conclusion:

And while it’s too hasty to draw sweeping conclusions on the basis of one study, it would be incredibly cheap to have a few cities experiment with installing air filters in some of their schools to get more data and draw clearer conclusions about exactly how much of a difference this makes.

Perhaps school systems are already experimenting with pollution-control devices; I don’t know. They’re installing heating, ventilation, and air conditioning units all over the place, so I’m guessing this is part of it.

P.S. But let me be clear.

I don’t think the correct summary of the above study is: “A large effect was found. But this was a small study, it’s preliminary data, so let’s gather more information.” Rather, I think a better summary is: “The data showed no effect. A particular statistical analysis of these data seemed to show a large effect, but that was a mistake. Perhaps it’s still worth studying the problem because of other things we know about air pollution, in which case this particular study is irrelevant to the discussion.”

Or, even better:

New study finds no evidence of educational benefits from installing air filters in classrooms

A new study was performed of a set of Los Angeles schools and found no effects on test scores, comparing schools with and without newly-installed air filters.

However, this was a small study, and even though it found null effects, it could still be worth exploring the idea of installing air filters in classrooms, given all that we believe about the bad effects of air pollution.

We should not let this particular null study deter us from continuing to explore this possibility.

The point is that these data, analyzed appropriately, do not show any clear effect. So if it’s a good idea to keep on with this, it’s in spite of, not because of these results.

And that’s fine. Just cos a small study found a null result, that’s no reason to stop studying a problem. It’s just that, if you’re gonna use that small study as the basis of your news article, you should make it clear that the result was, in fact, null, which is no surprise given high variability etc.

The Generalizer

I just saw Beth Tipton speak at the Institute of Education Sciences meeting on The Generalizer, a tool that she and her colleagues developed for designing education studies with the goal of getting inferences for the population. It’s basically MRP, but what is innovative here is the application of these ideas at the design stage.

The online app looks pretty cool. Here’s the demo version. It’s set up specifically for experiments in schools, so it won’t be relevant for most of you, but the general principle should also apply to medical experiments, economic interventions, A/B tests, etc.

This seems like the right way to go. There is no substitute for careful measurement and design—and ultimately measurements and designs should be evaluated based on the goals of analysis.

How to “cut” using Stan, if you must

Frederic Bois writes:

We had talked at some point about cutting inference in Stan (that is, for example, calibrating PK parameters in a PK/PD [pharmacokinetic/pharmacodynamic] model with PK data, then calibrating the PD parameters, with fixed, non updated, distributions for the PK parameters). Has that been implemented?

(PK is pharmacokinetic and PD is pharmacodynamic.)

I replied:

This topic has come up before, and I don’t think this “cut” is a good idea. If you want to implement it, I have some ideas of how to do it—basically, you’d first fit model 1 and get posterior simulations, then approx those simulations by a mixture of multivariate normal or t distributions, then use that as a prior for model 2. But I think a better idea is to fit both models at once while allowing the PK parameters to vary, following the general principles of partial pooling as in this paper with Sebastian Weber.

It would be fun to discuss this in the context of a particular example.

I also cc-ed Aki, who wrote:

Instead of parametric distribution a non-parametric would be possible with the approach presented here by Diego Mesquita, Paul Blomstedt, and Samuel Kaski. This could be also used so that PK and PD are fitted separately, but combined afterwards (this is a bit like 1-step EP with a non-parametric distribution).

So, lots of possibilities here.

This graduate student wants to learn statistics to be a better policy analyst

Someone writes:

I’m getting a doctoral degree in social science. I previously worked for a data analytics and research organization where I supported policy analysis and strategic planning. I have skills in post-data visualization analysis but am not able to go into an organization, take raw data, and turn it into something usable. I’m planning to use my elective credits to focus on statistical analysis so that I can do just that.

I heard about the work you’re doing after listening to your EconTalk episode and want to learn more about issues using quantitative research in social sciences (and try to connect it as much as possible to the field of education). I have the option to create an independent study but there isn’t anyone at my institution is familiar with this work and able to construct a plan for it. I would love some advice as to how you think I might construct an independent study focused on these concepts (as well as thoughts on the background knowledge and skills in stats I would need to be able to understand the material). Any suggestions you can send my way would be much appreciated.

My reply: I’m not sure, but as a start you might try working with my forthcoming book, Regression and Other Stories (coauthors Jennifer Hill and Aki Vehtari) and my edited book from a few years ago, A Quantitative Tour of the Social Sciences (with Jeronimo Cortina).

Maybe the commenters have additional suggestions?

Royal Society spam & more

Just a rant about spam (and more spam) from pay-to-publish and closed-access journals. Nothing much to see here.

The latest offender is from something called the “Royal Society.” I don’t even know to which king or queen this particular society owes allegiance, because they have a .org URL. Exercising their royal prerogative, they created an account for me without my permission.

I clicked through to their web site to see if the journal was both free and open, but it turns out to be a pay-to-publish journal. The sovereign in question must have rather shallow pockets, as they’re asking authors to pony up a $1260 (plus VAT!) “article processing charge”.

Whenever I get review requests for pay-to-publish or closed-access journals, I just decline. Not so easy here as I can’t reply to their email. Instead, they indicate they’re going to keep spamming me and I have to actively opt out. I think I’m going to chnage my policy to never review for a journal that signs me up for their reviewing site without my permission.

Now onto the details. Here’s the text of the mail they sent (my bold face).

Dear Dr Carpenter,

An account has been created for you on the Royal Society Open Science ScholarOne Manuscripts site for online manuscript submission and review. You are likely to be invited to contribute to the journal as a reviewer or author in the near future. For information about the journal please visit

Your account login details are as follows:

PASSWORD: To set your password please click the link below. Clicking the link will take you directly to the option for setting your permanent password.

Please log in to to:
– Choose a memorable user ID and password for easy login
– Associate your ORCID ID with your account
– Add your affiliation details
Tell us when you are unavailable, or delete your account
– Opt out of emails relating to the Royal Society and/or its journal content

When you log in for the first time, you will be asked to complete your profile.

The Royal Society is committed to your privacy and to the protection of your personal information. View our privacy policy at

Please contact us if you have any questions about why an account has been created.


Royal Society Open Science Editorial Office

Keep up-to-date:
Content and keyword alerts:
Log in to Remove This Account –

To avoid more spam, I thought I’d log in to remove the account, but wait, they have quirky weak password requirements.

Create New Password
Password Requirements:

Cannot be a recently used password
Cannot be the same as your username
Minimum of 8 characters
Minimum of 2 numbers
Minimum of 1 letter (Upper or lower case)
More Information May be Required

If the site requires more information, you will be taken to your profile next.

* = Required Fields

Of course it required more information.

Your Profile Needs to be Updated
The following profile item(s) need to be updated before you can access the site:

Salutation is a required field
Country / Region is a required field

Thankfully, a country and salutation were pre-entered by whoever decided to put me into this system. After this, I had to click through several more screens of profile creation for reviewers until I found a “delete your account” section. Done. Finally.

Done, not done

No sooner had I shaken off a pay-to-publish journal when I get hit with the same thing from a closed-access IEEE journal (the name “ScholarOne” is repeated acorss spams, so maybe that’s the name of the spam tool).

Dear Dr. Carpenter:

Welcome to the Transactions on Computer-Aided Design of Integrated Circuits and Systems – ScholarOne Manuscripts site for online manuscript submission and review. Your name has been added to our reviewer database in the hopes that you will be available to review manuscripts for the Journal which fall within your area of expertise.

Your USER ID for your account is as follows:

PASSWORD: To set your password please click the link below. Clicking the link will take you directly to the option for setting your permanent password.

When you log in for the first time, you will be asked to complete your full postal address, telephone, and fax number. You will also be asked to select a number of keywords describing your particular area(s) of expertise.

Thank you for your participation.

Of book reviews and selection bias

Publishers send me books to review. I guess I’m on the list of potential reviewers, which is cool because I often enjoy reading books. And, even if I don’t get much out of a book myself, I can give it to students. A book is a great incentive/reward for class participation. For any book, if you have a roomfull of students, there will just about always be somebody who’s interested in the topic.

Anyway, most of the time I don’t write or publish any review, either because I don’t have anything to say, or I don’t find the book interesting, or because I think I’m too far from the intended audience, or sometimes just because I don’t get around to reviewing it—this happens even for books I really like.

The other day I received a book in the mail on a topic that does not fascinate me, but I took a look anyway. The book was boring—even given his topic. The book had some historical content which seemed unbalanced (too much focus on recent events and a shallow, episodic treatment of what came before), also it had what seemed to me to be a smug tone—not that the author seemed like a bad person, exactly, just a bit too complacent. I just didn’t like the book, enough so that I didn’t even give it to a student, I just set it outside somewhere for some stranger to read.

I forgot about this book entirely, and then I happened to be looking at a blog post by someone else who, like me, receives a lot of books from publishers, reads a lot of books, and reviews a lot of books. He reviews more than I do, actually, often quick one-paragraph blurbs. And I noticed that he mentioned this book! I was surprised: did this blogger actually like that book? Maybe not: it’s not actually clear, as his one-paragraph review was descriptive but not actually complimentary.

Anyway, this brings us to selection bias. Most of my book reviews are positive. Why? For one thing, when I don’t like a book, often it’s on a topic I don’t care about and don’t know much about, so I’m not so interested in writing a review and I don’t feel so competent to write one either. Also, when I get a book for free, I guess I feel like I owe something to the publisher.

I do sometimes write negative reviews of books I receive for free, but there’s some selection bias here.

The funny thing is, I often write negative reviews of journal articles that people send to me. Why is that different? For one thing, books are usually sent to me by their authors or publishers, whereas I’ll often hear about a journal articles from a third party who already doesn’t like the article. Also an article is typically making a smaller, more specific point, so it makes more sense to criticize it on specific grounds.

Votes vs. $

Open forensic science, and some general comments on the problems of legalistic thinking when discussing open science

Jason Chin, Gianni Ribeiro, and Alicia Rairden write:

The mainstream sciences are experiencing a revolution of methodology. This revolution was inspired, in part, by the realization that a surprising number of findings in the bioscientific literature could not be replicated or reproduced by independent laboratories. In response, scientific norms and practices are rapidly moving towards openness. . . .

In this article, the authors suggest that open science reforms are distinctively suited to addressing the problems faced by forensic science. Openness comports with legal and criminal justice values, helping ensure expert forensic evidence is more reliable and susceptible to rational evaluation by the trier of fact. In short, open forensic science allows parties in legal proceedings to understand and assess the strength of the case against them, resulting in fairer outcomes. Moreover, several emerging open science initiatives allow for speedier and more collaborative research.

This all sounds good, but I imagine there will be a lot of resistance. The adversary nature of the legal system seems so much different from the collaborative spirit of open science. Indeed, one of the many problems I’ve seen in trying to promote open science is that it sometimes seems to have a kind of legalistic frameworks, focusing for example on arguments about p-values rather than on the substance of what is being studied. For example, when those ovulation-and-clothing researchers criticized my criticisms of their work, they didn’t address questions of measurement and effect size which, in my opinion, were central to understanding what went wrong with their study. One problem with a legalistic attitude toward science is that it can encourage the attitude that, if you just follow the protocols, you’ll do good science. Actually, though, honesty and transparency are not enough.

Related: The purported CSI effect and the retroactive precision fallacy.

Smoothness, or lack thereof, in MRP estimates over time

Matthew Loop writes:

I’m taking my first crack at MRP. We are estimating the probability of an event over 30 years, adjusting for sampling stratum using a multilevel model with varying intercepts for stratum.

When we fit the model, the marginal predicted probability vs. year is a smooth function, since the mean of the varying intercepts is 0. So far so good.

However, when we get the condition predictions in order to do post-stratification, the poststratified predicted probability vs. year is a “jagged” function. The math of this makes sense, as the mean of the random effects is now no longer 0, given you do a weighted average.

My questions are:

1. Is there a way to force the function to be smooth, like in the usual multilevel model with no poststratification?

2. If the answer to #1 is “no”, then how do we interpret the predicted probabilities? It seems like they no longer have the simple interpretation of marginal probabilities.

My reply: First, let me just say that you have an excellent name for someone who wants to use iterative algorithms!

More seriously, I don’t think you should expect smoothness from your inferences, unless smoothness is specified in the data. If you have a multilevel time series model with an intercept that varies by year, then you can get big year-to-year jumps, unless you control these with a spline or random walk or autoregression or some other smoothness-inducing prior distribution.

Your other question involves smooth parameter estimates that become jumpy when piped through the poststratification step—but you’ll only see that happening if the N’s are jumpy over time. If the N_j’s are smooth over time and the thetas_j are smooth over time, then sum_j(N_j*theta_j) / sum_j(N_j) will be smooth too.

Why I Rant

Someone pointed me to an over-the-top social science paper that is scheduled to be published soon. I then wasted 2 hours writing some combination of statistical commentary and rant.

I expect that, once the paper is published, there will be major controversy, as its empirical findings, such as they are, are yoked to political opinions which seem pretty much targeted to offend lots of people in its academic field.

Fortunately, by the time my post appears, the furor should have quieted. That’s one of the advantages of the blog delay: I can write a comment before something becomes a big deal, and then it appears after the controversy has blown over.

And, yeah, sure, I know, I know, that was not a good use of 2 hours—maybe more accurate to say this was a half hour of blog writing, interspersed with 1.5 hours of other things. But it wasn’t even a good use of a half hour . . . nor was writing the above paragraphs a good use of 10 minutes . . . nor was writing this paragraph a good use of, umm, 2 minutes . . . ok, we’re getting real Zeno here.

Whatever. We gotta do what we gotta do. In all seriousness, I do think these rants have value, not merely in allowing me to vent by typing rather than shouting at the TV or whatever, and not merely in letting those of you who agree with me know that others share your feelings, and not merely in the highly unlikely event that they convince any open-minded people to my position (I’m pretty sure a rant isn’t the best way to go about that, if your goal is persuasion).

No, the value to me of such rants is that they allows me to explore my thoughts. Writing is more rigorous than daydreaming, writing in public in complete sentences is more rigorous than writing notes to myself, and writing in a forum that allows comments from people who might disagree with me (along with people who agree with me but can help me refine my ideas) . . . that’s the best. And it’s not just about me. I write my blog posts, but these discussions express some sort of zeitgeist. I don’t claim ownership of these ideas or even these slogans. As scholars, we act as scribes for the ideas that are out there. I’d like to think George Orwell felt the same way, at a much higher level.

P.S. The above is not to be taken to imply that my ranting, or even my blogging, has net positive value. First off, it could have negative value by pissing people off, discrediting my statistical work, crowding out more subtle commentary by others, etc etc. Second, there’s the opportunity cost: instead of 5 rants I could be publishing one article for Slate; instead of 50 rants I could be publishing a research paper; instead of 500 rants I could be publishing a book, etc. Who’s to know? The above post is titled Why I Rant, not Why My Ranting is a Good Thing.

On deck for the first half of 2020

Here goes:

  • Smoothness, or lack thereof, in MRP estimates over time
  • Open forensic science, and some general comments on the problems of legalistic thinking when discussing open science
  • Votes vs. $

  • Of book reviews and selection bias
  • This graduate student wants to learn statistics to be a better policy analyst
  • How to “cut” using Stan, if you must

  • Two good news articles on trends in baseball analytics
  • Linear or logistic regression with binary outcomes
  • Of Manhattan Projects and Moonshots

  • Four projects in the intellectual history of quantitative social science
  • Making fun of Ted talks
  • Steven Pinker on torture

  • Will decentralised collaboration increase the robustness of scientific findings in biomedical research? Some data and some causal questions.
  • Against overly restrictive definitions: No, I don’t think it helps to describe Bayes as “the analysis of subjective
 beliefs” (nor, for that matter, does it help to characterize the statements of Krugman or Mankiw as not being “economics”)
  • Is it accurate to say, “Politicians Don’t Actually Care What Voters Want”?

  • In Bayesian inference, do people cheat by rigging the prior?
  • Graphs of school shootings in the U.S.
  • Some Westlake quotes

  • Pocket Kings by Ted Heller
  • Top 5 literary descriptions of poker
  • What are the famous dogs? What are the famous animals?

  • Are the tabloids better than we give them credit for?
  • The latest Perry Preschool analysis: Noisy data + noisy methods + flexible summarizing = Big claims
  • The intellectual explosion that didn’t happen

  • Deterministic thinking meets the fallacy of the one-sided bet
  • My review of Ian Stewart’s review of my review of his book
  • Are GWAS studies of IQ/educational attainment problematic?

  • David Leavitt and Meg Wolitzer
  • They added a hierarchical structure to their model and their parameter estimate changed a lot: How to think about this?
  • Don’t talk about hypotheses as being “either confirmed, partially confirmed, or rejected”

  • Forget about multiple testing corrections. Actually, forget about hypothesis testing entirely.
  • The fallacy of the excluded rationality
  • Is there any scientific evidence that humans don’t like uncertainty?

  • The importance of measurement in psychology
  • The importance of descriptive social science and its relation to causal inference and substantive theories
  • Advice for a Young Economist at Heart

  • The hot hand fallacy fallacy rears its ugly ugly head
  • A Collection of Word Oddities and Trivia
  • Evidence-based medicine eats itself

  • “It just happens to be in the nature of knowledge that it cannot be conserved if it does not grow.”
  • Intended consequences are the worst
  • The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time

  • “Repeating the experiment” as general advice on data collection
  • “Sometimes research just has to start somewhere, and subject itself to criticism and potential improvement.”
  • Researcher offers ridiculous reasons for refusing to reassess work in light of serious criticism

  • How many patients do doctors kill by accident?
  • This study could be just fine, or not. Maybe I’ll believe it if there’s an independent preregistered replication.
  • An article in a statistics or medical journal, “Using Simulations to Convince People of the Importance of Random Variation When Interpreting Statistics.”

  • Making differential equation models in Stan more computationally efficient via some analytic integration
  • “MIT Built a Theranos for Plants”
  • David Spiegelhalter wants a checklist for quality control of statistical models?

  • Do we trust this regression?
  • Question on multilevel modeling reminds me that we need a good modeling workflow (building up your model by including varying intercepts, slopes, etc.) and a good computing workflow
  • Deep Learning workflow

  • We should all routinely criticize our own work.
  • Hilarious reply-all loop
  • Birthdays!

  • Different challenges in replication in biomedical vs. social sciences
  • The Paterno Defence: Gladwell’s Tipping Point?
  • Conditioning on a statistical method as a “meta” version of conditioning on a statistical model

  • Theorizing, thought experiments, fake-data simulation
  • As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.”
  • “What is the conclusion of a clinical trial where p=0.6?”

  • You don’t want a criminal journal… you want a criminal journal
  • Junk Science Then and Now
  • They want “statistical proof”—whatever that is!

  • “Non-disclosure is not just an unfortunate, but unfixable, accident. A methodology can be disclosed at any time.”
  • Woof! for descriptive statistics
  • Why we kept the trig in golf: Mathematical simplicity is not always the same as conceptual simplicity

  • No, you’re not (necessarily) obliged to share your data and methods when someone asks for them. Better to post it all publicly.
  • Computer-generated writing that looks real; real writing that looks computer-generated
  • 100 Things to Know, from Lane Kenworthy

  • The Road Back
  • “Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples”
  • Breaking the feedback loop: When people don’t correct their errors

  • We want certainty even when it’s not appropriate
  • The New Yorker fiction podcast: how it’s great and how it could be improved
  • His data came out in the opposite direction of his hypothesis. How to report this in the publication?

  • My best thoughts on priors
  • He’s annoyed that PNAS desk-rejected his article.
  • “As a girl, she’d been very gullible, but she had always learned more that way.”

  • (1) The misplaced burden of proof, and (2) selection bias: Two reasons for the persistence of hype in tech and science reporting
  • “Pictures represent facts, stories represent acts, and models represent concepts.”
  • This awesome Pubpeer thread is about 80 times better than the original paper

  • The value (or lack of value) of preregistration in the absence of scientific theory
  • Let’s do preregistered replication studies of the cognitive effects of air pollution—not because we think existing studies are bad, but because we think the topic is important and we want to understand it better.
  • I’m still struggling to understand hypothesis testing . . . leading to a more general discussion of the role of assumptions in statistics

  • A question of experimental design (more precisely, design of data collection)
  • Structural equation modeling and Stan
  • “Everybody wants to be Jared Diamond”

  • What is the relevance of “bad science” to our understanding of “good science”?
  • Noise-mining as standard practice in social science
  • Toward understanding statistical workflow

  • Career advice for a future statistician
  • “Men Appear Twice as Often as Women in News Photos on Facebook”
  • “The Generalizability Crisis” in the human sciences

  • Should we judge pundits based on their demonstrated willingness to learn from their mistakes?
  • Upholding the patriarchy, one blog post at a time
  • Effects of short-term exposure to null hypothesis significance testing on cognitive performance

  • Stanford prison experiment
  • Given that 30% of Americans believe in astrology, it’s no surprise that some nontrivial percentage of influential American psychology professors are going to have the sort of attitude toward scientific theory and evidence that would lead them to have strong belief in weak theories supported by no good evidence.
  • The checklist manifesto and beyond

  • She’s wary of the consensus based transparency checklist, and here’s a paragraph we should’ve added to that zillion-authored paper
  • Yes, there is such a thing as Eurocentric science
  • Body language and machine learning

  • Marc Hauser: Victim of statistics?
  • Megan Higgs (statistician) and Anna Dreber (economist) on how to judge the success of a replication
  • Statistical fallacies as they arise in political science (from Bob Jervis)

  • MRP with R and Stan; MRP with Python and Tensorflow
  • We have really everything in common with machine learning nowadays, except, of course, language.
  • An odds ratio of 30, which they (sensibly) don’t believe

  • Lakatos was a Stalinist
  • Include all design information as predictors in your regression model, then postratify if necessary. No need to include survey weights: the information that goes into the weights will be used in any poststratification that is done.
  • Frustrating science reporting: I get quoted but misunderstood

  • The return of red state blue state
  • Public health researchers: “Death by despair” is a thing, but not the biggest thing
  • Ahhhh, Cornell!

  • Why is this graph actually ok? It’s the journey, not just the destination.
  • “Then the flaming sheet, with the whirr of a liberated phoenix, would fly up the chimney to join the stars.”
  • Hey, you. Yeah, you! Stop what you’re doing RIGHT NOW and read this Stigler article on the history of robust statistics

  • How scientists perceive advancement of knowledge from conflicting review reports
  • The 5-sigma rule in physics
  • No, I don’t like talk of false positive false negative etc but it can still be useful to warn people about systematic biases in meta-analysis

  • Here’s why rot13 text looks so cool.
  • The problem with p-hacking is not the “hacking,” it’s the “p”
  • We need better default plots for regression.

  • Stop-and-frisk data
  • Standard deviation, standard error, whatever!
  • Uncertainty and variation as distinct concepts

  • Claim of police shootings causing low birth weights in the neighborhood
  • The accidental experiment that saved 700 lives
  • The rise and fall and rise of randomized controlled trials (RCTs) in international development

  • You don’t need a retina specialist to know which way the wind blows
  • How much granularity do you need in your Mister P?
  • Are informative priors “[in]compatible with standards of research integrity”? Click to find out!!

  • Authors repeat same error in 2019 that they acknowledged and admitted was wrong in 2015
  • No, average statistical power is not as high as you think: Tracing a statistical error as it spreads through the literature
  • Best comics of 2010-2019?

  • Today in spam
  • Some thoughts on another failed replication in psychology
  • Estimating the college wealth premium: Not so easy

  • Create your own community (if you need to)
  • “Banishing ‘Black/White Thinking’: A Trio of Teaching Tricks”
  • Be careful when estimating years of life lost: quick-and-dirty estimates of attributable risk are, well, quick and dirty.

  • A very short statistical consulting story
  • Basbøll’s Audenesque paragraph on science writing, followed by a resurrection of a 10-year-old debate on Gladwell
  • Blast from the past

  • “The Moral Economy of Science”
  • “Note sure what the lesson for data analysis quality control is here is here, but interesting to wonder about how that mistake was not caught pre-publication.”
  • Three unblinded mice


P.S. I listed the posts in groups of 3 just for easier readability. There’s no connection between the three posts in each batch.

Progress in the past decade

It’s been a busy decade for our research.

Before going on, I’d like to thank hundreds of collaborators, including students; funders from government, nonprofits, and private industry; blog commenters and people who have pointed us to inspiring research, outrages, beautiful and ugly graphs, cat pictures, and all the rest; all those of you who have shared your disagreements; pointing out my errors and my failures in communication; and, most of all, family and friends for your love and support.

Bayes and Stan

Our biggest contribution was Stan, which represents a major research effort in itself (thanks, Bob, Matt, Daniel, and so many others!), has motivated lots of research in Bayesian inference and computation, has facilitated tons of applied work by ourselves and others, and has inspired other probabilistic programming languages targeted to particular classes of models and applications.

Relatedly, during the past decade we completed the third edition of BDA (thanks, Aki!) and Regression and Other Stories (thanks, Jennifer and Aki!). Here’s a list of our published papers on Bayesian methods and computation in the past decade, in reverse chronological order:

That method developed in that paper with Pasarica did not directly come to much. But then a few years later the idea of maximizing expected squared jumped distance helped motivate Matt Hoffman to develop the very useful Nuts algorithm. This demonstrates the potential benefit of pushing through our research ideas, even when they don’t lead to anything right away.

Voting, public opinion, and sample surveys

The motivation for all the above work on Bayesian methods and computing was to make progress in applied problems. Here’s our recent published work in voting, public opinion, and sample surveys:

Wow! I’d forgotten about a lot of that.

Other applied work

And here’s our recent published work in other applied areas:

I included the zombies paper in the above list, but I really could’ve counted it as survey methods.

Open science and ethics

Recently we’ve been thinking a lot about open science and ethics:

There are a ton of papers in that list, in part in response to recent concerns about scientific replication, and in part because at the beginning of the decade I had the idea of running a regular column on ethics and statistics for Chance magazine, with the idea of putting the columns into a book. I doubt I’ll write a book specifically on ethics and statistics—I just don’t think there would be that much of an audience for it—but I’ve learned a lot from thinking about these issues.

My favorite of my articles on open science is What has happened down here is the winds have changed, from 2016; it’s not on the above list because I forgot to ever send it to a magazine or journal to be officially published, so it exists only as a blog entry.

Understanding the statistical properties of statistical methods as they are used

Related to work in open science is our research into the statistical properties of the statistical methods that people actually use. Theoretical statistics is the theory of applied statistics, so this all might be labeled real-world frequentist statistics:

History, philosophy, and statistics education

My collaborators and I have also written some things on history, philosophy, and statistics education. Much of this represents ideas that my colleagues and I have been discussing for decades, that we finally got an opportunity to write up and discuss formally. Others were responses to new ideas or developments:

Also, Deb Nolan and I came out with the second edition of Teaching Statistics: A Bag of Tricks.

Causal inference

We’ve also done some research on causal inference:

Not a lot of papers on the topic, as I’m not always clear on what I can add to these discussions—there’s a reason that Jennifer is the main author of the causal chapters in our books—but causal inference is central to statistics (recall the title of this blog!), so I’m glad to contribute to it in some way.

Statistical graphics and visualization

Visualization is one of my favorite topics that we keep coming back to, as part of our larger effort to incorporate statistical practice into formal statistical theory and methods:

That’s all only part of the story

The above list is incomplete, in that it does not include unpublished papers, blogging (we’ve had something like 6000 posts and 100,000 comments in the past decade), case studies, wiki pages, and other modes of research communication.

Let me emphasize that all this work is collaborative. Even the articles published only under my name are collaborative in that they are the results of lots of reading and discussions with others. Let’s remember to avoid the scientist-as-hero narrative.

It’s been an eventful decade in the world: economic development, environmental challenges, social and political opportunities, and nearly a billion new people. Statistical modeling, causal inference, and social science is only a tiny part of all of this, and the work of my collaborators and myself is only a tiny part of statistical modeling, causal inference, and social science—but we still try in some way to develop tools for people to be able to understand and improve our physical and social environments. My colleagues and I have been privileged to have a working environment that has allowed us to make efforts in these directions, and we’ve also worked hard—with books, research articles, journalism, blogging, software, documentation, and online forums—to engage with and build communities of people who can do similar work.

It happens all the time

Under the subject line, “Here is another one for your archive,” someone points me to a news article and writes:

What would have happened had the guy not discovered his coding error? Or what if he had, but the results were essentially unchanged? My guess if that nothing would happen until someone got the data and tested the robustness of the statistical procedures used to produce the published results.

My reply: Oh, yes, I’ve blogged this one. What with the lag, I don’t think it has appeared yet. When I wrote the post, there had been no retraction but I still expressed skepticism for the usual reasons. Then I added a P.S. to discuss the retraction.

I purposely am not giving the details here because I want to make the larger point that pre-publication review doesn’t mean much. Even post-publication review doesn’t mean much. Review is an ongoing process. Yes, sometimes we can find the smoking gun that says, Don’t trust this guy!, but usually it’s not so simple—and we shouldn’t expect it to be so simple. To put it another way, don’t trust stuff just cos it’s been published and promoted and no obvious problems have been found—yet.

Criminologists be (allegedly) crimin’ . . . and a statistical New Year’s toast for you.

Someone who wishes to remain anonymous points us to this video, writing:

It has to do with Stewart at FSU, in criminology. Couldn’t produce a survey that was the basis for 5 papers, all retracted. FSU though still failed to do complete investigation. The preliminary investigation had a 3 person panel, 2 of whom were repeat co-authors with Stewart, violating their own protocols. So as far as FSU is concerned he is cleared. Story is bad on every level. And as usual, it is the whistleblower who suffers, in this instance a co-author who began to have second thoughts.

Thought this might interest you, not just the fraud but also the regulatory capture.

I replied that I’m too impatient to watch videos, so he sent me this text link, which starts with the following quick summary:

1. No funding source for $100,000+ survey
2. 60% response rate for a telephone survey
3. Changing origin story from Stewart: Grad school buddies (2018) -> Research Network (2019)
4. Marc Gertz cryptic validation of the 2013 survey (2019)
5. Marc Gertz in 2018 denies doing the survey
6. Stewart never provided the RAW data to FSU, John Smith or his co-authors
7. Jake Bratton went on the record saying all public surveys available were from 2000-2009
8. Jake Bratton says TRN closed in 2010 -> impossible for TRN to do the 2013 survey
9. FSU ignored its own policies of data sequestration
10. FSU ignored its own policies of avoiding co-authors on an investigation committee.

This all sounded familiar, something I’d posted on. But I googled all sorts of things and couldn’t find it on the blog. Then I realized: the blog delay! Someone had emailed me about this story in June 2019, I posted on it in October 2019, and the post is scheduled for March 2020.

The post is titled, “You don’t want a criminal journal… you want a criminal journal,” and it begins as follows:

“You don’t want a criminal lawyer… you want a criminal lawyer.” — Jesse Pinkman.

In what sense is it a “blood sport” to ask someone for their data?

That’s our question for the day. But it’ll take us a few paragraphs to get there.

You can read the full story in March. But maybe events will have overtaken it by then.

I told my correspondent about the forthcoming post and he responded:

Glad you are covering it. It is not just the fraud by the scientist, more so it is the institutional response. The video does have a quick interview with a vice-chancellor from Duke on why they didn’t act sooner on the fraud there. Doesn’t exactly give you an encouraging feeling.

The good news

It’s good to know that some people care enough about this to go to the trouble of making videos and writing articles about this and other cases. Yes, there are the corporate and university bureaucrats and pencil pushers, Association for Psychological Science-style apparatchiks who duck, dodge, and retaliate against dissent and who, ironically, call us names like terrorist and Stasi when we’re so uncivil as to point out errors in the published work of bigshots and their friends. But there are Javerts out there who are bothered by this network of deniers and apologists.

It’s horrible to be ending the year with all these stories about fraud in science. Let’s hope that 2020 is, as Simine Vazire hopes, “the year in which we value those who ensure that science is self-correcting.”

This new year’s eve, let’s toast to a reproducible generative Bayesian workflow! This works also for those of you who don’t use Bayesian methods. The key step is planning and understanding your methods using fake-data simulation: in that sense, replicability is central to the ideas of frequentist and Bayesian statistics.


They’re playing My Morning Jacket on the radio. I think Off the Record sounds just like the Ramones, but nobody agrees with me. Please tell me I’m not insane.

DAGS in Stan

Macartan Humphries writes:

As part of a project with Alan Jacobs we have put together a package that makes it easy to define, update, and query DAG-type causal models over binary nodes. We have a draft guide and illustrations here.

Now I know that you don’t care much for the DAG approach BUT this is all implemented in Stan so I’m hoping at least that it’ll make you go huh?.

The basic approach is to take any DAG over binary nodes, figure out the set of possible “causal types”, and update over this set using data. So far everything is done with a single but very flexible Stan model. The main limitations we see are that it is working only with binary nodes at the moment and that the type space blows up very quickly making computation difficult for complex models. Even still you can do quite a bit with it and quickly illustrate lots of ideas.

In Berlin we were also talking about case level explanation. If of interest this piece figures out the bounds that can be obtained on case level effects using mediators and moderators. Different to the approach you were discussing but maybe of interest.

Any comments welcome, as always (including a better name for the package)!

I followed the second link, and in the abstract is says, “We are now interested in assessing, for a case that was exposed and exhibited a positive outcome, whether it was the exposure that caused the outcome.” This doesn’t seem like a meaningful question to ask!

But maybe some of you will feel differently. And, as Macartan says, their method uses Stan, so I’m sharing this with all of you. Feel free to download the package, try out the methods, comment, etc.

Knives Out

Since I just ran a post without the 6-month delay, I might as well do another, this time to recommend Knives Out to you. We saw it a few days before Christmas, and it was our most enjoyable time at the movies since . . . actually, I can’t remember the last time we had so much fun. It was comparable to when we saw The Drowsy Chaperone.

I more often have this experiences of being enjoyably transported when reading a book, for example the works of Meg Wolitzer and Jonathan Coe. I guess it’s easier to get lost in the world of a book than the world of a movie or play. But the movie or play experience is special when the whole audience is laughing.

P.S. I should add that, in addition to being well acted, well shot, well written, and lots of fun, Knives Out also had some deep moments—or, at least, moments that seemed deep to me. To avoid spoilers, I’ll put these thoughts in rot13: V yvxrq gung ovg arne gur ortvaavat jurer Qnavry Penvt gnyxrq nobhg tbvat gb gur raq bs gur envaobj jurer gurer vf gehgu, naq gura jnvgvat sbe gur riragf bs gur fgbel gb trg gurer. Guvf frrzf gb zr gb qrfpevor n ybg bs jung erfrnepu srryf yvxr, naq vg nyfb svg va jvgu gur fcrpvsvpf bs gur fgbel fhpu nf uvf abgvpvat gur oybbq ba Nan qr Neznf’f fubr. Nyfb gur ovg ng gur raq jurer ur fnlf gung fur jba ol abg cynlvat gurve tnzr. Nyfb V yvxrq gur srry bs gur raqvat, ubj lrf fur tbg gur zbarl ohg vg jnfa’g cerfragrq nf fbzr sha pncre gung fur chyyrq bss. Gurer jnf fnqarff, cnegyl sebz Rqv Cnggrefba qlvat naq nyfb orpnhfr vg’f abg pyrne jung jvyy unccra gb Nan’f zbgure, ohg nyfb vg jnf whfg pbafvfgrag jvgu gur gurzr bs gur zbivr gung gur fvkgl zvyyvba qbyyne sbeghar jnf zber bs n ubg cbgngb guna n znthssva.

Do we still recommend average predictive comparisons? Click here to find the surprising answer!

Usually these posts are on 6-month delay but this one’s so quick I thought I’d just post it now . . .

Daniel Habermann writes:

Do you still like/recommend average predictive comparisons as described in your paper with Iain Pardoe?

I [Habermann] find them particularly useful for summarizing logistic regression models.

My reply: Yes, I do still recommend this!