Overall, I remain a fan of Oliver Sacks.

Pedro Franco writes:

I just saw that the noted Oliver Sacks has had what seems a very damning article published in the New Yorker (https://www.newyorker.com/magazine/2025/12/15/oliver-sacks-put-himself-into-his-case-studies-what-was-the-cost). Here’s a couple of selected tidbits:

In a letter to one of his three brothers, Marcus, Sacks enclosed a copy of “The Man Who Mistook His Wife for a Hat,” which was published in 1985, calling it a book of “fairy tales.”

The case study is presented as an ode to the power of understanding a patient’s life as a narrative, not as a collection of symptoms. But in the transcripts of their conversations—at least the ones saved from the year that followed, as well as Sacks’s journals from that period—Rebecca never joins a theatre group or emerges from her despair.

Obviously very damning quotes. Another stab at the “scientist as a hero” myth to boot.

What’s been interesting to me are the reactions. Both Tyler Cowen (https://marginalrevolution.com/marginalrevolution/2025/12/which-published-results-can-you-trust.html) and Steven Pinker (https://www.facebook.com/Stevenpinkerpage/posts/a-lesson-from-the-debunking-of-oliver-sacks-and-other-nonreplicable-findings-tru/1431125978383100/) stress the importance of looking at literatures over individual papers/books, obviously sound advice. And yet.

As other comments on the Marginal Revolution post and you’ve stressed so much, that’s not enough. Whole literatures have been decimated in the replication crisis and to emphasise just the literature review process, even when applied on a large scale, seems incorrect to me. Unfortunately, there seems no solution except, you know, doing the hard work of thinking and reading and things.

Anyway. I thought you might want to jump in, as this “consensus” advice, although better than just despairing, doesn’t sit right with me. And, I’m guessing, doesn’t sit right with you either.

Finally, both Pinker and Cowen reference a Bryan Caplan post referencing a Noah Smith post (https://www.econlib.org/no-paper-is-that-good/) which, ironically, does seem to suggest a better advice akin to what I wrote above.

My first reaction is that I’m a huge Oliver Sacks fan. The article, by Rachel Aviv, is fascinating.

Here’s one bit:

Mort Doran, a surgeon with Tourette’s syndrome whom Sacks profiled in “Anthropologist,” told me that he was happy with the way Sacks had rendered his life. He said that only one detail was inaccurate–Sacks had written that the brick wall of Doran’s kitchen was marked from Doran hitting it during Tourette’s episodes. “I thought, Why would he embellish that? And then I thought, Maybe that’s just what writers do.” Doran never mentioned the error to Sacks. He was grateful that Sacks “had the gravitas to put it out there to the rest of the world and say, ‘These people aren’t all nuts or deluded. They’re real people.’”

I’m surprised that Doran is so mellow. If someone interviewed me and reported that a wall in my kitchen was marked from me hitting it, I’d be really annoyed!

Also this:

In his journal, reflecting on his work with Tourette’s patients, Sacks described his desire to help their illness “reach fruition,” so that they would become floridly symptomatic. “With my help and almost my collusion, they can extract the maximum possible from their sickness—maximum of knowledge, insight, courage,” he wrote. “Thus I will FIRST help them to get ill, to experience their illness with maximum intensity; and then, only then, will I help them get well!”

Maybe we don’t want to “get well,” dude!

Overall, though, after reading the Aviv article, I remain very positive about Sacks. OK, not quite so much as before, and I do wish he’d clearly labeled in his books what was real and what was invention, but I’m still a fan.

Yes, it seems that he made stuff up, but he still seemed to be describing real things, unlike people like Wansink and Ariely who made up data (or had the misfortune to keep doing research projects with data-faking colleagues), or people like the beauty-and-sex ratio guy or the ovulation-and-voting researchers who did the equivalent of going through piles of random numbers in order to tell stories with no real evidential basis, or etc etc etc. I still feel like I got a lot out of the Oliver Sacks books that I’ve read.

And, yeah, he was barbaric on Tourette’s syndrome, but those were barbaric attitudes expressed privately, in his journal. Sacks in his public writings on the topic was reasonable and civilized. So I think he deserves credit for channeling some of his worst impulses into his diary and keeping them out of his books.

Overall, I remain a fan of Oliver Sacks.

P.S. As a separate matter, I’m not convinced by the above-linked post by psychologist Steven Pinker, given that Pinker doesn’t seem to distinguish between criticism of unreplicable published research and what he’s elsewhere called “social media hate mobs” (see part 3 of this post). Also, elsewhere he expressed a naive (in my opinion) take on pure crap social psychology research (in this case, the ridiculously bad ovulation-and-voting study).

It’s easy to make fun of Pinker cos he puts himself out there (see here) but overall I appreciate his willingness to engage with the world. So I’m not saying he’s a bad guy or that nothing he says should be trusted, just that it’s hard to know what to think when he starts writing about scientific skepticism.

Handbook of Markov chain Monte Carlo, second edition

Radu Craiu, Dootika Vats, Galin Jones, Steve Brooks, Xiao-Li Meng, and I edited the second edition of the Handbook of Markov chain Monte Carlo. Dootika set up a github page for the book, listing all the chapters and includes links to most of them in Arxiv form. (Chapter 4, “For how many iterations should we run Markov chain Monte Carlo?”, is by Charles Margossian and me.) For some reason, a few of the chapters are not yet on Arxiv but I guess they’ll get there soon.

I recommend the whole book, but especially chapter 24, “Running Markov chain Monte Carlo on modern hardware and software,” by Pavel Sountsov, Colin Carroll, and Matthew Hoffman. But really, just dive in and read whatever chapters interest you.

My only regrets are that we didn’t include chapters on the following topics:
– Probabilistic programming (Stan, etc.)
– Sequential Monte Carlo (particle filtering)
– Divide-and-conquer algorithms (expectation propagation, etc.)

But, hey, no project is ever done.

I’m glad to have been part of this, and special thanks to Radu and Dootika, who joined the project for the second edition and added a lot.

Is government policy actually “virtually unrelated to the desires of the low- and middle-income citizens”?

Peter Enns shares a new article, which states:

The finding that government policy is “virtually unrelated to the desires of the low- and middle-income citizens” (Gilens 2005:789) is one of the most influential social science results of the last two decades. This article offers a new perspective on this finding. I [Enns] show that the seemingly innocuous decision to restrict analyses to data where different income groups’ policy support differs (i.e., a preference gap exists) introduced Simpson’s paradox, leading to misleading conclusions about whose preferences policy reflects. The same concerns apply to analyses of responsiveness to men and women and to partisan groups. I also present evidence that other common approaches for evaluating policy responsiveness can produce equally misleading conclusions. These findings suggest a need to reconsider conventional wisdom about political influence. The conclusion offers methodological recommendations and discusses implications related to understanding social and economic inequality and support for populist candidates.

Enns writes:

I believe this research overturns Marty Gilens’ argument that “actual government policy does not respond to the preferences of the median voter” and Gilens and Ben Page’s conclusion that, “average citizens have little or no independent influence.” I show that the core result in Gilens’ original research stems from Simpson’s paradox (the short videos in this blog post illustrate this paradox with Gilens’ data).

This all reminds me of something we found when working on the Red State, Blue State project: the political differences between red and blue states (and thus between states controlled by Republicans and Democrats) were much more correlated with the political positions of upper-income voters than with lower-income voters. So, from that perspective, it seemed to us that one could observe a pattern in which the positions of elected officials were better predicted by the positions of upper-income voters in their state, without this being directly a responsiveness issue. It could just be correlational, having to do with the interaction between income and geographic variation in party preferences.

I looked at the paper by Enns and had some questions, to which Enns responded. I’ll give my questions below with his responses interspersed:

Me: You seem to have two different criticisms of the work of Gilens and others:

1. When they talk about upper-income voters, it’s pretty much just the upper 20%, not the truly rich.

Enns:

Yes, this is important for our theoretical expectations and mechanisms of influence.

2. There’s the Simpson’s paradox issue you discuss.

Enns:

Yes.

Me: But, substantively, are the Gilens et al. conclusions so wrong?

In your paper, you write, “The probability of policy change does, however, differ when the affluent support a policy more or less than the low-income group. Specifically, proposed policies are more likely to become law (i.e., α is greater) when the affluent prefer the policy more than other income groups.” This sounds pretty close to the general understanding of Gilens etc.

Enns:

There are two important differences:

(1) If this is a policy advantage, it is a completely different mechanism. Gilens (and Gilens and Page), like most other scholarship, focus on responsiveness, which is captured by beta. The conclusion that policy does not respond to the preferences of the middle or low-income voters no longer receives support. Some groups may have political advantages, but past conclusions about unequal responsiveness or some group’s preferences not mattering need to be revised. I think this distinction holds large implications.

(2) More importantly, I don’t think we can interpret this as an advantage at all. Figure 5 touches on this by plotting policy responsiveness to the affluent. When the affluent overwhelmingly oppose a policy, are they really advantaged when the unpopular policy is more likely to become law if they want the policy more than the poor do (orange dots on left side of Fig 5)? Conversely, when the affluent are extremely supportive of a policy, does it benefit them that the policy is less likely to become law when the poor want the policy more (blue dots on right of Fig 5)? I don’t think the different alpha when the affluent support policies more or less than other income groups maps neatly onto any conception of policy advantage as understood by political scientists. But I’d be eager to get your take on this.

Me: Later, you write, “Headlines based on Gilens and Page’s research, such as ‘Rich people rule!’ (Bartels 2014), ‘The Politics of Always Ignoring What Average Americans Want’ (Coy 2014), and ‘Politicians listen to rich people, not you’ (Prokop 2015), align with their original findings. But we have seen that the data cannot support these conclusions.” But it does seem like rich people rule, no? And, sure, politicians listen to all voters, and non-rich voters have influence (look at the recent NYC mayor election where all the rich people in the world weren’t enough to buy the election for Andrew Cuomo), but don’t you think people have always taken these “Politicians listen to rich people, not you” sorts of claims as relative statements?

Enns:

Here’s a two-minute video by Ezra Klein that I think very much aligns with my interpretation of these articles. I think a lot of other coverage takes the no responsiveness to average citizens claims very literally and I believe this really matters. If we take Gilens and Page at face value, there’s nothing the general public can do – policymakers have consistently completely ignored most voters. By contrast, the results in my article suggest: (a) we do not have evidence that policy makers ignore average citizens or even the lowest income group, and (b) we have strong evidence that policy responds to the general public (Table 4), so more political engagement could, theoretically, have an even greater influence.

Me: Also, I agree that these survey data don’t tell us about the actual rich, but, again, it’s hard to imagine that political influence doesn’t go up a lot when you go from the 90th to the 100th percentile of income or wealth.

Enns:

I totally agree about the different political influence on this spectrum, but we should no longer conclude that a study of the 90th income percentile can tell us about the super rich. In other work, Page and his co-authors have made this exact point, writing: “For systematic evidence on the policy preferences of really wealthy Americans—such as the top 1 percent or the top one tenth of 1 percent of wealth-holders—it is necessary to design special surveys that explicitly target those groups” (Page, Bartels, and Seawright 2013, 52).

Me: So, I’m not knocking your methodological point, and it does help make sense of objections raised by Bob Erikson and others regarding the strong claims made by Bartels etc. from those regressions; I just don’t know that the political conclusions are changed so much?

Enns:

If we take the narrowest conclusion of my findings, I think we have to update our understanding of who policymakers respond to (i.e., the interpretation of beta). I think this is important, and as Figure 4d shows, reverses existing conclusions about negative responsiveness to women.

I think this also changes the study of political influence. Different alphas would be a different type of policy advantage that has not been theorized and social scientists need to study and understand. I think it is absolutely implausible that policymakers pay attention to the difference in group preferences, enacting policies when their affluent constituents support a policy more than their middle-income constituents, and not enacting policies when their affluent constituents support the policy less (regardless of the actual total amount of affluent support). Policymakers don’t have this level of data and it seems theoretically absurd. If policymakers were considering group opinions, wouldn’t they look for policies with broad support across groups instead of the difference between groups? Something else must account for alpha.

Ultimately, I hope three things change in the field based on this article.

1. Scholars no longer conclude that there is near-zero responsiveness to the economic majority and to women.

2. Scholars find new ways to study responsiveness, so current methodological misinterpretations no longer happen.

3. Scholars start to theorize and study alpha. If this is a policy advantage, what accounts for it? If it’s not a policy advantage (because unpopular policies are more likely to pass when the affluent prefer them more and popular policies are less likely to pass when the affluent prefer them less), what does alpha represent?

I think these would be large in important changes for the field, but it’s okay if they aren’t.

For once, some scientific fraudsters have to pay their money back to the government.

Chuck Jackson points us to this news item:

Dana-Farber Cancer Institute Inc. (Dana-Farber) has agreed to pay $15,000,000 to resolve allegations that, between 2014 and 2024, it violated the False Claims Act by making materially false statements and certifications related to National Institutes of Health (NIH) research grants….

As part of the settlement, Dana-Farber admitted that its researchers used funds from six NIH grants to conduct research that resulted in 14 publications in scientific journals containing misrepresented and/or duplicated images and data. The publications reused images to represent different experimental conditions; duplicated images to represent different testing conditions, mice, and/or timepoints; or rotated, magnified, or stretched images. Further, Dana-Farber admitted that a supervising researcher failed to exercise sufficient oversight over these researchers, and that Dana-Farber spent funds from those six NIH grants that were unallowable.

As part of the settlement, Dana-Farber also admitted that another researcher received four NIH grants after submitting grant applications that discussed a journal article authored by the researcher but did not disclose that certain images and data in that article were misrepresented and/or duplicated. The United States contends that Dana-Farber caused the submission of false claims to NIH by falsely certifying compliance with grant terms and conditions, spending grant funds on unallowable expenses, and obtaining grants through false and misleading statements. . . .

Good to see this sort of thing.

Meanwhile, I don’t recall hearing that Columbia University had to pay back the money on this guy’s grants. And I don’t know if any of this guy’s research was federally funded. Although, now that the government is pardoning fraudsters all the time and cutting down on tax enforcement, I suppose that scientific fraud is really the least of it.

An economist writes: “the fulminations over the #1 pick seem overheated to me.”

Jonathan Falk writes:

I [Falk] am always amazed at the amount of (digital) ink spilled on the perverse incentives involved in taking to get the #1 draft pick. The current local woes of the Giants and Jets obviously contribute a lot to these discussions, but they happen all the time. As an economist, it’s clear to me that the value of a draft pick is the incremental value, not the absolute value. I’m completely aware that the upper tails of distributions have much more dispersion than the center, or even the 80th-90th percentile does, but the fulminations over the #1 pick still seem overheated to me.

First, of course, is the fact that assessment is made with error, and there are plenty of #1 busts in every sport. #2s can be busts as well, of course, but that merely lowers the expected difference between #1 and #2 as the true value of both is attenuated towards 0 — #1 loses more.

Second, there is the issue of team fit. Greatness is a vector, not a number, and if the teams ahead of you in draft order need something else, you still stand a chance of getting the player optimized for your needs. Going the other way, of course, is that higher draft picks absolutely lower the number of teams that can steal your guy.

Third, teams are… teams. One person can only contribute so much. So the relevant assessment is now how much better A is than B, but how much the addition of A versus the addition of B will change the prospects of your team — which I think is pretty obviously a lower difference, though I guess your rationale for voting runs in the other direction — you ought to judge a small incremental addition by the gigantic difference between winning a championship or not.

Fourth, more narrowly economic, every incrementally pick costs more. I don’t think that effect is huge in the context of overall payrolls, but isn’t that then another anomaly? If #1 picks are so dramatically better than, say, #5 picks, why aren’t they paid multiples more?

I don’t really have anything to say here, because I have no sense of how much teams are paying for #1 or #2 picks. I do remember a couple years ago that everyone was talking bout Wemby, but basketball’s different than football because there are only 5 players on the court, so one player can make more of a difference.

The case of Wemby makes me think that one way this could be studied would be to compare different years. In some years there is a clear consensus #1 pick, other years not.

John Carlin says, “‘Identifying variables that independently predict…’ is not a well-defined research task”

John “Bayesian Data Analysis” Carlin writes:

Recent developments in the methodology of epidemiological research have emphasized the importance of achieving clarity of purpose by classifying research questions into one of three types: descriptive, predictive, and causal. . . .

I [Carlin] do not believe that studies aiming to “identify” independent predictors or “prognostic factors” are addressing well-defined research questions. Indeed, beyond the issues already raised, there is a broader question of the extent to which it is ever sensible to frame a research question as if it could be answered dichotomously, as in “is this an (independent) prognostic factor?” Prediction questions, which include prognosis, are those that involve the development of a model or algorithm to provide predictions of outcomes using available variables that are potential predictors.

This all makes sense. I kinda think that descriptive, predictive, and causal are all the same thing–or, more precisely, that “descriptive” and “causal” are special cases of “predictive,” under different conditions. But if you want to divide them into three tasks, sure, go for it. Personally, I’d rather divide statistics into the goals of exploration, estimation, and discrimination, but I think that’s because I’m thinking in a more general “data science” perspective, whereas John is focusing more on the more traditional problem of inference.

But, yes, I agree with him 100% on avoiding dichotomization, a topic that Sander Greenland, I, and others have been screaming about for a long time–indeed, John and I contributed to the anti-dichotomization theme in our book Bayesian Data Analysis, in that we focused on model building and inference within a model, rather than on the then-fashionable problem of choosing among or comparing models using Bayes factors. So, yes on that.

John continues:

Some variables may have greater predictive value than others, but this should be assessed by comparing the predictive value of the model or algorithm with and without the use of that variable, not by examining its “independent effect” in a multivariable regression model.

I’m confused on this point. I mean, sure, I agree that you shouldn’t label a regression coefficient as an “independent effect”; indeed, I always use the terms “predictors” and “outcome” rather than “independent and dependent variables.” Beyond this, I’m not quite sure what John is suggesting. Suppose you have a predictor of interest, x3, and you’ve fit the model y ~ x1 + x2 + x3 (for convenience using standard R notation). I guess John is saying, don’t just look at the coefficient for x3 in that model; also compare it to the model y ~ x1 + x2. Maybe this is a good idea–it’s not something I’ve thought about for a while. Is this the same as what used to be called “partial regression coefficients”? I remember from the statistical literature in the 1960s and 1970s that there was a lot of work on methods for understanding what happens in linear regression when you add one variable at a time. Perhaps it would be good to revisit some of those ideas, and maybe it’s a mistake that we don’t cover them in Regression and Other Stories.

I also want to plug my paper with Guido Imbens (also included as Section 21.5 in Regression and Other Stories), Why ask why? Forward causal inference and reverse causal questions. Our point there is that it can be a good idea to search for prognostic factors in observational data, not with the idea this will identify causal effects but rather as a way of understanding what’s missing from our existing models.

Finally, John writes:

More broadly, debates on whether to “adjust” or not for certain variables in a regression model can only be answered by situating the analysis within a sharply defined research question and a sharply defined rationale for specifying a regression model in the first place.

I don’t get this at all. First I don’t get why “adjust” is in scare quotes; second, ummm, yeah, it’s always good to have a sharply defined research question, but in the meantime people are always making comparisons, and so let’s do what adjusting as we can. For example, in an epidemiology study it should pretty much always be a good idea to adjust for age and smoking history. Or maybe John would say that the rationale for adjusting for age and smoking history is sharply defined, in which case maybe we’re in agreement.

To put it another way, it’s often a good idea to have a sharply defined research question–but that applies in general, not just for statistical adjustments. I think it’s also true that it’s better to have a sharply defined research question when performing a randomized clinical trial. A randomized clinical trial gives identification for the sample average treatment effect in any case–but without a sharply defined research question, it’s not clear what can be done with such an estimate.

So I’m wary of John singling out adjustment in his criticisms, as I fear his article will be taken as implying that, if you don’t try to adjust, that everything will be ok.

What advice do you have for this student who’s in his first year of college and interested in both statistics and political science?

Joey Jennings writes:

I’m a first-year statistics major and wanted to reach out because statistics and political science were my two main options when choosing a major, and I’m still considering law school down the line.

I’m very interested in how statistical thinking intersects with politics, public policy, and legal reasoning, and your career seems to embody that combination. I was hoping to ask whether you have any general advice for a student early in college who is trying to keep these paths open and build a strong foundation.

My response: I think I’m too old and too privileged to offer much useful advice to a young student just starting out. My own experience is that I always loved math but I didn’t want to do pure math–it just seemed pointless to try to prove theorems, knowing that there would be other mathematicians who were better than me, proving better theorems–, I studied physics, but then I took some classes in probability and statistics and the subject really grooved with me. I also took some political science classes, and it was interesting to see the relevance of mathematical and statistical ideas in understanding various aspects of voting and political representation. Back then the state of the art in political analytics was pretty low. There was some good work, but also lots of unthinking applications of inappropriate models, so there were lots of openings for a student to do innovative work. I guess things are even better now, in the sense that you can do innovative work at a much higher level, making use of what’s already out there.

As for advice: ok, yeah, I still think it’s a good idea to “learn to code.” Coding is the most rigorous thing out there, and it’s how we understand our statistical models (as discussed in our Bayesian Workflow book). Work on real applications where you can. And choose your courses more based on the quality of the teachers than on the descriptions of the classes.

And, ummm, anyone else out there have any further advice to offer?

A study is retracted after it turns out that its authors were misrepresented as “third-party experts” even though they were actually paid by the company?

Gur Huberman points to this news article:

A Study Is Retracted, Renewing Concerns About the Weedkiller Roundup

Problems with a 25-year-old landmark paper on the safety of Roundup’s active ingredient, glyphosate, have led to calls for the E.P.A. to reassess the widely used chemical.

In 2000, a landmark study claimed to set the record straight on glyphosate, a contentious weedkiller used on hundreds of millions of acres of farmland. The paper found that the chemical, the active ingredient in Roundup, wasn’t a human health risk despite evidence of a cancer link.

Last month, the study was retracted by the scientific journal that published it a quarter century ago . . .

The 2000 paper, a scientific review conducted by three independent scientists, was for decades cited by other researchers as evidence of Roundup’s safety. It became the cornerstone of regulations that deemed the weedkiller safe.

But since then, emails uncovered as part of lawsuits against the weedkiller’s manufacturer, Monsanto, have shown that the company’s scientists played a significant role in conceiving and writing the study.

Oh, what was that significant role?

Monsanto employees praised each other for their “hard work” on the paper, which included data collection, writing and review. One Monsanto employee expressed hope that the study would become “‘the’ reference on Roundup and glyphosate safety.” . . .

In retracting the study last month, the journal, Regulatory Toxicology and Pharmacology, cited “serious ethical concerns regarding the independence and accountability of the authors.” Martin van den Berg, the journal’s editor in chief, said the paper had based its conclusions largely on unpublished studies by Monsanto. . . . There was no disclosure of a conflict of interest on the part of the authors beyond a mention in the acknowledgments that Monsanto had provided scientific support.

There seems to be some controversy about the safety of this pesticide:

Dr. Philip J. Landrigan, who is a pediatrician and epidemiologist and the director of the Program in Global Public Health at Boston College . . . recently chaired an advisory committee for a global glyphosate study that found that even low doses of glyphosate-based herbicides caused leukemia in rats. . . .

Laboratory tests first flagged potential risks posed by exposure to glyphosate as far back as the early 1980s, and soon after, studies of Midwestern farmers exposed to herbicides started to show an increase in certain cancers. A U.S.-backed effort to eradicate coca fields in Colombia by spraying glyphosate from planes onto hundreds of thousands of acres of cropland led to widespread reports of illnesses among residents.

The 2000 paper declaring glyphosate safe was published against that backdrop. . . .

Bayer has paid out more than $10 billion to settle approximately 100,000 Roundup claims . . .

And then there’s the bigger picture:

The retraction points to a wider problem of research secretly funded by industries like tobacco and lead, said David Rosner, co-director of the Center for the History and Ethics of Public Health at Columbia University. “Shading the science to favor the corporate interest,” he said, was likely “the rule rather than the exception.” Journals needed to “press scientists more forcefully to identify conflicts of interest,” he said. “Huge financial interests are at stake.”

The most disturbing thing in the linked emails was that the Monsanto people referred to the authors of that paper as “third party experts” and as “independent experts.”

But if they were paid by Monsanto, then it doesn’t seem accurate to characterize them as “third party” or “independent” experts.

The research article appeared in 2000. The emails were released in 2017 in the process of a lawsuit. The article was retracted in 2025 (although the official publication date of the retraction is February, 2026, i.e., a month after the writing of this post).

I don’t know what to think about all this. On one hand, how much can you trust research on a controversial topic that was written, funded, and reviewed by one of the parties to the controversy? They do say this in the paper, “In this effort, the authors have had the cooperation of Monsanto Company that has provided complete access to its database of studies and other documentation,” but it sounds like Monsanto provided more than data access.

I guess I could try to read the original article . . . .OK, let’s take a look:

The paper goes into details on three studies from 1988, 1991, and 1992 of oral doses in rats over 10 or 15 days. Then it looks like there was another study from 1973 on oral doses in rats for 15 days, and then three studies of skin exposure from 1983 and 1991, two on monkeys and one on humans. Then there’s a mouse study from 1992, rat studies from 1987 and 1992, a dog study from 1985, a rat study from 1979, a mouse study from 1983, a rat study from 1981, . . . ok, I’m getting tired now. There’s not really much for me to chew on here as a statistician. It does seem that belief in these results is going to boil down to your trust in the research team, and so the undisclosed conflicts of interest are a big deal.

On the other hand . . . I’ve done research funded by Novartis–they paid my colleagues and they paid me directly too. We published a paper based on that work–two of the authors were Novartis employees and two of the other authors had worked for me at the time (more precisely, they’d worked at Columbia under my supervision). That project used Novartis data, but it was a little different from the above-discussed Roundup article in that its purpose was methods rather than policy.

Also I did some consulting for Monsanto at one point, I think! I can’t remember the details, I think I was on the scientific advisory board of some company that was doing some agricultural stuff, I went to one of their meetings and then I stopped hearing from them, actually I can’t even remember if they paid me. So I’m not gonna get on my high horse and denounce industry-funded or pharma-funded research in general terms.

Taking one more swing at the foolish nudgelords who associate the Soviet Union with environmental protection

Hey, wait a second? The authors of Nudge . . . they’re not idiots! One of them won the Nobel Prize, and people keep telling me that the other guy is really smart. You don’t get to be Henry Kissinger’s pal by being a dummy, right?

And yet . . . they keep saying some really dumb things.

Whassup with that?

My guess is that they have no editor. Even the best of us make mistakes, even the best of us write some stupid things from time to time. If we’re lucky, though, we can show our writings to some trusted person who can point out where we’re wrong. Or to some complete strangers–like you blog commenters!–who feel free to point out where we’re wrong, or where they think we’re wrong.

I absolutely looove when youall disagree with me. It’s a no-lose situation: either you find a legitimate error and then I can fix it and recalibrate my thinking as needed, or you’re wrong in your correction, but in that case it’s still a useful sign that I’ve failed to communicate clearly.

I’m speaking here of sincere criticism, not trolls or Russian agents or people who are otherwise trying to muddy the waters. But the vast majority of you do seem to be sincere, and even the trolls often have good points, and the agents and equivalents usually go away once it’s clear that we’re not going to give them twitter-style engagement.

Anyway, back to the Nudgelords . . . I think their problem is they’re too successful, so they don’t need to listen to critics. Also, not listening to critics is a contributing factor to their success! One thing that made them Lords rather than just Commoners is their unshakeable confidence.

I thought about this because I happened to come across this 2013 review by Samuel Freeman of Cass Sunstein’s book, “Simpler: The Future of Government.” Here’s Freeman:

Simpler is a follow-up to Nudge. Sunstein draws from his experiences as head of the Office of Information and Regulatory Affairs (OIRA) from 2009 to 2012. . . .

Sunstein contends that “the future of government” largely lies in policies that preserve freedom of choice. Such policies, which he and Thaler dubbed “nudges,” would encourage people to make decisions that benefit rather than harm them. . . .

“To count as a mere nudge,” Sunstein writes, “the intervention must be easy and cheap to avoid. Nudges are not mandates. Putting the fruit at eye level [in a school cafeteria, for example] counts as a nudge. Banning junk food does not.”

Uh oh . . . he’s citing the work of the discredited business-school professor Brian Wansink! As we’ve discussed before, the problem is not that Sunstein got conned by that now-disgraced food researcher, but rather that, after the problems with Wansink’s work came out, they removed all references to it from the second edition of Nudge–without reflection on how they’d been fooled. That’s where the idiocy happened.

But now let me show you the place where Sunstein really brings on the stupid. Here’s Freeman again:

Finally, rather than “Soviet-style” national restrictions on major sources of pollution, they advocate incentive-based approaches that increase freedom of choice, ideally, for example, a cap-and-trade system in which “rights” to pollute could be purchased or given away and then traded on the market.

What an idiot, to refer to environmental protection laws as “Soviet-style”! Hasn’t he heard about the environmental devastation in the Soviet Union? Soviet-style is to let factories pollute because they’re run by well-connected people, and there were no independent executive, legislative, and judicial branches to make and enforce the rules. To think of pollution restrictions as “Soviet” . . . that’s just nuts, it’s both illogical and ahistorical.

It’s really frustrating that this sort of thing is taken seriously.

P.S. You might thing, Sure, but that was 2013, and since then we’ve had the replication crisis, the winds have changed, and nobody takes that crap seriously anymore. But nooooo, here it is in 2023: bullshit nudge numbers in the New York Times in 2023. I’m not blaming Sunstein for that one; my point only is that there are a lot of people who want to believe this stuff.

P.P.S. To be fair, Sunstein can be a thoughtful writer sometimes, for example in this review of a biography of the economist Albert Hirschman. I suspect that Sunstein’s thinking is clearest when it is detached from his personal ambitions, so that instead of trying to stake out some position, he can just step back and tell it like it is. I’d like to think he could do more of that going forward.

Two Health Economists Walk into a Bar: What bothered me in that conversation of Jay Bhattacharya and Emily Oster

Last week I was at a conference on enhancing scientific integrity (as I reported here), and one of the sessions was an interview of Jay Bhattacharya, the current director of the National Institutes of Health, and Emily Oster, a professor of economics and Brown University.

I referred to that session in a post the other day regarding the recent case of a report from the Centers for Disease Control and Prevention that was pulled by Bhattacharya, in his additional capacity as acting director of the CDC. I’ll get back to that story in a bit, but here I wanted to talk about some larger things that bothered me in the interview.

Before getting to my disagreements, let me give my positive take, which is that both the people in the interview had an air of moral seriousness.

This is important. So much of the discourse in politics and social science these days is polluted with cynicism, whether it be from history professor Niall Ferguson decrying the “wokeness” on college campuses when he’s not encouraging college students to do “oppo research” on each other, or Lawrence Summers sleazing around with a sex trafficker and then trying to enlist his rich friends to intimidate student journalists, or Cass Sunstein writing an entire book on a topic he knows nothing about, or Sunstein’s friend Adrian Vermeule promoting election denial, or Mehmet Oz and Andrew Huberman trading off their medical and scientific credentials to hawk dietary supplements, or Steven Levitt promoting dubious claims on mind-body healing and global warming denialism (presumably because they’re cool and transgressive, respectively), or Matthew Walker torturing the data, etc etc. I’m talking about researchers who see science as a path to glory, not to understanding, and politically-minded academics who will happily promote stupid ideas that push their agenda. Beyond that there are straight-up politicians who lie, cheat, and steal, and that’s bad too–but here I’m talking about that nexus between government, policy, and the human sciences.

Anyway, Bhattacharya and Oster weren’t like that. They recognize that we’re talking about serious issues here. When asked about disruptions to NIH funding, Bhattacharya emphasized the larger goal of improving public health, making the point that they want to fund a portfolio of projects to address health challenges. I have no sense of how things are run internally within NIH, so I’m not saying I agree or disagree with his particular administrative directions, but I appreciated that he kept his eye on the ball by emphasizing ultimate goals. For her part, Oster questioned Bhattacharya on a number of issues. She too gave the sense that this is a serious topic, not just a political game.

How to do better is another question! Last month Oster wrote positively about some silly dietary guidelines recently released by the FDA, and if you read her op-ed carefully she doesn’t actually seem to agree with most of those guidelines (the best thing she could say about them was that they were “not crazy”), so I take it that in writing that piece she was making a sort of persuasion calculation that the best way to be effective is to mix the criticism with a gallon of sugar. That’s not my style. So, Oster uses a different approach than I do, and I’m sure we’d have our differences in how to interpret statistical evidence. But, again, I think she’s engaging with moral seriousness.

And it’s possible to be morally serious while still having fun. Consider Nate Silver. Nate’s an entertaining writer–I try to be too!–and I’ve had my disagreements with him regarding statistics and communication, but I think he’s coming from a place of intellectual and moral seriousness that shows respect for the challenges of political analytics and the stakes involved. Indeed, sometimes when he’s disagreed with me, it’s on the implicit grounds that he’s making progress in understanding the real world, doing some analytical engineering that is outpacing the statistical theory. I still think there’s a benefit to interrogating the edge cases where our methods break down . . . anyway, my point is that I’m not just using the term “moral seriousness” to refer to things that I agree with. I’m talking about an attitude that I see in Bhattacharya, Oster, and Silver that I don’t see in, say, Niall Ferguson or Andrew Huberman.

Now, to return to our main thread, these are the parts of last week’s interview that bothered me:

1. When asked about some news reports regarding the NIH and CDC, Bhattacharya dismissed them as “fake news.” This annoyed me for two reasons. First, he offered no evidence that the reports were untrue. Second, he was appointed by a man who spews out false statements at an amazing rate, including on the topic of public health. Who are we supposed to trust here? News reports or a political appointee? Also, Bhattacharya himself has a record of being sloppy with the facts, as I happen to know because it happened to me.

Now, don’t get me wrong, I’m not saying that Bhattacharya was lying or misinformed regarding recent NIH and CDC policies. It could well be that the news items were erroneous or misleading–and, if so, I can see how Bhattacharya would be legitimately annoyed. And he should feel free to express his annoyance! But just dismissing the reports as “fake news” . . . that’s not a serious response.

As I wrote above, I appreciate that Bhattacharya treats the nation’s public health spending with the seriousness it deserves. As a statistician, I think information needs to be treated with respect as well. Which means he should be addressing serious news reports and, for that matter, respecting the institution of journalism. Which he wasn’t doing here.

2. When the topic of vaccines came up, Bhattacharya came out strongly in favor of vaccination, and he expressed the view that it is better for vaccination to be voluntary rather than mandatory. This could be. I guess it depends on the context. For almost all my life, childhood vaccines were mandatory, just about everybody got vaccinated, and just about nobody complained about it. So mandatory vaccination can work just fine–we have decades of experience on this one. The bad news is that in the past few years, vaccination has become politicized and anti-vax attitudes have become embedded in right-wing politics. So it could be that Bhattacharya is right and the mandates will have to go, we’ll just have to accept more sick and dead kids and adults, just the price to pay for this aspect of political dysfunction. I don’t know, but it could be, so I’m not going to criticize Bhattacharya for his hot take on this issue.

What bothered me was . . . if you are going to go with a voluntary vaccination strategy, I think you’d want a strong strategy of encouraging people to choose vaccination for themselves and their kids. So I think his response would’ve been stronger if he’d also said something about how to vigorously promote vaccine usage. That’s part of public health policy too. Also, Bhattacharya doesn’t have a great track record on this issue: just a few years ago he was part of an anti-vax organization. See here for the ugly story. OK, fine, everybody makes mistakes and has lapses in judgment. But then at least he should address that, in the past, he’s been part of the problem. To just say that you want vaccines to be optional but without addressing that history, that’s not right.

3. The un-publishing of that CDC report. Bhattacharya said he stopped the CDC from publishing the report because it was using an approach called a test-negative design, which he thinks is a bad statistical method. When he said this, Oster jumped in and said that she too thought it was a bad method. It was only a brief exchange and there was no time for either of them to give a reference or to explain why they think the method is bad. In the meantime, it seems that the report has been leaked; see here. One of the authors of the report said, “I’m strongly opposed to this kind of censorship . . . It should be out in the world at large for the scientific community to judge it for what it is.”

I think the best next step would be for the CDC to release the report officially, along with a critical response from a statistician explaining how the method is flawed. Bhattacharya said it was common knowledge that the method was terrible; on the other hand, it seems that this “test-negative design” is a standard approach for studying the effect of vaccines in the population after they have been released; see also here. So at the very least it would be a valuable educational opportunity to see this article that was on the verge of publication, and to understand its purported problems. Publishing the report along with a companion article discussing its problems, that could make sense. Canceling the report without explaining why (and, no, just saying you don’t like this method isn’t enough of an explanation) . . . that’s not serious science. Scientific integrity is not being advanced by this sort of behavior.

I was also upset that Oster just jumped into the discussion to say that she, too, hates the test-negative design. Neither Bhattacharya nor Oster are statisticians. They’re health economists. It’s fine for a health economist to have an opinion on a statistical method, but, to be so sure about it, that doesn’t seem right to me. To the extent that Bhattacharya and Oster have legitimate concerns about the statistical method, they can work with a statistician to express these concerns openly and scientifically.

I’m not saying that statisticians or epidemiologists are always right or that other professionals should defer to them. Statisticians can be wrong, really wrong, and the errors can be compounded by a presumption that they know what they’re doing. So question these reports all you want. But then is the time to bring in an expert of your own, not to wing it.

Above I talked about moral seriousness regarding outcomes. There’s also moral seriousness regarding methods, and neither of the two people in that interview were displaying it. Also important is moral seriousness about communication, which has not been displayed by Bhattacharya, who has yet to come to grips with the fact that he was on the board of an anti-vax organization.

P.S. Dorothy Bishop provides a detailed discussion of this event.

Hey! Try out the RMET (“the Reading the Mind in the Eyes test”).

Dan Luu is interested in this RMET thing so he set up a survey here. Click on the link and try it out!

Dan has some thoughts on this, and so do I. I have a post scheduled for October (that’s the current end of the queue) with our thoughts, but he’d like you to try it out now without being influenced by our takes.

Dan did the No Vehicles in the Park survey awhile ago and got some interesting results, so I think you’d be contributing in some small way to the public good by trying out this new survey and giving him some data. Enjoy.

Should French pollsters be using Mister P?

An anonymous statistics student from France sends in the above plots (click twice to see big versions) and writes:

I’m trying to push French pollsters to start doing MRP.

I made a poll agregator and applied it to the last 100 days of the last five french presidential elections.

I did some smoothing using an algorithm from a paper of Aki Vehtari. It is Kalman-RTS with cross-validated levels of noises.

I tested it on some simulated data to confirm it is fitting properly.

I put the data and the code on my blog.

What I shared as “the data” is the smoothed result. I fitted it on the wikipedia pages of the french polls.

On the plots, the same parties (with changed names or fusions) are on the same position horizontally to allow comparisons.

I see some periodic movements in opinion that I think may be coming from a periodic non-response.
Also, the movements seem far too large to me. I can believe 10% increase for a candidate in five years, but not in less than 100 days.

The French polling industry is in profound need of reform. A fun fact: They allow themselves to change the final result by plus or minus one point based on the feelings of the person in charge of the poll. They call that the “pifomètre” or nosemeter. I heard about this in an interview with sociologist Hugo Touzet on his book, “Produire l’Opinion: Une Enquête Sur Le Travail Des Sondeurs.” I trust his descriptions of their methods since he has interviewed their workers.

I think MRP would allow the pollsters to do predictions for the legislative elections and municipal elections, which have been largely ignored because they are too difficult and expensive with quota sampling.

I know next to nothing about French polling, but, yeah, I do think they should be using Mister P (multilevel regression and poststratification; MRP).

P.S. Here’s a fun cranky post from this student.

Why isn’t it possible to play a fun and serious game of poker not for money?

Dan Luu writes that, as a newcomer to poker, something puzzles him about how the game is played:

Poker players have collectively decided it’s not possible to play the game without trolling unless you play for “serious” money. The reasoning is something like, “obviously, people will make stupid plays like going all in every hand unless there’s real money on the line”. Outside of the implicit collective agreement to do so, this is patently absurd — people play all sorts of games where there’s no money on the line and they don’t, in general, purposely make troll moves, so there shouldn’t be an inherent reason poker can’t be played seriously when there isn’t serious money on the line, but since people have agreed to buy into this collective delusion, it seems fairly difficult to find a poker game where people actually want to play well without putting an amount of money up that’s meaningful to the people playing.

As a poker player myself, this rings true to me. OK, I’ve never been serious about the game–in grad school we had a weekly nickel-dime-quarter dealer’s choice game, mostly seven-card stud (this was before the popularity of table stakes Texas hold ’em, and “going all in” wasn’t a possibility in our game), and in the decades since then I’ve only played a few times, most recently over ten years ago. That last game included some political scientists and also some actual politicos who fit the stereotype (they were cynical and cursed a lot). It was pretty stressy, not a pleasant experience. I won a couple hundred bucks, probably more from luck than anything else, and one of the politicos was annoyed at me about that. I still think about the game, though. It’s a point of reference for me, as here, for example.

Anyway, yeah, in grad school we weren’t broke, but throwing $4 into the pot counted for something; it’s not a move we’d do just for laughs. Playing for pennies wouldn’t have been enough. And playing just to win, in the way that you might play a game of Scrabble, or chess, or ping-pong, or Uno . . . Nah, that just doesn’t work in poker.

The question is, why? Luu argues that this is just a convention, just one of the unwritten rules of the game, just as players avoid strategies using grid positions in Codenames. There’s an implicit agreement in poker not to play seriously unless the stakes compel it, and without this convention, people could play happily for low or even zero stakes, just as they do with chess or bridge. Luu:

There’s often some specific argument like “it’s more fun to play than to fold”, but most people would say this about declaring vs. defending in bridge, and yet you don’t see people randomly bidding 7NT (the maximum bid) in bridge all the time so their team is declaring and not defending, the way you see people randomly going all in in poker when money isn’t on the line (or only a very small amount of money is on the line).

I don’t know about that. I mean, yeah, I think Luu is right about people being willing to play serious bridge or Scrabble or whatever for zero stakes but not doing so with poker, but I don’t think it’s just a convention.

Some possible reasons

So let me throw out a few reasons why it’s essentially impossible to play a fun and serious game of poker not for money, even though people have no problem doing this for many other board games:

1. There’s a historical relation between recreational game-playing and gambling. I’m not an expert here, but my impression is that if you went back a hundred years ago, when people played bridge, gin rummy, poker, cribbage pretty much any card game, it was usual to play for money. Not to mention dice games, which are only played for money. Nowadays I don’t think anyone plays gin rummy–it’s just too damn boring, and there are too many other competing leisure activities.

2. Low effort, high risk, high reward strategies (what Luu calls “trolling”) exist in poker more than in other games. What would be the equivalent in Scrabble, for example? Maybe trading in your letters more often in the hope of getting a seven-letter word? But that’s a lot of work, especially if you’re not a top player. (If you are a good player, then trading in can be a legitimate strategy, just as going all-in can be a serious play for a good poker player.) In chess, you can play more wildly, more offense and less defense, sacrificing pieces for a positional advantage—and players are more likely to do these fun plays in a home game with no stakes than in a tournament where rating points are on the line. There is some “trolling” in chess too–for example, goofy openings where you purposely block off your own pieces, just to get to an interesting position unlike anything your opponent is familiar with–but that’s not quite the same as going all-in; the poker equivalent would be more like a strategy of betting in a slightly irrational way to throw off the other players.

Or what about Uno? Uno’s a boring game but it has the pleasant feature that it requires no thought to play; it can be relaxing in the same way that it’s relaxing to watch a baseball game on a sunny afternoon. When you play Uno for no money, I guess you play with less focus than if you’re playing for money, but it’s pretty much the same game.

I guess my point is that, in any game, the lower the stakes, the more opportunity for silly play, but poker is one of the few games where trolling can be exciting. The closest analogy would be ping pong. Slamming it on every point is like going all-in in poker: it’s exciting, you’ll probably miss, but it’s very satisfying when you win.

3. Poker is a multi-player game. In ping-pong you can have a friendly game where both players are slamming every point, or a friendly game where both players are trying their hardest to win, or a friendly game where both players are just hitting it back and forth–any of these are possible. But in zero-stakes or low-stakes poker, it only takes one player to troll and it throws off the whole game.

4. Poker’s a skill game but not completely a skill game. Luu writes:

I would’ve thought that playing in the largest public cash games around would be the equivalent of joining a local open chess tournament, where anyone who started as an adult, let alone as a middle aged adult, will get demolished by IMs/FMs/NMs (I looked up one random local chess tournament, and there was an IM who placed 3rd). But you can play poker for two weeks and sit down at the biggest public games in town and do fine (there are, supposedly, some well-known private games that are a bit bigger than the largest casino games and I have no idea what the level of skill in those games is). Part of that may be down to variance, but part of that seems to be that the local level of play in poker isn’t all that high, at least in the largest public cash games around. . . .

I strongly suspect the best poker players are much better at poker than the best modern board game players. But, for some reason, you don’t see this difference expressed in local games in the same way that you would if you went down to the local chess club.

I just think the range of abilities, from beginner to intermediate to expert, is much wider in chess than in poker. I’ve played poker with some people who are clearly worse than me and some who are clearly better than me–but these differences are nothing like the difference between me and a really bad chess player, or the difference between me and a really good chess player.

5. The structure of the game. Poker’s much more interesting when you play it for money. An 8-hour poker session is commonplace, but people usually would not want to play a board game for 8 hours. And nobody would play 8 hours of poker if not for money (unless, say, you’re trying to get practice for a future money game)–it would just be too boring.

There a scene in Valis, I believe, where Dick is in a mental hospital and they’re playing games like Go Fish. There’s the opportunity to play poker, but not for money, and Dick says that poker is not a card game, it’s a money game. And he’s got a point. Money is central to poker in a way that it’s not in chess or Scrabble or even bridge. In poker, you’re not just playing for money; the game is built around betting. Money is involved at every stage of the game play.

6. In money poker, the goal is not to win; it’s to improve your bank balance. This makes a difference. For example, suppose it’s the end of the night, you’re down by a lot, and you’re in one last big hand. If your only goal was to end up a winner, you might be motivated to risk a big outlay even if it only gave you a small chance of winning that final pot. But it doesn’t work that way with money. Being down $100 is bad, but being down $300 is worse. It’s not like football where you might as well throw that Hail Mary pass because, if you don’t try, you’ll lose, and getting that pass intercepted won’t make things any worse.

All said and done, though, I think Luu is on to something when he talks about the culture of the game. I could imagine a version of poker that’s played for points, just like Scrabble, and the goal is to be the winner at the end of the game. I guess the point is that such a game would be kind of boring, closer to gin rummy than to Scrabble.

He’s a music educator evaluating K–12 music education systems, and he wants someone to look at his measurements and statistics.

Ned Kellenberger writes:

I’m a music education researcher working on a book that constructs an international index of K–12 music education systems — indicators, weights, and rankings across 20 developed countries. (Measuring Music: An International Framework for Comparing K–12 Music Education Systems)

I understand your expertise in statistics, and how they intersect with the humanities.

My question is what’s the best way to have the measurement and statistics competently checked? Should I be looking for a particular kind of applied statistician or methodologist, or is there a vetting approach you would recommend for a project like this?

My recommendation is that he should talk with a statistician or methodologist at a school of education.

But if anyone reading this is interested in helping on this project, you can contact him directly: [email protected]

Blogging and writing style

I was invited to pay a visit this month to something called the Inkhaven Residency in California “to talk about the craft, advise, and give feedback on the writing.”

It happens that I was already going to be in the area so I agreed to stop by Wednesday morning.

As the organizer, Ben Pace, describes the program, “We’re bringing 40 writers to Lighthaven to write-and-publish an essay online every day.” So, it’s for people who want to blog. That sounds cool–I’m a big fan of blogging–beyond the evidence of the 12,000 posts and 200,000 comments here, you can see my various posts on the topic, including:
Continue reading

Update on that un-published CDC report on covid vaccines

The other day I criticized the Centers for Disease Control and Prevention for canceling the publication of a report on vaccine effectiveness. Apparently this move to unpublish was unusual; from a news report, “‘I’ve never seen a case where an article in the [Morbidity and Mortality Weekly Report] that got to that stage was not published,’ said Dr. Michael Iademarco, who led the center that included the publication’s operations from 2014 to 2022.”

But then today I was at a conference where Jay Bhattacharya, the acting director of the CDC, was asked about this unpublished report, and he said it was because it was using a really bad statistical method. It was only a brief exchange and there was no time for him to give a reference or to explain why he thinks the method is bad. I still haven’t myself seen a copy of the report so it’s hard for me to judge.

I think the best next step would be for the CDC to release the report in question, along with a critical response from a statistician explaining how the method is flawed. Bhattacharya said it was common knowledge that the method was terrible, so at the very least it would be a valuable educational opportunity to see this article that was on the verge of publication, and to understand its fatal problems. As a citizen, as well as in my role as statistician, I find it frustrating to hear about this dispute and not be able to see the controversial document and an explanation for why it’s not to be trusted.

It may be that the CDC is in the process of doing this. There could be a statistician right now writing that document explaining the problems with the almost-published paper.

Or maybe they sent it back to the original researchers to redo using a better analysis.

It’s kinda scary if the CDC was routinely using a terrible statistical method. Or maybe there’s more to the story. I just don’t know, which is why I’d like to see the study and also to see the criticism.

Three things I forgot to say today at the National Academy of Sciences Workshop on Enhancing Scientific Integrity

Uri Simonsohn and I spoke here:

This workshop will bring together researchers, journal editors, publishers, funders, and scientific association leaders to identify practical, forward-looking strategies for strengthening data integrity and transparency in the social and behavioral sciences. Participants will explore innovative tools and frameworks to detect and prevent errors, promote accountability, and reinforce public trust in research. Discussions will also consider how journals, institutions, and professional societies can adopt fair, sustainable practices that support scientific rigor while ensuring accessibility for researchers across many contexts and settings.

I brought up some relevant points, including:

1. The science-reform movement as an awkward alliance between reformers (who anticipated that failed replications will cause people to move away from some bad published ideas) and status-quo people (who anticipated that successful replications would validate various now-controversial studies in the past).

2. It’s not always clear at first that a paper is bad, but then in retrospect its problems jump out at you. The analogy I gave is that Arthur Conan Doyle was fooled by photos of garden fairies that, years later, were obvious fakes. This is one rationale for post-publication review.

3. We should consider variation when hypothesizing effect sizes, and this connects to the point that researchers and the public should be more accepting of uncertainty. This is really the most important point. Later in the conference the health economist Jay Bhattacharya discussed the problem that people don’t know whether published research is true. It’s a good point, and it leads to the next step, to move beyond the expectation that research results should produce certainty. Even the cleanest and best study is only telling you about some set of people under some conditions at some particular time. Future effects on new people in new settings will differ.

4. The Bayesian cringe.

5. The role of the field of statistics in improving science (multilevel modeling!), and at times making science worse (null hypothesis significance testing!). As I said to Uri, I do think we’ve developed a better statistical understanding in the past fifteen years, and this has allowed us to understand and address replication concern in ways that were not done in previous decades.

But there were some other things I meant to talk about but I never got around to saying:

1. My frustration that many of the people who promote bad research don’t seem to even care about the work that they’re promoting. For example, consider the physicist who pushed the ridiculous claim that scientific citations are worth $100,000 each, or the biologist who pushed the ridiculous claim that chess players burn 6000 calories per day. If they really cared about these things, they could try to study them! For example trying to trace where this $100,000 is going to, or studying the variation in the value of paper. Or trying to understand physiologically where those 6000 calories were going. But nooooo . . . they just want these B.S. factoids. Or the people who studied ovulation and voting, but got the dates of ovulation wrong. So often it seems that the critics care more about the topic than the people who are out there pushing these claims.

2. The role of the National Academy of Sciences. The Academy is sponsoring this workshop, and the workshop has gone well, but I also wanted to point out that the National Academy of Sciences is part of the problem too! Their journal has published some notably bad articles (air rage, himmicanes, ages ending in 9, etc etc). I have no reason to believe that PNAS is worse than other journals, but it does get some attention.

3. The problem of junk science as a betrayal of trust. Later at the conference, the political scientist Skip Lupia made the point that research is expensive and it’s the responsibility of the academic community to justify this to the taxpayer, especially in a modern information-rich environment where many people might feel that they don’t need academic institutions at all because they can find everything online. And I agree with this. But, even beyond waste of resources, it seems to me that when credentialed scientists promote junk, this degrades the reputational coin of the realm. As it should be. Every time a researcher at Harvard or Stanford or the University of California or wherever is promoting ridiculous work, every time an academic podcaster plays the promotion game, etc., this does its part to discredit the scientific enterprise. And this makes me mad.

I guess I can see why that last point never came up in the discussion, because I expect that everyone in this meeting is, like me, incensed by that sort of scientific careerism. So I didn’t need to say that.

I do like the idea of replication being a norm.

For example, imagine a world in which, when this psychology professor tells his Stanford class that chess players burn 6000 calories per day, that some student would raise their hand and ask, “Where did that number come from?” And then, if the professor were to supply some reference or rationale, another student could ask, “Is there any outside confirmation about that claim?”

I don’t know how the professor would answer such a question. Maybe he’d give another reference, maybe he’d say he doesn’t know, maybe he’d just ignore the question and move on . . . there are many possible responses. The real point is to set up the expectation that there be a response. The goal is to move beyond the pattern of strong claims supported by vague references. When you look carefully, you’ll often find that the evidence claimed in support isn’t always there.

Beyond the direct value of the replications themselves, there are benefits from thinking about replication, in part because it moves you to think about evidence and to think about how the conditions of an experiment can vary. If, instead of thinking of that $100,000 per citation or those 6000 calories as cool numbers, you think seriously about their variation and you think seriously about replication, the claims themselves will crumble. And then, pushing it back one step, maybe you’d think twice about promoting those sorts of stupid claims in the first place.

So, I guess the thing I’d like to have added to the discussion is a clearer discussion of the links between the procedures of science and science reform (publication, replication, etc.) and the particular claims being made.

Trump 1 vs. Trump 2: The role of the two other branches of government (legislative and judicial)

There’s a lot of discussion about how the second Trump administration is much more out of control than the first, and I’ve seen lots of reasons offered for this. In no particular order:

– This time there are no “grownups” in cabinet positions to talk the president out of bad ideas. Instead, the government is run by some combination of ideologues and airheads–sometimes both at once!–who actively come up with bad ideas themselves.

– Trump is getting older and more incoherent and delusional, which diminishes his common sense, increases his susceptibility to whatever stupid or criminal idea is proposed to him, and reduces his ability to resist bad ideas or to weigh options.

– Lack of serious consequences for the 6 Jan 2021 insurrection emboldens extremists within the government to break more laws.

– The news media environment is more fragmented, so it’s easier for the government to dodge bad coverage.

– At this point Trump is so unpopular that his party is likely to lose lots of seats in Congress no matter what he does, so they’ve moved to full bust-out mode, just pushing all the buttons they can before they lose access to the public treasury.

I’m framing this in a negative way, but if you’re a Trump fan, the same arguments apply: less of a “deep state” to stop the government from shooting up boats that might harbor terrorists, arresting people without the usual red tape, shooting protesters, starting wars, pardoning patriots who just happened to have committed crimes, etc., the hell with the legacy media, doing whatever it takes to swing the pendulum back to the center after the far left excursions of the Obama-Biden years.

In any case, here’s an important explanation that I don’t think is being mentioned often enough, and that is that the right wing of the Republican party controls all three branches of government.

By “all three branches of government,” I don’t mean the presidency, the House of Representatives, and the Senate. I mean the legislative, executive, and judicial. The last time one wing of one party controlled all three branches of government was the left wing of the Democratic party in the mid-1960s, and they did a lot, indeed they did some things that took decades to roll back.

The Republicans did control all three branches of government in 2017-2018 after Trump’s first election, but Congress and the courts were not dominated by the right wing as they are now. Moderates had some influence. I’m not saying that Congress and the courts always say yes to the president right now–nor, for that matter, did liberal Democrats in the 1960s always get what they wanted–but, despite the occasional conflict, there’s an ideological and partisan uniformity.

In short, I think that if the executive branch had tried to do what it’s doing now, back in 2017 and 2018, it wouldn’t have gone through. The Senate wouldn’t have confirmed Pete Hegseth. The courts would’ve been less forgiving. The president’s office would’ve moved more slowly because of the recognition that they’d have to get congressional approval for wars, tariffs, domestic surveillance, etc.

None of what I’m saying here is new. Everybody knows about the three branches of government. But I think it’s easy to focus on the colorful personalities and crazy doings in the executive branch and forget about the constraints–or, in this case, the relative lack of constraints–that they face.

P.S. Lots of discussion in comments about the political positions of the Democratic and Republican parties–which, don’t forget, can differ a lot from the positions of Democratic and Republican voters, not to mention independents. That’s all fine; these are important topics to discuss.

I just want to emphasize that the point of the above post is not about whether particular policies are good or bad, or even where they stand on the left-right spectrum. Rather, my point is that the unleashed nature of the second Trump administration can, to a large extent, be explained by the lack of constraint resulting from a unified control of all three branches of government. It’s easy to get lost in the details and then to forget this simple point.

Probability theory corner: My favorite birthday-problem story

Since we’re on the topic of the birthday problem, I wanted to share a story from my review of Dan Davies’s book, Lying for Money:

On p.124, Davies shares an amusing story of the unraveling of a scam involving counterfeit Portuguese banknotes: “While confirming them to be genuine, the inspector happened to find two notes with the same serial numbers—a genuine one had been stacked next to its twin. Once he knew what to look for, it was not too difficult to find more pairs. . . .”

The birthday problem in the wild!

P.S. I sent this story to John Cook, saying that this was the first time I think I’ve ever seen this particular problem come up in real life. John replied that birthday problems come up all the time in cryptography, e.g. hash collisions, and he pointed to this post from 2017:

Ideally, a secure hash is “indistinguishable from a random mapping.” So if a hash function has a range of size N, how many items can we send through the hash function before we can expect two items to have same hash value? By the pigeon hole principle, we know that if we hash N + 1 items, two of them are certain to have the same hash value. But it’s likely that a much smaller number of inputs will lead to a collision, two items with the same hash value.

The famous birthday problem illustrates this. . . . Variations on the birthday problem come up frequently. For example, in seeding random number generators. And importantly for this post, the birthday problem is the basis for birthday attacks against secure hash functions. . . .

I had no idea! John also points to Pollard’s rho algorithm as another real-world application of the birthday problem.

If that CDC report had just included some fake citations and some crazy dietary advice, the boss would surely have approved it for publication.

From a news article, “C.D.C. Cancels Publication of Study Showing Benefits of Covid Vaccines”:

The acting head of the Centers for Disease Control and Prevention has canceled the publication of a study that found that the Covid vaccine sharply cut the odds of hospitalizations and emergency visits last winter, a Health Department spokesman said. . . .

The study, conducted by C.D.C. scientists, calculated the effectiveness of Covid shots by looking at the vaccination status of people who had sought care at hospitals and emergency rooms. It found that vaccination cut the likelihood of emergency visits due to Covid by 50 percent and of hospitalizations by 55 percent, according to a summary of the study viewed by The New York Times.

It was scheduled to be published on March 19 in The Morbidity and Mortality Weekly Report, the C.D.C.’s flagship journal. News of its cancellation was reported earlier by The Washington Post.

Some former C.D.C. officials said it was unusual for the head of the agency to cancel a scientific publication that had already been cleared by the agency’s staff scientists and had been scheduled for publication.

So what happened?

Andrew Nixon, a spokesman for the Department of Health and Human Services . . . said that assessment “identified concerns regarding the methodological approach to estimating vaccine effectiveness, and the manuscript was not accepted for publication.”

But:

“I’ve never seen a case where an article in the M.M.W.R. that got to that stage was not published,” said Dr. Michael Iademarco, who led the center that included the publication’s operations from 2014 to 2022.

And:

The approach employed in this research has been used for years by scientists at the C.D.C. and elsewhere to gauge the real-world performance of flu and Covid vaccines, said Dr. Fiona Havers, a vaccine expert who resigned from the agency in June.

No link to the report itself. Maybe the authors should anonymously email it to [email protected] and then it can appear in the next file dump.

It must be horrible to be working for CDC right now. They were literally shot at by an anti-vax terrorist, and now the in-house anti-vaxxers are suppressing their reports. Meanwhile the government is releasing health-related reports with fake citations and is releasing dietary guidelines which are so bad that even a supporter of these guidelines can do no better than describing them as “not crazy.”

So, that’s the way it’s going. The report with fake citations is released. The “not crazy” (actually, crazy) advice is promoted. The CDC report is suppressed. I guess it doesn’t meet the government’s high standards. Maybe if they’d thrown in some fake citations and some nutty health advice, it would’ve been approved for publication. That’s how you get “gold standard science,” right?

P.S. More here. I hope that future updates are coming.