A tool for learning about Fourier transforms

Eric Novik came by my talk the other day and we were chatting about a number of things, including how much we forget as the years go by. I remarked that I used to be very comfortable with Fourier analysis and was able to use it as a research tool—see section 2.2 of my Ph.D. thesis, and it also came up in my research leading to R-hat (although it didn’t make it into the writeup)—but at this point I only understand Fourier analysis on a conceptual level. It’s not one of these things that stuck with me.

In response, Eric pointed to this app that he created (with chatbot assistance) to help him my understand some things about Fourier series. Maybe it will be useful to some of you too. The source code is here.

Gray Davis, Grover Norquist, and a rabbi walk into Peter Thiel’s Dialog conference . . . and get no press coverage!

You know that Oscar Wilde saying, “There is only one thing in the world worse than being talked about, and that is not being talked about”?

This came to mind with respect to three once-famous people: Gray Davis, Grover Norquist, and a rabbi.

Act 1 (2021-2022): I receive emails from some sort of, ummm, I don’t want to call it a “scam” exactly . . . let’s call it a “networking event,” featuring luminaries such as “Gray Davis – Of Counsel, Loeb & Loeb. Fmr. Governor, California. [Los Angeles],” “Grover Norquist – President, Americans for Tax Reform. [Washington, D.C.],” and “David Wolpe – Rabbi, Sinai Temple. [Los Angeles].”

It seemed to be a great opportunity–just look at the email:

Hello Andrew,

We’ve heard a lot of great things about you, which is why you’ve been selected for membership. Dialog members–ranging from scientists to elected politicians, CEOs, artists, economists, media figures, and political dissidents–regularly convene to intellectually challenge each other in off-the-record conversations exploring pressing issues. We think you’d add an exciting perspective!

On the other hand, they were charging $16,846, which, as you may have heard, would cover the cost of a lot of Jamaican beef patties.

If they’d really heard a lot of great things about me, and they thought I’d add an exciting perspective, you wouldn’t think they’d charge me for the privilege, right?

I asked the organizers, who replied:

I absolutely get it; the majority of those who are invited to Dialog typically only attend conversations or gatherings as the keynote speaker, and if money is involved, it’s typically because they’re being paid to attend.

To keep Dialog fully independent and off the record, it is 100% participant funded–everyone who attends pays to do so.

Wow! So Gray Davis, Grover Norquist, and the rabbi were paying thousands of dollars to mingle with each other? It kinda makes you wonder. One of the other listed members was as the “Turki Al Faisal Al Saud, Former Minister of Intelligence, Saudi Arabia”–no, I’m not kidding there! I wonder if they let him take the bone saw on the plane? I bet he had a great conversation with “Zeke Emanuel – Vice Provost for Global Initiatives, Professor & Chair, Department of Medical Ethics and Health Policy, University of Pennsylvania.” And what about “Lawrence Summers – President Emeritus & Professor, Harvard University. Fmr. Secretary of the Treasury, United States”: did he really pay? It’s hard for me to imagine Larry paying for anything out of his own pocket. Maybe he got some friendly Harvard donor to fork over the money?

As I discussed in the above-linked blog post, I could see reasons why Gray Davis or Grover Norquist might want to talk with me, and I could see reasons why I might want to talk with Gray Davis or Grover Norquist, but I can’t figure out why each of us needs to spend $16,846 to do it. We could just talk on the phone for free!

Act 2 (2026): This arrives in the inbox:

I am reaching out on behalf of the WIRED team. We are working on a story about Dialog, the private, invite-only organization co-founded by Peter Thiel.

WIRED has obtained internal Dialog records, exposed by its website, including a membership directory and the registration list for the group’s 2026 retreat. Your name appears in them.

We wanted to give you the opportunity to comment before we publish. We’d welcome any response, including whether you’d confirm your affiliation with Dialog and anything you’d like to say about the group or your involvement.

Our deadline is 1pm EST, but if you’d need more time to prepare a response, please let me know as soon as possible.

As many of you know, I never check my email before 4pm. It’s actually daylight time here in New York, not standard time, but either way it’s before 4.

In any case, another email arrived soon after:

My sincerest apologies for this mixup! Please ignore our previous email. Your name was mixed up with a list of Dialog attendees we are trying to reach. We’re actually reaching out because we saw your 2022 blog post about being invited to the event, wanted to mention it in our story, and thought you should have the opportunity to comment on it, if you wanted to.

That evening I saw the message and replied:

Hi, sure, feel free to quote me. I stand by what I wrote before. I’ve never actually attended the Dialog event, as I have better uses for my $16,000.

The news article appeared soon after, under the title, “Leak Exposes Members of Peter Thiel’s Secretive ‘Dialog’ Society,” with subtitle, “More than 200 of the world’s elites registered for a retreat whose agenda runs from panels on cult-building and sex to prepping for World War III. An associated app offers matchmaking.”

Wow–I had no idea! I have to say, the idea of seeing Gray Davis, Grover Norquist, and a rabbi talking about cult-building and sex . . . ok, still not worth $16,846, but maybe there’s some entertainment value there.

Act 3 (2026): The story was picked up by other news organizations. I know this because a few of them contacted me directly and asked if I had anything more. I forwarded them three of the emails I’d received back in 2021 and 2022. There was also a story in the Hollywood Reporter (Palko pointed me to it) mentioning anti-free-press warrior Peter Thiel and a bunch of movie stars and executives, the most notable of whom was Benj Pasek, one of the composers of the music for La La Land.

It’s hard for me to picture Benj Pasek forking over $16,846 for the opportunity to mingle with Gray Davis, Grover Norquist, a rabbi, and the head of Saudi intelligence. But maybe his agent paid for it? I dunno.

Act 4 (2026): Here’s what I’m wondering. What do Gray Davis, Grover Norquist, and the rabbi think about all this? Each of them is a bigshot in his own field (failed politician, political lobbyist, religious leader), but none of them is important enough to be mentioned in any of these news articles.

How humiliating!

There was a time when the name Gray Davis meant something, a time when Grover Norquist had armies at his command, a time when a rabbi could call down thunderbolts. And now they’re just anonymous names in a list. What a comedown. Here’s my advice to these three guys: Fire your publicist.

I hope at least that they enjoyed the conferences. $16,846 is real money!

The only thing I don’t get is why the news organizations are making such a big deal about all of this. It’s an annual conference where rich guys spend thousands of dollars to be in each others’ company. No joke, it doesn’t sound much different from a country club.

And there’s this whole bit about the membership list being a secret. I don’t get why this is supposed to be a big thing either. Country clubs keep their membership lists secret too–it’s part of the whole exclusivity cachet. They’re not the public library, y’know!

Summary

Gray Davis, Grover Norquist, and a rabbi got the worst of all worlds. They had to go to a boring conference, they paid $16,846, they got no press coverage out of the deal, and they didn’t get any Jamaican beef patties.

I have no idea what food they serve at the Dialog conferences. I’m guessing it’s standard crappy upscale catering food, nothing nearly as good as you could get for $2.85 at Golden Krust here on 125 St.

LLM-generated Stan case study on Galileo’s inclined plane experiment

This post is from Bob.

I’ve been planning for at least a couple years to generate a case study around Galielo’s use of an inclined plane instrumented with water clocks to estimate the terrestrial gravitational constant. Here are some photographs of a replica in the Museo Galileo (click to blow them up). And here’s a video simulation of the experiment. We replace his clever pendulum apparatus explained in the video and the web page with simple Bayesian statistics so we can actually estimate the gravitational constant.

The case study

Here is a draft.

Bob Carpenter. 2026. Estimating g from Galileo’s Water Clock: A scientific Bayesian inverse problem with Stan and CmdStanPy. GitHub.

I list myself as the author here because I’m responsible and AIs can’t own copyright in the U.S., but 100% of the text and code was written by Claude Opus 4.8 (medium or high effort, but I can’t recall which). I used the desktop app, which doesn’t allow sharing, but you can try it yourself.

The prompt

Here’s the sloppy prompt I used, which I just typed in without much thought in a couple minutes to get a feel for what it could do on its own.

I would like to generate a case study written in Quarto and using CmdStanPy to demonstrate solving scientific Bayesian inverse problems. I want to use a simulation of Galileo’s water clock experiment, which can be used to estimate the gravitational constant. I would like you to start by generating the mathematical model description in LaTeX, the model code in Stan to solve the inverse problem, and a simulation driver in Python using CmdStanPy and plotnine for plotting. Please just `import plotnine as pn` and use `pn.geom…`, etc. All I need in the output now is a call to `.summary()` on the fit returned by `.sample()`. Wrap this all up in a quarto document for me from which I can generate HTML by calling `quarto render galileo.qmd`.

It was done before I got back to my desk with a cup of coffee (well under five minutes). So not quite the several hours Andrew said it took him to write his case study on the New York Knicks basketball team, which he posted earlier today. Of course, this was much simpler and I didn’t have to think through any details before generating it.

Is it right?

What Claude produced looks really good to me. If a student had done this, I’d given them an A. I can’t object to the way it described Galileo’s experiment, wrote the math, wrote the Stan code, wrote the Python simulation, or plotted the raw data as Andrew is always urging us to do.*

The source

You can find the source .qmd file on my GitHub:

https://github.com/bob-carpenter/case-studies/tree/master/galileo-gravity

It’s short, so I would have just included it, but the blog software blocked my post after considering it an attack on the site. To get it to render with resources embedded, I had to ask Claude a follow-up question and manually insert a single line of config into the .yaml header for the markdown document.

Putting this blog post together took longer than writing the prompt and checking the results.


*   Maybe Claude runs a little simulation of Andrew like I do. Andrew himself claims to run a simulation of Jennifer Hill—it’s the basis of his
handy statistical lexicon entry for “WWJD,” which he told me stands for “What would Jennifer do?” Unfortunately, neither the lexicon entry nor its underlying link explains the acronym.

Gambling provides a gentle rocking of the emotions to put you in a pleasant baby-like state

A commenter recommended the book, Addiction by Design: Machine Gambling in Las Vegas, by the anthropologist Natasha Dow Schüll, and I checked it out of the library. It’s a study of people who play slot machines and video poker, focusing on the locals: Vegas residents who have some low-level gambling addictions as part of their lives.

Nowadays, I guess that much of this business has been supplanted by machine gambling that you can do on your phone in the comfort of your own home. But the market for gambling must be far from being tapped: I imagine that there are many millions of potential gambling addicts out there, available to be hooked by some form of gambling or another.

As a statistician, I have mixed feelings about gambling. Ever since I was a kid, I’ve thought that probability is cool, and I like to bet. When we were kids we had a toy roulette set that we would play (just betting chips, not real money) and I’ve enjoyed poker and informal sports betting. The last time I’ve bet on anything was about 20 years ago, but that’s just more me getting older than anything else.

At the same time, there are all these addicts, and all the people who might not be addicts but who still degrade their standard of living, not to mention reward evil people (even if they’re pleasant as invididuals, they’re in an evil business; sorry, Nate!). And it just keeps getting worse.

To a statistician, this is all an endlessly fascinating topic: the odds and all that, but also whatever it is in people’s brains that motivate them to spend thousands of dollars on lottery tickets, etc.

As Schüll writes in her book, the popularity of machine gambling (which she says is the source of the majority of casino gambling profits in Vegas) is particularly puzzling in that people are just pulling the lever over and over again, without the sense of human context or any feeling of agency.

There’s also the interaction between the players and the people who make money from the machines:

For extreme machine gamblers, the experience of play is an end in itself–an “autotelic” zone beyond value as such, in that “no other reward than continuing the experience is required to keep it going.” Conversely, for the gambling industry the zone is a means to an end; although it carries no value in and of itself, it is possible to derive value from it. . . . In effect, gamblers’ drive to remain indefinitely suspended in the zone is rerouted, via the technological detours of the gambling industry, toward a destination of complete depletion.

It’s not just “the technological detours of the gambling industry,” it’s also politics: the industry doing what it takes to keep all this going, a gradual effort over many decades that continues to this day.

Later, Schüll summarizes:

Gambling addicts play machines to suspend themselves in a state of equilibriated affect.

This seems pretty accurate.

I would just add two things.

First, this equilibrium is not flat. It’s periods of stress, punctuated with the occasional excitement of winning and the frequent relaxing calm of losing. The best analogy I can think of is the way that a baby is calmed, not by lying completely still, but by being rocked in a somewhat irregular fashion.

Second, stakes matter. That “state of equilibriated affect” can only be achieved when real money is involved. I guess this is related to the phenomenon of habituation in drug exposure. Schüll talks with someone who started on a zero-stakes poker video game but them moved to the machines that take real dollars. We discussed this general idea recently in our post, Why isn’t it possible to play a fun and serious game of poker not for money?

It’s a good thing that babies don’t work that way–you can rock them a reasonable amount and they’ll be happy. No need to keep upping the stakes until the crib does a loop and the baby flies out the window. Although I guess that might happen if there were money in it.

Elmore Leonard.

With Leonard’s reputation as a Western author growing, [Detroit-based advertising agency] Campbell-Ewald saw fit to match Leonard with their truck division, writing copy geared toward the same rough-and-tumble demographic that, essentially, would read like a Western paperback. “Truck ads I had an easier time with,” he later admitted. “You could be straightforward with a truck . . . I’ve never been any good at similes and metaphors.” Much like his father before him, Leonard was soon sent traveling around the country for company “field work,” gathering customer testimonials from satisfied truckers. “I would call on the Chevrolet dealer, who would then introduce me to a truck owner who had some fantastic story to tell about his trucks,” he would later claim, prompting the owner to “say something colloquial,” in the hopes of shaking loose some down-home phrases to tinker with. However, Leonard’s favorite–“You don’t wear that sonofabitch out, you just get tired of looking at it and buy a new one”–proved too gritty for Chevy.
— from Cooler than Cool: The Life and Work of Elmore Leonard, by C. M. Kushins.

As the above quote illustrates, this is an interesting, well-researched, and well-written biography, much better than the biography of John D. Macdonald that we discussed a few years ago. Kushins begins with a brisk and effective overview of Leonard’s childhood and then moves quickly into the career, bouncing between the style and themes of Leonard’s stories and books; the details of writing schedule, agents, and contracts; and enough of his activities outside of the writing to give a sense of how his life fit together. Thanks to Leonard’s long and stable career, Kushins is also able to spread the details uniformly through the decades, unlike for example any biography of J. D. Salinger, which won’t have much to say for the final decades of that writer’s life.

The main weakness of Kushins’s book for me is that it doesn’t talk so much about the novels themselves. There’s a lot on how they were written and on their general themes (good guys and bad guys, the roles of the women characters, religious themes, some other things) and on their style (notably, Leonard’s move from Westerns to crime capers and his ear for dialogue), and on movie adaptations and helpful literary agents and how he did his research and where many of the character names came from and all sorts of fascinating things–overall I enjoyed the biography and I recommend it–, but I would’ve liked to see more actual literary criticism, some detailed discussions of the novels and what made them work, as well as, sometimes, what didn’t.

I first learned of Elmore Leonard around 1981, it must have been from a book review in the Washington Post. Phil and I read a bunch of his books with pleasure–my favorite is Swag, which I actually read a few years later–and I also learned about George V. Higgins, an author to whom Leonard was often compared. Over the years, I’ve read almost everything by Higgins that I could find.

Elmore Leonard vs. George V. Higgins . . . what to say? Leonard had a long and successful career, whereas Higgins started at the top and worked his way down. And on a sentence-by-sentence level, Leonard was a better writer: Higgins had a lot of clunky sentences and was notorious for not rewriting. But I think that, of the two, Higgins was more of an artist. There’s something special about Higgins that makes me really love his writing, despite the flaws. Leonard was great too–I think Swag is a close-to-perfect crime novel–but, I don’t know, I don’t have the same feeling of being transported. I want to say that Leonard is Wings and Higgins is the Velvet Underground . . . no, that’s not quite right . . .

What else? Both Leonard and Higgins wrote about loquacious lowlifes. Leonard wrote with more affection, Higgins with more cynicism, but both had a habit of playing favorites with their characters, liking some and finding others irritating. Which can lead to some absolutely wonderful things, such as the pitch-perfect final line of Swag.

The other thing is . . . oddly enough, neither Leonard nor Higgins had great plots, or great characters. A crime novel needs a plot, and both authors’ plots were serviceable, often excellent in the details (for example, the robberies in Swag and The Friends of Eddie Coyle), but for both authors the plots weren’t much more than vehicles to allow for stunning set-pieces of dialogue and the development of themes of friendship, betrayal, etc. As to the characters: it might seem odd to describe these authors’ characters as empty, given that they were portrayed by great actors in memorable films, but . . . ok, let me put this more carefully . . . I wouldn’t say the main characters in their books are one-dimensional, but rather that they are blank. Not completely blank, of course–they have characteristics–and they have a lot more personality than the killers in Agatha Christie books–but not what I’d call memorable characters.

If it’s not the plots, and it’s not the characters, then what are we reading Leonard and Higgins for? The juicy dialogue, sure, but also the situations. Swag doesn’t have an elaborate plot, but the setup of these two criminals with rules for robbery, that’s great. Similarly with The Switch: it’s a great setup. Or The Digger’s Game, with all the events spooling out with a sense of inevitability. I guess you could label all of this as “plot,” but these are not cool plots in the manner of The ABC Murders.

OK, so here you have it: Leonard and Higgins place real (if sometimes blurrily-defined) people into compelling situations, and they make it all run on sharp, hilarious, compelling dialogue. This is actually very cinematic! It’s the setup more than the plot, but the setup only works because you’re throwing (some version of) real people into it.

And why do I find Higgins more compelling than Leonard? Because with Higgins the stakes are higher. Not just that it’s life and death–lots of bodies hit the floor with Leonard too–but, even in the presence of humor, Higgins’s stories are ultimately more serious.

OK, back to the Kushins book, which, again, I like a lot. I’m just bummed that he doesn’t engage with Leonard’s writing, even to the extent that I do above, or to the extent that those book reviewers did, 45 years ago. I’m not saying that Kushins has to say that Leonard isn’t as good as Higgins–he’s a Leonard fan, and I’d expect him to make the case for the author–; I’d just like some discussion of the novels themselves along with all the fascinating details of how they were constructed.

I often enjoy literary biographies and I appreciate that new ones keep being published, given that they must not sell a lot of copies! That said, it seems likely that Elmore Leonard will outlast some other great once-bestselling authors. Younger readers still appreciate his books, at least for now. Higgins, though, unfortunately I think he has no chance. He’s an innovator and I don’t think he’ll ever be completely forgotten, but I think his books are a little too hard to read to ever sustain a revival.

Maybe it helps to write in genre. Readers of crime and science fiction seem loyal to past bestsellers in a way that we don’t always see with mainstream literature.

P.S. I also recommend this review by J. Robert Lennon of a few of Leonard’s books. I don’t agree with everything Lennon says, but that’s fine; it’s what I was looking for, which is a serious engagement with what makes Leonard’s books work. My main disagreement with that review is that Lennon says that Leonard’s strength is creating memorable characters, whereas I think that, as with many crime and suspense novelists, what Leonard does best is to create memorable situations and then work out their logical implications.

R wins statistics award.

Elena Belogolovsky writes:

Congratulations to the R Core Team on receiving the 2026 Rousseeuw Prize for Statistics.

R has made creative, open-ended statistical analysis and graphics accessible to generations of statisticians and applied researchers. It has also been central to statistical research, methodology, and applications during decades when statistics became more computational and more important across science, engineering, business, and public health.

One of the great strengths of R is that it is not just a software platform. It is also a community. The system of R packages allows anyone to implement a new method and share it with the world, helping make statistical research more open, useful, and alive. R has also been the medium for major developments in statistical graphics, transforming applied statistics and the way people work with data.

The volunteers who have developed, guided, and maintained R and the R community are richly deserving of this major award.

I agree with the committee that the R team is an excellent recipient of this award. I say this for several reasons:

– Most obviously, R is super-useful and it’s changed statistics, both by enabling more complicated and reliable analysis and by establishing a common language for statistical coding.

– R integrates statistical modeling with graphics, which traditionally (but, in my opinion, mistakenly) have been thought of as in opposition.

– R is open source. This might sound like no big deal, but its predecessor was Splus, which was a commercial package. Before that came S, which was open but was not set up to expand in a scalable way.

– With its system of packages, R became modular: different groups of users (including me!) could write their own packages and develop new and useful tools without needing to get tangled in core R issues. For example, we have cmdstanr, which lets you run Stan programs from R. This is super-useful for Bayesian workflow.

– R is a programming language, not a menu-based set of commands. This is no big deal now, given that the natural comparison to R is Python, but, back in the day, when R’s competitors were Sas, Spss, Stata, etc., it was a big deal that with R you write programs, you don’t just push buttons. A big deal for workflow in statistics and data science.

– Regarding the R community . . . ok, this gets complicated. Still and all, the R core team is very helpful to outsiders and has been a clear net benefit to the communities of developers, statisticians, and users.

I’m sure I’m missing a few things. My only disagreement with the award citation is that it doesn’t mention S, the statistical software environment developed by John Chambers and others at Bell Labs back in the 1980s. R is a rewrite of S. With lots of improvements, but I do think the S team deserves credit for setting up the template.

Call for invited session proposals for the upcoming BayesComp conference

Lu Zhang writes:

As a member of the BayesComp 2027 conference committee, I would like to share the announcement of the call for invited session proposals for the upcoming BayesComp conference, which will be held in College Station, Texas, on May 18–20, 2027.

The scientific committee is currently soliciting proposals for invited sessions. Each invited session will consist of three speakers, and proposals should focus on timely, important, and broadly engaging topics in Bayesian computation and related areas.

The submission deadline (as of now) for invited session proposals is August 15, 2026.

Proposal form: https://forms.gle/wpYvkkjKGZ5vHqhF6

Additional details are available in the official announcement:

The LOC for BayesComp 2027 is pleased to announce that the next edition of BayesComp will take place in College Station, TX during May 18–20, 2027. The scientific committee is now opening calls for invited sessions. Each invited session will consist of 3 speakers. Proposals should highlight timely, important, and broadly engaging topics in Bayesian computation and related areas. Each speaker may be listed as a speaker in only one invited or contributed session proposal.

Lu is the first author on the Pathfinder paper and continues to do interesting work on Bayesian statistics and computing. Based on what I’ve heard about past BayesComps, the conference should be really interesting.

Survey Statistics: using MRP in later analyses (pride edition)

Happy pride !

One way I celebrated was by reading Lax & Phillips 2009, Gay Rights in the States: Public Opinion and Policy Responsiveness. It’s on-theme, an example in the MrPlew paper (which I also still need to digest), and I wanted examples of using MRP in later analyses.

Lax & Phillips 2009 studied the relationship between state-level public opinion and state adoption of policies affecting gays and lesbians. Andrew blogged about this work in Nov 2008Jan 2009, and June 2009 when he wrote:

Fancy statistical analysis can indeed lead to better understanding. Jeff Lax and Justin Phillips used the method of multilevel regression and poststratification (“Mister P”…

The paper’s appendix includes a NYT article and an almost-rainbow-colored plot:

Lax & Phillips 2009 used MRP to estimate state-level public opinion E(y | s). Let

  • y_i = 1 if person i supports laws to protect against discrimination in job opportunities (for example), = 0 otherwise
  • s[i] = state where person i lives, e.g. NY
  • L_s = 1 if state s has laws to protect against discrimination in job opportunities, = 0 otherwise

Their Multilevel Regression (“MR” of MRP) model had race, gender, age, education, state, and poll effects:

They modeled the state effect with state-level predictors (% religious conservatives, % Democratic voters in 2004):

Then they Poststratified (“P” of MRP) to the population:

Then they used the MRP estimate of public opinion as a predictor of whether the state adopts the policy:
Pr(L_s = 1) = logit^-1(a + b * y_s^pred)

From their Figure 1:

Questions:

  1. (How) did Lax & Phillips 2009 incorporate uncertainty in the MRP estimate of public opinion y_s^pred in their later analysis of its effect on policy adoption L_s ?
    Footnote 7 says they incorporated uncertainty for non-MRP estimates:

    if we use an opinion index based on disaggregation instead of MRP estimates, correcting for reliability using an error-in-variables approach (eivreg in Stata)…

  2. Are results sensitive to whether policy adoption L_s is a state-level predictor in the MRP model ?

The New York Knicks and the martingale property of calibrated probability forecasts (with some simulation and R code)

This long post covers four topics:

1. The Knicks’ stunning series of come-from-behind victories to win the NBA title in 5 games;

2. The martingale property of probability forecasts;

3. An example of learning from simulation;

4. How we (sometimes) do research in probability and statistics.

I don’t know enough about this blog’s audience to know which of the four topics will appeal to most of you. For the internet as a whole, it’s #1; for most of you, it might be #3.

I’m interested in all four, which is why I’m writing this all up right now. I’m embarrassed to say that it took several hours to do this. I was originally planning to post this Sunday morning after the game but it took time for me to get to the task. Most of the effort came from writing the code, not from writing the text. And there’s actually not much code, as you can see if you scroll to the end of this post. The main effort was not figuring out the syntax or even debugging (although there was some of that) but in working out what I wanted to be coding in the first place.

On the plus side, this is research I’ve been wanting to do for awhile, so (a) I don’t think this effort is wasted, even beyond whatever educational and entertainment value if has for you, and (b) I learned a bit from this already. Looking at data is always good; experimenting with simulation is always good.

Ok, here goes.

The NBA finals

Hey, remember this, from game 4 of the recent NBA finals:

Or the trajectory of the game that came after:

Just for completeness, here are the traces for games 3, 2, and 1, also courtesy of ESPN:

In game 4, the Spurs at one point were estimated to have a 99.6% chance of winning. But, as you might have heard, they lost.

Extreme win probabilities

Were those stated win probabilities too extreme?

On one hand, sure, unusual events happen on occasion. If you have a 0.4% chance of losing, that’s something that should happen 1 in 250 times, and there were a lot more than 250 basketball games just in this past season. On the other hand, very unusual event are supposed to happen only very rarely, and there was a point in the third quarter of game 4 where ESPN’s algorithm gave the Spurs a 97.1% chance of winning, a point in game 1 where the Spurs were given a 94.1% chance. There was a moment in game 2 where the Knicks were assigned a 98.2% chance of winning, and, sure, they did win that one, but given that the final score was 105-104, after being tied 97-97 and 104-104, it seems in retrospect that this 98.2% was a bit overconfident.

Should we be suspicious of these probabilities? One way to ask this question is to check calibration: if we collect all game situations where a team has a 99.6% of winning, are they winning 99.6% of the time?

On the other hand, I’m picking the most extreme values of these win probabilities. You should get calibration of win probabilities at any time, and it’s ok to condition on them, but only to condition on what came before.

That is, if we look at win probabilities at the end of the first quarter, or at the end of the first half, or at the end of the third quarter, they should be calibrated. And if you look only at win probabilities only when they’re greater than 99%, they should be calibrated. And if you look only at win probabilities when they are the maximum for the game so far, they should be calibrated. But it’s not clear to me that you should expect calibration for win probabilities selected to be the maximum for the entire game, because if the win probability at time t is p(t), and you condition on the event p(t) < p(t_0) for t > t_0, that could provide information. It’s tricky.

The martingale property of probability forecasts

We wrote about this in section 1.6 of our 2020 article, Information, incentives, and goals in election forecasts:


And it also came up in some blog posts:

from 2020: Do we really believe the Democrats have an 88% chance of winning the presidential election?

from 2020: More on martingale property of probabilistic forecasts and some other issues with our election model

from 2024: “Unusual Betting Patterns With Several Temple Games”: It’s martingale time, baby!

also from 2024: It’s martingale time, baby! How to evaluate probabilistic forecasts before the event happens? Rajiv Sethi has an idea. (Hint: it involves time series.)

I’d expect ESPN’s win probabilities to be closer to calibrated than prediction-market odds or model-based election forecasts. Prediction markets depend on the bettors and there’s no reason to expect calibration, at least not until the market is fully mature in some way. Model-based election forecasts are based on approximate models that have known pathologies (for example here), so they won’t be universally calibrated. ESPN’s probabilities won’t be calibrated either–they too are based on an imperfect model–but I assume it’s model has been trained on tons of data so I don’t think it should be far off.

If someone could send me the moment-by-moment estimated win probabilities from some large database of basketball games, we could take a look.

In the meantime we can get some intuition by simulating from a mathematical model where we can compute win probabilities exactly.

Simulating the process

Assume a simple Brownian motion with drift, where the score differential y(t) starts at y(0) = 0 and then takes a continuous random walk so that y(t) ~ normal(delta*t, sigma*sqrt(t)). We’ll scale t to be in minutes, so the game goes from t=0 to t=48, with the winner being determined by y(48). The drift is then delta=point_spread/48, because this is the expected final score differential before the game has started. And we’ll set sigma=2, which seems reasonable: 2*sqrt(48)=13.8, so that the sd of the final score differential is approximately 14 points.

One cool thing about this model is that the win probability can be trivially computed given the score differential at any point in the game.

How wrong can you be?

To demonstrate, I’ll show the results–the score and the win probability during the game–for 18 independently simulated games. For simplicity I’ll assume the point spread is 0, so the two teams are always assumed to be evenly matched. And I’ll step through the game 10 times per minute, thus approximating the game as a sum of 480 independent increments.

The code is below; here are the results:

I don’t know enough about basketball to have a sense of how plausible these are as game outcomes (setting aside the lack of discreteness in the score; we used a continuous model so that we could more easily compute the relevant probabilities analytically). They don’t look too much like the Knicks-Spurs game except for that one simulation near the lower left of the plot, where the “Spurs” led by 10 points into the third quarter, maxing out with a win probability of 95.6% before eventually losing.

To get a broader picture, I simulated 10,000 games. (Just as a reference point, there are 30 NBA teams, so there are 82*30/2=1230 regular season games each year.)

For each game, I computed “max_p_wrong”: the highest win probability assigned to the game’s eventual loser. In my simulation, every game starts with a 50/50 probability–remember, for simplicity I’m always assuming a point spread of 0–so max_p_wrong must be somewhere between 0.5 and 1. Here’s what comes out:

So, extreme wrong probabilities are not unheard of. How common are they? Out of these 10,000 games, 61 had max_p_wrong greater than 99%. That is, in 0.6% of games, the eventually-losing team exceeds the threshold of 99% win probability during some point in the game.

This result should go up if we move to continuous updating. But we’re already updating 10 times a minute. Increasing this schedule to 50 times a minute increases Pr(max_p_wrong > 0.99) to 0.0075, and increasing to 100 times a minute takes it to 0.0076, so my guess is that this is roughly the continuous limit.

OK, just to check, I’ll simulate 100,000 games, and now Pr(max_p_wrong > 0.99) is 0.0072 with 10 updates a minute, or 0.0084 with 50 updates per minute. So I’ll go out on a limb and say that if we were to compute the exact probability under continuous updating, we’d get 0.0085.

This was a surprise. Before doing this simulation, I was assuming that the probability of p_win exceeding 99% in for the eventual loser at any time in the game would be more than 1% because of selection. I guess my intuition was wrong. Maybe it has to do with the fact that I’m conditioning on which team wins. (Of course, if you go the other way, the probability of p_win exceeding 99% for the eventual winner is 100% in the continuous limit, because with epsilon of a second left in the game the winner will almost certainly be known.)

So, yeah, the above graph is kind of interesting. Under our model, most games won’t stray too far into retrospectively-embarrassing probability estimates, but it can happen sometimes.

It would be interesting to compare the above graph with what you’d get from a database of game-odds data from ESPN or whatever.

Just to be clear: there’s no reason to think that the above graph represents any sort of universal property of martingales. It’s a very specific model! But you have to start somewhere. Also, the existence of various central limit theorems makes me hold out the hope that this could be a general result under some appropriately restricted class of continuous martingale processes. It’s a research question!

A surprising uniform distribution

To get some further understanding of the process, I gathered the win probabilities after the end of each of the three quarters for the 10,000 simulated games. Below are histograms of these probabilities and calibration plots:

Unsurprisingly, the calibration is fine. After all, the probabilities are computed from the same model that the data are drawn from. Indeed, even the apparent anomaly in the lower-left plot is just a small-sample artifact which disappears when we up the number of simulations to 100,000.

More interesting are the histograms. It makes sense that, as the game goes on, the distribution of win probabilities starts at 0.5, then gradually bunches up at 0 and 1. Indeed, at the end of the fourth quarter the win probabilities are exactly 0 and 1.

But it’s funny how the distribution of win probabilities is exactly uniform at halftime. There must be a direct mathematical argument giving intuition for that result; it’s too perfect to just be an accident.

Lots more research to be done here:

– Generalizing beyond the continuous model to allow discrete scoring changes.

– Generalizing beyond the random walk; there’s no reason the model needs to be Markovian.

– Are there general statements that can be made about these distributions of win probabilities under arbitrary martingale processes? I’m guessing there are some results. At least, there should be some inequalities and limit theorems.

– Looking at real data from basketball, other sports, and other realms, including election forecasts and prediction markets.

Our ultimate aim here is to come up with a general measure of departure from the martingale property of probability forecasts. We want something that can be applied to any dataset, obviously with more precision as the series get longer, more finely-spaced in time, and when replications are available (as in those thousands of basketball games).

P.S. Here’s the R code to make the above simulations and graphs:
Continue reading

Ph.D. student opening in Sweden on Earth Observation, Data Science, and AI for poverty estimation

Adel Daoud writes:

I’m writing to ask for your help circulating a PhD opening in my group at Chalmers, the AI and Global Development Lab (www.aidevlab.org). The position is in Earth Observation, Data Science, and AI for poverty estimation, the Data Science and AI division (Department of Computer Science and Engineering). We are looking for candidates with a strong grounding in data science, computer science, deep learning, statistics, or similar— remote sensing experience and causal inference are welcome bonus.

Ad and application portal: https://www.chalmers.se/en/about-chalmers/work-with-us/vacancies/?rmpage=job&rmjob=14818&rmlang=UK
Deadline: 20 June 2026.

Here’s the description of their center:

The AI & Global Development Lab fuses AI with Earth Observation to illuminate the causes and consequences of human development across time and space.

Our interdisciplinary team, comprising data scientists, computer scientists, and social scientists, develops methods to better understand the multi-scale dynamics of pressing global issues, including poverty, conflict, sustainability, and the effectiveness of policy interventions.

By analyzing satellite imagery from 1984 to the present, AI search agent swarms for large-scale knowledge discovery, and other planetary-scale sources, we are reconstructing historical and geographical development trajectories at a level of detail never before possible, working to offer new insights into the changing face of development worldwide.

We also invite you to visit PlanetaryCausalInference.org for more information about the causal arm of our project.

They call it “Planetary causal inference,” which seems to fit the themes of this blog.

Capitalism: On its last legs or healthy enough to be milked?

In The Strange Death of Tory England, a book full of great lines, Geoffrey Wheatcroft writes,

Just as the labour movement had never been quite sure whether the capitalist system was on its last legs and needed only a final push to be toppled, or was healthy enough to be milked over and again, so the cultural-intellectual left had never quite decided whether it liked increasing prosperity or not.

I like the above quote, and I would add something analogous for conservatives, that they have never been quite sure whether the capitalist system is an amazing wealth machine with even low-income people being rich on an absolute scale, or whether the system is so fragile that people can barely afford to pay their taxes and that any particular tax or regulation will bankrupt the system. Unfortunately, try as I might, I can’t manage to phrase this as aphoristically as Wheatcroft did.

I suppose that every political movement must balance between triumphalism and alarmism. For another example, environmentalists will announce their progress in protecting the environment and warn of all the horrible things that will happen if more isn’t done. From the other direction, business groups will say that we can’t afford to protect the environment (we want jobs, not owls) but at the same time insist that the environment is better than ever.

The political science research project all this would be to study these ideologies more systematically and see which groups follow different patterns in their statements.

“Are prediction markets causing more harm than good?”

The other day I was invited to an “anti-debate” on the above topic, scheduled for this afternoon. I’d not heard about the concept of an anti-debate before; here’s the description:

The Anti-Debate is a new format for debate where participants build on each other’s insights, so that greater complexity can emerge.

Despite its name, the Anti-Debate is not anti-debate. It actually starts out like a traditional debate, with opening statements and rebuttals. But then it goes further — guiding participants to explore how they might integrate their perspectives into a bigger picture. Hence our tagline: First Debate, Then Elevate.

Sounds reasonable to me. They refer to the concept of steel-manning, and I’m skeptical of that, but I agree that standard debate formats have problems (just read The Topeka School!) and I’m very open to this sort of alternative.

The organizer, Winter Ku, referred to my posts on “the statistical skepticism about betting markets versus polls (self-reinforcing prices, thin volume), and more recently the integrity and harm concerns in your ‘Uh oh prediction markets’ writing, e.g. manipulation, the absence of insider-trading rules, and the gambling-like risks to vulnerable users,” and it seemed like it would be fun to have a chance to speak on this with several hundred people who might well be inclined to disagree with me. At the very least, I’d get some good questions, lots of pushback, and I’d probably change my mind about a few things.

The anti-debate was to be held at Manifest, an annual festival about prediction markets and forecasting at the same California location that had this blogging workshop a couple months ago. Unfortunately I was only invited to the Manifest thing a couple days ago and I wasn’t able to fly out on such short notice.

I hope the anti-debate goes well without me! Actually, it’ll probably go better without me than with me. I think I’m a careful and interesting writer with lots of good ideas, but I don’t know how well I’d do in a live debate. I imagine I’d get flustered. On the other hand, sharing objections to prediction markets, in front of a crowd coming from a much different perspective than me, but open to listening, could possibly do some good, as well as being a learning experience for me.

So maybe next year! I don’t know if they’ll put the anti-debate up on youtube or whatever; if so, it would be interesting to see the arguments on both sides.

To what extent is it true that “All intelligence, human or artificial, must extract structure from correlational data”?

Someone pointed me to this article, “Does AI already have human-level intelligence?” You can click through to read the whole thing; spoiler alert: their answer is Yes.

I don’t have much to say about the main argument of the article–it’s a topic we’ve gone over all too much in past comment threads–also, as non-user of chatbots, I’m really the worst person to ask for an opinion on the topic. Indeed, the other day I was contacted by a reporter for a story about “vibe analytics” where people use chatbots to write code to perform data analysis. I shared my thoughts for a few minutes but then referred the reporter to Bob and Jessica, as they both have thought a lot more about this than I have. I continue to (a) think that it can make sense to consider chatbots and ping-pong playing robots as having human-level intelligence, and (b) agree with Gary Smith that it remains a big problem when people think chatbots have a level of understanding that they don’t actually have. But, again, my thoughts on this shouldn’t count for much.

But there is one thing in this new article that I did want to comment on. It was just an aside, not the main point by any means, but interesting:

“All intelligence, human or artificial, must extract structure from correlational data.”

Is this true? I don’t know about that, for two reasons. First, I can’t think of many cases where I (that is, my human intelligence) have extracted structure from correlational data. Setting aside my professional life as a statistician and social scientist, when have I done this? I’m not sure. Yes, I’ve estimated parameters from correlational data–for example, if I’m playing sports I make inferences about the abilities of other players based on what they’ve done on the field in the past. But that’s not structure, exactly. There is structure in the world, like the difference between cats and dogs. You can dress a dog up like a cat but it’s still a dog. Essentialism and natural kinds and all that. But that’s not anything I extracted from correlational data: I know it because people told me.

One way that I’ve extracted structure from correlational structure is that as a kid I heard lots of talking and read lots of books and I extracted lots of structure of the language from that. But that’s just one example–an important example, sure, but I don’t know that it’s a characteristic of “all intelligence.”

Another way to look at this is that, as a community, we’ve extracted a lot of structure in the world–it’s called doing science–and some of this is from correlational data (Kepler figuring out planetary orbits, Galton and his table of heights, etc.) but lots of the structure we’ve extracted comes either from logical reasoning (Newtonian mechanics, relativity theory) or from experimentation–they say Galileo did a bit of that.

This doesn’t invalidate the argument made in the linked article–after all, there’s no reason a computer program can’t do pure theory or conduct experiments–; I just thought it was interesting. Speaking in some fundamental sense, it seems to me that experimentation, not just observation, is a crucial part of how we often extract structure. We experiment a lot when speaking. On the other hand, sometimes, as with Kepler or with someone learning a language from reading books, the information is all, or almost, correlational.

It’s an interesting thing to think about. We could throw this at a chatbot and see what it would say–or, more precisely, we could see what it could extract from what humans have said about related topics. But humans have said a lot; it’s a mark of intelligence to be able to read a million books and then extract their key points.

P.S. After reading a bunch of comments, I realize that I kind of missed the point of the passage I was quoting.

My argument above is that intelligence doesn’t learn about structure only by extracting structure from correlational data. Intelligence also learns about structure from logical reasoning and experiment.

But my argument doesn’t refute the quoted line, “All intelligence, human or artificial, must extract structure from correlational data.” That quote doesn’t posit that intelligence only learns from correlations. It just says that learning from correlation is part of the mix, and I agree with that.

So, as long as that passage is interpreted as saying that “extracts structure from correlational data” is necessary for “intelligence,” I’m ok with it. My problem was my interpretation (or misreading) that correlational analysis was sufficient.

Jazz and quantum mechanics: Eventually Dmitri realized that they are kind of similar

Dmitri Tymoczko pointed me to this article by John Baez explaining general relativity. I replied that this seems like some very important stuff, but I’m devoting all of that part of my brain to being confused by quantum mechanics. I have no room to be confused by gravity too!

Dmitri responded:

When I was 13, there were two things I wanted to understand more than anything else in the world: jazz and quantum mechanics.

Eventually I realized they are kind of similar. In both cases, you start with this fabulously complicated 19th-century language — Lagrangian and Hamiltonian mechanics in the one case, and romantic harmony in the other. Then you “twist” it. In the one case, you turn variables into operators, while in the other you add this scale-based improvisational component. But they are both difficult in kind of the same way because you have to learn this whole other language, and then apply this massive conceptual twist.

But quantum mechanics is genuinely mysterious — there’s some basic stuff we don’t know. General relativity is just straightforward geometry, no mysteries to solve.

All I can say regarding the connection between jazz and quantum mechanics is . . . wow. I wish I could play music, hold music in my mind, and read music. I guess that with a lot of effort I could make some progress in all three of these, but I can’t see myself putting in the time, so I’ll just be wistful about it, and I’ll continue to listen to lots of music and read a lot of music.

Here are some relevant posts (on music, not on quantum mechanics or jazz):

In music, literature, and technical writing, the relation of large-scale structure to the local action

Books by Charles Rosen and Jeremy Denk on piano playing and the nature of music

Playing music, listening to music, background music, talking about music

How Music Works by David Byrne, and Sweet Anticipation by David Huron

Why do we prefer familiarity in music and surprise in stories?

The revelation came while hearing a background music version of Iron Butterfly’s “In A Gadda Da Vida” at a Mr. Steak restaurant in Colorado

Luc Sante reviews books by Nick Hornby and Geoffrey O’Brien on pop music

This guy is to music as I am to statistical graphics

“Song for Aki”: Prof reportedly clears a half million bucks by requiring online students to pay $89.99 each for his self-published course notes

Why is modern poetry so hard to read? Adam Kirsch offers a clue.

Causality and Crime: In science as in genre storytelling, the thrill of the unexpected can only come with reference to (and in confounding) some preexisting norm.

And, finally:

My desert island discs

Adjusting for nonrepresentativeness in continuous norming using multilevel regression and poststratification.

Klazien de Vries, Marieke E. Timmerman, Anja F. Ernst, and Casper J. Albers write:

In psychological test norming, nonrepresentativeness in background variables in the normative sample can lead to bias in the normed score estimates. Because representativeness is difficult to establish in practice, adjustment methods are needed to combat this bias. As a candidate adjustment method, we investigated generalized additive models for location, scale, and shape with multilevel regression and poststratification (GAMLSS + MRP), the combination of MRP and continuous norming with GAMLSS. This adjustment method was then compared to current adjustment methods in continuous norming using weighted regression: GAMLSS + P (with poststratification) and cNORM + R (with raking). The results of our simulation showed that GAMLSS + MRP was generally more efficient than GAMLSS + P and cNORM + R. Furthermore, GAMLSS + MRP was better than the current methods at reducing bias in samples where the nonrepresentativeness was age-dependent. We argue that GAMLSS + MRP is a valid adjustment method in continuous norming and recommend this adjustment method to mitigate bias in nonrepresentative normative samples. To facilitate the use of GAMLSS + MRP in practice, we provide a step-wise approach for the implementation of GAMLSS + MRP. We illustrate this approach by deriving normed scores from the normative data of the third Schlichting language test.

I don’t recall how I came across this paper, and I haven’t actually read it, but I wanted to share it with you, just because it’s cool to see the different ways that multilevel regression and poststratification (MRP) can be used.

Ultimately, MRP is the inevitable consequence of three things:

1. We are interested in generalizing to populations of interest.

2. Available data are typically unrepresentative of the population. This is the case even with simple random sampling–Hello, random variation! Hello, small-area estimation!–and is even more so with selected samples, nonresponse, dropout, etc. In some settings such as medical experimentation there’s not even an attempt to get a representative sample: you’re directly aiming to include in the study the groups of people who might get the greatest benefit from the treatment.

3. When adjusting for differences between sample and population, many variables can be relevant–for example, demographic and geographical variables in a survey of people–and so simple adjustments such as raw poststratification or non-multilevel regression adjustment won’t do the job.

Put this together and you’ll want to do MRP (or, more generally, RPP). It’s not just for survey research. It comes up everywhere in statistics and machine learning, whenever there is a concern with population prediction, or generalization, or transportability, or whatever you want to call it.

It can seem like a hassle that to do this you need to know (or estimate, or postulate) a distribution of predictors in your population, but (a) this is often work that’s well worth the effort, if you really care about the population, (b) dependence of the result on the choice of population is important, and where this dependence is strong you should be aware of it, and (c) if you want to take the easy way out you can always bootstrap to get inference for the hypothetical population of which your data are considered to be a random sample.

“The Data Analyst’s Guide to Cause and Effect”

Theiss Bendixen and Benjamin Grant Purzycki wrote this book. He writes:

The website holds:

– All data and code used in the book
– Free sample chapters
– Bonus material

These aren’t quite the same methods for causal inference that I’m inclined to use (for my own approach, see chapters 18-21 of Regression and Other Stories), but their presentation is clear and has code, and it’s always good to see another perspective.

Survey Statistics: should MRP workflow include LOCO-CV ?

Due tomorrow (June 10): Enter a contest for Alexandre Andorra’s interview of Aki, Richard, and Andrew about their new book Bayesian Workflow.

I hope folks ask about evaluating MRP models. We’ve seen:

At Andrew Gelman’s 60-ish Birthday workshop Aki gave a great talk about loo’s 10ish birthday. The loo R package computes approximate leave-one-out (loo) cross-validation. Aki covered a huge range of work across the Bayesian workflow. He said there will soon be a new version of their paper about evaluating MRP models, Kennedy et al. 2024.

Sketch portrait of Andrew Gelman

Kennedy et al. 2024 pivot from the usual individual-level Loss(y_i, yhat_i) to a population-level Loss(E(Y), E(yhat_i)). We don’t have the true E(Y), so they replace it with a classical poststratification estimate (see the post on poststratification). To avoid overfitting, this classical estimate should be calculated on different data than the MRP model itself.

They use leave-one-cell-out (LOCO) cross-validation, a version of leave-one-group-out (LOGO) that we mentioned in “design-based cross validation (dCV)”. In “dCV for MRP ?” we asked if we should be assessing how well the MRP model predicts new groups (e.g. new cells).

Should MRP workflow include LOCO-CV ?

Naming a jail after a convicted criminal

Here’s the background:

Mayor Giuliani took the unusual step of naming the Manhattan Detention Complex, the Lower Manhattan central lockup known informally as the Tombs, after a still-living person: Kerik. Giuliani’s police commissioner at the time, Kerik had previously served two years as his correction commissioner, after first getting to know the mayor as his bodyguard and driver and moving up through the ranks under his patronage.

Naming the jail facility after Kerik became somewhat awkward a few years later in 2006, when he was charged with the first of a series of state and federal crimes ranging from receiving undisclosed and improper gifts to lying to White House officials.

Then-mayor Michael Bloomberg recognized the awkward optics, and Kerik’s name came off the building. “After Bernie Kerik pleaded guilty, it was not appropriate to have that facility named after him,” Bloomberg said. “I informed the [Correction] commissioner of my decision and he expeditiously changed the naming of the sign.”

And here’s the funny part:

Nearly 20 years after Kerik’s name was stripped from the Tombs, in July the DOC quietly reinstalled signage designating the building at 125 White Street the “Bernard Kerik Courts.”

“The late Bernard Kerik served as First Deputy Commissioner of the NYC Department of Correction from 1995 until 1997 and served as Commissioner from 1997 to 2000,” a DOC spokesperson told Hell Gate when asked about the new signage. “The Manhattan Detention Center was previously named in his honor and signage was re-installed on the DOC side of the Manhattan Courts upon his passing.” . . .

Kerik’s professional biography is long, fascinating, and so chock-a-block with outrageous and alarming episodes of moral failure that his life takes on a sort of mythic scale, a tall tale of rolling skullduggery.

An extremely incomplete accounting might include: abandoning his daughter and her mother in Korea; commandeering a Battery Park City apartment donated for the use of tired police and rescue workers after 9/11 to conduct one of two simultaneous extramarital affairs; acting as a sex-police enforcer for a Saudi hospital; taking multi-million-dollar payouts from Taser; tasking police under his command to do book research for him and harass Fox News employees suspected of stealing his lover’s jewelry later found in her bag; and acting as the interim interior minister of Iraq, where he took a quarter-million-dollar, no-interest personal loan from an Israeli billionaire with Defense Department contracts. . . .

In 2009, Kerik pleaded guilty to eight federal corruption charges including tax fraud and lying to White House officials about having helped a company suspected of mob connections get a license in exchange for free renovations to his Riverdale home. For those crimes, Kerik did three and a half years in federal prison. . . .

So, yeah, if you’re gonna name something after this guy, it might as well be a jail! “Named in his honor,” indeed.

This is appropriate in the same sense that is was appropriate for them to name an airport near D.C. after someone who overthrew democratic governments in multiple foreign countries.

But BATF doesn’t take the bait

Also, amusingly I found this news article suggesting that the headquarters of the Bureau of Alcohol, Tobacco, and Firearms be named after a politician whose most famous act was to kill someone while under the influence of alcohol. I don’t think they did it, though. According to Wikipedia, they named it after Ariel Rios, an ATF undercover special agent who was killed in action in 1982. The BATF just doesn’t have the sense of irony possessed by the NYC Department of Correction.

Stein’s method, learning and inference -or- how to really monitor convergence and thin chains

This post is from Bob.

I’ve been thinking a lot about scores (gradients of the log density function) and how they can be used for convergence monitoring. We know that the expected value of the score is zero. Stein generalized this with Stein operators. In the monomial case, the Stein operators give you functions in increasing degrees, all of which have zero expectation in the posterior. Here theta is the variable being sampled and S is the score function, so that S(theta) is the gradient of the target log density evaluated at theta.

    Order 0: S(theta)

    Order 1: 1 + theta .* S(theta)

    Order 2: 2 * theta + theta^2 .* S(theta)

This leads to a natural test for convergence of first, second, and third moments. Just compute Monte Carlo estimates of these quantities and see if they’re zero. We’d want to standardize for standard deviation to make the result scale-free like R-hat. To develop some intuitions, in a standard normal distribution p(theta) = normal(theta | 0, I), we have S(theta) = -theta, and thus S(theta) converges to zero at the same rate as our variable theta converges to its true value; the order 1 test is 1 – theta^2, which we know has expectation zero because theta^2 has a ChiSquared(1) distribution with expectation of 1). The order 1 case corresponds to equipartition in physics and the form D + theta’ * S(theta) also naturally has zero expectation as shown in the viral theorem in physics in the 1870s.

Diving into this a bit more led me back to Jackson Gorham and Lester Mackey’s work on Stein’s method. They haven’t been sitting still since introducing the basic idea, which kernelizes the idea above. Mackey et al. have produced an absolutely wonderful summary of this body of work in two forms. The first is a dense, 41-slide deck with all the key definitions and results. I’d suggest at least skimming this first.

Lester Mackey. April 2026. Stein’s Method, Learning, and Inference.. GitHub.

Mackey along with Chris Oates and Qiang Liu, who have also worked heavily in this area, put together a definitive monograph. They’ve presented a great deal of difficult material in a way that I can digest (though it’s going to be rough going if you’re not well versed in sampling and how MCMC is traditionally measured and evaluated).

Qiang Liu, Lester Mackey, Chris Oates. March 2026. Probabilistic Inference and Learning with Stein’s Method. arXiv.

In particular, they go over Stein variational inference, which seems to me like it would be the ideal way to perform quasi Monte Carlo-like inference for statistical models if we could only get a robust version to scale. The idea’s to initialize a bunch of points, then use optimization to minimize a kernelized Stein discrepancy of the empirical distribution of those points to the true distribution.