How far can exchangeability get us toward agreeing on individual probability?

This is Jessica. What’s the common assumption behind the following? 

    • Partial pooling of information over groups in hierarchical Bayesian models 
    • In causal inference of treatment effects, saying that the outcome you would get if you were treated (Y^a) shouldn’t change depending on whether you are assigned the treatment (A) or not
    • Acting as if we believe a probability is the “objective chance” of an event even if we prefer to see probability as an assignment of betting odds or degrees of belief to an event

The question is rhetorical, because the answer is in the post title. These are all examples where statistical exchangeability is important. Exchangeability says the joint distribution of a set of random variables is unaffected by the order in which they are observed. 

Exchangeability has broad implications. Lately I’ve been thinking about it as it comes up at the ML/stats intersection, where it’s critical to various methods: achieving coverage in conformal prediction, using counterfactuals in analyzing algorithmic fairness, identifying independent causal mechanisms in observational data, etc. 

This week it came up in the course I’m teaching on prediction for decision-making. A student asked whether exchangeability was of interest because often people aren’t comfortable assuming data is IID. I could see how this might seem like the case given how application-oriented papers (like on conformal prediction) sometimes talk about the exchangeabilty requirement as an advantage over the usual assumption of IID data. But this misses the deeper significance, which is that it provides a kind of practical consensus between different statistical philosophies. This consensus, and the ways in which it’s ultimately limited, is the topic of this post.

Interpreting the probability of an individual event

One of the papers I’d assigned was Dawid’s “On Individual Risk,” which, as you might expect, talks about what it means to assign probability to a single event. Dawid distinguishes “groupist” interpretations of probability that depend on identifying some set of events, like the frequentist definition of probability as the limiting frequency over hypothetical replications of the event, from individualist interpretations, like a “personal probability” reflecting the beliefs of some expert about some specific event conditioned on some prior experience. For the purposes of this discussion, we can put Bayesians (subjective, objective, and pragmatic, as Bob describes them here) in the latter personalist-individualist category. 

On the surface, the frequentist treatment of probability as an “objective” quantity appears incompatible with the individualist notion of probability as a descriptor of a particular event from the perspective of the particular observer (or expert) ascribing beliefs. If you have a frequentist and a personalist thinking about the next toss of a coin, for example, you would expect the probability the personalist assigns to depend on their joint distribution over possible sequences of outcomes, while the frequentist would be content to know the limiting probability. But de Finetti’s theorem shows that if one believes a sequence of events to be exchangeable, then you can’t distinguish their beliefs about those random variables from conceiving of independent events with some underlying probability. Given a sequence of exchangeable Bernoulli random variables X1, X2, X3, … , you can think of a draw from their joint distribution as sampling p ~ mu, then drawing X1, X2, X3, … from Bernoulli(p) (where mu is a distribution on [0,1]). So the frequentist and personalist can both agree, under exchangeability, that p is meaningful for decision making. David Spiegalhalter recently published an essay on interpreting probability that he ended by commenting on how remarkable this pragmatic consensus is.

But Dawid’s goal is to point out ways in which the apparent alignment is not as satisfactory as it may seem in resolving the philosophical chasm. It’s more like we’ve thrown a (somewhat flimsy) plank over it. Exchangeability may sometimes get us across by allowing the frequentist and personalist to coordinate in terms of actions, but we have to be careful how much weight we put on this.  

The reference set depends on the state of information

One complication is that the personalist’s willingness to assume exchangeability depends on the information they have. Dawid uses the example of trying to predict the exam score of some particular student. If they have no additional information to distinguish the target student from the rest, the personalist might be content to be given an overall limiting relative frequency p of failure across a set of students. But as soon as they learn something that makes the individual student unique, p is no longer the appropriate reference for the individual student’s probability of passing the exam. 

As an aside, this doesn’t mean that exchangeability is only useful if we think of members of some exchangeable set as identical. There may still be practical benefits of learning from the other students in the context of a statistical model, for example. See, e.g., Andrew’s previous post on exchangeability as an assumption in hierarchical models, where he points out that assuming exchangeability doesn’t necessarily mean that you believe everything is indistinguishable, and if you have additional information distinguishing groups, you can incorporate that in your model as group-level predictors.

But for the purposes of personalists and frequentists agreeing on a reference for the probability of a specific event, the dependence on information is not ideal. Can we avoid this by making the reference set more specific? What if we’re trying to predict a particular student’s score on a particular exam in a world where that particular student is allowed to attempt the same exam as many times as they’d like? Now that the reference group refers to the particular student and particular exam, would the personalist be content to accept the limiting frequency as the probability of passing the next attempt? 

The answer is, not necessarily. This imaginary world still can’t get us to the generality we’d need for exchangeability to truly reconcile a personalist and frequentist assessment of the probability. 

Example where the limiting frequency is constructed over time

Dawid illustrates this by introducing a complicating (but not at all unrealistic) assumption: that the student’s performance on their next try on the exam will be affected by their performance on the previous tries. Now we have a situation where the limiting frequency of passing on repeated attempts is constructed over time. 

As an analogy, consider drawing balls from an urn, where when we draw our first ball, there is 1 red ball and 1 green ball in it. Upon drawing a ball, we immediately return and add an additional ball of the same color. At each draw, each ball in the urn is equally likely of being drawn, and  the sequence of colors is exchangeable. 

Given that p is not known, which do you think the personalist would prefer to consider as the probability of a red ball on the first draw: the proportion of red balls currently in the urn, or the limiting frequency of drawing a red ball over the entire sequence? 

Turns out in this example, the distinction doesn’t actually matter: the personalist should just bet 0.5. So why is there still a problem in reconciling the personalist assessment with the limiting frequency?

The answer is that we now have a situation where knowledge of the dynamic aspect of the process makes it seem contradictory for the personalist to trust the limiting frequency. If they know it’s constructed over time, then on what ground is the personalist supposed to assume the limiting frequency is the right reference for the probability on the first draw? This gets at the awkwardness of using behavior in the limit to think about individual predictions we might make.

Why this matters in the context of algorithmic decision-making

This example is related to some of my prior posts on why calibration does not satisfy everyone as a means of ensuring good decisions. The broader point in the context of the course I’m teaching is that when we’re making risk predictions (and subsequent decisions) about people, such as in deciding who to grant a loan or whether to provide some medical treatment, there is inherent ambiguity in the target quantity. Often there are expectations that the decision-maker will do their best to consider the information about that particular person and make the best decision they can. What becomes important is not so much that we can guarantee our predictions behave well as a group (e.g., calibration) but that we understand how we’re limited by the information we have and what assumptions we’re making about the reference group in an individual case. 

7 steps to junk science that can achieve worldly success

More than a decade after the earthquake that was the replication crisis (for some background, see my article with Simine Vazire, Why did it take so many decades for the behavioral sciences to develop a sense of crisis around methodology and replication?), it is frustrating to see junk science still being published, promoted, and celebrated, even within psychology, the field that was at the epicenter of the crisis.

The crisis continues

An example that I learned about recently was an article out of Harvard, Physical healing as a function of perceived time, published in 2023 and subsequently promoted in the news media, that claimed to demonstrate that healing of bruises could be sped or slowed by manipulating people’s subjective sense of time. All things are possible, and never say never, but, yeah, this paper offered no good evidence for its extraordinary claims. It was standard-issue junk science: a grabby idea, a statistically significant p-value extracted from noisy data, and big claims.

Someone pointed me to this paper, and for some reason that I can no longer remember, Nick Brown and I decided to figure out exactly what went wrong with it. We published our findings in this article, How statistical challenges and misreadings of the literature combine to produce unreplicable science: An example from psychology, which will appear in the journal Advances in Methods and Practices in Psychological Science.

In short, the published article was flawed in two important ways, first in its statistical analysis (see section 2.4 of our paper, where we write, “We are skeptical that this study reveals anything about the effect of perceived time on physical healing, for four reasons”) and second in its interpretation of its cited literature (see section 3 of our paper, where we write, “Here we discuss three different examples of this sort of misinterpretation of the literature cited in the paper under discussion”).

I don’t have any particular interest in purported mind-body healing, but Nick and I went to the trouble to shepherd our article through the publication process, with two goals in mind:
– Providing an example of how we, as outsiders, could look carefully at a research article and its references and figure out what went wrong. This is important, because it’s pretty common to see papers that make outlandish claims but seem to be supported by data and the literature.
– Exploring what exactly goes wrong–in this case, it was a mis-analysis of a complex data structure, researcher degrees of freedom in decisions of what to report, and multiple inaccurate summaries of the literature.

What does it take for junk science to be successful?

All this got me thinking about what it takes for researchers to put together a successful work of junk science in the modern era, which is the subject of today’s post.

Before going on, let me emphasize that I have no reason to suspect misconduct on the part of the authors of the paper in question. It’s a bad paper, and it’s bad science, but that happens given how people are trained, and given the track record of what gets published in leading journals (Psychological Science, PNAS), what gets rewarded in academia, and what gets publicity from NPR, Ted, Freakonomics, and the like. As we’ve discussed many times, you can do bad science without being a bad person and without committing what would usually be called research misconduct. (I actually don’t think that bad data analysis and inaccurate description of the literature would usually be put in the “research misconduct” category.)

This is also why I’m not mentioning the authors’ names here. The names are no secret–just click on the above link and the paper is right there!–I’m just not including them in this post, so as to emphasize that I’m writing here about the process of bad science and its promotion; it’s not about these particular authors (or any particular authors).

7 steps to junk science

So here they are, 7 things that allow junk science to thrive:

1. Bad statistical analysis. Statistics is hard; there are a lot of ways to make mistakes, and often these mistakes can lead to what appears to be strong evidence.

2. Researcher degrees of freedom. Garden of forking paths. As always, the problem is not with the forking paths–there really are a lot of ways to collect, code, and analyze data!–but rather with selection in what is reported. As Simmons et al. (2011) unforgettably put it, “undisclosed flexibility in data collection and analysis allows presenting anything as significant.” And, as Loken and I emphasized in our paper on forking paths, “undisclosed flexibility” could be undisclosed to the authors themselves: the problem is with data-dependent analysis choices, even if the data at hand were analyzed only once.

3. Weak or open-ended substantive theory. Theories such as evolutionary psychology, embodied cognition, and mind-body healing are vague enough to explain just about anything. As Brown and I wrote in our above-linked article, “The authors refer to ‘mind–body unity’ and ‘the importance of psychological factors in all aspects of health and wellbeing,’ and we would not want to rule out the possibility of such an effect, but no mechanisms are examined in this study, so the result seems at best speculative, even taking the data summaries at face value. During the half hour of the experimental conditions, the participants were performing various activities on the computer that could affect blood flow, and these activities were different in each condition . . . there are many alternative explanations for the results which we find
just as scientifically plausible as the published claim.”

4. Inaccurate summaries of the literature. This is a big deal, a huge deal, and something we don’t talk enough about.

It’s a lot to expect the journal editors and reviewers to check citations and literature reviews. It’s your job as an author to read and understand the work you’re citing before using those papers to make unsupported claims. For example, don’t make the claim, “If a person who does not exercise weighed themselves, checked their blood pressure, took careful body measurements, wrote everything down, maintained their same diet and level of physical activity, and then repeated the same measures a month later, few would expect exercise-like improvements. But in a study involving hotel housekeepers, that is effectively what the researchers found,” if you’re citing a study that does not support this claim.

5. Institutional support. Respectable journals are willing to publish articles that make outlandish claims based on weak evidence. Respected universities give Ph.D.’s for such work. Again, I’m not suggesting malfeasance on the part of the authors; they’re just playing by the rules that they’ve learned.

6. External promotion. This work was featured in Freakonomics, Scientific American, and other podcasts and news outlets (see here and here). This external promotion has three malign effects:
– Most directly, it spreads the (inaccurate) word about the bad research.
– The publicity also provides an incentive for people to more sloppy work that can yield these sorts of strong claims from weak evidence.
– Also, publicity for sloppy, bad science can crowd out publicity and reduce the incentives to do careful, good science.

7. Celebrity culture. This is a combination of items 5 and 6 above: many celebrity academic and media figures prop each other up. Some of it’s from converging interests, as when the Nudgelords presented the work of Brian Wansink as “masterpieces,” but often I think it’s more just a sense that all these media-friendly scientists and podcasters and journalists feel that they’re part of some collective project of science promotion, and from that perspective it doesn’t really matter if the science is good or bad, as long as it’s science-like, by their standards.

Anyway, this continues to bug the hell out of me, which is why I keep chewing on it and writing about it from different angles. I’m glad that Nick and I wrote that paper–it took some effort to track down all the details and express ourselves both clearly and carefully.

Why I like preregistration (and it’s not about p-hacking). When done right, it unifies the substance of science with the scientific method.

This came up in comments to Jessica’s recent post.

I like preregistration. It’s not something I used to do, and I still don’t always do it. I’ve worked on hundreds of research projects, and only a few of them had had any preregistration at all.

That said, I think preregistration has value, and I’m doing it more and more.

The reason I like preregistration has nothing at all to do with hypothesis tests or p-values or p-hacking or questionable research practices or anything like that.

I like preregistration for two reasons.

1. For me, preregistration implies constructing a hypothetical world–not a “null hypothesis” of no effect, but a possible world corresponding to what I’m actually aiming to study–and then simulating fake data and proposing and trying out analysis methods on those simulated data. I find this sort of commitment–the effort of laying out a complete generative model for the process–to be helpful. Thinking about effect sizes and their variation, all sorts of things, also seeing if the proposed analysis can recover parameters of interest from the simulated data, which is what’s often called power analysis although I prefer the more general term “design analysis.”

2. When other people preregister, that can be useful because then we can see discrepancies between the original plan and what actually got reported. Two examples are here and here–in both those cases, discrepancies between the preregistration and the final paper gave us doubts about the published claims. When these changes happen, it is not a moral failure on anyone’s part–we can learn from data!–it’s just relevant for understanding the theories being promulgated in these papers.

I agree that preregistration is not necessary for good science. I still think it can be a useful tool, both my own workflow in developing scientific hypotheses and gathering data to understand them, and in communication of workflow to others.

Preregistration has a valuable indirect function of making it more difficult to do bad science. It does not directly turn bad science into good science. That doesn’t make preregistration a bad idea–recently I’ve been preregistering studies and, more generally, simulating data before gathering any data–; we should just be aware that this sort of procedural step can only one small part of the story. Ultimately, science is about the substance of science, not just about the scientific method.

There’s something interesting here, though, that links the two perspectives. If you do things right, your preregistration will involve the substance of what you’re studying and will not merely be a procedural step, a form of paperwork that exists to validate the p-values that your study will produce. Rather, doing this preregistration will require simulating fake data, which in turn will require hypothesizing a full model of the underlying process.

I recognize that what I just described is not the usual thing that is meant by “preregistration,” which is more along the lines of: “We will perform this comparison and use a 2-sided test,” etc. But it could be! I think this is a useful connection.

P.S. As discussed in comments, a more precise term for what I’m recommending is fake-data simulation or simulated-data experimentation. I use the term “preregistration” above in order to connect with the many people in the science-reform movement who use that term.

“The terror among academics on the covid origins issue is like nothing we’ve ever seen before”

Michael Weissman sends along this article he wrote with a Bayesian evaluation of Covid origins probabilities. He writes:

It’s a peculiar issue to work on. The terror among academics on the covid origins issue is like nothing we’ve ever seen before.

I was surprised he was talking about “terror” . . . People sometimes send me stuff about covid origins and it all seems civil enough. I guess I’m too far out of the loop to have noticed this! That said, there have been times that I’ve been attacked for opposing some aspect of the scientific establishment, so I can believe it.

I asked Weissman to elaborate, and he shared some stories:

A couple of multidisciplinary researchers from prestigious institutions were trying to write up a submittable paper. They were leaning heavily zoonotic, at least before we talked. They said they didn’t publish because they could not get any experts to talk with them. They said they prepared formal legal papers guaranteeing confidentiality but it wasn’t enough. I guess people thought that their zoo-lean was a ruse.

The extraordinarily distinguished computational biologist Nick Patterson tells me that a prospective collaborator cancelled their collaboration because Patterson had blogged that he thought the evidence pointed to a lab leak. It is not normal for a scientist to drop an opportunity to collaborate with someone like Patterson over a disagreement on an unrelated scientific question. You can imagine the effect of that environment on younger, less established scientists.

Physicist Richard Muller at Berkeley tried asking some bio colleague about an origins-related technical issue. The colleague blew him off. Muller asked if a student or postdoc could help. No way- far too risky, would ruin their career. (see around minute 43 here)

Come to think about it, I got attacked (or, at least, misrepresented) for some of my covid-related research too; the story is here. Lots of aggressive people out there in the academic research and policy communities.

Also, to put this in the context of the onset of covid in 2020, whatever terror we have been facing by disagreeing with powerful people in academia and government is nothing compared to the terror faced by people who were exposed to this new lethal disease. Covid is now at the level of a bad flu season, so still pretty bad but much less scary than a few years ago.

Genre fiction: Some genres are cumulative and some are not.

I’ve been reading some books about the history of twentieth-century mystery novels and science fiction stories, and one thing that struck me was that each of these genres had a sense of continuity. If you used a gimmick in a story, then it was “yours,” and future authors were kind of obliged to come up with something new or, if they were going to use your idea, they were supposed to give it a new twist or else refer back to your original story. And there was a sense of progression, as the mystery puzzles became more elaborate and the science fiction scenarios became more deeply realized.

With an expectation of progression comes a fear of stagnation, and I think that one reason that genre fans talk about a “golden age” (that would be the 1930s for mysteries or the 1940s for science fiction) is the idea that you can’t keep coming up with new tricks. At some point you need to change the rules, which for mysteries included directions such as interacting psychology and sociology (Ross Macdonald, etc.), focusing on local color and how the world really works (John D. Macdonald, George V. Higgins, etc.), and for science fiction meant a move away from the poles of horror on one side and techno-optimism on the other. Even as the genres expanded, though, I think there remained a sense of them as cumulative, with new writers building upon what had been done in the past. You weren’t supposed to write a mystery novel or science fiction story where you ripped off some previously-published plots without adding something.

What about other genres?

Let’s start with “mundane” or “memetic” fiction—that is, non-genre writing that follows conventions of realism. There, it’s considered ok to reuse plots, sometimes very openly as with Jane Smiley’s remake of King Lear, other times just with standard plot structures of happy families and unhappy families and affairs and business reversals and all sorts of other stories. In memetic fiction, the plot is not the main focus, and even if the plot is a driver of the story (as with Jonathan Franzen, for example), nobody would really care if it’s taken from somewhere else.

Other genres commonly mentioned in the same breath as mystery and science fiction are romance, western, porn, and men’s adventures (war stories, etc.). I don’t know much about these genres, so maybe readers can correct me on this, but it’s my impression that the twentieth-century versions of these genres have not been cumulative in the way of mystery and science fiction. It was not expected that a romance story or a war story would need a new plot twist or a new idea. Westerns might be different just because they were so popular for awhile that maybe their high profile pushed authors to come up with new twists, I’m not sure. I’m not saying these genres are static—they will change over time as readers’ expectations change—but they’re not expected to offer novelty or innovation in the way of mystery or science fiction.

Why would these other genres not be cumulative? Perhaps because they are offering different sorts of pleasures. Traditionally, mystery and science fiction stories have the form of puzzles; you’re reading them for the pleasure of trying, and often failing, to figure them out, and then maybe rereading for the pleasure of understanding how the mechanisms were put together. In contrast, memetic fiction and genres such as romance/western/etc. are read more for their emotional impact. OK, I guess people would also read memetic fiction as a way to learn about the world—but to learn, you don’t need a new twist in the story, you just need a clear presentation.

I’m not saying that mystery and science fiction are read merely for their puzzle aspects. Mysteries also offer the thrill of suspense and the twin satisfactions of lawbreaking and justice, science fiction has the sense of wonder, and both genres offer some form of social commentary that is gained by looking at society from a distance. I’m just saying that the puzzle is a big part of Golden Age mystery and science fiction, and this could help explain their cumulative natures.

What other genres could we consider?

There’s writing for children (recall Orwell’s classic essay on boys’ weeklies) and young adults, but with rare exceptions these books are read by a new audience every few years, so there’s no need for continuity. It’s no problem if you steal a plot idea from a twenty-year-old book that today’s kids are no longer reading.

There’s also modernist fiction, where the innovation is supposed to come in the form, not the content. You can steal an old plot but you’re supposed to present it from some new perspective, and that’s a puzzle in a different way.

So that’s how I see things as of 1950 or so (with a few forward references). What’s been happening since?

So many more books get published each year than before, and nobody can keep track of them. In the meantime, it seems that fewer people are interested in reading for the puzzle. Yes, Agatha Christie remains popular, and I’m guessing that some classic science fiction continues to sell, but I get the impression that, for a long time now, readers of mystery and science fiction aren’t looking for clever puzzles anymore, with rare exceptions such as Everyone in My Family Has Killed Someone and The Martian. Mystery and science fiction novels are now more like mystery and science fiction movies, which, again, with rare exceptions, are mostly about delivering thrills, with a side of philosophical reflection.

So, between the disappearance of the past, the diminishing interest in puzzles, and the sheer impossibility of remaining aware of all the earlier books in the genre, I’m guessing that mystery and science fiction are no longer cumulative endeavors the way they used to be. Instead of trying to top what came before, their authors are just writing books.

There’s also the economics of it all: 50 or 100 years ago, you could make an OK living churning out books of any sort, you could make a good living writing successful books, and you had an outside chance of getting rich (by the standards of the day; I’m talking “millionaire,” not “billionaire”) by plugging into the zeitgeist and writing bestsellers. Nowadays, with very rare exceptions, even successful authors don’t sell many books; many literary authors are reduced to supporting themselves through academic jobs; and pretty much the only way they’ll make real money from writing is through movie or TV contracts.

To return to the main topic of this post, the transition from a cumulative to a static literature: this happens in other fields too. Music, for example, This happens in different genres at different times, but it seems that in many genres of music, there is a period where different composers and artists are feeding on each others’ work and feel the need to do something new (recall Brian Wilson’s attitude with respect to the Beatles), and a period where there is no longer the sense of cumulative building of a genre.

The theory crisis in physics compared to the replication crisis in social science: Two different opinion-field inversions that differ in some important ways

Yesterday we discussed an “opinion-field inversion” in theoretical physics: in the prestige news media and publicity complex (NPR, Ted, etc), string theory reigns supreme; in elite physics departments, string theory is where it’s at, but then there’s a middle range of skeptics ranging from my Columbia colleague Peter “Not Even Wrong” Woit to xkcd columnist Randall Munroe who characterize string theory as an overhyped nothingburger. I guess that Woit would be cool with some elite physicists studying string theory as part of the overall portfolio of theoretical research, but he and others in this middle ground think that whatever profile string theory should have, both in academia and in public perceptions of physics, should be much lower.

I referred to this sort of layered difference in public opinion as an opinion-field inversion, by analogy to the phenomenon of temperature inversion that is a precursor to tornadoes.

The theory crisis in physics

Before going on, let me emphasize that I know nothing of theoretical physics. I find Woit’s writing on the topic persuasive, but I’ve not tried to understand any of the debate in physics terms. I’m only discussing this from the perspective of sociology of science.

I’ll call it the theory crisis in physics, by analogy to the replication crisis in social science and medical research. The string theory thing isn’t a replication crisis—indeed, one of the main criticisms of string theory is that it does not make new testable predictions, so there’s no possibility of replication or falsification–but it’s still a crisis. I think the term “theory crisis” is about right.

Arguably the replication crisis in psychology and economics is also a theory crisis, in that the work is based on broad theories such as embodied cognition and evolutionary psychology that have major problems of the sort that they can be used to explain any possible result, but it was the failed replications that were the convincer to many people, hence the term “replication crisis” rather than “theory crisis” or “methods crisis.”

The replication crisis as another opinion-field inversion

In any case, another example of an opinion-field inversion in science, at least until recently, was woo-woo psychology such as social priming and walking speed, ovulation and voting, air rage, power pose, himmicanes, ages ending in 9, signing at the top of the form, etc. The news media and associated institutions (Ted, Freakonomics, etc.) were all-in on these things; informed scientists such as Uri Simonsohn, Anna Dreber, and various other so-called methodological terrorists were very skeptical; and the power centers at Harvard, PNAS, etc., were a mix of head-in-the-sand true believers (claiming the replication rate “is statistically indistinguishable from 100%”) and I’ve-got-mine-don’t-rock-the-boat nudgelords, who seemed to be more concerned about keeping their Henry Kissinger party invitations and positions on NPR speed-dial than in cleaning up their house.

With junk science, things have changed–more and more reporters seem to be tired of having their chains yanked by whatever Psychological Science and PNAS happen to be promoting this week–and I guess that’s partly a consequence of the opinion inversion. Shaking up the power centers might be more of a challenge.

Differences between the theory crisis in physics and the replication crisis in psychology and economics

What will happen with string theory, I don’t know. One difference is that in psychology and economics, my impression is that the people who do this sort of headline-bait are not taken very seriously at an intellectual level. They may have institutional power (hi, Robert Sternberg!) and others in their field may enjoy the reflected glow of their TV appearances, but nobody would consider them to be the brightest lights in the chandelier. In contrast, it seems that many of the physicists who work in string theory are considered to be brilliant, most notably Ed Witten—I know nothing of his work, it’s all beyond me, but he’s always described in superlatives.

So if string theory really is a hyped dead end, it’s a much sadder story than junk econ and junk psychology, which seem more like the products of ambitious careerists who, for a couple of decades, stumbled upon a way to hack the system of scientific publication and publicity at the nexus of academia and the news media.

The other twist is that even the opponents of string theory still characterize it as having some mathematical interest–their criticism is not that string theory is being done, so much with that too much is being claimed for it. That’s different than research in himmicanes, air rage, beauty-and-sex-ratio, extra-sensory perception, etc., which is pretty much unmitigated crap, whose only contribution to science has been to reveal the rotten core of science as it is often practiced.

This also gives different flavors to the discussions in these two fields. With psychology and economics, the frustration is mostly external, with observers being bothered that bad work gets so much publicity, that methodological criticisms and unsuccessful replications get less notice than problematic work, etc. With physics, the frustration seems mostly internal, with people bothered that the top physics students are going into what they perceive as a dead-end world. In physics, the concern is with the misuse of human resources in the form of brilliant Ph.D. students. In psychology and econ, the concern is with the bad work giving the general public a misleading view of science and perhaps leading to bad policies (Excel error, anyone?). Nobody’s saying it’s too bad Brian Wansink and Dan Ariely got into this social priming stuff or otherwise they could’ve made major discoveries.

String theory wars: An opinion-field inversion.

From my Columbia math department colleague Peter Woit:

Brian Greene’s The Elegant Universe is being reissued today, in a 25th anniversary edition. It’s the same text as the original, with the addition of a 5 page preface and a 36 page epilogue. . . .

One thing I [Woit] was looking for in the new material was Greene’s response to the detailed criticisms of string theory that have been made by me and others such as Lee Smolin and Sabine Hossenfelder over the last 25 years. It’s there, and here it is, in full:

There is a small but vocal group of string theory detractors who, with a straight face, say things like “A long time ago you string theorists promised to have the fundamental laws of quantum gravity all wrapped up, so why aren’t you done?” or “You string theorists are now going in directions you never expected,” to which I respond, in reverse order “Well, yes, the excitement of searching into the unknown is to discover new directions” and “You must be kidding.”

As one of the “small but vocal group” I’ll just point out that this is an absurd and highly offensive straw-man argument. The arguments in quotation marks are not ones being made by string theory detractors, and the fact that he makes up this nonsense and refuses to engage with the real arguments speaks volumes.

Woit follows up in the comments section:

It’s interesting to see that the comments back up an argument I’ve often heard from physicists when I criticize books like this. They tell me “sure, the material in that book about string theory is nonsense and misleading the public, but it’s getting young people excited about physics and interested in becoming physicists. Once they do become physicists they’ll realize this is nonsense and go on to do some real physics.”

Woit’s a persuasive writer, and he, or someone, seems to have convinced Randall Munroe. I have no idea, as my physics days are long gone and I never learned any particle physics in my studies. When I worked at the Laboratory for Cosmic Ray Physics we dealt with muons and stuff like that, but nothing so theoretical as string theory or its predecessors. So I offer no comment on the technical side of all this, except to note that debates remain even in much simpler areas of quantum physics, as I am reminded whenever the discussion comes up of joint distributions in the two-slit experiment (as in section 2 of or article on holes in Bayesian statistics): some people will pop up and say I’m completely wrong, and others will agree with me. Physics is hard, and even in examples where the theory is well understood, it can be a challenge to come up with a definitive experiment, or even a definitive thought experiment, to resolve fundamental disagreements.

But the sociology-of-science part of the string theory story, that I can comment on. To start with, Woit and Greene are both in the Columbia math department. They’re not far apart in age, and they both studied physics at Harvard. But, from the quotes above, I get the impression that they can’t stand each other! Or, at least, they don’t respect each other. That’s fine—there’s no reason the mathematics faculty at Columbia should agree on everything, indeed some healthy disagreement is a good thing. Still, it’s interesting.

It’s also interesting that in some aspects of the court of scientifically-informed public opinion, Woit and the anti-stringers represent the standard or accepted view, whereas Greene and the stringers seem to have locked up the two extremes—the general public (as represented by PBS, Ted, etc.) and the academic theoretical physics community.

You know how tornados come from “temperature inversions”? I think of this sort of thing as an “opinion inversion”: an unstable pattern in which opinions among more informed people are different from those of the general public. In this case it’s a double inversion, with the general public and the institutional power on one side and generally scientifically-informed people in the middle.

Muckraking at the University of Oregon

Check it out. I was amused by these posts:

From 2014: Crap-free UO homepage

From 2024: Provost Chris Long is paid $540K plus $130K per year start-up & alcohol budget

Columbia should have a crap-free homepage too! And the alcohol budget reminded me of this story from Columbia a few years ago.

UO Matters is run by economics professor Bill Harbaugh, who seems to be a busy person. Every city, town, and institution should have this sort of news outlet—or, ideally, more than one.

“The king, sir, is much better!”

Algis Budrys wrote this in 1983:

Budrys brings this up in the context of how our reading of a book is affected by whatever reviews and publicity materials we’ve seen–an interesting point which I became more aware of after going to the bookstore in France and picking out books without having a sense of what they’d be like (see entry under Crédit Illimité in this post. In any case, the main thing is that I just like Budrys’s story and how he tells it. He was a true blogger, which I mean in the best sense of the word.

Postdoc, doctoral student, and summer intern positions, Bayesian methods, Aalto

Postdoc and doctoral student positions in developing Bayesian methods at Aalto University, Finland! This job post is by Aki (maybe someday I write an actual blog post).

The positions are funded by Finnish Center for Artificial Intelligence FCAI and there are many other topics, but if you specify me as the preferred supervisor then it’s going to be Bayesian methods, workflow, cross-validation, and diagnostics. See my video on Bayesian workflow, Bayesian workflow paper, my publication list, and my talk list, for more about what I’m working on.

There are also plenty of other topics and supervisors in

  1. Reinforcement learning
  2. Probabilistic methods
  3. Simulation-based inference
  4. Privacy-preserving machine learning
  5. Collaborative AI and human modeling
  6. Machine learning for science

Join us, and learn why Finland has been ranked the happiest country in the world for seven years in row!

See how to apply at fcai.fi/winter-2025-researcher-positions-in-ai-and-machine-learning

I might also hire one summer intern (BSc or MSc level) to work on the same topics. Applications through Aalto Science Institute call

I would add three words to this statement by Uri Simonsohn on preregistration

Uri writes:

Pre-registrations should only contain information that helps demarcate confirmatory vs exploratory statistical analyses (i.e., that would help a reader identify harking and p-hacking), and should generally avoid other information.

I disagree, partly because I think that confirmatory and exploratory statistical analyses are the same thing (see here), partly because I very rarely care about p-values anyway, and mostly because preregistration is super useful to me, not because of harking or p-hacking but because I think that the work we put into preregistration improves our research. See my discussion here and Jessica’s here. For still more, I refer you to my post from 2022, What’s the difference between Derek Jeter and preregistration?.

That said, I can well believe that, for Uri, the best preregistration only contain information that helps demarcate etc etc.

So my suggested alteration to Uri’s above-quoted statement is just to add “for Uri Simonsohn” before the word “should.” Problem solved!

Junk science becomes more professionalized. Meanwhile, conspiracy theories are being more associated with the center-right and right, politically. How does all this fit together? I’m not sure.

A few years ago I wrote a post, Junk Science Then and Now discussing the movement of junk science from the periphery of elite culture to the core:

The junk science (by which I mean work that has some of the forms of scientific research but is missing key elements such as valid and reliable measurements, transparency, and openness to criticism) of the mid-twentieth century came from cranks and outsiders, often self-educated people with no academic positions, and even those who were in academia were peripheral figures, for example, the ESP researcher J. B. Rhine at Duke University, who according to Wikipedia was trained as a botanist and was not a central figure in the psychology profession. Immanuel “Worlds in Collision” Velikovsky had lots of scientist friends, but he was an outsider to the scientific community. And those guys from the 1970s who wrote books about ancient astronauts and the Bermuda triangle, I don’t think they even claimed to have any scientific backing. Yes, there were some missteps within academic science from N-rays to cold fusion, but these were minor storms that blew up and went away.

Nowadays, though, the pseudoscientists are well ensconced in the academy, they play power games in the field of psychology, and they get to publish in the Proceedings of the National Academy of Sciences (air rage, himmicanes, ages ending in 9, etc etc) whenever they want. The call is coming from inside the house, as it were. Many of them are still considered by the news media to be the legitimate representatives of the scientific community. Even absolutely ridiculous ideas like the $100,000 citations. There’s also the related phenomenon of . . . not “junk science” exactly, but bad science: scientific errors that then persist because the scientific community refuses to come to terms with corrections. An example is the contagion-of-obesity story.

When discussing this, I wrote that the above-described shift represents a sort of gentrification of scientific error, mirroring the professionalism that has come into so many other aspects of our intellectual life. Instead of some wacky guy somewhere claiming to have developed a perpetual motion machine or whatever, you’ve got a Stanford professor promoting junk science on cold showers.

I thought about all this recently when reading a post by political journalist Matthew Yglesias on what he calls “the crank realignment”:

Robert F. Kennedy Jr.’s transition from semi-prominent Democrat to third party spoiler to Donald Trump endorser is emblematic of a broader, decade-long “crank realignment” in American politics.

Trump himself, of course, used to be a Democrat. He switched parties in a blaze of birther conspiracy theories, and only then came to embrace conservative views on topics like gun control and abortion. And RFK Jr. was into election fraud conspiracy theories long before January 6, but his version was about George W. Bush stealing the 2004 election in Ohio. That wasn’t a mainstream Democratic Party view (there’s a reason there was no Kerry-led insurrection), but it was mainstream enough to be published in Rolling Stone and for Kennedy to continue to be a player in progressive politics.

Twenty years later, that’s no longer the case. Democrats are much more buttoned-up, and the GOP is much more accepting of cranks and know-nothings like Kennedy.

The partisan shifts of both Trump and RFK Jr. are part of a long term cycle in which educated professionals have gravitated toward the Democratic Party coalition and a generic suspicion of institutions and the people who run them has come to be associated with conservative politics.

I think Yglesias is on to something here. I agree with him that from a logical point of view, there’s no reason why conspiracy theories should be concentrated on the right half of the political spectrum. Indeed, from a logical perspective you might expect conspiracy theories to be more popular on the left, as this would be consistent with a general leftist anti-powerful-people, anti-big-business take.

One thing I’ve noticed in the past is that commentators have been tied to the idea of anti-science leftists even when the data don’t bear that out. Here’s an example from a couple years ago, where political scientist Chris Blattman made the offhanded remark that opposition to genetically modified organisms (GMOs) was “mostly left,” even though actually opposition was about the same on the left and right. It’s a convenient story to pair anti-vaccine attitudes on the right to anti-GMO attitudes on the left and ask why can’t we all get along, but that’s not what public opinion happens to look like. I agree with Yglesias that this is kinda too bad, as it makes it harder to have a bipartisan push against anti-science.

It’s still hard for me to put all this together in my head. One challenge is that I have the impression that most of the prominent purveyors of junk science and bad science in academia are on the left, or the center-left. OK, not Dr. Oz, and maybe not that cold-shower dude at Stanford. And not those covid-minimizers. Or the climate-change denialists being promoted by Freakonomics. But the mainstream NPR/Ted/PNAS world . . . they’re mostly on the left, right? We do hear about right-wing people in science, but they get some attention because they are exceptions.

So the professionalization of bullshit—as exemplified by Gladwell’s prominence at the New Yorker, the UFO’s-as-space-aliens theories promoted by elite journalists, the Association for Psychological Science promotion of superstition, and various wacky stuff coming out of Harvard, Stanford, etc.—runs counter to the movement of conspiracy theorizing from the political fringes to the core of the political right.

I don’t know where this will all lead. It seems kind of unstable.

Treasure trove of forensic details in arXiv’s LaTeX source code

There’s gold in them thar hills*

When you submit a paper to arXiv, you send them a bundle including the LaTeX source, figures, etc. These are all available for download through the arXiv site. This morning, I was downloading the source** for the original Hoffman and Gelman no-U-turn sampler paper. If you want to follow along, ere’s the arXiv link, but you have to click through to the “TeX Source” link under the “Access Paper:” header on the top right side under the banner. What I found was a treasure trove of comments that never made it to the paper, some of which I will share below.

Examples

Returning to Hoffman and Gelman’s arXiv source LaTeX, what struck me was the following comment right after the algorithm itself.

%% Algorithm ?? is more efficient than algorithm
%% ??, but the policy of sampling uniformly from
%% $\cC$ leaves something to be desired. We would prefer to select an
%% element of $\cC$ that is farther away from the initial position
%% $\theta^t$, rather than face the possibility of performing many costly
%% gradient evaluations just to wind up choosing an element of $\cC$ that
%% is close to where we started. Algorithm ?? addresses
%% this issue by giving preference to points subtrees that do not include
%% the starting point $\{\theta^t, r^t\}$. [To do: explain why this is
%%   valid. Probably proof by induction is the easiest way to go.]

Neither the arXiv preprint nor the final JMLR paper have a clearly delineated inductive proof. In both versions, we get “this is equivalent to a Metropolis_Hastings kernel with proposal …, and it is straightforward to show that it obeys detailed balance” (second-to-last sentence on page 1604 of JMLR paper).

Presumably on the principle of minimizing surface area for reviewers to gripe, the following useful comment from the abstract didn’t make the final cut.

%% This issue is compounded when the
%% target distribution depends on a set of parameters that cannot be
%% updated by HMC (such as discrete parameters) and are updated
%% independently of the parameters updated by HMC. 
%% In this case, optimal settings of $L$ may change from iteration to
%% iteration.

Here’s another useful comment that wound up on the cutting-room floor. I’m not saying these should all be in the paper—usually there are so many things you can add and qualify that it requires some judgement. But for the dedicated and interested reader, the paper would have been more useful with the elided comments.

%% Even if we assume that there exists some transformation of the
%% parameter space under which all parameters are i.i.d. and that this
%% transformation can be applied cheaply (i.e. in $O(D)$ time, for
%% example using a low-rank transformation matrix to avoid the $O(D^2)$
%% cost of dense matrix multiplication and the $O(D^3)$ cost of dense
%% matrix inversion), the cost of obtaining an effectively independent
%% sample using RWM is still $O(D^2)$ \citep{Creutz:1988}. Gibbs also
%% requires $O(D^2)$ operations per effectively independent sample in
%% this setting, since it must update $D$ parameters and it must perform
%% a transformation costing $O(D)$ operations after each update.

There are also useful explanations of figures that never made the final cut, like this one, which expands the diagram from the one of “naive NUTS” in the paper figures to what the paper calls “efficient NUTS.”

%% %% %% Figure ?? illustrates how an iteration of NUTS might
%% %% %% proceed once the slice and initial momentum variables have been
%% %% %% resampled. Initially (a), we have only one node. We double the size of
%% %% %% the tree to two nodes by taking a single step forward (b), and since
%% %% %% the new point is valid tentatively set $w^{t+1}$ to that new point
%% %% %% (with probability $1/1=1$). We then redouble the size of the tree to
%% %% %% four nodes, taking two steps forward (c). Only one of the two new
%% %% %% nodes is valid, so the probability of choosing a node from the new
%% %% %% half-tree is $1/2$ (the ratio of the number of valid new nodes to
%% %% %% valid old nodes). In this example, we randomly choose to stick with
%% %% %% the old value of $w^{t+1}$. Next, we again double the size of the tree
%% %% %% by taking four steps backward from $w^-$ (d). We discover that the new
%% %% %% half-tree satisfies the stopping criterion, and so we cannot select
%% %% %% any points from it. Finally, we double the tree one more time, this
%% %% %% time going forward (e). This half-tree contains some valid points and
%% %% %% does not satisfy the stopping criterion, but a subtree of it does
%% %% %% satisfy the stopping criterion, so we invalidate the points in that
%% %% %% subtree. The number of valid points in this half-tree (3) is the same
%% %% %% as the number of valid points in the old half-tree (3), so we choose a
%% %% %% point uniformly at random from the new half-tree for $w^{t+1}$. At
%% %% %% this point, the end points $w^-$ and $w^+$ satisfy the stopping
%% %% %% criterion, and we return $w^{t+1}$ as the new position-momentum pair.

The following would have been nice.

%% Also, we should probably have a scatterplot showing target versus
%% realized criteria (mean acceptance probability, mean energy change)
%% that shows that the stochastic approximation scheme pretty much works,
%% and maybe a plot showing convergence speed.

There’s more where these came from—I was just cherry picking from the algorithm, abstract, intro, and conclusion.

Who knew?

I’ve never heard anyone mention diving into the source of papers, so I wonder just what’s out there to be mined. I also wonder how many authors realize that comments in their arXiv LaTeX are forever.


* An American idiom meaning there’s value to be found from exploring in a particular place; see the wikitionary for a definition and etymology.

** I downloaded the LaTeX source of Hoffman and Gelman’s paper in order to produce a ChatGPT(o1[plus]) translation of the efficient NUTS algorithm to Python. I need to code a similar algorithm for a new sampler we’re exploring and wanted to make sure I had understood the structure of the NUTS algorithm, because it’s a very subtle recursion. GPT continues to impress!

Truth is more realistic than fiction, and what this tells us about odious thought experiments

In 2010, economist Robin Hanson gained some notoriety by writing about “gentle silent rape”:

Imagine a woman was drugged into unconsciousness and then gently raped, so that she suffered no noticeable physical harm nor any memory of the event, and the rapist tried to keep the event secret. Now drugging someone against their will is a crime, but the added rape would add greatly to the crime in the eyes of today’s law, and the added punishment for this addition would be far more than for cuckoldry. . . . A colleague of mine suggests this is gender bias, pure and simple; women seem feminist, and men chivalrous, by railing against rape, but no one looks good complaining about cuckoldry. What other explanations you got?

Hanson’s wikipedia entry contains this quote from Nate Silver from 2012:

He is clearly not a man afraid to challenge the conventional wisdom. Instead, Hanson writes a blog called Overcoming Bias, in which he presses readers to consider which cultural taboos, ideological beliefs, or misaligned incentives might constrain them from making optimal decisions.

Taking these phrases one at time:

– “Cultural taboos” = attitudes that the many people have but which you don’t share.
– “Ideological beliefs” = ideologies that many people hold that you don’t share.
– “Misaligned incentives” = incentives for people to do things that make you unhappy.
– “Optimal decision” = decisions that you approve of.

In any case, the “gentle silent rape” thing sounded like a bizarre thought experiment. But then a news item recently appeared in which it really happened: a man had drugged and raped his wife and kept it secret for decades. The result was neither gentle nor silent, which I guess might lead Hanson to say that this real-world case wasn’t an example of what he was talking about, but I would take this argument in the opposite direction and say that the real-world horror story demonstrates a problem with the thought experiment, which is that gentle silent rape isn’t really a thing—the phrase is a way of minimizing a real crime by giving it impossible modifying qualifiers.

The point of this post is not to have some sort of gotcha on Nate. Rather, it’s just a horribly vivid demonstration of a general issue with thought experiments in social science, which is that to work they should be internally coherent and also consistent with reality.

Progress in 2024 (Aki)

Here’s my 2024 progress report. There are 5 publications common with Andrew in 2024.

Active Statistics book is the biggest in size, but personally getting the Pareto smoothed importance sampling paper published after 9 years from the first submission was a big event, too. I think I only blogged 2023 progress report and job ads (I sometimes have blog post ideas, but as I’m a slow writer, it’s difficult to find time to turn them to actual posts). I’m very happy with the progress in 2024, but also excited on what we are going to get done in 2025!

Book

Papers published or accepted for publication in 2024

  • Yann McLatchie, Sölvi Rögnvaldsson, Frank Weber, and Aki Vehtari (2025). Advances in projection predictive inference. Statistical Science, accepted for publication. arXiv preprint arXiv:2306.15581. Software: projpred, kulprit.

  • Christopher Tosh, Philip Greengard, Ben Goodrich, Andrew Gelman, Aki Vehtari, and Daniel Hsu (2025). The piranha problem: Large effects swimming in a small pond. Notices of the American Mathematical Society, 72(1):15-25. arXiv preprint arXiv:2105.13445.

  • Kunal Ghosh, Milica Todorović, Aki Vehtari, and Patrick Rinke (2025). Active learning of molecular data for task-specific objectives. The Journal of Chemical Physics, doi:10.1063/5.0229834.

  • Charles C. Margossian, Matthew D. Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari, and Andrew Gelman (2024). Nested Rhat: Assessing the convergence of Markov chain Monte Carlo when running many short chains. Bayesian Analysis, doi:10.1214/24-BA1453.Software: posterior.

  • Yann McLatchie and Aki Vehtari (2024). Efficient estimation and correction of selection-induced bias with order statistics. Statistics and Computing, 34(132). doi:10.1007/s11222-024-10442-4.

  • Frank Weber, Änne Glass, and Aki Vehtari (2024). Projection predictive variable selection for discrete response families with finite support. Computational Statistics, doi:10.1007/s00180-024-01506-0. Software projpred.

  • Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao, and Jonah Gabry (2024). Pareto smoothed importance sampling. Journal of Machine Learning Research, 25(72):1-58. Online. Software: loo, posterior, ArviZ

  • Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, and Jonathan H. Huggins (2024). A framework for improving the reliability of black-box variational inference. Journal of Machine Learning Research, 25(219):1-71. Online.

  • Noa Kallioinen, Topi Paananen, Paul-Christian Bürkner, and Aki Vehtari (2024). Detecting and diagnosing prior and likelihood sensitivity with power-scaling. Statistics and Computing, 34(57). Online.
    Supplementary code.
    Software: priorsense

  • Erik Štrumbelj, Alexandre Bouchard-Côté, Jukka Corander, Andrew Gelman, Håvard Rue, Lawrence Murray, Henri Pesonen, Martyn Plummer, and Aki Vehtari (2024). Past, present, and future of software for Bayesian inference. Statistical Science, 39(1):46-61. Online.

  • Alex Cooper, Dan Simpson, Lauren Kennedy, Catherine Forbes, and Aki Vehtari (2024). Cross-validatory model selection for Bayesian autoregressions with exogenous regressors. Bayesian Analysis, doi:10.1214/23-BA1409.

  • Marta Kołczyńska, Paul-Christian Bürkner, Lauren Kennedy, and Aki Vehtari (2024). Trust in state institutions in Europe, 1989–2019. Survey Research Methods, 18(1). doi:10.18148/srm/2024.v18i1.8119.

  • Alex Cooper, Aki Vehtari, Catherine Forbes, Lauren Kennedy, and Dan Simpson (2024). Bayesian cross-validation by parallel Markov chain Monte Carlo. Statistics and Computing, 34:119. doi:10.1007/s11222-024-10404-w.

  • Ryoko Noda, Michael Francis Mechenich, Juha Saarinen, Aki Vehtari, Indrė Žliobaitė (2024). Predicting habitat suitability for Asian elephants in non-analog ecosystems with Bayesian models. Ecological Informatics, 82:102658. doi:10.1016/j.ecoinf.2024.102658.

  • Petrus Mikkola, Osvaldo A. Martin, Suyog Chandramouli, Marcelo Hartmann, Oriol Abril Pla, Owen Thomas, Henri Pesonen, Jukka Corander, Aki Vehtari, Samuel Kaski, Paul-Christian Bürkner, Arto Klami (2024). Prior knowledge elicitation: The past, present, and future. Bayesian Analysis, 19(49):1129-1161. doi:10.1214/23-BA1381.

arXived in 2024

  • Marvin Schmitt, Chengkun Li, Aki Vehtari, Luigi Acerbi, Paul-Christian Bürkner, and Stefan T. Radev (2024). Amortized Bayesian Workflow (Extended Abstract). arXiv preprint arXiv:2409.04332.

  • Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, Bob Carpenter, and Aki Vehtari (2024). posteriordb: Testing, benchmarking and developing Bayesian inference algorithms. arXiv preprint arXiv:2407.04967. Database and software: posteriordb

  • David Kohns, Noa Kallionen, Yann McLatchie, and Aki Vehtari (2024). The ARR2 prior: flexible predictive prior definition for Bayesian auto-regressions. arXiv preprint arXiv:2405.19920.

  • Anna Elisabeth Riha, Nikolas Siccha, Antti Oulasvirta, and Aki Vehtari (2024). Supporting Bayesian modelling workflows with iterative filtering for multiverse analysis. arXiv preprint arXiv:2404.01688.

  • Guangzhao Cheng, Aki Vehtari, and Lu Cheng (2024). Raw signal segmentation for estimating RNA modifications and structures from Nanopore direct RNA sequencing data. bioRxiv preprint.

Software

Case studies

  • Aki Vehtari (2024). Nabiximols. Model checking and comparison, comparison of continuous and discrete models, LOO-PIT checking, calibration plots, prior sensitivity analysis, model refinement, treatment effect, effect of model mis-specification.

  • Aki Vehtari (2024). Birthdays. Workflow example for iterative building of a time series model. In 2024, added demonstration of Pathfinder for quick initial results and MCMC initialization.

FAQ

Video

This looks like an excellent new business line for Wolfram Research!

Can you catch the trick in the above letter? I was staring and staring and couldn’t figure out the scam. Yes, I get it that “Mushens and Churchill” is a fake literary agency (according to this post from Victoria Strauss, which is where I found this story, this particular scammer is taking the name of the legitimate literary agent Juliet Mushens, which is a really horrible thing to do), and they’re preying on the hopes of authors. I get that “we cannot promise the moon and the stars” is classic soft-sell. What I couldn’t figure out is what’s the motivation for the scammer. They get someone to send them their unpublished manuscript? Later they ask for money to publish the book? But that doesn’t make sense—the author is already self-publishing. (And, yes, there’s no shame in paying money to publish your own work—do you think this blog hosting comes for free?)

Strauss explains how the scam works:

Although I haven’t yet heard from anyone who has actually signed up with MCLit, and therefore don’t know what they’re charging, the fifth paragraph of the solicitation above gives away what they’re selling: an “International Literary Registration Seal and Bookstore Access Code”. Both of these are completely bogus items that scammers have invented to enable them to drain writers’ bank accounts.

Ha! I didn’t catch that at all.

Here’s another:

Strauss spells it out for us:

Story Arc Literary Groups employs an approach common to many fake literary agency scams: promising to work on commission only, with no other fees due (note especially paragraph 5, which helpfully explains that “a reputable literary agent should not charge upfront fees”). The aim of such solicitations, however, is always money, and writers who sign up with Story Arc soon discover this. In order for Story Arc to successfully pitch a book to traditional publishers, authors are told they must first “re-license” their book (a requirement that, as I’ve explained in another blog post, is completely fictional). As is typical for this type of scam, they’re referred to a “trusted” company to perform the service–in this case, an outfit called CreativeIP. The price tag: $5,000.

Ouch!

Here’s another:

Typical of fake literary agency scams, Zenith Literary is an aggressive solicitor. One writer who responded to this solicitation was told that in order to snag a traditional publisher’s interest, they needed to gather various “action items”, including “ten editorial reviews and endorsements” (hint: reviews and endorsements are nice, but they are absolutely not required by traditional publishers). To obtain these, the writer was referred to Verse Bound Solutions, a company with no apparent existence beyond a Wyoming business registration but active enough to phone the author and offer them ten book reviews for $3,000.

And another:

The author who was targeted with [a solicitation from “ImplicitPress Literary Agency”] was asked to supply a variety of necessary “documents”; note #5, which is what this scam is hoping to sell (no publisher requires or cares about a book trailer):

If you’re itching for more such stories, just go here:

What a world we live in.

P.S. In case you’re wondering about the title of this post, see here for the relevant background.

What are my goals? What are their goals? (How to prepare for that meeting.)

Corresponding with someone who had a difficult meeting coming up, where she was not sure how much to trust the person she was meeting with, I gave the following advice:

Proceed under the assumption that they want to do things right. I say this because if they’re gonna be defensive, then it doesn’t matter what you say; it’s not like you’re gonna sweet-talk them into opening up. But if they do want to do better, then maybe there is some hope.

My correspondent responded that the person she was meeting hadn’t been helpful up to this point: “I always assume (and hope for) good intentions and a desire to do better. But I’ll admit I’m feeling less positive after a few days of not getting an answer.”

I continued:

Many of these sorts of meetings require negotiation, and good negotiation often involves withholding of information or outright deception, and I’m not good at either of these things, so I don’t even try. Instead I try some of the classic “Getting to Yes” strategies:
(1) Before the meeting, I ask myself what are my goals: my short-term goals for the meeting and my medium and long-term goals that I’m aiming for.
(2) During the meeting, I explicitly ask the other parties what their goals are.

When I think of various counterproductive interactions I’ve had in the past, often it seems this has come in part because I was not clear on my goals or on the goals of the other parties; as a result we butted heads when we could’ve found a mutually-beneficial solution. I’m including here some interactions with bad actors: liars, cheats, etc. Even when working with people you can’t trust, the general principles can apply.

It does not always make sense to tell the other parties what your goals are! But, don’t worry, most people won’t ever ask, as they will typically be focused on trying to stand firm on some micro-issue or another. Kinda like how amateur poker players are notorious for looking over and over again at their own hole cards and not looking enough at you.

The above advice may seem silly because you’re not involved in a negotiation at all! Even so, if you have a sense of what your goals are and what their goals are, this could be helpful. And be careful to distinguish goals from decision options. A goal is “I would like X to happen”; a decision option is “I will do Y.” It’s natural to think in terms of decision options, but I think this is limiting, compared to thinking about goals.

Anyway, that’s just my take from a mixture of personal experience and reading on decision making; I’ve done no direct research on the topic.

The above techniques are not any sort of magic; they’re just an attempt to focus on what is important.

Echoing Eco: From the logic of stories to posterior predictive simulation

“When I put Jorge in the library I did not yet know he was the murderer. He acted on his own, so to speak. And it must not be thought that this is an ‘idealistic’ position, as if I were saying that the characters have an autonomous life and the author, in a kind of trance, makes them behave as they themselves direct him. That kind of nonsense belongs in term papers. The fact is that the characters are obliged to act according to the laws of the world in which they live. In other words, the narrator is the prisoner of his own premises.” — Umberto Eco, Postscript to The Name of the Rose (translated by William Weaver)

Perfectly put. As I wrote less poetically a few years ago, the development of a story is a working-out of possibilities, and that’s why it makes sense that authors can be surprised at how their own stories come out. In statistics jargon, the surprise we see in a story is a form of predictive check, a recognition that a scenario, if carried out logically, can lead to unexpected places.

In statistics, one reason we make predictions is to do predictive checks, to elucidate the implications of a model, in particular what it says (probabilistically) regarding observable outcomes, which can then be checked with existing or new data.

To put it in storytelling terms, if you tell a story and it leads to a nonsensical conclusion, this implies there’s something wrong with your narrative logic or with your initial scenario.

Again, I really like how Eco frames the problem, reconciling the agency of the author (who is the one who comes up with premise and the rules of the game and who works out their implications) and the apparent autonomy of the character (which is a consequence of the logic of the story).

This also connects to a discussion we had a year ago about chatbots. As I wrote at the time, a lot of what I do at work—or when blogging!—is a sort of autocomplete, where I start with some idea and work out its implications. Indeed, an important part of the writing process is to get into a flow state where the words, sentences, and paragraphs come out smoothly, and in that sense there’s no other way to do this than with some sort of autocomplete. Autocomplete isn’t everything—sometimes I need to stop and think, make plans, do some math—but it’s a lot.

Different people do autocomplete in different ways. Just restricting ourselves to bloggers here, give the same prompt to Jessica, Bob, Phil, and Lizzie, and you’ll get four much different posts—and, similarly, Umberto Eco’s working out of the logic of a murder in a medieval monastery will come out different from yours. Not even considering the confounding factor that we get to choose the “prompts” for our blog posts, and Eco picked his scenario because he thought it would be fruitful ground for a philosophical novel. Writing has value, in the same way that prior or posterior simulation has value: we don’t know how things will come out any more than we can know the millionth digit of pi without doing the damn calculation.

Progress in 2024 (Jessica)

2024 was an enjoyable year. Below are a few things I did.

Conference and journal papers published

Three of my collaborators above were Ph.D. students I advised, who graduated in 2024! Congrats to Priyanka Nanayakkara (now a postdoc at Harvard CRCS), Hyeok Kim (now a postdoc at University of Washington CS, on the academic job market), and Dongping Zhang (now a research scientist at NREL). 

Talks (that are available online)

I gave a several other talks that I think exist somewhere online, but I can’t find the links. 

Workshop participation/organization 

The highlights of the year for me were workshops I attended over the summer. Getting to travel to interesting places to think deeply about topics you find fascinating is truly a privilege. 

I co-organized a few other workshops I enjoyed:

My goal for next year is to do more creative writing. I would be thrilled if I could write even a couple poems I’m happy with.