What major works of literature were written after age of 85? 75? 65?!

Posted on March 25, 2026 9:55 PM by Witold Więcek

EDIT: removed a couple of false positives based on comments from people. 2/04/2026: added a GitHub repository with data for this post

This is Witold.

The other day we were discussing with Andrew if there are major works in fiction published by authors over age of 85. [A few years ago I wrote a post, “What’s the best novel ever written by an 85-year-old?. — AG]

Thomas Pynchon is 88 and published a new novel a few months back. I’m yet to read it, but I understand the consensus is that it’s far from his best. (Then again, people have been saying that about Pynchon for a long time, so maybe age has nothing to do with it.)

In trying to come up with some good examples I asked LLMs. Turns out that Sophocles lived into his 90s and at least two plays can be dated to his last few years. One of them is Philoctetes. And Goethe finished Faust at 82. Anyway, other than that I did not see a great LLM suggestion. So I tried to cast the net more broadly and asked LLMs to compile list of 10-20 writers considered canon in each decade since 1800, then identify all their notable works and years of publication. After some iterations with coding agents I got over 2,000 works by 200 authors.

It looks something like this:

[edit: you can get data at wwiecek/author_age]

They are definitely getting older with time (side question for another time: can all of the increase here be explained simply by gains in life expectancy?), but there are few points above 80.

When I checked closely, these points turned out to be mostly minor works. (EDIT: also hunted down several mistakes, as one would expect from LLMs; thanks to commenters.) Still, you can see some traces in the graph for Jules Verne, Lev Tolstoy, and Jose Saramago. Borges wrote “Shakespeare’s Memory”, a great short story—albeit you can say not very innovative—at 82/83 [thanks to Igor for this correction]

(Also interestingly, the trend in that graph keeps going up in recent years… but it looks to me like this is driven by lack of major works from young authors. It may be how my sample is constructed.)

Since I wasn’t seeing any titles that could be considered canon, I asked LLM to narrow my dataset to major works only:

There were so few above age of sixty-five that you can just write them down:

1874 – V. Hugo Ninety-three (72)
1947 – T. Mann Doctor Faustus (72)
1957 – B. Pasternak Doctor Zhivago (67)
1962 – K. A. Porter Ship of Fools (72)
1995 – J. Saramago Blindness (73)
2005 – C. McCarthy No Country for Old Men (72)
2006 – C. McCarthy The Road (73)
2020 – H. Mantel The Mirror and the Light (68)

I’ve read Mantel, McCarthy and Saramago from this list. “Ship of Fools” I’ve never even heard of. They’re probably great novels, but the list tops out at 73! That’s quite stark!

And it’s such a short list, too. Sure, it is very incomplete, because I started from a kind of “balanced” panel over time, not a list of all major writers… But I couldn’t think of any further additions! Asking chatbots for suggestions I’ve learned that Saul Bellow, Doris Lessing, and Don DeLillo kept writing into their 80s, but are they good books? I haven’t read them. People should put their contenders in comments.

FDA guidance on Bayesian clinical trials

Posted on January 15, 2026 11:38 AM by Witold Więcek

This is Witold.

The Food & Drug Administration just released a guidance for industry document on “Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products”:

This draft guidance provides guidance to sponsors and applicants submitting investigational new drug applications (INDs), new drug applications (NDAs), biologics licensing applications (BLAs), or supplemental applications on the appropriate use of Bayesian methods in clinical trials. Bayesian methods can be used in various ways in clinical trials. For example, Bayesian calculations can be used to govern the timing and adaptation rules for an interim analysis in an adaptive design, to inform design elements (e.g., dose selection) for subsequent clinical trials, or to support primary inference in a trial. The primary focus of this guidance is on the use of Bayesian methods to support primary inference in clinical trials intended to support the effectiveness and safety of drugs.

Find it here. It’s not a very long read, about 20 pages of pretty accessible text.

I saw several people on Twitter talking it up, with someone claiming that “FDA is Bayesian now!”. Well, apparently that level of analysis is the entry bar for punditry, so that instantly made me feel qualified.

More seriously though, I bet some of the readers will have interesting takes on this, so I thought it’s good to post it and have a discussion in comments. From my side, I will try to give a short understanding of what such a document is and a few impressions.

First of all, what is a guidance document? The FDA issues and updates dozens of such documents every year. First time you encounter one, it is a bit entertaining, because it opens up with a Big Disclaimer stating not only that this sets out FDA’s “thinking” and is not binding, but even all the way down to explaining the meaning of word “should”. Why? Well, as a gov’t agency the FDA sensibly doesn’t want to give an impression of creating a law. More broadly, however, it wants to keep their options open in what they approve and what they don’t.

But then why issue guidance at all? In some sense it is pharma companies that want to have their hands tied. Drug developers are very risk averse—and for a good reason. Given the scale of investment involved in clinical trials, high uncertainty, and timelines that can stretch into decades, producers just crave predictability. So instead of taking time to interact with the FDA to vet every trial decision, it’s good to have a set of rules.

By the way, each FDA division can have its own guidance. This new one deals with drugs and vaccines. FDA released a guidance for Bayesian stats in medical device clinical trials back in 2010. This makes sense, these trials are quite different from drug or vaccine trials and people have used Bayesian techniques there for a long time, or so I’m told—I have not read that 2010 guidance (although I can report that the actual header for the doc says “baysian statistics”, sic).

Is this related to the current US gov’t and recent changes at the FDA? That’s very unlikely. (Sorry to disappoint another Twitter commenter who concluded that “Trump admin is Bayesian”.) When I blogged about this three years ago, the guidance was already scheduled to come out in 2025. Documents like this do take a long while to create.

Onto the document. As I said, I will just share a few impressions and hope that others will have more interesting comments. (Or I may also just write a second blog post on this.) Here is what jumps out scrolling through the doc:

(1) There are some basic definitions to start with, but I don’t think this matters much. It’s interesting, however, to see what type of uses the guidance lists in the second section. That probably covers the type of trials that drug and vaccine developers want to run—or that the FDA would like to see more of.

(2) The list of use cases has six categories, five of which are roughly about “borrowing”: across clinical trial phases, across subpopulations of patients, across age groups (especially extrapolating to children), or across diseases. It’s very nice that this general idea is broken down across several practical categories and given actual real-life examples in each. Lastly, there is a slightly different category of a problem, which is dose-finding, especially in cancer drugs, where you try to balance toxicity with drug’s effect.

(3) What about “fancier” adaptive types of trials? For example, trials where you add/remove treatment/dosages arms dynamically and seamlessly move across clinical trial phases. They are already covered elsewhere and the FDA is working on a separate broader guidance on “complex innovative designs”. So I don’t think there is too much about them here, although they are covered in discussion of priors.

(4) Then there are sections on success criteria and operating characteristics, which set out how to design the trials. The first proposed approach to design is… “Calibration to Type I Error Rate”. That does not sound encouraging! Frank Harrel had a great short blog post that gave a practical example of a trial where mixing Bayesian and frequentist reasoning can lead to a mess. So at first glance this is a bit disappointing…

(5) …but right next the document talks about cases where sponsor and the FDA agree to not calibrate to Type I error. There is guidance on doing things more “Bayesianly”: for example, there is even “Success Criteria Based on Benefit-Risk Assessment or Decision-Theoretic Approaches”. “A decision-theoretic approach might include assessment of the potential negative consequences of approving an ineffective drug or of not approving an effective drug.” That’s very nice! However, at the same time it’s also highly generic—the bit I just listed is limited to a single paragraph. It’s hard for me to imagine someone just reading this and confidently concluding that they can build a trial around it.

OK, I am skimming at this point, as I think this is 1,000 words already. This will probably be a follow-up blog.

(6) Then there is a long section on priors. There are many reassuring bits, because examples seem quite specific. There is a discussion of what should be considered in construction of priors. There is a discussion of “dynamic discounting”. Pretty complex and detailed. But then in talking about borrowing, the guidance says: “Areas where informative priors have been most often proposed include pediatrics and rare diseases. Additional areas can be considered on a case-by-case basis, and FDA advises early discussion of such proposals with the Agency.” Again, if this is on “case-by-case” basis, apart from two areas where we already are Bayesian, then a guidance does not seem very helpful.

So… is the FDA Bayesian now? Well, the real question should be, does a guidance document give trial designers enough confidence to run Bayesian clinical trials in situations where there is a rationale to run a Bayesian trial? Time will tell, for now there is probably a great “the guidance is directionally correct, but is it significant?” joke in there somewhere.

GiveWell’s Change Our Mind contest, cost-effectiveness, and water quality interventions

Posted on April 8, 2023 6:13 AM by Witold Więcek

Some time ago I wrote about a new meta-analysis pre-print where we estimated that providing safe drinking water led to a 30% mean reduction in deaths in children under-5, based on data from 15 RCTs. Today I want to write about water, but from a perspective of cost-effectiveness analyses (CEA).

A few months ago GiveWell (GW), a major effective altruism charity, hosted a Change Our Mind contest. Its purpose was to critique and improve on GW’s process/recommendations on how to allocate funding. This type of contest is obviously a fantastic idea (if you’re distributing tens of millions of dollars to charitable causes, even a fraction of percent improvement to efficiency of your giving is worth paying good money for) and GW also provided pretty generous rewards for the top entries. There were two winners and I think both of them are worth blogging about:

1. Noah Haber’s “GiveWell’s Uncertainty Problem”
2. An examination of cost-effectiveness of water quality interventions by Matthew Romer and Paul Romer Present (MRPRP henceforth)

I will post separately on the uncertainty analysis by Haber sometime soon, but today I want to write a bit on MRPRP’s analysis.

As I wrote last time, back in April 2022 GW recommended a grant of $65 million for clean water, in a “major update” to their earlier assessment. The decision was based on a pretty comprehensive analysis by GW, which estimated cost-benefit of specific interventions aimed at improving water quality in specific countries.[1] (Scroll down for footnotes. Also, I’m flattered to say that they also cited our meta-analysis a motivation for updating their assessment.) MRPRP re-do the GW’s analysis and find effects that are 10-20% smaller in some cases. This is still highly cost effective, but (per the logic I already mentioned) even small differences in cost-effectiveness will have large real-world implications for funding, given that funding gap for provision of safe drinking water is calculated in hundreds of millions of dollars.

However, my intention is not to argue what the right number should be. I’m just wondering about one question these kind of cost-effectiveness analyses raise, which is how to combine different sources of evidence.

When trying to estimate how clean water reduces mortality in children, we can estimate these reductions due to clean water either by looking at direct experimental evidence (e.g. in our meta-analysis) or indirectly: first you look at the estimates of reductions in disease (diarrhea episodes), then at evidence on how it links to mortality. The direct approach is the ideal (it is the ultimate outcome we care about; it is objectively measured and clearly defined, unlike diarrhea), but deaths are rare. That is why researchers studying water RCTs historically focused on reductions in diarrhea and often chose not to capture/report deaths. So we have many more studies of diarrhea.

Let’s say you go the indirect evidence route. To obtain an estimate, we need to know or make assumptions on (1) the extent of self-reporting bias (e.g. “courtesy” bias), (2) how many diseases can be affected by clean water, and (3) the potentially larger effect of clean water on severe cases (leading to death) than “any” diarrhea. Each of these are obviously hard. Direct evidence model (meta-analysis of deaths) doesn’t require any of these steps.

And once we have the two estimates (indirect and direct), then what? I describe GW process in footnotes (I personally think it’s not great but want to keep this snappy).[2] Suffice to say that they use the indirect evidence to derive a “plausibility cap”, the maximum size of the effect they are willing to admit into the CEA. MRPRP do it differently, by putting distributions on parameters in direct and indirect models and then running both in Stan to arrive at a combined, inverse-weighted estimate. [3] For example, for point (2) above (which diseases are affected by clean water), they look at a range of scenarios and put a Gaussian distribution with a mean at the most probable scenario and the most optimistic scenario being 2 SDs away. MRPRP acknowledge that this is an arbitrary choice.

A priori a model-averaging approach seems obviously better than taking a model and imposing an arbitrary truncation (like in GW’s old analysis). However, now depending on how you weigh direct vs indirect evidence models, you can have ~50% reduction or ~40% increase in the estimated benefits compared to GW’s previous analysis; a more extensive numerical example is below.[4] So you want to be very careful in how you weigh! E.g. for one of the programs MRPRP estimate of benefits is ~20% lower than GW’s, because in their model 3/4 of the weight is put on the (lower variance) indirect evidence model and it dominates the result.

In the long term the answer is to collect more data on mortality. In the short term probabilistically combining several models makes sense. However, putting 75% weight on a model of indirect evidence rather than the one with a directly measured outcome strikes me as very strong assumption and the opposite of my intuition. (Maybe I’m biased?) Similarly, why would you use Gaussians as a default model for encoding beliefs (e.g. in share of deaths averted)? I had a look at using different families of distributions in Stan and got to quite different results. (If you want to follow the details, my notes are here.)

More generally, when averaging over two models that are somewhat hard to compare, how should we think about model uncertainty? I think it would be a good idea in principle to penalise both models, because there are many unknown unknowns in water interventions. So they’re both overconfident! But how to make this penalty “fair” across two different types of models, when they vary in complexity and assumptions?

I’ll stop here for now, because this blog is already a bit long. Perhaps this will be of interest to some of you.

Footnotes:

[1] There many benefits of clean water interventions that a decision maker should consider (and the GW/MRPRP analyses do): in addition to reductions in deaths there are also medical costs, developmental effects, and reductions in disease. For this post I am only concerned with how to model reductions in deaths.

[2] GW’s process is, roughly, as follows: (1) Meta-analyse data from mortality studies, take a point estimate, adjust it for internal and external validity to make it specific to relevant contexts where they want to consider their program (e.g. baseline mortality, predicted take-up etc.). (2) Using indirect evidence they hypothesise what is the maximum impact on mortality (“plausibility cap”). (3) If the benefits from direct evidence exceed the cap, they set benefits to the cap’s value. Otherwise use direct evidence.

[3] By the way, as far as I saw, neither model accounts for the fact that some of our evidence on mortality and diarrhea comes from the same sources. This is obviously a problem, but I ignore it here, because it’s not related to the core argument.

[4] To illustrate with numbers, I will use GW’s analysis of Kenya Dispensers for Safe Water (a particular method of chlorination at water source), one of several programs they consider. (The impact of using MRPRP approach on other programs analysed by GiveWell is much less.) In GW’s analysis, the direct evidence model gave 6.1% mortality reduction, but plausibility cap was 5.6%, so they set it to 5.6%. Under the MRPRP model, the direct evidence suggests about 8% reduction, compared to 3.5% in the indirect evidence model. The unweighted mean of the two would be 5.75%, but because of the higher uncertainty on the direct effect the final (inverse-variance weighted) estimate is a 4.6% reduction. That corresponds to putting 3/4 of weight on indirect evidence. If we applied the “plausibility cap” logic to the MRPRP estimates, rather than weighing two models, the estimated reduction in mortality for Kenya DSW program would be 8% rather than 4.6%, a whooping 40% increase on GW’s original estimate.

Water Treatment and Child Mortality: A Meta-analysis and Cost-effectiveness Analysis

Posted on January 25, 2023 5:14 AM by Witold Więcek

This post is from Witold.

I thought some of you may find this pre-print (that I am a co-author of) interesting. It’s a meta-analysis of improving water quality in low and middle income countries. We estimated this reduced odds of child mortality by 30% based on 15 RCT. That’s obviously a lot! If true, this would have very large real-world implications, but there are of course statistical considerations of power, publication bias etc. So I thought that maybe some of the readers will have methodological comments while others may be interested in the public health aspect of it. It also ties to a couple of follow-up posts I’d like to write here on effective altruism and finding cost-effective interventions.

First, a word on why this is an important topic. Globally, for each thousand births, 37 children will die before the age of 5. Thankfully, this is already half of what it was in 2000. But it’s still about 5 million deaths per year. One of the leading causes for death in children is diarrhea, caused by waterborne diseases. While chlorinating [1, scroll down for footnotes] water is easy, inexpensive, and proven to remove pathogens from water, there are many countries where most people still don’t have access to clean water (the oft-cited statistic is that 2 billion people don’t have access to safe drinking water).

What is the magnitude of impact of clean water on mortality? There is a lot of experimental evidence for reductions in diarrhea, but making a link between clean water and mortality requires either an additional, “indirect”, model connecting disease to deaths, which is hard [2], or directly measuring deaths, which are rare (hence also hard) [3].

In our pre-print [4], together with my colleagues Michael Kremer, Steve Luby, Ricardo Maertens, and Brandon Tan we identify 53 RCTs of water quality treatments. Contacting the authors of each study resulted in 15 estimates that could be meta-analysed, with about 25,000 children. (Why only 15 out of 53? Apparently because the studies were not powered for mortality, with each one of them contributing just a handful of deaths, in some cases the authors decided to not collect, retain or report deaths.) As far as we are aware, this is the first attempt to meta-analyse experimental evidence on mortality and water quality.

We conduct a Bayesian meta-analysis of these 15 studies using a logit model and find a 30% reduction in odds of all-cause mortality (OR = 0.70, with a 95% interval 0.49 to 0.93), albeit with high (and uncertain) heterogeneity across studies, which means the predictive distribution for a new study has a much wider interval and slightly higher mean (OR=0.75, 95% interval 0.29 to 1.50). This heterogeneity is to be expected because we compare different types of interventions in different populations, across a few decades.[5] (Typically we would want to address this with a meta-regression, but that is hard due to a small sample.)

The whole analysis is implemented in baggr, an R package that provides meta-analysis interface for Stan. There are some interesting methodological questions related to modeling of rare events, but repeating this analysis using frequentist methods (random-effects model on Peto’s OR’s has a mean OR of 0.72) as well as various sensitivity analyses we could think of all lead to similar results. We also think that publication bias is unlikely. Still, perhaps there are things we missed.

Based on this we calculate about $3,000 cost per child death averted, or under $40 per DALY. It’s hard to convey how extremely cost-effective this is (a typical cost effectiveness threshold is equivalent of one years GDP per DALY; this is reached at 0.6% reduction in mortality), but basically it is on par with the most cost-effective child health interventions such as vaccinations.

Since the cost-effectiveness is potentially so high, there are obviously big real-world implications. Some funders have been reacting to the new evidence already. For example, some months ago GiveWell, an effective altruism non-profit that many readers will already be familiar with, conducted their own analysis of water quality interventions and in a “major update” of their assessment recommended a grant of $65 million toward a particular chlorination implementation [6]. (GiveWell’s assessment is an interesting topic for a blog post of its own, so I hope to write about it separately in the next few days.)

Of course in the longer term more RCTs will contribute to precision of this estimate (several are being worked on already), but generating evidence is a slow and costly process. In the short term the funding decisions will be driven by the existing evidence (and our paper is still a pre-print), so it would be fantastic to see if readers have comments on methods and its real-world implications.

Footnotes:

[1] For simplicity I simply say “chlorination” but this may refer to chlorinating at home, at the point from which water is drawn, or even using a device in the pipe, if households have piped water which may be contaminated. Each of these will have different effectiveness (primarily due to how convenient it is to use) and costs. So differentiating between them is very important for a policy maker. But in this post I group all of this to keep things simple. There are also other methods of improving quality, e.g. filtration. If you’re interested, this is covered in more detail in the meta-analyses that I link to.

[2] Why is extrapolating from evidence on diarrhea into mortality hard? First, it is possible that reduction in severe disease is higher (in the same way that vaccine may not protect you from infection, but it will almost definitely protect you from dying). Second, clean water also has lots of other benefits, e.g. it likely makes children less susceptible to other infections, nutritional deficiencies, and also makes their mothers healthier (which could in turn lead to fewer deaths during birth). So while these are just hypotheses, it’s hard a priori to say how a reduction in diarrhea would translate to a reduction in mortality.

[3] If you’re aiming for 80% power to detect 10% reduction in mortality you will need RCT data on tens of thousands of children. Exact number of course depends on how high baseline mortality rate is in the studies.

[4] Or, to be precise, an update to a version of this pre-print which we released in February 2022. If you happened to read the previous version of the paper, both main methods and results are unchanged, but we added extra publication bias checks, characterization of the sample and rewrote most of the paper for clarity.

[5] That last aspect of heterogeneity seems important, because some have argued that the impact of clean water may diminish with time. There is a trace of that in our data (see supplement), but with 15 studies the power to test for this time trend is very low (which I show using a simulation approach).

[6] GiveWell’s analysis included their own meta-analysis and led to more conservative estimates of mortality reductions. As I mention at the end of this post, this is something I will try to blog about separately. Their grant will fund Dispensers for Safe Water, an intervention which gives people access to chlorine at the water source. GW’s analysis also suggested a much larger funding gap in water qulity interventions, of about $350 million per year.

Statistical Modeling, Causal Inference, and Social Science

Author Archives: Witold Więcek

What major works of literature were written after age of 85? 75? 65?!

FDA guidance on Bayesian clinical trials

GiveWell’s Change Our Mind contest, cost-effectiveness, and water quality interventions

Water Treatment and Child Mortality: A Meta-analysis and Cost-effectiveness Analysis