Remembering David MacKay

Pilgrim Beart informs us that is Cambridge Philosophical Society is organizing a meeting on Energy and Information to mark the 10th anniversary of David MacKay’s untimely death. Beart asked me to share any tribute to MacKay, and I sent him this:

In his classic 2003 book on information theory, David wrote that, “in many problems, we really only need about twelve independent samples.” He explains: “Now, how accurately would a manager like to know [a parameter] Φ? I would suggest there is little point in knowing Φ to a precision finer than about σ/3. After all, the true cost is likely to differ by ±σ from Φ. If we obtain R = 12 independent samples from P(x), we can estimate Φ to a precision of σ/√12 – which is smaller than σ/3. So twelve samples suffice.”

This led me to wonder . . . . . . where did the “12” come from in MacKay’s passage? When I first saw the number there, I assumed it would have something to do with the variance of the uniform distribution. But that doesn’t seem to come up at all. Accepting MacKay’s stipulation that σ/3 would be enough precision in this sort of example, shouldn’t he have said that 9 random draws would suffice? I could see him rounding that up to 10. Or even 16, if he wanted to get all base-2 on us. But why 12? I didn’t get this at all.

So I asked David. Here was his reply: “You said that can imagine rounding up 9 to 10 – which would be elegant if we worked in base 10. But in the UK we haven’t switched to base 10 yet, we still work in dozens and grosses. . . . Probably in an earlier draft of the book in 2001 I said ‘a dozen’, rather than ’12’. Then some feedbacker may have written and said ‘I don’t know what a dozen is’; so then I sacrificed elegant language and replaced “dozen” by ’12’, which leads to your mystification.”

This reminds me that, when I was a kid, my dad told me that in Britain there was a Duodecimal Society, devoted to promoting base 12. This was told to me in a “There will always be an England” sort of thing. In writing this note, I was curious so I looked it up on the web, and it turns out that there is such a society, but it was first established in the United States. So there! I guess David was more American than he realized.

I thought that would be better than ranting about why I agree with Radford Neal and disagree with David MacKay about Occam’s razor (although David claimed I actually agreed with him). I guess I can send them that story in the next commemoration, a decade from now.

It’s actually a good story in that I thought we disagreed, but David disagreed with me on whether there was a disagreement.

David Mackay was a smart, generous, and committed person. It’s good to remember that such people exist: not every important person science is like this guy or these guys or these guys or these guys. The greedy people who did all those things often seem to feel like their behavior is ok because everybody does it. David Mackay is a reminder that, no, better things are possible.

A link to the conference is here. There’s also a form where you can share your memories of him.

Keith O’Rourke’s final published paper: “Statistics as a social activity: Attitudes toward amalgamating evidence”

Keith O’Rourke passed away two years ago. Here’s his obituary, which was sent to me by Bart Harvey:

Lover of earth, wood, water and fire, Keith left us after a brief illness on November 27, 2022. Born to Evelyn and Frank O’Rourke, he was the second of four sons. He met Marlene and they shared their life together for 39 years.

Keith worked in landscaping throughout his undergraduate years at the University of Toronto, then at Moore Business Forms, and in the western and northern provinces and territories in the field of compressed air before returning to UofT to complete and MBA and undertake an MSc. For many years he worked as a biostatistician at the Toronto General Hospital and the Ottawa Hospital on numerous studies in the fields of cancer, diabetes, SARS and infectious diseases research before completing a PhD at Oxford University in 2004. Having worked at Duke University, McGill and Queen’s, he joined Health Canada in the late 2000’s as a biostatistician, initially in health care and then in pesticide management. A conscientious intellectual and deep-thinker, Keith endeavoured to make our world a better and safer place.

Known to enjoy the occasional three fingers of Glenfiddich or an IPA, Keith pondered the mysteries of health and safety while taking longs walks in nature, nurturing many maple, birch and oak saplings, cutting up downed trees at our cottage in the Lanark Highlands, building bonfires and stoking every wood stove he had access to. A black belt in Kung Fu, and an assistant coach in boxing at Oxford he was known to jump in the ring and spar with up-and-coming kick-boxers into his mid-sixties, until Covid ended access to the ring.

We had an unpublished project together which we had not touched since 2016. I recently revised the paper, and it will be published. The article was originally titled, “Attitudes toward amalgamating evidence in statistics”; the final version is called, “Statistics as a social activity: Attitudes toward amalgamating evidence,” and here’s the abstract:

Amalgamation of evidence in statistics is done in several ways. Within a study, multiple observations are combined by averaging or as factors in a likelihood or prediction algorithm. In multilevel modeling or Bayesian analysis, population or prior information are combined with data using the weighted averaging derived from probability modeling. In a scientific research project, inferences from data analysis are interpreted in light of mechanistic models and substantive theories. Within a scholarly or applied research community, data and conclusions from separate laboratories are amalgamated through a series of steps including peer review, meta-analysis, review articles, and replication studies.

These issues have been discussed for many years in the philosophy of science and statistics, gaining attention in recent decades first with the renewed popularity of Bayesian inference and then with concerns about the replication crisis in science. In this article, we review amalgamation of statistical evidence from different perspectives, connecting the foundations of statistics to the social processes of validation, criticism, and consensus building.

I’m pretty sure that this will be Keith’s final published article. I very much regretted not having him around when revising the paper; we would’ve had lots to talk about.

Dave Krantz

“One of the things that makes scientific research hard is that one is usually not sure what hat one should be wearing in the given situation.” — David H. Krantz, 1938-2023

It’s been a bunch of years since I talked or corresponded with Dave Krantz—a quick check of my email reveals one email from 2013, an exchange from 2012, one item from 2011, one item from 2010, one from 2009, one from 2008, and, before that I guess he was still at Columbia and I was talking to him regularly in person.

After he passed away, I was asked to contribute something to his obituary. This is what I sent:

Dave had deep knowledge about everything. I treasure the many long conversations we had at Columbia. Just about every day something comes up in my teaching or research for which I’d love to have Dave’s perspective. Fortunately, he wrote well and had long conversations with many people, so his ideas and perspectives will remain with us.

I was surprised when searching online to not find any academic memoirs or obituaries of Dave—maybe the Society for Judgement and Decision Making is preparing some something? To help fill the gap, I thought I’d post some of the last emails I received from Dave, as they give a sense of his depth and range.

The last email came on 11 Jul 2013. I’d sent Dave an email discussion I’d had with Dan Kahan—hey, that’s another person I haven’t heard from in many years!—regarding hypothesis testing. In one of those messages, I’d written this to Dan:

Regarding your [Dan’s] later point about evaluating competing hypotheses, that’s another (interesting) story. Dave Krantz once was telling me his research regarding how people think of evidence comparing hyps A and B. His point was that there are 4 types of information:
– supporting A
– supporting B
– opposing A
– opposing B
In usual likelihood-ratio or Bayes theory, all 4 sorts of evidence can be placed on a common scale, but in his expts people treated them differently. I think one example was a comparison between two settings: (a) weak evidence in favor of A, weak evidence in favor of B; (b) strong evidence against A, strong evidence against B. This was in a situation where the truth really had to be A or B, there was no “other” option. At least, that’s how I remember it. Anyway, I think Dave told me that people had much different reactions to settings (a) and (b). Which I believe.

And this was Dave’s reply to me (addressing the entire thread, not just my snippet above):

Thanks for sharing this interesting discussion with me. I have a few comments.

First, I do of course agree with Kahan’s point about multiple working hypotheses and good experimental design. John Platt’s Science paper on “strong inference” used to be the standard discussion of this point; but I think it has been forgotten by now …

Nonetheless, there are situations where one uses evidence to reject a hypothesis even though it does not support any alternative. In lab experiments, the best examples are those in which a study is designed to discriminate two or more scientifically interesting theories but the data reject all of them strongly. One usually concludes that something has gone seriously wrong with method or with modeling. “Something gone wrong” is a diffuse alternative, which was not originally under consideration in the design and which does not lend itself to generating conditional probabilities of the observations, given the hypothesis. Sometimes one is led a step farther: something is wrong and THAT is really interesting — new hypotheses, never before considered, are generated. It is the use of evidence to generate diffuse alternatives or interesting novel alternatives that undercuts the likelihood principle as well as the more general Platt principle of “strong inference.”

A legal example would be discovering that the prime suspect has an airtight confirmed alibi. One (provisionally) rejects the hypothesis that he or she is guilty, but the alternative, “someone else” is too diffuse to be useful.

Second, I think that your characterization of NHST is exactly right.

Third, in answer to your questions about different sorts of evidence, I think you are conflating several things that I’ve worked on; and also, it is necessary to take framing into account as to what is evidence for or against. With one prime suspect, an alibi is evidence against. With two prime suspects, an alibi for one is (logically) evidence pointing to the other, and can be framed either way.

One of the differences between strong evidence for each of two incompatible hypotheses, A and B, versus weak evidence for each, is that a moderate piece of evidence pointing to one “should” weigh fairly heavily in the weak case but not so much in the strong case. When I say “should” I am referring both to my own intuitions and to the Dempster-Shafer rule assuming cognitively independent evidence; but NOT to empirical studies. I did do some of those, but the results were too confusing and messy to publish.

The best recent work about evidence for and against, with framing, is by Elke Weber and Eric Johnson (Query Theory). You probably know about this.

I can describe my own forays into empirical studies of evidence judgment (joint work with Laura Briggs and with Rahul Dodhia) and empirical studies of the use of weak evidence in probability judgment (modifications of Tversky’s Support Theory, with Dan Osherson); but most of it is not that relevant. The most important is the study that Briggs and I published (J. Behavioral Decision Making, 1992, 5, 77-106). It is particularly important for separating evidence judgment from probability judgment and for showing at least some possibility of focusing judgment on “designated” evidence — something that is critical in many tasks, from juror decision making to preparation of arguments.

Fourth, and finally, I’m sure that I don’t need to tell Dan Kahan to be careful in judging rationality of rules of evidence. In particular, legal systems have multiple goals: not just making judgments of evidence in particular cases but also system goals. One of the latter is to deter the use of inflammatory prejudicial information within the legal system.

If X embezzled in the past, this is pretty good evidence that X will embezzle in the future — thus, relevant to hiring decisions — but it is not good evidence that X embezzled on the present occasion; and it may well have more prejudicial than probative value, taking into account the temperament of judges and juries, and also taking into account system goals as noted above. On the other hand, if X has demonstrated scrupulous honesty in the past, this is decent character evidence against an accusation on the present occasion; but it then becomes admissible to refute such character evidence by pointing to past occasions of embezzlement.

As to things in general — I’ve been well, mostly these days in Nashville, but planning to continue teaching every Winter term at Columbia for the next few years. I hope you and your family are all well.

My second-to-last discussion with Dave was on 14 May 2012. I forwarded to him a blog post entitled, “I hate to get all Gerd Gigerenzer on you here, but . . .”, that was critical of a researcher in the judgment and decision making space. We had an exchange of several emails back and forth. Some of this involved discussion of the contributions of Gigerenzer and some it involved Dave’s skepticism of Tversky and Kahneman’s “two-system” theory, and I connected this to my later-published work with Basbøll on the nature of stories. Dave summarized the discussion in this way:

Everyone seems to be capable of failing to apply things that they know well when they are not being tip-top-alert.

I have long puzzled about the ontogeny of mathematical reasoning. On the one hand, we almost all have one heart, two kidneys (I am down to 1 2/3), etc., and we are capable of knowing one another’s minds on the basis of our own. We do make mistakes in this; but we make mistakes in inferences about our own goals and feelings as well. So if most humans are so similar, why do we see such a range of performance in mathematics, dance, musical composition, etc.? Some of it I believe derives from huge positive-feedback loops. Each good move is satisfying and motivates the person making it to develop further.

There are many tricky little problems that have been used in the laboratory to show how stupid people are. I have hardly ever fallen for them, but it isn’t because I’m smarter, it is because I’ve developed my mathematical persona and have confidence in it, and when I see such problems I don’t respond to them as Dave Krantz, I respond to them as Dave Krantz-plus-math hat. Someone who lacks confidence in his or her math hat may respond by putting on a dunce hat instead, or more simply, no hat at all. If the situation does not cue me to put the math hat on — as, for example, with Tversky and Kahneman’s original conjunction fallacy problems — then I behave just like most people. One of the things that makes scientific research hard is that one is usually not sure what hat one should be wearing in the given situation.

People don’t like this account much and it may be wrong. Certainly it strains to account for the feats of Mozart, Capablanca, and Picasso. But “supposed feats” would be more accurate. The documentation is thin.

Dave was an incredibly thoughtful person. OK, not always—like all of us, he was capable of failing to apply things that he knew well when he was not being tip-top-alert—but he was tip-top-alert most of the times that I ever saw him. Generous, amazing insights, big picture, the whole thing.

P.S. Here are some Krantz-related items from the blog archives:

12 Jul 2005: Dave Krantz on decision analysis and quantum physics, leading to a Jim Thomspon reference and then back to Penrose’s theory that consciousness is inherently quantum-mechanical

13 Jul 2007: Goals and plans in decision making

5 Oct 2007: More on significance testing in economics

25 Oct 2007: Dave Krantz on utility and value

19 May 2009: I got the idea of type M errors from Dave Krantz; apparently it was a well-known concept in psychometrics (although without the “type M” name)

Brooks Robinson, Earl Weaver, and a general principle of management

Reading this mini-obituary of the great third baseman reminds me of a passage from Earl Weaver’s autobiography. It has a great passage . . . I don’t remember the exact words, but it’s relating some important game where Brooks Robinson was up at bat, and Earl told him, “Swing away, Brooksie.” Weaver always maintained that it was the players who won the game, not the manager, and that the manager’s job was to get the best out of each of his players. This is a good general principle, I think, and similar to Deming’s views on management.

Also it was good that when Weaver came back in 1985, his team did really badly. I mean, no, it wasn’t good, it was bad, it made me sad. But from a statistical perspective it was good to be reminded that even though Weaver was famous for being a genius, that wasn’t enough. Just cos you’re a genius, it doesn’t mean your teams will automatically win.

Carl Morris: Man Out of Time [reflections on empirical Bayes]

Carl Morris recently passed away, so I will repost this article from 2015:

When Carl Morris came to our department in 1989, I and my fellow students were so excited. We all took his class. The funny thing is, though, the late 1980s might well have been the worst time to be Carl Morris, from the standpoint of what was being done in statistics at that time—not just at Harvard, but in the field in general. Carl has made great contributions to statistical theory and practice, developing ideas which have become particularly important in statistics in the last two decades. In 1989, though, Carl’s research was not in the mainstream of statistics, or even of Bayesian statistics.

When Carl arrived to teach us at Harvard, he was both a throwback and ahead of his time.

Let me explain. Two central aspects of Carl’s research are the choice of probability distribution for hierarchical models, and frequency evaluations in hierarchical settings where both Bayesian calibration (conditional on inferences) and classical bias and variance (conditional on unknown parameter values) are relevant. In Carl’s terms, these are “NEF-QVF” and “empirical Bayes.” My point is: both of these areas were hot at the beginning of Carl’s career and they are hot now, but somewhere in the 1980s they languished.

In the wake of Charles Stein’s work on admissibility in the late 1950s there was an interest, first theoretical but with clear practical motivations, to produce lower-risk estimates, to get the benefits of partial pooling while maintaining good statistical properties conditional on the true parameter values, to produce the Bayesian omelet without cracking the eggs, so to speak. In this work, the functional form of the hierarchical distribution plays an important role—and in a different way than had been considered in statistics up to that point. In classical distribution theory, distributions are typically motivated by convolution properties (for example, the sum of two gamma distributions with a common shape parameter is itself gamma), or by stable laws such as the central limit theorem, or by some combination or transformation of existing distributions. But in Carl’s work, the choice of distribution for a hierarchical model can be motivated based on the properties of the resulting partially pooled estimates. In this way, Carl’s ideas are truly non-Bayesian because he is considering the distribution of the parameters in a hierarchical model not as a representation of prior belief about the set of unknowns, and not as a model for a population of parameters, but as a device to obtain good estimates.

So, using a Bayesian structure to get good classical estimates. Or, Carl might say, using classical principles to get better Bayesian estimates. I don’t know that they used the term “robust” in the 1950s and 1960s, but that’s how we could think of it now.

The interesting thing is, if we take Carl’s work seriously (and we should), we now have two principles for choosing a hierarchical model. In the absence of prior information about the functional form of the distribution of group-level parameters, and in the absence of prior information about the values of the hyperparameters that would underlie such a model, we should use some form with good statistical properties. On the other hand, if we do have good prior information, we should of course use it—even R. A. Fisher accepted Bayesian methods in those settings where the prior distribution is known.

But, then, what do we do in those cases in between—the sorts of problems that arose in Carl’s applied work in health policy and other areas? I learned from Carl to use our prior information to structure the model, for example to pick regression coefficients, to decide which groups to pool together, to decide which parameters to model as varying, and then use robust hierarchical modeling to handle the remaining, unexplained variation. This general strategy wasn’t always so clear in the theoretical papers on empirical Bayes, but it came through in the Carl’s applied work, as well as that of Art Dempster, Don Rubin, and others, much of which flowered in the late 1970s—not coincidentally, a few years after Carl’s classic articles with Brad Efron that put hierarchical modeling on a firm foundation that connected with the edifice of theoretical statistics, gradually transforming these ideas from a parlor trick into a way of life.

In a famous paper, Efron and Morris wrote of “Stein’s paradox in statistics,” but as a wise man once said, once something is understood, it is no longer a paradox. In un-paradoxing shrinkage estimation, Efron and Morris finished the job that Gauss, Laplace, and Galton had begun.

So far, so good. We’ve hit the 1950s, the 1960s, and the 1970s. But what happened next? Why do I say that, as of 1989, Carl’s work was “out of time”? The simplest answer would be that these ideas were a victim of their own success: once understood, no longer mysterious. But it was more than that. Carl’s specific research contribution was not just hierarchical modeling but the particular intricacies involved in the combination of data distribution and group-level model. His advice was not simply “do Bayes” or even “do empirical Bayes” but rather had to do with a subtle examination of this interaction. And, in the late 1980s and early 1990s, there wasn’t so much interest in this in the field of statistics. On one side, the anti-Bayesians were still riding high in their rejection of all things prior, even in some quarters a rejection of probability modeling itself. On the other side, a growing number of Bayesians—inspired by applied successes in fields as diverse as psychometrics, pharmacology, and political science—were content to just fit models and not worry about their statistical properties.

Similarly with empirical Bayes, a term which in the hands of Efron and Morris represented a careful, even precarious, theoretical structure intended to capture classical statistical criteria in a setting where the classical ideas did not quite apply, a setting that mixed estimation and prediction—but which had devolved to typically just be shorthand for “Bayesian inference, plugging in point estimates for the hyperparameters.” In an era where the purveyors of classical theory didn’t care to wrestle with the complexities of empirical Bayes, and where Bayesians had built the modeling and technical infrastructure needed to fit full Bayesian inference, hyperpriors and all, there was not much of a market for Carl’s hybrid ideas.

This is why I say that, at the time Carl Morris came to Harvard, his work was honored and recognized as pathbreaking, but his actual research agenda was outside the mainstream.

As noted above, though, I think things have changed. The first clue—although it was not at all clear to me at the time—was Trevor Hastie and Rob Tibshirani’s lasso regression, which was developed in the early 1990s and which has of course become increasingly popular in statistics, machine learning, and all sorts of applications. Lasso is important to me partly as the place where Bayesian ideas of shrinkage or partial pooling entered what might be called the Stanford school of statistics. But for the present discussion what is most relevant is the centrality of the functional form. The point of lasso is not just partial pooling, it’s partial pooling with an exponential prior. As I said, I did not notice the connection with Carl’s work and other Stein-inspired work back when lasso was introduced—at that time, much was made of the shrinkage of certain coefficients all the way to zero, which indeed is important (especially in practical problems with large numbers of predictors), but my point here is that the ideas of the late 1950s and early 1960s again become relevant. It’s not enough just to say you’re partial pooling—it matters _how_ this is being done.

In recent years there’s been a flood of research on prior distributions for hierarchical models, for example the work by Nick Polson and others on the horseshoe distribution, and the issues raised by Carl in his classic work are all returning. I can illustrate with a story from my own work. A few years ago some colleagues and I published a paper on penalized marginal maximum likelihood estimation for hierarchical models using, for the group-level variance, a gamma prior with shape parameter 2, which has the pleasant feature of keeping the point estimate off of zero while allowing it to be arbitrarily close to zero if demanded by the data (a pair of properties that is not satisfied by the uniform, lognormal, or inverse-gamma distributions, all of which had been proposed as classes of priors for this model). I was (and am) proud of this result, and I linked it to the increasingly popular idea of weakly informative priors. After talking with Carl, I learned that these ideas were not new to me, indeed these were closely related to the questions that Carl has been wrestling with for decades in his research, as they relate both to the technical issue of the combination of prior and data distributions, and the larger concerns about default Bayesian (or Bayesian-like) inferences.

In short: in the late 1980s, it was enough to be Bayesian. Or, perhaps I should say, Bayesian data analysis was in its artisanal period, and we tended to be blissfully ignorant about the dependence of our inferences on subtleties of the functional forms of our models. Or, to put a more positive spin on things: when our inferences didn’t make sense, we changed our models, hence the methods we used (in concert with the prior information implicitly encoded in that innocent-sounding phrase, “make sense”) had better statistical properties than one would think based on theoretical analysis alone. Real-world inferences can be superefficient, as Xiao-Li Meng might say, because they make use of tacit knowledge.

In recent years, however, Bayesian methods (or, more generally, regularization, thus including lasso and other methods that are only partly in the Bayesian fold) have become routine, to the extent that we need to think of them as defaults, which means we need to be concerned about . . . their frequency properties. Hence the re-emergence of truly empirical Bayesian ideas such as weakly informative priors, and the re-emergence of research on the systematic properties of inferences based on different classes of priors or regularization. Again, this all represents a big step beyond the traditional classification of distributions: in the robust or empirical Bayesian perspective, the relevant properties of a prior distribution depend crucially on the data model to which it is linked.

So, over 25 years after taking Carl’s class, I’m continuing to see the centrality of his work to modern statistics: ideas from the early 1960s that were in many ways ahead of their time.

Let me conclude with the observation that Carl seemed to us to be a “man out of time” on the personal level as well. In 1989 he seemed ageless to us both physically and in his personal qualities, and indeed I still view him that way. When he came to Harvard he was not young (I suppose he was about the same age as I am now!) but he had, as the saying goes, the enthusiasm of youth, which indeed continues to stay with him. At the same time, he has always been even-tempered, and I expect that, in his youth, people remarked upon his maturity. It has been nearly fifty years since Carl completed his education, and his ideas remain fresh, and I continue to enjoy his warmth, humor, and insights.

Here is a video from his retirement event.

Keith O’Rourke

From his obituary:

Keith worked in landscaping throughout his undergraduate years at the University of Toronto, then at Moore Business Forms, and in the western and northern provinces and territories in the field of compressed air before returning to UofT to complete and MBA and undertake an MSc. For many years he worked as a biostatistician at the Toronto General Hospital and the Ottawa Hospital on numerous studies in the fields of cancer, diabetes, SARS and infectious diseases research before completing a PhD at Oxford University in 2004. Having worked at Duke University, McGill and Queen’s, he joined Health Canada in the late 2000’s as a biostatistician, initially in health care and then in pesticide management. A conscientious intellectual and deep-thinker, Keith endeavoured to make our world a better and safer place.

Longtime readers of this blog will recognize Keith for his occasional posts and many comments and his idealistic and skeptical take on medical statistics. We’re all sorry to hear that he is gone.

James Loewen

Paul Alper sent me this obituary by Robert McFadden of James Loewen, author of the classic book, Lies My Teacher Told Me. I never met Loewen and can’t really say more about his book except to recommend it, but I do have a story to share.

I read Lies My Teacher Told Me when it came out, back in 1995 when I was living in California. I think I just noticed the cover in the bookstore, picked it up, took a look, and bought it. Then at some point I was chatting with someone in the political science department and I mentioned the book. I was all excited about it and I was curious what this professor of American politics thought about it. And he didn’t care at all! I was so disappointed. I’d heard about professors being snobs about popular writing, but, jeez . . . this was important stuff. Then I mentioned Loewen’s book to a political science professor at another university, and he was like, oh, James Loewen, he goes around testifying in court using some statistical method he doesn’t understand. And my reaction (which I kept to myself) was, Yeah, but you don’t understand that statistical method either—and you also didn’t write a really cool book about all the problems with the teaching of U.S. history. I guess it was better to learn sooner than later about these blind spots in academia.

The good news is that my colleagues in the Columbia poli sci dept aren’t like that; they’re much more open to political insights from all directions. As we should all be. Recall that the key idea in our 2016 QJPS paper was anticipated by a blogger (not me) four years earlier.

Wilkinson’s contribution to interactive visualization

This is Jessica. Upon learning this morning that Lee Wilkinson passed away I also felt compelled to write something on the extent to which his work has influenced interactive visualization research. 

The Grammar of Graphics was an incredibly ambitious undertaking – Wilkinson set out to create a system that could produce any statistical graphic he’d ever seen, and that could deepen understanding of the meaning of graphics. The GoG demonstrates the minimum set of components necessary to generate a statistical graphic, under an understanding that a graph is a function: data, algebra, scales, statistics, geometry, coordinates, and aesthetics. I often tell students I teach that in visualization research we hate chart taxonomies, and GoG is perhaps the best demonstration of how much more deeply we can think about visualization. Wilkinson pointed out the “deep structure” in visualization, observing, for instance, that a pie chart is just the result of passing rectangular marks through a polar transformation. He was inspired by Bertin’s work on graphical symbolism, and GoG systematizes thinking about the design space of visualizations in a way that is ultimately generative as well. You might be able to use an interactive implementation of it to make some crazy graphs but nothing that isn’t meaningful. 

From what Wilkinson describes (e.g., in this recent podcast) some of the hard work behind the Grammar of Graphics was in the editing: making sure the system was complete and correct while also minimally complex. There are only three operators in the algebra – cross, nest, and blend – but they suffice. Tableau’s underlying table algebra and ggplot2 are examples of major components of the grammar used in today’s most popular visualization tools. At the same time, the book covers uncertainty, time, graph drawing, interactive control, and just about every other major branch of visualization research in some way, synthesizing important distinctions that would otherwise take someone a while to glean from the literature. 

My own admiration for Grammar of Graphics is partly why I chose to get into visualization back as a grad student. I remember thinking his concept of a frame was really important but underappreciated in any discussions I’d heard about visualization. I read it for the first time as a Ph.D. student and have been calling it my favorite book for years. Whenever I go back to reread chapters I always come away with some new appreciation. I even bring in a copy to pass around in my interactive visualization course, trying to get students to sense its influence and hopefully read it. Just looking at the examples is like an education in visualization.

I didn’t know Lee well at all, but recall meeting him for the first time at the IEEE VIS doctoral colloquium, back in 2012. I remember he came in very late, but just in time for my presentation on uncertainty visualization, and his enthusiasm for the ideas (basically hypothetical outcome plots) was the highlight of my week. A few years later I remember talking to him at Tableau about the same ideas, and that time he was more critical, arguing that they would never catch on, but I appreciated that too. Lee has been critical of the visualization research community at times over the years, and while it was sometimes it was tough to hear, it was always clear he cared deeply and was optimistic about the field’s progress and open-minded intellectual attitude. His perspective will be missed.

Lee Wilkinson

Lee Wilkinson is most famous for his book, The Grammar of Graphics, whose ideas were implemented in the widely used R package ggplot2 as well as Tableau, the popular commercial graphics program, and all sorts of other places. Arguably as important as the book itself is its title. The idea that graphics has a “grammar”—that’s a real breakthrough. It’s related to ideas in statistics (graphs as comparisons, graphs as model checks, graphs as exploration and being surprised, with surprise being implicitly defined relative to expectations and modeling) and to ideas in computer science (here I’m not so familiar with the literature, but I’m thinking of the idea of a graph being defined not by what it looks like but rather by the steps used to create it). The modern era of statistical graphics was begun by Tukey in the 1960s and 1970s and continued by Cleveland, Tufte, and others in the 1980s. I think of Wilkinson as the key figure in the next wave of work in this area: once the general messages (exploratory data analysis is important; graphs can be clear and beautiful) had been absorbed, there was space for new ways of thinking about the process of creating statistical graphics. Along these lines, I think it was valuable that Lee straddled the worlds of academia and commerce.

I did not know Lee well—I’m not sure exactly how many times we actually spoke, but it was less than ten times—but I considered him a friend. We corresponded by email from time to time. He was a sincere and thoughtful person, both soft-spoken and lively, if that is possible. He could be critical (see for example here) but only because he cared so much about statistics and its applications that it bothered him when people were being annoyingly stupid. I think it’s safe to say that Lee’s ideas will continue to influence data analysis for many decades to come.

“Maybe we should’ve called it Arianna”

Katie Hafner wrote this obituary of Arianna Rosenbluth, original programmer of what is known as the Metropolis algorithm:

Arianna Rosenbluth Dies at 93; Pioneering Figure in Data Science

Dr. Rosenbluth, who received her physics Ph.D. at 21, helped create an algorithm that has became a foundation of understanding huge quantities of data. She died of complications of the coronavirus. . . .

Despite her extraordinary work and despite earning her Ph.D. from Harvard at 21, Dr. Rosenbluth left the field in her mid-20s and rarely talked about her scientific achievement afterward. . . .

Arianna Wright was born on Sept. 15, 1927, in Houston to Augustus and Leffie (Woods) Wright. Her mother was a schoolteacher, and her father was an office manager for a flower company. . . .

Dr. Rosenbluth was an accomplished fencer, winning not only women’s championships but also men’s. But her plans to compete in the Olympics were foiled, first by the cancellation of the 1944 summer Games during World War II, then by a lack of funds for travel to the London Games in 1948, her daughter said. . . .

After completing her dissertation, she traveled west for a postdoctoral fellowship at Stanford University, funded by the Atomic Energy Commission. There she met Marshall Rosenbluth, another physicist. They married in 1951 and moved to New Mexico to work at Los Alamos. . . .

The group published its seminal paper, “Equation of State Calculations by Fast Computing Machines,” in The Journal of Chemical Physics in 1953. . . .

Arianna Rosenbluth’s contribution was crucial. “She actually did all the coding, which at that time was a new art for these new machines,” Marshall Rosenbluth said in 2003. Sophisticated programming tools were still years away, so Dr. Rosenbluth programmed in machine language . . .

Metropolis the man was involved in the Monte Carlo method in 1946, but the “Metropolis algorithm” (I guess we should call it the Rosenbluth and Rosenbluth algorithm) was from the early 1950s, at least that’s what Mr. Rosenbluth said. Metropolis in a 1987 memoir attributed the Monte Carlo algorithm to Ulam and Teller in 1947 or maybe 1946: they developed it for their H-bomb work. He also said that Fermi had independently discovered the principle in the early 1930s. Then again, maybe Laplace had some idea of this method back around 1800. The method becomes more useful once computing is available. This Metropolis article also has an amazing picture of a physical randomization device that was used to perform Monte Carlo simulations:

In that article, Metropolis mentions the method that is now called the Metropolis algorithm: “During this study a strategy was developed that led to greater computing efficiency for equilibrium systems obeying the Boltzmann distribution function. According to this strategy, if a statistical ‘move’ of a particle in the system resulted in a decrease in the energy of the system, the new configuration was accepted. On the other hand, if there was an increase in energy, the new configuration was accepted only if it survived a game of chance biased by a Boltzmann factor. Otherwise, the old configuration became a new statistic.” He describes this as “a collaborative effort with the Tellers, Edward and Mici, and the Rosenbluths, Marshall and Arianna.” But, as we know, Marshall said that Metropolis and Mici Teller didn’t do anything on it at all. So maybe this was just Metropolis being a diplomatic lab director.

In any case, the contributions of the Rosenbluths have endured. An algorithm becomes real when it is programmed.

Conway II

Following up on our post on John “Game of Life” Conway, Paul Alper points us to this informative obituary by Siobhan Roberts:

John Horton Conway was born on Dec. 26, 1937, in Liverpool, England, the third child and only son of Cyril and Agnes (Boyce) Conway. His father, an autodidact, had left school at age 14 and, with his photographic memory, made a living playing cards. Later he was a technician in the chemistry lab at the Liverpool Institute High School for Boys, setting up experiments for students, among them George Harrison and Paul McCartney.

Wow! That’s something. And then on the sadder side, a reminder of the complexity of real life, even for someone who was famous for being brilliant and playful:

His first two marriages, to Eileen Howe and Larissa Queen, ended in divorce. . . . Dr. Conway persevered in finding the fun through triple bypass surgery, a suicide attempt and a number of strokes. . . .

Also:

And there were ever more games of Phutball, which Dr. Conway was not very good at.

This gives a whole new twist to the story in the P.P.S. from our earlier post.

Carol Nickerson

Nick Brown informed me that Carol Nickerson passed away. Nick writes:

Carol was unemployed for the last five years of her life. She had been associate/adjunct faculty at UIUC for some time, but when I got to know her she was being let go after she refused to do something unethical for the person who signed off on her contracts. Still, she went along to the UIUC library every day to do research. When she lost her library privileges too, she switched to the Champaign Public Library. . . .

She had a tremendous eye for detail. . . . She printed out the whole of my [Nick’s] 2014 translation of Diederik Stapel’s book (https://nick.brown.free.fr/stapel ), annotated every page by hand for typos or incorrect American usage, then scanned them & mailed them back to me. . . .

Nick also mentions that he and Carol had exchanged 10,000 emails, which sounded like a lot—but then I checked my own mailbox and found approximately 1000 from Carol (not just to me; these were part of threads with many participants). So I guess emails just pile up, and they can easily run into the four or five figures.

The news of Carol’s passing is very sad. Like Nick, I’d never met Carol in person, but she had many thoughtful things to say over email. Our last interactions were in 2018, summarized here and here. In both cases, she put a lot of effort into tracking down details of some things that arguably weren’t worth her effort. She wanted to get to the bottom of things.

It’s also poignant to think of Carol in light of our recent discussions of the problems with the “great man” or “heroic mode” of science reporting, where we some pathbreaking genius tells us how he broke the rules and revolutionized how we think about the world. Carol was the opposite of this, in that she put her 10,000 hours into getting the details right. And she deserves our thanks for that.

“Richard Jarecki, Doctor Who Conquered Roulette, Dies at 86”

[relevant video]

Thanatos Savehn is right. This obituary, written by someone named “Daniel Slotnik” (!), is just awesome:

Many gamblers see roulette as a game of pure chance — a wheel is spun, a ball is released and winners and losers are determined by luck. Richard Jarecki refused to believe it was that simple. He became the scourge of European casinos in the 1960s and early ′70s by developing a system to win at roulette. And win he did, by many accounts accumulating more than $1.2 million, or more than $8 million in today’s money . . . He and his wife honed his technique at dozens of casinos, including in Monte Carlo; Divonne-les-Bains, France; Baden-Baden, Germany; San Remo, on the Italian Riviera; and, briefly, Las Vegas.

How did they do it?

At the time, Dr. Jarecki told reporters that he had cracked roulette with the help of a powerful computer at the University of London. But the truth was more prosaic. He accomplished his improbable lucky streak through painstaking observation, with no electronic assistance.

Ms. Jarecki said in a telephone interview on Monday that she, Dr. Jarecki and a handful of other people helping them would record the results of every turn of a given roulette wheel to discover its biases, or tendency to land on some numbers more frequently than others, usually because of a minute mechanical defect caused by shoddy manufacturing or wear and tear.

Here’s some juicy statistical detail:

Ms. Jarecki said that watching, or “clocking,” a wheel, as Mr. Barnhart described it, could mean observing more than 10,000 spins over as long as a month. Sometimes a wheel would yield no observable advantage. But when Dr. Jarecki and company did find a wheel with a discernible bias, he would have an edge over the house. “It isn’t something he invented,” Ms. Jarecki said. “It’s something he perfected.”

Wow. This obit has more statistical sophistication than most of the PNAS papers I’ve seen.

Jarecki was bi-cultural: He was born in Germany, then his family moved to the U.S. when he was a child, then after graduating from college he moved back to Germany, then he met his wife, an American, during a medical residency in New Jersey, then not long after that they returned to live in Germany together.

Also this:

In addition to his wife, with whom he also had a home in Las Vegas, he is survived by a brother, Henry, a billionaire psychiatrist, commodities trader and entrepreneur; two daughters, Divonne Holmes a Court and Lianna Jarecki; a son, John, a chess prodigy who became a master at 12; and six grandchildren.

Two nephews of Dr. Jarecki are the award-winning documentarians Andrew Jarecki (“Capturing the Friedmans” and the HBO series “The Jinx: The Life and Deaths of Robert Durst”) and Eugene Jarecki (“Why We Fight” and “The House I Live In).”

And, finally:

Dr. Jarecki moved to Manila about 20 years ago, his wife said, because he liked the lifestyle there and preferred the city’s casinos to those run by Americans.

His touch at the roulette wheel endured until nearly the end. Ms. Jarecki said he last played in December, at a tournament in Manila. He came in first.

Roulette tournaments? Who knew??

A style of argument can be effective in an intellectual backwater but fail in the big leagues—but maybe it’s a good thing to have these different research communities

Following on a post on Tom Wolfe’s evolution-denial trolling, Thanatos Savehn pointed to this obituary, “Jerry A. Fodor, Philosopher Who Plumbed the Mind’s Depths, Dies at 82,” which had lots of interesting items, including this:

“We think that what is needed,” they wrote, “is to cut the tree at its roots: to show that Darwin’s theory of natural selection is fatally flawed.” . . .

The book loosed an uproar among scientists. (Its review in the magazine Science appeared under the headline “Two Critics Without a Clue.”)

“He and Chomsky had a modus operandi which was ‘Bury your opponents as early as possible,’ ” Dr. [Ernie] Lepore said, speaking of Dr. Fodor. “And when he went up against the scientific community, I don’t think Fodor was ready for that. He basically told these guys that natural selection was bogus. The arguments are interesting, but he didn’t win a lot of converts.”

That’s an interesting idea, that a style of argument can be effective in an intellectual backwater such as academic linguistics but fail in the big leagues of biology. It’s not so bad to have these different academic communities: we can think of academic linguistics as a “safe space” where scholars can pursue ridiculous ideas that might still become useful.

If we crudely model scientific hypotheses as being true/false, or reasonable/unreasonable, then it can at times be a good research strategy to start in “reasonable” territory and then deliberately wander into the “unreasonable” zone as a way of better traversing the space of theories. The best way to get to new reasonable hypotheses might be to entertain some silly ideas, considering these ideas seriously enough to fully work through their implications. And perhaps that is what Foder was doing in his thought experiment of cutting the evolutionary tree “at its roots.”

At the same time, you can’t expect biologists to just sit there and take it. Hence the value of distinct research communities. As long as we’re not using linguists’ theories of evolution to fight disease, I guess we’re ok.

P.S. Peter Erwin convincingly makes the case that the above post is “massively and bizarrely unfair to linguistics, especially by taking a single, controversial theorist (that is, Chomsky; Fodor is a philosopher) as being somehow representative of the field.”

“Each computer run would last 1,000-2,000 hours, and, because we didn’t really trust a program that ran so long, we ran it twice, and it verified that the results matched. I’m not sure I ever was present when a run finished.”

Bill Harris writes:

Skimming Michael Betancourt’s history of MCMC [discussed yesterday in this space] made me think: my first computer job was as a nighttime computer operator on the old Rice (R1) Computer, where I was one of several students who ran Monte Carlo programs written by (the very good) chemistry prof Dr. Zevi Salsburg and his grad students.  As I recall, each computer run would last 1,000-2,000 hours, and, because we didn’t really trust a program that ran so long, we ran it twice, and it verified that the results matched.  I’m not sure I ever was present when a run finished.

I did a quick search and turned up Monte Carlo Procedure for Statistical Mechanical Calculations in a Grand Canonical Ensemble of Lattice Systems, which has an abstract that ends, “A comparison with the exact analytical results (B= ∞, Δ=0) indicates that the accuracy of the Monte Carlo procedure for the grand ensemble can be reliably estimated by a statistical analysis of partial averages over the Markov chain.” That sounds a bit like MCMC!  If so, what’s up with worries about a few days of HMC sampling.

>Here are a few pictures of the Rice Computer, along with the USAEC Bessel Function Generator.  Wikipedia has more, as does Google.

Thinking a bit more, I was told we were running it twice ’cause the hardware might make an error (or so I recall), but perhaps we were simply running two chains on a room-sized single processor with 32K words.

If you want more on Salsburg or on the R1, just ask.

Except that I obviously couldn’t remember how to spell his name right in 2011, here’s a short anecdote about him: https://makingsense.facilitatedsystems.com/2011/12/thinking-for-yourself.html.  https://ricehistorycorner.com/2010/11/18/zevi-salsburg/ is a bit more about him (and, looking at the picture, he’s third from the left, not right).  Limiting Polytope geometry for Rigid Rods, Disks, and Sphere appears to describe some of his research, although it’s too late for me to even pretend to skim it and make much sense of it tonight (the paper to the abstract I sent previously appears to be paywalled).

https://ricehistorycorner.com/2012/01/31/new-info-on-the-rice-computers/ is a bit more on the Rice Computer and Salsburg, and https://archive.li/opI1Y is perhaps the definitive online documentation about the machine.  It indicates that (apparently) some or much of Salsburg’s work on the Rice Computer was done on the bare machine, which means no Genie programming language; I don’t know if it meant no assembler.  The computer’s ability to do dynamic memory allocation using tagged memory and codewords is the reason I always heard Salsburg wanted this machine; the IBM machines of the time ran out of memory and didn’t, apparently, have the ability to reclaim unused memory.

And it’s still my favorite computer!  Real superscripts and subscripts, thanks to the Friden Flexowriter and the Genie language, and flashing neon lamps everywhere, which made an impressive sight at night, especially if you turned the room lights off.

Oooh, I love this sort of thing. I guess that’s a sign that I’m getting old. 2018 is in the future, after all.

P.S. After doing some more digging, Harris adds:

I found reference 69 in chapter 4 of Heermann’s Computer Simulation Methods in Theoretical Physics (printed page 83), which refers to some of Salsburg’s research.  Maybe that makes it clearer whether he was doing what you’d call MCMC today.  (I’m not sure that book should be online, but it is.)  He does have works listed in a list of LASL research.

Computer-Simulationenzu Strukturen undPhasenumwandlungenin Modell-Kolloiden (in English) mentions his research in several places.

I also found a brief obituary at the bottom of the second page of https://physicstoday.scitation.org/doi/pdf/10.1063/1.3021804.  It appears that he was active in statistical mechanics and related fields, but I haven’t found anything I recognize as MCMC integration.  The best I’ve seen is stuff possibly related to the non-statistical work Michael related.

If you see a connection, great.  Otherwise, perhaps it’s a false alarm.  I may ask Melissa Kean if she’s got contacts at Rice who would know.

Ooh—bingo!?!  Scroll down a bit on https://ethw.org/Oral-History:Martin_Graham, and you’ll find Metropolis and Salsburg mentioned in the same paragraph.  The Rice Computer was a descendant of the MANIAC.  At any rate, it sounds as if Salsburg was working for Metropolis at the time (at least during the summers).  https://mobile-hi-mobiles.blogspot.com/2009/04/pressures-and-goals.html makes it clear that the R1 was not the MANIAC II.

Robert Gelman, 1923-2017

Bob Gelman, beloved husband of Jane for 67 years, proud father of Alan, Nancy, Susan, and Andy, and adoring grandparent of Stephanie, Noah, Adam, Jamie, Ben, Zacky, Jakey, and Sophie, passed away peacefully on the morning of 27 Aug 2017 at the age of 94. A child of immigrants, Bob grew up playing stickball in the streets of Brooklyn, studied physics at City College and Columbia University, taught at Champlain College in Plattsburgh, and served his country during World War II and after, when he built machines to compute missile trajectories, and later in his work at the Environmental Protection Agency. Bob was a gentle, careful man who loved life, a fiercely liberal Democrat who delighted in puns and the English language, music, tennis, and, above all, his family.

Irwin Shaw, John Updike, and Donald Trump

So. I read more by and about Irwin Shaw. I read Shaw’s end-of-career collection of short stories and his most successful novel, The Young Lions, and also the excellent biography by Michael Shnayerson. I also read Adam Begley’s recent biography of John Updike, which was also very good, and it made be sad that probably very few people actually read it. Back in the old days, a major biography of a major writer would’ve had a chance of attracting some readers.

John Updike was a master of the slice of life and also created one very memorable character in Rabbit. Irwin Shaw was known as a “storyteller” but I’m not quite sure what that means, as his stories didn’t have such memorable plots. Kinda like a composer whose music is engrossing but at the same time has no memorable tunes. The guy was no John Le Carre or Stephen King.

In his New York Times obituary, Herbert Mitgang wrote, “Stylistically, Mr. Shaw’s short stories were noted for their directness of language, the quick strokes with which he established his different characters, and a strong sense of plotting.” Well put. Quick strokes. His characters didn’t come to life, but their situations and predicaments did. In that way he had a lot in common with Updike.

One thing Shaw did have was a combination of emotional sympathy, real-world grit, and social observation. Some similarity here with John O’Hara, but O’Hara’s situations always seemed a bit more schematic to me, whereas Shaw’s characters seem to be in real situations (even if they’re not, ultimately, real characters).

Updike and Shaw had different career trajectories. Updike started at the top and stayed here. Shaw started at the top and worked his way down. OK, even at the end he was selling lots of copies, but his books weren’t getting much respect (and, at least according to his biographer, they had some strong moments but they weren’t great; I can’t bring myself to try to read these novels myself). On the other hand, I’ve tried to read a couple of Updike’s later novels and I wasn’t so impressed. From my perspective, Updike redeemed himself by writing a lot of excellent literary journalism. As they got older, both Updike and Shaw reduced their output of short stories, maintaining the high quality in both cases.

Speaking of John Updike, if he were around today I expect he’d’ve had something to say about those rural Pennsylvanians who voted for Donald Trump. Being a rural Pennsylvanian. And John O’Hara, as a Pennsylvanian, and Roman Catholic, and an all-around resentful person: he would’ve had something to say about Trump voters from all those groups. Then we could bring in Lorrie Moore to explain Hillary Clinton voters to us. Hey, here it is—ok, that didn’t work: Moore doesn’t like Clinton. Hmmm, lots of people don’t like Hillary Clinton, but she did get 51% of the two-party vote. We’ll have to find some expert to explain those voters to us.

Wolfram on Golomb

I was checking out Stephen Wolfram’s blog and found this excellent obituary of Solomon Golomb, the mathematician who invented the maximum-length linear-feedback shift register sequence, characterized by Wolfram as “probably the single most-used mathematical algorithm idea in history.” But Golomb is famous to me, and to other readers of Martin Gardner, for inventing polyominoes.

The whole thing’s a good read, and it even includes this cool nonperiodic tiling from Wolfram’s 2002 book:

There’s also some interesting stuff on cellular automata, itself a fascinating topic. Wolfram should hire someone to prove some theorems about it!

P.S. Wolfram’s blog has lots of good stuff. In fact, I just added it to the blogroll! For example, here’s a long post from a few months ago on cellular automata and physics. It’s a funny thing, though: Wolfram seems to have an extreme aversion to talking about his collaborators. With Wolfram, it’s all through the day, I me mine, I me mine, I me mine. Don’t get me wrong, I like to talk about myself too. But science as I experience it is soooo collaborative, it’s hard for me to imagine being in Wolfram’s situation: he has all the resources in the world but he works all on his own. So lonely. On one hand, he has these interesting ideas that he wants to share with the world, with complete strangers on his blog. On the other hand, he doesn’t seem to be able to collaborate with people directly. In literature, this would not be surprising—we don’t demand or even expect that Matthew Klam, Francis Spufford, Alison Bechdel, etc., find collaborators—but in science it seems like a mistake to work alone. Then again, what do I know. Andrew Wiles didn’t seem to require a research team or even a research partner.

Steve Fienberg

I did not know Steve Fienberg well, but I met him several times and encountered his work on various occasions, which makes sense considering his research area was statistical modeling as applied to social science.

Fienberg’s most influential work must have been his books on the analysis of categorical data, work that was ahead of its time in being focused on the connection between models rather than hypothesis tests. He also wrote, with William Mason, the definitive paper on identification in age-period-cohort models, and he worked on lots of applied problems including census adjustment, disclosure limitation, and statistics in legal settings. The common theme in all this work is the combination of information from multiple sources, and the challenges involved in taking statistical inferences using these to make decisions in new settings. These ideas of integration and partial pooling are central to Bayesian data analysis, and so it makes sense that Fienberg made use of Bayesian methods throughout his career, and that he was a strong presence in the Carnegie Mellon statistics department, which has been one of the important foci of Bayesian research and education during the past few decades.

Fienberg’s CMU obituary quotes statistician and former Census Bureau director Bob Groves as saying,

Steve Fienberg’s career has no analogue in my [Groves’s] lifetime. . . . He contributed to advancements in theoretical statistics while at the same time nurturing the application of statistics in fields as diverse as forensic science, cognitive psychology, and the law. He was uniquely effective in his career because he reached out to others, respected them for their expertise, and perceptively saw connections among knowledge domains when others couldn’t see them. He thus contributed both to the field of statistics and to the broader human understanding of the world.

I’d say it slightly differently. I disagree that Fienberg’s career is unique in the way that Groves states. Others of Fienberg’s generation such as Don Rubin and Nan Laird have similarly made important theoretical or methodological contributions while also actively working on a broad variety of live applications. One can also point to researchers such as James Heckman and Lewis Sheiner who have come from outside to make important contributions to statistics while also doing important work in their own fields. And, to go to the next generation, I can for example point to my collaborators John Carlin and David Dunson, both of whom have had deep statistical insights while also contributing to the reform and development of their fields of application.

But please don’t take my qualification of Groves’s statement to be a criticism of Fienberg. Rather consider it as a plus. Fienberg is a model of an important way to be a statistician: to be someone deeply engaged with a variety of applied projects while at the same time making fundamental contributions to the core of statistics. Or, to put it another way, to work on statistical theory and methodology in the context of a deep engagement with a wide range of applications.

Lionel Trilling famously wrote this about George Orwell:

Orwell, by reason of the quality that permits us to say of him that he was a virtuous man, is a figure in our lives. He was not a genius, and this is one of the remarkable things about him. His not being a genius is an element of the quality that makes him what I am calling a figure. . . . if we ask what it is he stands for, what he is the figure of, the answer is: the virtue of not being a genius, of fronting the world with nothing more than one’s simple, direct, undeceived intelligence, and a respect for the powers one does have, and the work one undertakes to do. . . . what a relief! What an encouragement. For he communicates to us the sense that what he has done any one of us could do.

Or could do if we but made up our mind to do it, if we but surrendered a little of the cant that comforts us, if for a few weeks we paid no attention to the little group with which we habitually exchange opinions, if we took our chance of being wrong or inadequate, if we looked at things simply and directly, having only in mind our intention of finding out what they really are . . . He tells us that we can understand our political and social life merely by looking around us, he frees us from the need for the inside dope.

George Orwell is one of my heroes. I am not saying that Steve Fienberg is the George Orwell of statistics, whatever that would mean. What I do think is that the above traits identified by Trilling are related to what I admire most about Fienberg, and this is why I think it’s a fine accomplishment indeed for Fienberg to have not been a unique example of a statistician contributing both to theory and applications but an exemplar of this type. Laplace, Galton, and Fisher also fall in this category but none of us today can hope to match the scale of their contributions. Fienberg through his efforts changed the world in some small bit, as we all should hope to do.