Progress in 2023, Leo edition

Posted on January 23, 2024 6:00 PM by Leonardo Egidi

Following Andrew, Aki, Jessica, and Charles, and based on Andrew’s proposal, I list my research contributions for 2023.

Published:

Egidi, L. (2023). Seconder of the vote of thanks to Narayanan, Kosmidis, and Dellaportas and contribution to the Discussion of ‘Flexible marked spatio-temporal point processes with applications to event sequences from association football’. Journal of the Royal Statistical Society Series C: Applied Statistics, 72(5), 1129.
Marzi, G., Balzano, M., Egidi, L., & Magrini, A. (2023). CLC Estimator: a tool for latent construct estimation via congeneric approaches in survey research. Multivariate Behavioral Research, 58(6), 1160-1164.
Egidi, L., Pauli, F., Torelli, N., & Zaccarin, S. (2023). Clustering spatial networks through latent mixture models. Journal of the Royal Statistical Society Series A: Statistics in Society, 186(1), 137-156.
Egidi, L., & Ntzoufras, I. (2023). Predictive Bayes factors. In SEAS IN. Book of short papers 2023 (pp. 929-934). Pearson.
Macrì Demartino, R., Egidi, L., & Torelli, N. (2023). Power priors elicitation through Bayes factors. In SEAS IN. Book of short papers 2023 (pp. 923-928). Pearson.

Preprints:

Consonni, G., & Egidi, L. (2023). Assessing replication success via skeptical mixture priors. arXiv preprint arXiv:2401.00257. Submitted.

Softwares:

CLC estimator

free and open-source app to estimate latent unidimensional constructs via congeneric approaches in survey research (Marzi et al., 2023)

footBayes package (CRAN version 0.2.0)

diagonal inflated bivariate Poisson model (Karlis and Ntzoufras, 2003)
zero-inflated Skellam model (Karlis and Ntzoufras, 2009)

pivmet package (CRAN version 0.5.0)

sparse finite mixtures implementation (Fruhwirt-Schnatter and Malsiner-Walli, 2019)
Stan implementation

I hope and guess that the paper dealing with the replication crisis, “Assessing replication success via skeptical mixture priors” with Guido Consonni, could have good potential in the Bayesian assesment of replication success in social and hard sciences; this paper can be seen as an extension of the paper written by Leonhard Held and Samuel Pawel entitled “The Sceptical Bayes Factor for the Assessment of Replication Success“. Moreover, I am glad that the paper “Clustering spatial networks through latent mixture models“, focused on a model-based clustering approach defined in a hybrid latent space, has been finally published in JRSS A.

Regarding softwares, the footBayes package, a tool to fit the most well-known soccer (football) models through Stan and maximum likelihood methods, has been deeply developed and enriched with new functionalities (2024 objective: incorporate CmdStan with VI/Pathfinder algorithms and write a package’s paper in JSS/R Journal format).

Here’s how to subscribe to our new weekly newsletter:

Posted on January 23, 2024 4:40 PM by Andrew

Just a reminder: we have a new weekly newsletter. We posted on it a couple weeks ago; I’m just giving a reminder here because the goal of the newsletter is to reach people who wouldn’t otherwise go online to read the blog.

Subscribing is free, and then in your inbox each Monday morning you’ll get a list of our scheduled posts for the forthcoming week, along with links to the past week’s posts. Enjoy.

P.S. To subscribe, click on the link and follow the instructions from there.

Learning from mistakes (my online talk for the American Statistical Association, 2:30pm Tues 30 Jan 2024)

Posted on January 23, 2024 9:25 AM by Andrew

Here’s the link:

Learning from mistakes

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

We learn so much from mistakes! How can we structure our workflow so that we can learn from mistakes more effectively? I will discuss a bunch of examples where I have learned from mistakes, including data problems, coding mishaps, errors in mathematics, and conceptual errors in theory and applications. I will also discuss situations where researchers have avoided good learning opportunities. We can then try to use all these cases to develop some general understanding of how and when we learn from errors in the context of the fractal nature of scientific revolutions.

The video is here.

It’s sooooo frustrating when people get things wrong, the mistake is explained to them, and they still don’t make the correction or take the opportunity to learn from their mistakes.

To put it another way . . . when you find out you made a mistake, you learn three things:

1. Now: Your original statement was wrong.

2. Implications for the future: Beliefs and actions that flow from that original statement may be wrong. You should investigate your reasoning going forward and adjust to account for your error.

3. Implications for the past: Something in your existing workflow led to your error. You should trace your workflow, see how that happened, and alter your workflow accordingly.

In poker, they say to evaluate the strategy, not the play. In quality control, they say to evaluate the process, not the individual outcome. Similarly with workflow.

As we’ve discussed many many times in this space (for example, here), it makes me want to screeeeeeeeeeam when people forego this opportunity to learn. Why do people, sometimes very accomplished people, give up this opportunity? I’m speaking here of people who are trying their best, not hacks and self-promoters.

The simple answer for why even honest people will avoid admitting clear mistakes is that it’s embarrassing for them to admit error, they don’t want to lose face.

The longer answer, I’m afraid, is that at some level they recognize issues 1, 2, and 3 above, and they go to some effort to avoid confronting item 1 because they really really don’t want to face item 2 (their beliefs and actions might be affected, and they don’t want to hear that!) and item 3 (they might be going about everything all wrong, and they don’t want to hear that either!).

So, paradoxically, the very benefits of learning from error are scary enough to some people that they’ll deny or bury their own mistakes. Again, I’m speaking here of otherwise-sincere people, not of people who are willing to lie to protect their investment or make some political point or whatever.

In my talk, I’ll focus on my own mistakes, not those of others. My goal is for you in the audience to learn how to improve your own workflow so you can catch errors faster and learn more from them, in all three senses listed above.

P.S. Planning a talk can be good for my research workflow. I’ll get invited to speak somewhere, then I’ll write a title and abstract that seems like it should work for that audience, then the existence of this structure gives me a chance to think about what to say. For example, I’d never quite thought of the three ways of learning from error until writing this post, which in turn was motivated by the talk coming up. I like this framework. I’m not claiming it’s new—I guess it’s in Pólya somewhere—, just that it will help my workflow. Here’s another recent example of how the act of preparing an abstract helped me think about a topic of continuing interest to me.

Intro to BridgeStan: The new in-memory interface for Stan

Posted on January 22, 2024 3:07 PM by Eric Novik

This is Eric.

Brian Parbhu took over the Bayesian Data Analysis meetup (formerly NYC Stan Users Group), and he is running an event in NYC this Friday, during which Brian Ward is going to discuss BridgeStan, the new in-memory interface for Stan. You can register here.

P.S. This is the presentation Brian gave at the Meetup.

“My view is that if I can show that a result was cooked and that doing it correctly does not yield the answer the authors claimed, then the result is discredited. . . . What I hear, instead, is the following . . .”

Posted on January 22, 2024 9:16 AM by Andrew

Economic historian Tim Guinnane writes:

I have a general question that I have not seen addressed on your blog. Often this question turns into a narrow question about retracting papers, but I think that short-circuits an important discussion.

Like many in economic history, I am increasingly worried that much research in recent years reflects p-hacking, misrepresentation of the history, useless data, and other issues. I realize that the technical/statistical issues differ from paper to paper.

What I see is something like the following. You can use this paper as a concrete example, but the problems are much more widespread. We document a series of bad research practices. The authors played games with controls to get the “right” answer for the variable of interest. (See Table 1 of the paper). In the text they misrepresent the definitions of variables used in regressions; we show that if you use the stated definition, their results disappear. They use the wrong degrees of freedom to compute error bounds (in this case, they had to program the bounds by hand, since stata automatically uses the right df). There are other and to our minds more serious problems involved in selectively dropping data, claiming sources do not exist, etc.

Step back from any particular problem. How should the profession think about claims such as ours? My view is that if I can show that a result was cooked and that doing it correctly does not yield the answer the authors claimed, then the result is discredited. The journals may not want to retract such work, but there should be support for publishing articles that point out such problems.

What I hear, instead, is the following. A paper estimates beta as .05 with a given SE. Even if we show that this is cooked—that is, that beta is a lot smaller or the SE a lot larger if you do not throw in extraneous regressors, or play games with variable definitions—then ours is not really a result. It is instead, I am told, incumbent on the critic to start with beta=.05 as the null, and show that doing things correctly rejects that null in favor of something less than .05 (it is characteristic of most of this work that there really is no economic theory, so the null is always “X does not matter” which boils down to “this beta is zero.” And very few even tell us whether the correct test is one- or two-sided).

This pushback strikes me as weaponizing the idea of frequentist hypothesis testing. To my mind, if I can show that beta=.05 comes from a cooked regression, then we need to start over. That estimate can be ignored; it is just one of many incorrect estimates one can generated by doing things inappropriately. It actually gives the unscrupulous an incentive to concoct more outlandish betas which are then harder to reject. More generally, it puts a strange burden of proof on critics. I have discussed this issue with some folks in natural sciences who find the pushback extremely difficult to understand. They note what I think is the truth: it encourages bad research behavior by suppressing papers that demonstrate that bad behavior.

It might be opportune to have a general discussion of these sorts of issues on your website. The Gino case raises something much simpler, I think. I fear that it will in some ways lower the bar: so long as someone is not actively making up their data (which I realize has not been proven, in case this email gets subpoenaed!) then we do not need to worry about cooking results.

My reply: You raise several issues that we’e discussed on occasion (for some links, see here):

1. The “Research Incumbency Rule”: Once an article is published in some approved venue, it is taken as truth. Criticisms which would absolutely derail a submission in pre-publication review can be brushed aside if they are presented after publication. This is what you call “the burden of proof on critics.”

2. Garden of forking paths.

3. Honesty and transparency are not enough. Work can be non-fraudulent but still be crap.

4. “Passive corruption” when people know there’s bad work but they don’t do anything about it.

5. A disturbingly casual attitude toward measurement; see here for an example: https://statmodeling.stat.columbia.edu/2023/10/05/no-this-paper-on-strip-clubs-and-sex-crimes-was-never-gonna-get-retracted-also-a-reminder-of-the-importance-of-data-quality-and-a-reflection-on-why-researchers-often-think-its-just-fine-to-publ/ Many economists and others seem to have been brainwashed into thinking that it’s ok to have bad measurement because attenuation bla bla . . . They’re wrong.

He responded: If you want an example of economists using stunningly bad data and making noises about attenuation, see here.

The paper in question has the straightforward title, “We Do Not Know the Population of Every Country in the World for the Past Two Thousand Years.”

Michael Wiebe has several new replications written up on his site.

Posted on January 21, 2024 9:02 PM by Andrew

Michael Wiebe writes:

I have several new replications written up on my site.

Moretti (2021) studies whether larger cities drive more innovation, but I find that the event study and instrumental variable results are due to coding errors. This means that the main OLS results should not be interpreted causally.

Atwood (2022) studies the long-term economic effects of the measles vaccine. I run an event study and find that the results are explained by trends, instead of a treatment effect of the vaccine.

I [Wiebe] am also launching a Patreon, so that I can work on replications full-time.

Interesting. We’ve discussed some of Wiebe’s investigations and questions in the past; see here, here, here, and here (on the topics of promotion in China, election forecasting, historical patents, and forking paths, respectively). So, good to hear that he’s still at it!

What’s the problem, “math snobs” or rich dudes who take themselves too seriously and are enabled in that by the news media?

Posted on January 21, 2024 9:00 AM by Andrew

Chris Barker, the chair of the Statistical Consulting Section of the American Statistical Association, writes:

I’m curious about your reaction/opinion to a Financial times article I read today about Sam Bankman-Fried (“SBF,” charged with fraud in the loss several billion of Crypto) with pointless insults about mathematicians (“mathematical chauvinists,” “math snobs,” “mental arithmetic,” and what seems to be a claim that “math snob” caused to a decline in the UK economy). And disparagement at the potential use or abuse of Bayesian statistics. From the Financial Times article:

We must leave it to the criminal courts to decide the future of Sam Bankman-Fried. He denies the various charges against him. For now, I am less concerned with his specific doings than with his worldview, which is a sort of mathematical chauvinism. A theme in Michael Lewis’s new book about “SBF” is the subject’s mistrust of what cannot be quantified. Shakespeare’s supposed primacy in literature, for example. “What are the odds that the greatest writer was born in 1564?” SBF is quoted as asking, citing the billions of people who have been born since then, and the higher share of them who are educated. These are his “Bayesian priors”. I hope to never encounter a starker case of abstract reasoning getting in the way of practical observation.

He is, if nothing else, of his time. A year ago this weekend, Liz Truss, a maths snob who assailed colleagues with mental arithmetic questions, fell as UK premier, almost taking the economy with her. If we consider, too, the dark, Kremlin-partial end of finance bro politics, these are the most embarrassing times for maths chauvinists since Robert McNamara, who even looked geometric and dug America ever deeper into the pit of Vietnam on the back of data.

I defer to the lexicographers or the relevant experts as to the whether this is the first, or if not, for how long these insults against mathematicians and statisticians have appeared in the media.

I replied that: Yes, the Financial Times article seems pretty bad to me, indeed just seems too stupid to deserve a response! Is the author of the column well-respected in Britain? I did some googling and he seems just like a generic hack political columnist.

Barker replied:

The demonization of math and statistics was disappointing. There is no way to ever know what the editors were thinking by permitting publication of that article, nor do I particularly care. In other areas of the internet that article might simply be called “click bait.”

He also points to this quote from “SBF”:

I could go on and on about the failings of Shakespeare and the constitution and Stradivarius violins, and at the bottom of this post I do, but really I shouldn’t need to: the Bayesian priors are pretty damning. About half of the people born since 1600 have been born in the past 100 years, but it gets much worse than that. When Shakespeare wrote almost all of Europeans were busy farming, and very few people attended university; few people were even literate—probably as low as about ten million people. By contrast there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564? The Bayesian priors aren’t very favorable.

I agree with everyone else that this is represents a misapplication of Bayesian methods but for a kind of subtle reason. The numerator/denominator thing is ok; the real problem is in the premise, which is that there’s something called “the greatest writer.” Was William Shakespeare a greater writer than Veronica Geng? How could you even answer such a question? And, mathematically, you can only apply Bayesian inference to a problem that is well defined.

The general problem to me is not SBF’s asinine “Bayesian priors” quote—if it wasn’t that, he’d be wielding other mysterious power phrases such as “the subconscious” or “comparative advantage” or “quantum leap” or “inflection point” or whatever—, but rather the well-known phenomenon of rich people thinking they know what they’re talking about when they’re actually just making things up in some nonsensical way.

P.S. But, yeah, there is a history of stupid arguments being made with a Bayesian connection.

Regarding the use of “common sense” when evaluating research claims

Posted on January 20, 2024 9:38 AM by Andrew

I’ve often appealed to “common sense” or “face validity” when considering unusual research claims. For example, the statement that single women during certain times of the month were 20 percentage points more likely to support Barack Obama, or the claim that losing an election for governor increases politicians’ lifespan by 5-10 years on average, or the claim that a subliminal smiley face flashed on a computer screen causes large changes in people’s attitudes on immigration, or the claim that attractive parents are 36% more likely to have girl babies . . . these claims violated common sense. Or, to put it another way, they violated my general understanding of voting, health, political attitudes, and human reproduction.

I often appeal to common sense, but that doesn’t mean that I think common sense is always correct or that we should defer to common sense. Rather, common sense represents some approximation of a prior distribution or existing model of the world. When our inferences contradict our expectations, that is noteworthy (in a chapter 6 of BDA sort of way), and we want to address this. It could be that addressing this will result in a revision of “common sense.” That’s fine, but if we do decide that our common sense was mistaken, I think we should make that statement explicitly. What bothers me is when people report findings that contradict common sense and don’t address the revision in understanding that would be required to accept that.

In each of the above-cited examples (all discussed at various times on this blog), there was a much more convincing alternative explanation for the claimed results, given some mixture of statistical errors and selection bias (p-hacking or forking paths). That’s not to say the claims are wrong (Who knows?? All things are possible!), but it does tell us that we don’t need to abandon our prior understanding of these things. If we want to abandon our earlier common-sense views, that would be a choice to be made, an affirmative statement that those earlier views are held so weakly that they can be toppled by little if any statistical evidence.

P.S. Perhaps relevant is this recent article by Mark Whiting and Duncan Watts, “A framework for quantifying individual and collective common sense.”

Progress in 2023, Charles edition

Posted on January 19, 2024 3:00 PM by Charles Margossian

Following the examples of Andrew, Aki, and Jessica, and at Andrew’s request:

Published:

Variational Inference with Gaussian Score Matching. Neural Information Processing Systems.
(Chirag Modi, CM, Yuling Yao, Robert Gower, David Blei and Lawrence Saul)
The Shrinkage-Delinkage Trade-off: An Analysis of Factorized Gaussian Approximations for Variational Inference. Uncertainty in Artificial Intelligence (Oral).
(CM and Lawrence Saul)
Adaptive Tuning for Metropolis Adjusted Langevin Trajectories. Artificial Intelligence and Statistics.
(Lionel Riou-Durand, Pavel Sountsov, Jure Vogrinc, CM and Sam Power)

Unpublished:

For how many iterations should we run Markov chain Monte Carlo?
(CM and Andrew Gelman)
Amortized Variational Inference: When and Why?
(CM and Dave Blei)
Nested Rhat: Assessing the convergence of Markov chain Monte Carlo when running many short chains. (Revised).
(CM, Matthew Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari and Andrew Gelman)
General Adjoint-Differentiated Laplace approximation. Technical Report.
(CM)

This year, I also served on the Stan Governing Body, where my primary role was to help bring back the in-person StanCon. StanCon 2023 took place at the University of Washington in St. Louis, MO and we got the ball rolling for the 2024 edition which will be held at Oxford University in the UK.

It was also my privilege to be invited as an instructor at the Summer School on Advanced Bayesian Methods at KU Leuven, Belgium and teach a 3-day course on Stan and Torsten, as well as teach workshops at StanCon 2023 and at the University of Buffalo.

The free will to repost

Posted on January 19, 2024 9:07 AM by Andrew

Jonathan “no Trump” Falk points to this press release and writes:

Scientist, after decades of study, concludes: We don’t have free will. Does that include the decision to write a book about free will?

PS … A quick mention of the replication crisis to silence some doubters.

PPS: I couldn’t help sending this to you.

I replied with a link to this post from last year, which in turn linked to a post by Kevin Mitchell, who wrote:

Gotta hand it to Sapolsky here . . . it’s quite ballsy to uber-confidently assert we do not have “the slightest scrap of agency” and then support that with one discredited social psych study after another…

Postdoc at Washington State University on law-enforcement statistics

Posted on January 18, 2024 5:49 PM by Andrew

This looks potentially important:

The Center for Interdisciplinary Statistical Education and Research (CISER) at Washington State University (WSU) is excited to announce that it has an opening for a Post-Doctoral Research Associate (statistical scientist) supporting a new state-wide public data project focused on law enforcement. The successful candidate will be part of a team of researchers whose mission is to modernize public safety data collection through standardization, automation, and evaluation. The project will actively involve law enforcement agencies, state and local policymakers, researchers, and the public in data exploration and discovery. This effort will be accomplished in part by offering education and training opportunities fostering community-focused policing and collaborative learning sessions. The statistical scientist in this role will develop comprehensive educational materials, workshops, online courses, and training manuals designed to equip and empower law enforcement agencies, state and local policymakers, researchers, and the public with data and statistical literacy skills that will enable them to maximize the utility of the data project.

Data, education, and policy. Interesting.

Storytelling and Scientific Understanding (my talks with Thomas Basbøll at Johns Hopkins on 26 Apr)

Posted on January 18, 2024 9:28 AM by Andrew

Storytelling and Scientific Understanding

Andrew Gelman and Thomas Basbøll

Storytelling is central to science, not just as a tool for broadcasting scientific findings to the outside world, but also as a way that we as scientists understand and evaluate theories. We argue that, for this purpose, a story should be anomalous and immutable; that is, it should be surprising, representing some aspect of reality that is not well explained by existing models of the world, and have details that stand up to scrutiny.

We consider how this idea illuminates some famous stories in social science involving soldiers in the Alps, Chinese boatmen, and trench warfare, and we show how it helps answer literary puzzles such as why Dickens had all those coincidences, why authors are often so surprised by what their characters come up with, and why the best alternative history stories have the feature that, in these stories, our “real world” ends up as the deeper truth. We also discuss connections to chatbots and human reasoning, stylized facts and puzzles in science, and the millionth digit of pi.

At the center our framework is a paradox: learning from anomalies seems to contradict usual principles of science and statistics where we seek representative or unbiased samples. We resolve this paradox by placing learning-within-stories into a hypothetico-deductive (Popperian) framework, in which storytelling is a form of exploration of the implications of a hypothesis. This has direct implications for our work as a statistician and a writing coach.

Progress in 2023, Aki’s software edition

Posted on January 18, 2024 6:05 AM by Aki Vehtari

Andrew, I, and Jessica (and I hope we get more) listed papers for progress in 2023, but many papers would be much less useful without software, so I list also software I’m contributing to with the most interesting improvements added in 2023 (in addition there is always huge amount of work that improves the software somehow, but is not that visible)

Stan (including Stan math + Stan core + Stanc + CmdStan)

Laplace approximation (see a case study)
Jacobian adjustment for optimization (see a case study)
Tuples and tuple versions of functions
Pathfinder (Zhang et al., 2022)

posterior R package

support for discrete draws
summarise_draws accepts tibble::num arguments
nested-Rhat for many-short-chains (Margossian et al., 2022)
in github:
- Pareto-diagnostics and -smoothing (Vehtari et al., 2022)
- automatic thinning (Säilynoja et al., 2022)

loo R package

loo_predictive_metric (MAE, MSE, RMSE, ACC, BACC)
(scaled) continuously ranked probability score (Bolin and Wallin, 2022)
Mixture IS leave-one-out cross-validation vignette (Silva and Zanella, 2022),
in github:
- order statistic warning (McLatchie and Vehtari, 2023)

projpred R package

augmented-data projection to add support for more model families (Weber et al., 2023)
latent projection to add support for more model families (Catalina et al., 2021)
enhanced verbose output
improved user interface and summary tables

P.S. I was surprised that there were no major updates to priorsense R package or bayesplot R package in 2023, but there are some great usability improvements coming soon to both of these.

Bad stuff going down at the American Sociological Association

Posted on January 17, 2024 9:01 AM by Andrew

I knew the Association for Psychological Science, the American Psychological Association, the American Political Science Association, the American Statistical Association, and the National Academy of Sciences had problems. It turns out the American Sociological Association does some bad things too.

Philip Cohen has the story. It starts back in 2019, when the American Sociological Association, along with “many other paywall-dependent academic societies” (in Cohen’s words) sent an open letter to the president to oppose open science. Here’s Cohen:

At the time, there was a rumor that OSTP [the U.S. Office of Science and Technology Policy] would require agencies to make public the results of research funded by the federal government without a 12-month delay — the cherished “embargo” that allowed these associations to profit from delaying access to public knowledge . . .

They wrote: “We are writing to express our concerns about a possible change in federal policies that could significantly threaten a vibrant American scientific enterprise.” That is, by requiring free access to research, OSTP would threaten the “financial stability that enables us to support peer review that ensures the quality and integrity of the research enterprise.” If ASA lost their journal subscription profits, in other words, American science would die. “To take action to shorten the 12-month embargo… risks the continued international leadership for the U.S. scientific enterprise.”

Uh huh. I agree with Cohen that this is some combination of ridiculous and offensive. He continues:

Despite a petition signed by many ASA members, and a resolution from its own Committee on Publications “to express opposition to the decision by the ASA to sign the December 18, 2019 letter” — which the ASA leadership never even publicly acknowledged — ASA has not uttered a word to alter its anachronistic and unpopular position.

It’s starting to make me wonder if academic cartels sometimes act like . . . cartels?

Just to be clear, this does not seem to be a problem with academic sociology as a profession. As Cohen notes, the ASA’s own Committee on Publications opposed the ASA’s horrible recommendation to keep science closed.

Putting it all into perspective

We live in a world where political leaders start wars, companies and governments dump toxic waste, church leaders cover up child abuse, etc. In comparison, universities and academic societies faking statistics, rewarding plagiarism and other scientific misconduct, restricting data, and otherwise mucking up the process of scholarly inquiry . . . that barely registers on the scale of institutionalized evil.

So what is it that’s so irritating about academic institutions behaving badly? I can think of a few things:

1. I work in academia so I’m made aware of these issues and feel some bit of collective responsibility for them.

2. Academia is more open than much of business, government, and organized religion, so it’s easier for us to see the problems.

3. So much of the enabling of cheating in academia just seems so pointless. It’s not cool when companies pollute, but, hey, you can see the reason$ they’ll want to do so. But what does the American Sociology Association get out of fighting against open science, what does the University of California get out of tolerating research misconduct, what do the American Statistical Association and American Political Science Association get out of giving rewards for plagiarists? Nothing. That’s what so damn pitiful.

When Lysenko did his part to destroy Soviet agriculture, at least he personally got something out of it. These American Sociology Association etc dudes, they get nothing.

It’s really pitiful, when you think about it. These people aren’t evil, they’re pathetic.

Progress in 2023, Jessica Edition

Posted on January 16, 2024 2:10 PM by Jessica Hullman

Since Aki and Andrew are doing it…

Published:

Dongping Zhang, Jason Hartline, and Jessica Hullman (2024). Designing Shared Information Displays for Agents of Varying Strategic Sophistication. ACM Transactions on Computer-Supported Cooperative Work (CSCW).
Yifan Wu, Ziyang Guo, Michalis Mamakos, Jason Hartline, and Jessica Hullman (2023). The rational agent benchmark for data visualization. IEEE Transactions on Visualization & Computer Graphics (Proc. VIS ‘23).
Alex Kale, Ziyang Guo, Xiao Li Qiao, Jeff Heer, Jessica Hullman (2023). EVM: Incorporating model checking into exploratory visual analysis. IEEE Transactions on Visualization & Computer Graphics (Proc. VIS ‘23).
Hari Subramonyam and Jessica Hullman (2023). Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research. IEEE Transactions on Visualization & Computer Graphics (Proc. VIS ‘23).
Hyeok Kim, Ryan Rossi, Jessica Hullman, and Jane Hoffswell (2023). Dupo: A Mixed-Initiative Authoring Tool for Responsive Visualization. IEEE Transactions on Visualization & Computer Graphics (Proc. VIS ‘23).
Fumeng Yang, Mandi Cai, Chloe Mortenson, Hoda Fakhari, Ayse Deniz Lokmanoglu, Jessica Hullman, Steven Franconeri, Nicholas Diakopoulos, Erik Nisbet, and Matthew Kay. Swaying the Public? Impacts of Election Forecast Visualizations on Emotion, Trust, and Intention in the 2022 US Midterms. IEEE Transactions on Visualization & Computer Graphics (Proc. VIS ‘23).
Andrew Gelman, Jessica Hullman, and Lauren Kennedy (2023). Causal quartets: Different ways to attain the same average treatment effect. American Statistician.
Priyanka Nanayakkara and Jessica Hullman (2023). What’s driving conflicts around differential privacy for the US Census. IEEE Security & Privacy.
Alex Kale, Sarah Lee, T.J. Goan, Beth Tipton, and Jessica Hullman (2023). MetaExplorer: Facilitating Reasoning with Epistemic Uncertainty in Meta-analysis. ACM Transactions on Computer-Human Interaction (Proc. CHI ‘23).
Abhraneel Sarma, Alex Kale, Michael Jongho Moon, Nathan Taback, Fanny Chevalier, Jessica Hullman, and Matthew Kay (2023). multiverse: Multiplexing alternative data analyses in R notebooks. ACM Transactions on Computer-Human Interaction (Proc. CHI ‘23).

Unpublished/Preprints:

Jake Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan Watts, and Jessica Hullman. Pre-registration for Predictive Modeling.
Jessica Hullman, Ari Holtzman, and Andrew Gelman. Artificial Intelligence and Aesthetic Judgment.
Sayash Kapoor, Emily Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Malik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, and Arvind Narayanan. Reforms: Reporting standards for machine learning based science.
Jessica Hullman. Some problems with zooming out as science reform (commentary on Almaatouq et al.).

Performed:

Andrew Gelman and Jessica Hullman. Recursion (a play). Performed by a cast of Northwestern University Theater students at ACM Conference on Fairness, Accountability, and Transparency in Artificial Intelligence (FAccT ’23). Chicago, IL.

If I had to choose a favorite (beyond the play, of course) it would be the rational agent benchmark paper, discussed here. But I also really like the causal quartets paper. The first aims to increase what we learn from experiments in empirical visualization and HCI through comparison to decision-theoretic benchmarks. The second aims to get people to think twice about what they’ve learned from an average treatment effect. Both have influenced what I’ve worked on since.

This post is not really about Aristotle.

Posted on January 16, 2024 9:56 AM by Andrew

I just read this magazine article by Nikhil Krishnan on the philosophy of Aristotle. As a former physics student, I’ve never had anything but disdain for that ancient philosopher, who’s famous for getting just about everything in physics wrong, as well as notoriously claiming that men had more teeth than women (or maybe it was the other way around). Dude just liked to make confident pronouncements. He also thought slavery was cool.

I guess that if Aristotle were around today, he’d be a long-form blogger.

That all said, Krishnan’s article was absolutely wonderful, and I can’t wait to read his forthcoming book. And I say this even though this article didn’t convince me one bit that Aristotle is worth reading! It did convince me that Krishnan is worth reading, which I guess is more relevant right now.

I’m not gonna go through and summarize the article here—it’s short, and you can follow the above link and read it online—; instead, I’ll just point out some thoughts that it inspired in me. It’s the sign of a good work of literature that it makes us reflect.

1. Krishnan discusses AITA. One thing I’d never thought about before is the framing in terms of the asshole, the idea that in any dispute there is exactly one asshole. No room for honest misunderstandings or, from the other direction, a battle between two assholes. This kind of implies that the “asshole” trait is a property of the interaction, so that an unresolvable or unresolved conflict even between two wonderful caring people can inevitably lead to one of them becoming “the asshole.”

2. Krishnan writes:

[Aristotle] says, for instance, that people in politics who identify flourishing with honor can’t be right, for honor “seems to depend more on those who honor than on the one honored.” This has been dubbed the “Coriolanus paradox”: seekers of honor “tend to defeat themselves by making themselves dependent on those to whom they aim to be superior,” as Bernard Williams notes.

I’ve never heard of Bernard Williams, but I have heard of Coriolanus, and what really struck me about that bit is how much it reminded me of a someone I know who absolutely loves honors—he lusts after them, I’d say, to the extent that this lust has warped his life. Nothing wrong with honors as long as they don’t do that distortion. But that point about honor-seekers defeating themselves by making themselves dependent on those to whom they aim to be superior . . . That hits the nail on the head for this guy, his frustration at not being recognized by his perceived inferiors. I’d never thought about that angle before. (Please don’t try to guess this person’s name in the comments; that’s not the point here.)

3. Krishnan writes that his college instructor gave this recommendation: “How to read Aristotle? Slowly. This reminds me of the principle that God is in every leaf of every tree. In short: Sure, read Aristotle very slowly and you’ll learn a lot, in the same way that if you put any bug under a microscope and look at it carefully, or if you sit in front of a painting for a few hours and really force yourself to stare, or if you hold a long conversation with anyone, you’ll learn a lot.

4. Krishnan writes:

There is such a thing as the difference between right and wrong. But reliably telling them apart takes experience, the company of wise friends, and the good luck of having been well brought up.

Well put. It also helps not to be hungry or in pain.

Progress in 2023, Aki Edition

Posted on January 15, 2024 9:05 AM by Aki Vehtari

Following Andrew, here is my (Aki’s) list of published papers and preprints in 2023 (20% together with Andrew)

Published

Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, and Jonathan H. Huggins (2023). Robust, Automated, and Accurate Black-box Variational Inference. Journal of Machine Learning Research, accepted for publication.
arXiv preprint arXiv:2203.15945.
Alex Cooper, Dan Simpson, Lauren Kennedy, Catherine Forbes, and Aki Vehtari (2023). Cross-validatory model selection for Bayesian autoregressions with exogenous regressors. Bayesian Analysis, accepted for publication.
arXiv preprint arXiv:2301.08276.
Noa Kallioinen, Topi Paananen, Paul-Christian Bürkner, and Aki Vehtari (2023). Detecting and diagnosing prior and likelihood sensitivity with power-scaling. Statistics and Computing, 34(57).
Online
arXiv preprint arXiv:2107.14054.
Supplementary code.
priorsense: R package
Martin Modrák, Angie H. Moon, Shinyoung Kim, Paul Bürkner, Niko Huurre, Kateřina Faltejsková, Andrew Gelman, and Aki Vehtari (2023). Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. Bayesian Analysis, doi:10.1214/23-BA1404.
arXiv preprint arXiv:2211.02383.
Code
SBC R package
Erik Štrumbelj, Alexandre Bouchard-Côté, Jukka Corander, Andrew Gelman, Håvard Rue, Lawrence Murray, Henri Pesonen, Martyn Plummer, and Aki Vehtari (2023). Past, Present, and Future of Software for Bayesian Inference. Statistical Science, accepted for publication. preprint
Marta Kołczyńska, Paul-Christian Bürkner, Lauren Kennedy, and Aki Vehtari (2023). Trust in state institutions in Europe, 1989–2019. Survey Research Methods, accetped for publication.
SocArXiv preprint doi:10.31235/osf.io/3v5g7.
Juho Timonen, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, and Aki Vehtari (2023). An importance sampling approach for reliable and efficient inference in Bayesian ordinary differential equation models. Stat, doi:10.1002/sta4.614.
arXiv preprint arXiv:2205.09059.
Petrus Mikkola, Osvaldo A. Martin, Suyog Chandramouli, Marcelo Hartmann, Oriol Abril Pla, Owen Thomas, Henri Pesonen, Jukka Corander, Aki Vehtari, Samuel Kaski, Paul-Christian Bürkner, Arto Klami (2023). Prior knowledge elicitation: The past, present, and future. Bayesian Analysis, doi:10.1214/23-BA1381.
arXiv preprint arXiv:2112.01380.
Peter Mikula, Oldřich Tomášek, Dušan Romportl, Timothy K. Aikins, Jorge E. Avendaño, Bukola D. A. Braimoh-Azaki, Adams Chaskda, Will Cresswell, Susan J. Cunningham, Svein Dale, Gabriela R. Favoretto, Kelvin S. Floyd, Hayley Glover, Tomáš Grim, Dominic A. W. Henry, Tomas Holmern, Martin Hromada, Soladoye B. Iwajomo, Amanda Lilleyman, Flora J. Magige, Rowan O. Martin, Marina F. de A. Maximiano, Eric D. Nana, Emmanuel Ncube, Henry Ndaimani, Emma Nelson, Johann H. van Niekerk, Carina Pienaar, Augusto J. Piratelli, Penny Pistorius, Anna Radkovic, Chevonne Reynolds, Eivin Røskaft, Griffin K. Shanungu, Paulo R. Siqueira, Tawanda Tarakini, Nattaly Tejeiro-Mahecha, Michelle L. Thompson, Wanyoike Wamiti, Mark Wilson, Donovan R. C. Tye, Nicholas D. Tye, Aki Vehtari, Piotr Tryjanowski, Michael A. Weston, Daniel T. Blumstein, and Tomáš Albrecht (2023). Bird tolerance to humans in open tropical ecosystems. Nature Communications, 14:2146. doi:10.1038/s41467-023-37936-5.
Gabriel Riutort-Mayol, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, and Aki Vehtari (2023). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Statistics and Computing, 33(17):1-28. doi:10.1007/s11222-022-10167-2.
arXiv preprint arXiv:2004.11408.

Pre-prints

Lauren Kennedy, Aki Vehtari, and Andrew Gelman (2023). Scoring multilevel regression and poststratification based population and subpopulation estimates. arXiv preprint arXiv:2312.06334.
Alex Cooper, Aki Vehtari, Catherine Forbes, Lauren Kennedy, and Dan Simpson (2023). Bayesian cross-validation by parallel Markov chain Monte Carlo. arXiv preprint arXiv:2310.07002.
Yann McLatchie and Aki Vehtari (2023). Efficient estimation and correction of selection-induced bias with order statistics. arXiv preprint arXiv:2309.03742.
Yann McLatchie, Sölvi Rögnvaldsson, Frank Weber, and Aki Vehtari (2023). Robust and efficient projection predictive inference. arXiv preprint arXiv:2306.15581.
Frank Weber, Änne Glass, and Aki Vehtari (2023). Projection predictive variable selection for discrete response families with finite support. arXiv preprint arXiv:2301.01660.

jd asked Andrew “which paper from 2023 do you like best?”, and I also find it difficult to choose one. I highlight two papers, but I’m proud of all of them!

“Detecting and diagnosing prior and likelihood sensitivity with power-scaling” is based on an idea that had been on my todo list for a very long time, and seeing that it works so well and can have practical software implementation was really nice.

In “Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming” we didn’t come up with a new GP approximation, but we were able to develop simple diagnostics to tell whether we have enough basis functions. I just love when diagnostics can answer frequently asked questions like “How do I choose the number of basis functions?”

A feedback loop can destroy correlation: This idea comes up in many places.

Posted on January 15, 2024 9:02 AM by Andrew

The people who go by “Slime Mold Time Mold” write:

Some people have noted that not only does correlation not imply causality, no correlation also doesn’t imply no causality. Two variables can be causally linked without having an observable correlation. Two examples of people noting this previously are Nick Rowe offering the example of Milton Friedman’s thermostat and Scott Cunningham’s Do Not Confuse Correlation with Causality chapter in Causal Inference: The Mixtape.

We realized that this should be true for any control system or negative feedback loop. As long as the control of a variable is sufficiently effective, that variable won’t be correlated with the variables causally prior to it. We wrote a short blog post exploring this idea if you want to take a closer look. It appears to us that in any sufficiently effective control system, causally linked variables won’t be correlated. This puts some limitations on using correlational techniques to study anything that involves control systems, like the economy, or the human body. The stronger version of this observation, that the only case where causally linked variables aren’t correlated is when they are linked together as part of a control system, may also be true.

Our question for you is, has anyone else made this observation? Is it recognized within statistics? (Maybe this is all implied by Peston’s 1972 “The Correlation between Targets and Instruments”? But that paper seems totally focused on economics and has only 14 citations. And the two examples we give above are both economists.) If not, is it worth trying to give this some kind of formal treatment or taking other steps to bring this to people’s attention, and if so, what would those steps look like?

My response: Yes, this has come up before. It’s a subtle point, as can be seen in some of the confused comments to this post. In that example, the person who brought up the feedback-destroys-correlation example was economist Rachael Meager, and it was a psychologist, a law professor, and some dude who describes himself as “a professor, writer and keynote speaker specializing in the quality of strategic thinking and the design of decision processes” who missed the point. So it’s interesting that you brought up an example of feedback from the economics literature.

Also, as I like to say, correlation does not even imply correlation.

The point you are making about feedback is related to the idea that, at equilibrium in an idealized setting, price elasticity of demand should be -1, because if it’s higher or lower than that, it would make sense to alter the price accordingly and slide up or down that curve to maximize total $.

I’m not up on all this literature; it’s the kind of thing that people were writing about a lot back in the 1950s related to cybernetics. It’s also related to the idea that clinical trials exist on a phase transition where the new treatment exists but has not yet been determined to be better or worse than the old. This is sometimes referred to as “equipoise,” which I consider to be a very sloppy concept.

The other thing is that everybody knows how correlations can be changed by selection (Simpson’s paradox, the example of high school grades and SAT scores among students who attend a moderately selective institution, those holes in the airplane wings, etc etc.). Knowing about one mechanism for correlations to be distorted can perhaps make people less attuned to other mechanisms such as the feedback thing.

So, yeah, a lot going on here.

“Theoretical statistics is the theory of applied statistics”: A scheduled conference on the topic

Posted on January 14, 2024 4:57 PM by Andrew

Ron Kenett writes:

We are planning a conference on 11/4 that might be of interest to your blog followers.

It is a hybrid format event on the foundations of applied statistics. Discussion inputs will be most welcome.

The registration link and other information are here.

I think that “11/4” refers to 11 Apr 2024; if not, I guess that someone will correct me in comments.

Kenett’s paper on the theory of applied statistics reminds me of my dictum that theoretical statistics is the theory of applied statistics. For example of how this principle can inform both theory and applications, see this comment at the linked post:

There are lots of ways of summarizing a statistical analysis, and it’s good to have a sense of how the assumptions map to the conclusions. My problem with the paper [on early-childhood intervention; see pages 17-18 of this paper here for background] was that they presented a point estimate of an effect size magnitude (42% earnings improvement from early childhood intervention) which, if viewed classically, is positively biased (type M error) and, if viewed Bayesianly, corresponds to a substantively implausible prior distribution in which an effect of 84% is as probable as an effect of 0%.

If we want to look at the problem classically, I think researchers who use biased estimates should (i) recognize the bias, and (ii) attempt to adjust for it. Adjusting for the bias requires some assumption about plausible effect sizes; that’s just the way things are, so make the assumption and be clear about what assumption your making.

If we want to look at the problem Bayesianly, I think researchers should have to justify all aspects of their model, including their prior distribution. Sometimes the justification is along the lines of, “This part of the model doesn’t materially impact the final conclusions so we can go sloppy here,” which can be fine, but it doesn’t apply in a case like this where the flat prior is really driving the headline estimate.

The point is that theoretical concepts such as “unbiased estimation” or “prior distribution” don’t exist in a vacuum; they are theoretically relevant to the extent that they connect to applied practice.

I assume that such issues will be discussed at the conference.

“And while I don’t really want a back-and-forth . . .”

Posted on January 14, 2024 9:18 AM by Andrew

A few months ago we had an interesting discussion about evaluation of pollsters, following up on some thoughts of Elliott Morris and Nate Silver, two analysts I respect and with whom I’ve collaborated (on separate occasions). In recent years I’ve become annoyed with Nate from time to time, but, hey, nobody’s perfect and I still think he’s generally a reasonable person.

I had a new feeling of frustration, though, when in one of his recent posts involving the pollsters, Nate wrote, “So take that as a signal that I don’t intend this a back-and-forth.” And then, more recently, in the context of a completely different dispute with someone else, Nate wrote, “And while I don’t really want a back-and-forth . . .”

I get it that everyone’s busy and you don’t have time to respond to every argument that comes across your desk, but . . . back-and-forths are good, no?

Some googling turned up this quote from 2012, which I agree with:

Silver tells TechCrunch that intelligent prediction is messy, biased, and iterative — all the characteristics that don’t lend themselves to grand pronouncements in 30-second soundbites. Blogs, instead, lend themselves to an honest back-and-forth about the sausage of statistical conclusions, which can, hopefully, create a more respected class of experts and a more informed public. . . .

The “iterative” thing is good too. In complicated problems, our methods are always flawed. So, wherever we happen to be right now, we should welcome criticism and opportunities for improvement.

This is a point that I and others have made over the years, for example:

Blogging also has the benefit that the discussion can go back and forth. In contrast, the journal reviewing process is very slow, and once an article is published, it typically just sits there. . . .

This was before twitter, which has the different problem that most of the volume of posts is people cheering or booing. See enough of that and you too will want to cut short all the back-and-forth.

What happened to Nate between 2012, when he explicitly talked about the benefits of “an honest back-and-forth about the sausage of statistical conclusions,” and 2020, when he avoided discussions about problems with our forecasting models, and 2023, when he “doesn’t really want a back-and-forth”?

I don’t have any specific information regarding Nate, but in any case I’m more interested in the general phenomenon of when it is that public figures want to engage in back-and-forths and when they don’t.

My theory is that if you’re a pundit and you become famous, you attract lots and lots of stupid criticism. This happened to Paul Krugman, it happened to David Brooks, and it happened to Nate Silver too. You get lots of uninformed people attacking you because they don’t misunderstand what you’re doing or you were pooh-poohing their favorite JFK-assassination theory or UFO’s-as-space-aliens theory or you’re not taking their exact political position on the issue of the day or whatever. And so you develop . . . not a thick skin, exactly, but an acceptance that you have neither the time or interest in responding to all the uninformed and possibly insincere criticism that you’re receiving.

At some point you realize that you’re piloting a submarine through a poop-filled sea and you can’t spend the rest of your life trying to keep the hull clean.

At the same time, you’re getting some thoughtful criticism! Some of it is framed in a very deferential way to you, some of it is direct and polite, and some of it is downright rude but still thoughtful. But it doesn’t matter: you’ve already turned off your reply instinct. So even when you feel forced to reply (for example, person A criticizes something you said and then B, C, D, E, F, etc. join in the twitter thread and ask what is your response), you do so reluctantly, with annoyance, and you reiterate that you “don’t really want a back-and-forth.”

In summary, the “I don’t really want a back-and-forth” attitude makes me sad, but I think I understand where it is coming from. And if people don’t want that sort of discussion, that’s their choice.