This is Jessica. This winter I’m teaching a new graduate seminar on prediction for decision-making intended primarily for Computer Science Ph.D. students. The goal of the new course is to consider various perspectives on what it means to predict for the purpose of decision-making. We’ll look at this question in the context of predictive modeling for automated decisions or to inform expert decisions and causal estimation to inform policy. I’m trying to include a mix of theoretical and applied papers, with an emphasis on philosophical and ethical challenges to evaluating decision-making and applying formal methods in practice, especially in contexts where human experts currently make decisions and/or the decisions involve people. Technically the course title is Prediction for Decision-making. But one of the motivations is that we have yet to adequately address the gap between conventional machine learning, where we optimize loss over aggregates, and the needs of human decision-makers in practice, where we often care about doing right by individual cases. Hence the reference to “individualized.”
Suggestions welcome if this is your cup of tea and you think I missed something important. A few of the listed papers are already coming from pointers I’ve gotten from readers here. I’m especially interested in papers that help illustrate the gaps in current methods when it comes to good individual decisions.
Course Schedule
Week 1 – Introduction and background on statistical decision rules
Background: Statistical decision theory, randomized controlled trials
- Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science & Business Media. Chapter 1.
- Hernan, Miguel A., & Robins, James, M. (2023). Causal inference: what if. CRC PRESS. Chapters 1, 2
Examples
- Tarabichi, Y., Cheng, A., Bar-Shain, D., McCrate, B. M., Reese, L. H., Emerman, C., … & Hecker, M. T. (2022). Improving timeliness of antibiotic administration using a provider and pharmacist facing sepsis early warning system in the emergency department setting: a randomized controlled quality improvement initiative. Critical care medicine, 50(3), 418-427.
- Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
- Widner, K., Virmani, S., Krause, J., Nayar, J., Tiwari, R., Pedersen, E. R., … & Webster, D. R. (2023). Lessons learned from translating AI from development to deployment in healthcare. Nature Medicine, 29(6), 1304-1306.
- Kawakami, A., Sivaraman, V., Cheng, H. F., Stapleton, L., Cheng, Y., Qing, D., … & Holstein, K. (2022). Improving human-AI partnerships in child welfare: understanding worker practices, challenges, and desires for algorithmic decision support. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1-18).
- Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, 1 (2018). Publisher: American Association for the Advancement of Science
Week 2 – Prediction versus decision-making
- Fernández-Loría, C., & Provost, F. (2022). Causal decision making and causal effect estimation are not the same… and why it matters. INFORMS Journal on Data Science, 1(1), 4-16.
- Mitzenmacher, M., & Vassilvitskii, S. (2022). Algorithms with predictions. Communications of the ACM, 65(7), 33-35.
- Liu, L., Barocas, S., Kleinberg, J., and Levy, K. (2024). On the actionability of outcome prediction. Proceedings of the AAAI Conference on Artificial Intelligence 38 (20).
Optional
- Perdomo, J. C. (2024). The Relative Value of Prediction in Algorithmic Decision Making.
- Elmachtoub, A. N., & Grigas, P. (2022). Smart “predict, then optimize”. Management Science, 68(1), 9-26.
- Liu, L. T., Wang, S., Britton, T., & Abebe, R. (2023). Reimagining the machine learning life cycle to improve educational outcomes of students. Proceedings of the National Academy of Sciences, 120(9), e2204781120.
Week 3 – Human versus statistical judgment
- Meehl, P. Clinical versus statistical prediction: A theoretical analysis and a review of the evidence.
- Felin, T., & Holweg, M. (2024). Theory Is All You Need: AI, Human Cognition, and Causal Reasoning. Strategy Science.
Optional
- Spengler, P. M. (2013). Clinical versus mechanical prediction. Handbook of psychology: Assessment psychology, 26-49.
- Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: a meta-analysis. Psychological assessment, 12(1), 19.
- Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., … & Rush, J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The counseling psychologist, 34(3), 341-382.
- Colunga-Lozano, L. E., Foroutan, F., Rayner, D., De Luca, C., Hernández-Wolters, B., Couban, R., … & Guyatt, G. (2024). Clinical judgment shows similar and sometimes superior discrimination compared to prognostic clinical prediction models: a systematic review. Journal of Clinical Epidemiology, 165, 111200.
- Razzaki, S., Baker, A., Perov, Y., Middleton, K., Baxter, J., Mullarkey, D., … & Johri, S. (2018). A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis.
- Boone, C. (2024). Discretion in Clinical Decision Making: Evidence from Bunching.
- Kawakami, A., Sivaraman, V., Stapleton, L., Cheng, H. F., Perer, A., Wu, Z. S., … & Holstein, K. (2022, June). “Why Do I Care What’s Similar?” Probing Challenges in AI-Assisted Child Welfare Decision-Making through Worker-AI Interface Design Concepts. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (pp. 454-470).
Week 4 – Evaluating (individual) predictions and decisions
- Dawid, P. (2017). On Individual Risk.
- Selbst, A. (2019). Negligence and AI’s Human Users.
- Wang, A., Kapoor, S., Barocas, S., & Narayanan, A. (2024). Against predictive optimization: On the legitimacy of decision-making algorithms that optimize predictive accuracy. ACM Journal on Responsible Computing, 1(1), 1-45.
Optional
- van Royen, F. S., Moons, K. G., Geersing, G. J., & van Smeden, M. (2022). Developing, validating, updating and judging the impact of prognostic models for respiratory diseases. European Respiratory Journal, 60(3).
- Ben-Michael, E., Greiner, D. J., Huang, M., Imai, K., Jiang, Z., & Shin, S. (2024). Does AI help humans make better decisions? A methodological framework for experimental evaluation. arXiv preprint arXiv:2403.12108.
- Coston, A., Kawakami, A., Zhu, H., Holstein, K., & Heidari, H. (2023). A validity perspective on evaluating the justified use of data-driven decision-making algorithms. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 690-704). IEEE.
- Karusala, N., Upadhyay, S., Veeraraghavan, R., & Gajos, K. Z. (2024). Understanding Contestability on the Margins: Implications for the Design of Algorithmic Decision-making in Public Services. In Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-16).
Week 5 – Data shifts and causality
- Adarsh Subbaswamy and Suchi Saria. 2020. From development to deployment: Dataset shift, causality, and shiftstable models in health AI. Biostatistics 21, 2 (Apr. 2020), 345–352.
- Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5), 947-1012.
- C. Mendler-Dünner, F. Ding, and Y. Wang. Anticipating performativity by predicting from predictions. Advances in Neural Information Processing Systems, 35:31171–31185, 2022.
Optional
- Wald, Y., Feder, A., Greenfeld, D., & Shalit, U. (2021). On calibration and out-of-domain generalization. Advances in neural information processing systems, 34, 2215-2227.
- Guo, L. L., Pfohl, S. R., Fries, J., Johnson, A. E., Posada, J., Aftandilian, C., … & Sung, L. (2022). Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Scientific reports, 12(1), 2726.
- Luke Guerdan, Amanda Coston, Kenneth Holstein, and Zhiwei Steven Wu. 2023. Counterfactual prediction under outcome measurement error. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT’23). ACM, New York, NY, 1584–1598. https://doi.org/10.1145/3593013.3594101
- Van Parys, B. P., Esfahani, P. M., & Kuhn, D. (2021). From data to decisions: Distributionally robust optimization is optimal. Management Science, 67(6), 3387-3402.
Week 6 – Personalization and fairness
- Shalit, U. (2020). Can we learn individual-level treatment policies from clinical data? Biostatistics, 21(2), 359-362.
- Curth, A., Peck, R. W., McKinney, E., Weatherall, J., & van Der Schaar, M. (2024). Using machine learning to individualize treatment effect estimation: Challenges and opportunities. Clinical Pharmacology & Therapeutics, 115(4), 710-719.
- Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. Advances in neural information processing systems, 30.
Optional
- Hedges, L. (2024). Chapter 6: Planning Experimental Designs. Unpublished manuscript.
- Suriyakumar, Vinith Menon, Marzyeh Ghassemi, and Berk Ustun. When personalization harms performance: reconsidering the use of group attributes in prediction. International Conference on Machine Learning. PMLR, 2023.
Week 7 – Calibration for decision-making
- Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(2), 243-268.
- Hébert-Johnson, U., Kim, M., Reingold, O., & Rothblum, G. (2018, July). Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning (pp. 1939-1948). PMLR.
- Gopalan, P., Kalai, A. T., Reingold, O., Sharan, V., & Wieder, U. (2021). Omnipredictors. arXiv preprint arXiv:2109.05389.
Optional
- Dawid, P. The well-calibrated Bayesian (1982). Journal of the American Statistical Association.
- Dwork, C., Kim, M. P., Reingold, O., Rothblum, G. N., & Yona, G. (2021). Outcome indistinguishability. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (pp. 1095-1108)
- Gopalan, P., Hu, L., Kim, M. P., Reingold, O., & Wieder, U. (2022). Loss minimization through the lens of outcome indistinguishability. arXiv preprint arXiv:2210.08649.
- Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M. J., & Steyerberg, E. W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. Journal of clinical epidemiology, 74, 167-176.
- Angelopoulos, A. N., Bates, S., Fisch, A., Lei, L., & Schuster, T. (2022). Conformal risk control.
Week 8 – Communicating prediction uncertainty
- Cortes-Gomez, S., Patiño, C., Byun, Y., Wu, S., Horvitz, E., & Wilder, B. (2024). Decision-Focused Uncertainty Quantification. arXiv preprint arXiv:2410.01767.
- Corvelo Benz, N., & Rodriguez, M. (2024). Human-aligned calibration for ai-assisted decision making. Advances in Neural Information Processing Systems, 36.
- Marx, C., Calmon, F., & Ustun, B. (2020). Predictive multiplicity in classification. International Conference on Machine Learning. PMLR.
Optional
- Zhang, D., Chatzimparmpas, A., Kamali, N., & Hullman, J. (2024). Evaluating the utility of conformal prediction sets for ai-advised image labeling. In Proceedings of the CHI Conference on Human Factors in Computing Systems (pp. 1-19).
Week 9 – Designing human-AI workflows
- Guo, Z., Wu, Y., Hartline, J. D., & Hullman, J. (2024). A Decision Theoretic Framework for Measuring AI Reliance. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 221-236).
- Alur, R., Laine, L., Li, D. K., Shung, D., Raghavan, M., & Shah, D. (2024). Integrating Expert Judgment and Algorithmic Decision Making: An Indistinguishability Framework. arXiv preprint arXiv:2410.08783.
- Collina, N., Goel, S., Gupta, V., & Roth, A. (2024). Tractable Agreement Protocols. arXiv preprint arXiv:2411.19791.
Optional
- Punzi, C., Pellungrini, R., Setzu, M., Giannotti, F., & Pedreschi, D. (2024). AI, Meet Human: Learning paradigms for hybrid decision making systems. arXiv preprint arXiv:2402.06287.
- Mozannar, H., Lang, H., Wei, D., Sattigeri, P., Das, S., & Sontag, D. (2023). Who should predict? exact algorithms for learning to defer to humans. In International conference on artificial intelligence and statistics (pp. 10520-10545). PMLR.
- Hilgard, S., Rosenfeld, N., Banaji, M. R., Cao, J., & Parkes, D. (2021, July). Learning representations by humans, for humans. In International conference on machine learning (pp. 4227-4238). PMLR
- Karimi, A. H., Muandet, K., Kornblith, S., Schölkopf, B., & Kim, B. (2022). On the relationship between explanation and prediction: A causal view. arXiv preprint arXiv:2212.06925.
- Fok, R., & Weld, D. S. (2024). In search of verifiability: Explanations rarely enable complementary performance in AI‐advised decision making. AI Magazine, 45(3), 317-332.
- Buçinca, Z., Swaroop, S., Paluch, A. E., Doshi-Velez, F., & Gajos, K. Z. (2024). Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills. arXiv preprint arXiv:2410.04253.
What an excellent list of references – most of which I was not familiar with but which are directly relevant to things I think about a lot. I will offer one observation of many of these sources – they are unnecessarily laden with technical jargon, in my opinion. There are a number of examples of what I consider academic writing styles that adhere to our academic conventions but make the substantive issues harder to understand. I think this applies especially to the calibration readings (which have been hard for me to digest on this blog as well). There is a lot on this list! Too much for the time I have! But, thanks for putting it together and providing much reading for the coming months.
I agree! Glad you appreciate the list.
This looks like fun and there is always more to add. My main concern would be that’s a lot of technical material per week—if these aren’t stats or very dedicated ML Ph.D. students, this is going to be rough.
I would want to talk about Pareto optimality as it relates to decision making for groups and individuals. And then something like the trolley problem, which clearly highlights that these problems don’t have a “solution”. Then I’d want to talk about Arrow’s impossibility theorem on voting for decision making. But just this topic is the basis of a whole semester-long class, so I don’t know how you’re going to balance all this material—good luck!
I’d be interested to know any references about Arrow’s impossibility theorem in the context of decision making. I am only vaguely familiar with the theorem from reading several of Donald Saari’s works about a decade ago. I came away with the impression that it was more molehill than mountain, but I don’t remember any specifics.
It should also be mentioned that Arrow’s theorem requires that you vote based on ranks. It doesn’t apply if you’re voting based on cardinality/utility estimates. Of course other “perverse/strategic voting theorems” apply perhaps. Everything has some kind of strategic voting possibility. But Arrow’s theorem is a molehill for sure. Unfortunately we can’t seem to get people to understand that score voting is the one true path ;-)
Daniel,
Yes, everything has some kind of strategic voting possibility, also the rules of the election can affect who decides to run and how many serious candidates there will be. So naive analyses of voting paradoxes based on a static set of candidates can go wrong.
You’re right that it will be a challenge given the material. I usually assign groups of 3 or so students to cover each week, and it’s a big chunk of their grade, so the presentations are expected to be technically correct and well prepared. This year I plan to do practice sessions with them to make sure the presentations hit all the important points. In the past in courses like this I have sometimes taken on some of the technical material (and the important points from the optional material) to help out. I’m ok with students not grasping everything if they are motivated to try to keep up and sincerely interested in the material. I prefer to have some diversity of perspectives in a course like this.
On Arrow’s impossibility theorem – not including anything on computational social choice is probably an oversight, particularly given the interest in ML right now around “pluralistic alignment”/social choice for RLHF.
Your mention of Pareto optimality reminds me of a fairness paper I may add: https://dl.acm.org/doi/pdf/10.1145/3490486.3538237
Jessica,
If you want to talk about fairness, I kinda like my 2002 paper, Voting, fairness, and political representation (with discussion). Chance 15 (3), 22-26.
I like this particular paper because it’s simple and it demystifies some fairness issues that I sometimes think are presented in too technical a manner.
(I keep referring to my own work not because it’s the best but because I’m most familiar with it, so when any topic comes up, some article of mine will come to mind.)
– For week 1 on decision analysis, I recommend chapter 9 of BDA3. Or, for a softer read, this 2015 article with Phil Price, Centralized analysis of local data, with dollars and lives on the line: Lessons from the home radon experience. In Data Science for Politics, Policy and Government}, ed. R. Michael Alvarez. Cambridge University Press. I prefer these real examples to the abstract discussions of mean squared error etc. in Berger’s book.
– For week 1 on causal inference, I recommend my 2011 article, Causality and statistical learning. American Journal of Sociology 117, 955-966, not because it offers any deep theory (it doesn’t!) but because it discusses several different perspectives on causal inference.
– For week 5 on generalization, I recommend chapter 19 of Regression and Other Stories and this 2021 article with Lauren Kennedy, Know your population and know your model: Using model-based regression and poststratification to generalize findings beyond the observed sample. Psychological Methods 26, 547-558.
– For week 8 on communicating predictive uncertainty, I recommend our article! from 2020, Information, incentives, and goals in election forecasts. Judgment and Decision Making 15, 863-880.
Thanks, these are all good!
This list already looks great! Here is some relevant writing that might be interesting.
For a history (of science) perspective on how formalized decision policies are put to work in social practice, I really enjoyed Lorraine Daston’s book “Rules: A Short History of What We Live By”. It focuses on how we have historically approached the tasks of first generalizing from particulars (cases) to the universal (the rule/policy), and second applying the universal (rule/policy) to the particular (individual decision) across a range of domains. The book really gets at how the application of formal procedures inevitably runs into improvisation (or what we sometimes call judgment and intuition). Streel-Level Algorithms (https://hci.stanford.edu/publications/2019/streetlevelalgorithms/streetlevelalgorithms-chi2019.pdf) also talks about the same improvisation in application. Lorraine’s work highlights how for a long time improvisation was thought to be necessary (rules or policies work were thought to work best when *someone* is tasked with applying them rather than automated application) but over time the concepts of “judgement” and “discernment” have shifted from being desirable (by accommodating for the uniqueness of each decision) to undesirable (introducing human biases) in decision-making processes. Street-level algorithm’s talks about how the improvisation can be accounted for (e.g. by provisioning recourse).
For philosophical perspectives on individual decisions, I like Ruth Chang’s work on the limits of practical reason (such expected utility calculus or cost/benefits analysis). This is an example https://philarchive.org/archive/CHAHC-8
Finally self-plug on decisions about people, where the decision-maker cares about individual outcomes (and how current methods are limited): https://dl.acm.org/doi/10.1145/3613904.3642685 . We study what kinds of “decision support” managers wanted when making team decisions. Similar to Anna’s work on child welfare decisions in that it deals with decisions about people but the key difference is that we looks at decisions that warrant explanation (e.g. where convincing stakeholders about a course of action and choosing the course of action are concomitant). So, shaping collective beliefs is part of the decision-making process and as a result, also constrains how decisions proceed. This is also explored to some extent in https://hal.science/hal-03199715/document
Thanks, these recs are quite relevant but I was not aware of them. The Daston book sounds very intriguing.
The Dalston book is great. Kenneth R. Hammond’s (1996) “Human Judgement and Social Policy” could be relevant especially to the topics in week 3, but I’m not sure I would recommend it even as background material for this reading list since it’s a rather long book. Many thanks for sharing this list of references, it looks fantastic and I’m looking forward to reading!
For causal inference section
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=cuKCFmcAAAAJ&citation_for_view=cuKCFmcAAAAJ:2osOgNQ5qMEC
Causal inference in the Social and behavioral sciences
Hello, Jessica! On “individualized”, I happen to be watching some old YouTube videos where Michael Jordan was talking about computational thinking, inferential/statistical thinking, and big data (around 2015), e.g.,
https://www.youtube.com/watch?v=j1PS6T6yAoY
https://dl.acm.org/doi/10.1145/2745754.2745782 (there is a video under Supplementary Material)
The latter is linked to:
Michael I. Jordan. 2015. Computational Thinking, Inferential Thinking and “Big Data”. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’15). Association for Computing Machinery, New York, NY, USA, 1. https://doi.org/10.1145/2745754.2745782.
I’m not sure if there are any decision making contents in the paper, but just in case.
This is a really solid list. Some work I haven’t come across before but I’ll come back to this post to find some new stuff, I think.
As an economist by background I have also been meaning to start delving into some behavioural economics reading as the entire discipline is more or less about decision-making under uncertainty. If anyone has anywhere they would suggest to get started, I’d appreciate it!
Darn, what a list! Would definitely enjoy working with graduates from this class. I mentioned it recently in another thread and will highlight it here again as an excellent dissection of individualized inference that would fit well in such a course:
Liu K & Meng XL (2016). There Is Individualized Treatment. Why Not Individualized Inference? Annual Review of Statistics and Its Application, Vol. 3, Issue 1, pp. 79-111: https://www.annualreviews.org/content/journals/10.1146/annurev-statistics-010814-020310
Thanks, this is very relevant!
Lots of people above have said it – let me say it again. Very useful – thanks.
I don’t know if this really fits in your existing course but take a look at Fleder, Daniel M. and Hosanagar, Kartik, Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity (September 1, 2007). Management Science, Vol. 55, No. 5, pp. 697-712, May 2009, Available at SSRN: https://ssrn.com/abstract=955984 or https://dx.doi.org/10.2139/ssrn.955984. This paper is looking at how interactions with a recommender system affect the sales of various products over the course of a sequence of decisions, and that doesn’t necessarily line up with what you might expect from just looking at a single period.
Jessica: Have you considered including a section on legal issues for AI systems that will in all likelihood be used to violate the 4th Amendment along with state and federal legislation related to stalking and harassment?
I am interested in including more on the legal side related to individual-level protections. Open to suggestions.
My thought is that you will likely be covering AI algorithmic employment decisions under your section on fairness and what might be a good exercise is to identify a specific technology and generate the potential for abuse and constitutional or illegal use of such a technology. One technology that has been around for a long time, but is just becoming commercially available is hypersonic speaker and parabolic microphone technology. You could assign students to research the technology and come with how it will likely be misused given AI technology such as Voice Generators.
This looks like a really interesting set of papers. I am interesting in exploring these. The following paper could be relevant too, in case you’re not already aware of it:
Tim Miller. 2023, Explainable AI is dead, Long live Explainable AI! Hypothesis-driven decision support, https://arxiv.org/abs/2302.12389
I know this one! It makes some good points.