Active learning and decision making with varying treatment effects!

In a new paper, Iiris Sundin, Peter Schulam, Eero Siivola, Aki Vehtari, Suchi Saria, and Samuel Kaski write:

Machine learning can help personalized decision support by learning models to predict individual treatment effects (ITE). This work studies the reliability of prediction-based decision-making in a task of deciding which action a to take for a target unit after observing its covariates x̃ and predicted outcomes p̂(ỹ∣x̃,a). An example case is personalized medicine and the decision of which treatment to give to a patient. A common problem when learning these models from observational data is imbalance, that is, difference in treated/control covariate distributions, which is known to increase the upper bound of the expected ITE estimation error. We propose to assess the decision-making reliability by estimating the ITE model’s Type S error rate, which is the probability of the model inferring the sign of the treatment effect wrong. Furthermore, we use the estimated reliability as a criterion for active learning, in order to collect new (possibly expensive) observations, instead of making a forced choice based on unreliable predictions. We demonstrate the effectiveness of this decision-making aware active learning in two decision-making tasks: in simulated data with binary outcomes and in a medical dataset with synthetic and continuous treatment outcomes.

Decision making, varying treatment effects, type S errors, Stan, validation. . . this paper has all of my favorite things!

2 thoughts on “Active learning and decision making with varying treatment effects!

  1. Don’t forget causality, propensity scoring, automatic relevance determination, and lots of definitions, theorems, and proofs!

    The abstract says that type-S error is “the probability of the model inferring the sign of the treatment effect wrong.” Is this something that only makes sense in a causal context? What does it mean to infer the sign of a treatment effect? Is it something like bounding the posterior away from zero like in a NHST? What’s the difference between 0 + epsilon and 0 – epsilon as an estimate in a regression? Don’t they both have roughly no effect?

    I tried to read the paper, but got lost in the examples, definitions, and theorems.

    • My understanding is that the `difference between 0 + epsilon and 0` is you’d give different treatments to a particular person. Suppose you had a treatment and a control and found the individual treatment effect for a particular person is + epsilon. Then you should give the person the treatment. I can see this making a difference if you want to get an ‘accuracy correct treatment’ stat or something like that but seems irrelevant for really small ITEs.

Leave a Reply

Your email address will not be published. Required fields are marked *