This is Jessica, but today I’m blogging about a blog post on interpretable machine learning that co-blogger Keith wrote for his day job and shared with me. I agree with multiple observations he makes. Some highlights:
The often suggested simple remedy for this unmanageable complexity is just finding ways to explain these black box models; however, those explanations can sometimes miss key information. In turn, rather than being directly connected with what is going on in the black box model, they result in being “stories” for getting concordant predictions. Given that concordance is not perfect, they can result in very misleading outcomes for many situations.
I blogged a bit about this before, giving examples like inconsistency in explanations for the same inputs and outputs. Thinking about these complications, I’m reminded of a talk by Chris Olah that I saw back in 2018, where he talked about how feature visualizations of activiations that fire for different image inputs to a deep neural net allow us to seriously consider what’s going on inside, in a way that makes them analogous to the discovery of the microscope opening up a whole new world of microorganisms. I wonder if this idea has lost favor given that sometimes these explanations don’t behave the way we would hope.
I can also buy that not-quite-correct explanations can cause problems downstream since I see it in the human context. The other day I had to ask a collaborator to try to refrain from providing explanations instead of methodological details for unexpected analysis results, since when delivered passionately the explanations could seem good enough that I wouldn’t question them initially, only to later realize we wasted time when there was a better explanation. Plus when every unexpected result comes with explanation, skepticism with the explanation can make it feel like you’re undercutting everything a person says, even if you want to encourage discussion of these things.
While we need to accept what we cannot understand, we should never overlook the advantages of what we can understand. For example, we may never fully understand the physical world. Nor how people think, interact, create and or decide. In ML, Geoffrey Hinton’s 2018 YouTube drew attention to the fact that people are unable to explain exactly how they decide in general if something is the digit 2 or not. This fact was originally pointed out, a while ago, by Herbert Simon, and has not been seriously disputed (Erickson and Simon, 1980). However, prediction models are just abstractions and we can understand the abstractions created to represent that reality, which is complex and often beyond our direct access. So not being able to understand people is not a valid reason to dismiss desires to understand prediction models.
In essence, abstractions are diagrams or symbols that can be manipulated, in error-free ways, to discern their implications. Usually referred to as models or assumptions, they are deductive and hence can be understood in and of themselves for simply what they imply. That is, until they become too complex. For instance, triangles on the plane are understood by most, while triangles on the sphere are understood by less. Reality may always be too complex, but models that adequately represent reality for some purpose need not be. Triangles on the plane are for navigation of short distances while on the sphere, for long distances. Emphatically, it is the abstract model that is understood not necessarily the reality it attempts to represent.
I’m not sure I totally grasp the distinction Keith is trying to get at here. To me the above passage implies we should be careful about assuming that some aspects of reality are too complex to explain. But given the part about concordances being misleading above, it seems applying this recursively can lead to problems: when the deep model is the complex thing we want to explain, we have to be careful isolating what we think are simpler units of abstractions to capture what it’s doing. For instance, a node in a neural network is a relatively simple abstraction (i.e., a linear regression wrapped in a non-linear activation function), but is thinking at that level of abstraction as a means of trying to understand the much more complex behavior of the network as a whole useful? Maybe Keith is trying to motivate considering interpretability in your choice of model, which he talks about later.
Related to people not being able to say how they recognize a 2, one thing that people can potentially do is point to the processor they think is responsible; e.g., I can’t describe why it’s a 2 succintly based on low level properties like edge detection but maybe I could say something higher level like, ‘I would guess it’s something like visual word form memory.’ It’s not a complete explanation, but it seems that sort of meta statement could potentially be useful since the first step to debugging is to figure out where to start looking.
[A] persistent misconception has arisen in ML that models for accurate prediction usually need to be complex. To build upon previous examples, there remains some application areas where simple models have yet to achieve accuracy comparable to black box models. On the other hand, simple models continue to predict as accurately as any state of the art black box model and thus, the question, as noted in the 2019 article by Rudin and Radin, is: “Why Are We Using Black Box Models in AI When We Don’t Need To?”
The referenced paper describes how the authors entered a NeurIPS competition on explainability, but then realized they didn’t need a black box at all to do the task, they could just use one of many simpler, interpretable models. Oops. Some of the interpretability work coming out of ML does seem like what you get when complexity enthusiasts excitedly latch onto new problem that can motivate more of what they’re good at (e.g., optimization), without necessarily questioning the premise.
Interpretable models are far more trustworthy in that they can be more readily discerned where and when they should be trusted or not and in what ways. But, how can one do this without understanding how the model works, especially for a model that is patently not trustworthy? This is especially important in cases where the underlying distribution of data changes, where it is critical to trouble shoot and modify without delays, as noted in the 2020 article by Hamamoto et al. It is arguably much more difficult to remain successful in the ML full life cycle with black box models than with interpretable models.
Agreed; debugging calls for some degree of interpretability. And often the more people you can get helping debug something, the more likely you are to find the problem.
There is increasing understanding based on considering numerous possible prediction models in a given prediction task. The not-too-unusual observation of simple models performing well for tabular data (a collection of variables, each of which has meaning on its own) was noted over 20 years ago and was labeled the “Rashomon effect” (Breiman, 2001). Breiman posited the possibility of a large Rashomon set in many applications; that is, a multitude of models with approximately the same minimum error rate. A simple check for this is to fit a number of different ML models to the same data set. If many of these are as accurate as the most accurate (within the margin of error), then many other untried models might also be. A recent study (Semenova et al., 2019), now supports running a set of different (mostly black box) ML models to determine their relative accuracy on a given data set to predict the existence of a simple accurate interpretable model—that is, a way to quickly identify applications where it is a good bet that accurate interpretable prediction model can be developed.
I like the idea of trying to estimate how many different ways there are to achieve good accuracy on some inference problem. I’m reminded of a paper I recently read which does basically the inverse – generate a bunch of hypothetical datasets and see how well a model intended to explain human behavior does across them, to understand when you just have a very flexible model versus when it’s actually providing some insight into behavior.
The full data science and life-cycle process likely is different when using interpretable models. More input is needed from domain experts to produce an interpretable model that make sense to them. This should be seen as an advantage. For instance, it is not too unusual at a given stage to find numerous equally interpretable and accurate models. To the data scientist, there may seem little to guide the choice between these. But, when shown to domain experts, they may easily discern opportunities to improve constraints as well as indications of which ones are less likely to generalize well. All equally interpretable and accurate models are not equal in the eyes of domain experts.
I definitely agree with this and other comments Keith makes about the need to consider interpretability early in the process. I was involved in a paper a few years ago where my co-authors had interviewed a bunch of machine learning developers about interpretability. One of the more surprising things we found was that in contrast to ML lit implying that interpretability can be applied post model development, it was seen by many of the developers as a more holistic thing related to how much others in their organization trusted their work at all, and consequently many thought about from the beginning of model development.
There is now a vast and confusing literature, which conflates interpretability and explainability. In this brief blog, the degree of interpretability is taken simply as how easily the user can grasp the connection between input data and what the ML model would predict. Erasmus et al. (2020) provide a more general and philosophical view. Rudin et al. (2021) avoid trying to provide an exhaustive definition by instead providing general guiding principles to help readers avoid common, but problematic ways of thinking about interpretability. On the other hand, the term “explainability” often refers to post hoc attempts to explain a black box by using simpler ‘understudy’ models that predict the black box predictions.
I’ve always found the simple definition of interpretability as ability to simulate what a model will predict interesting. At one point I was thinking about how if interpretability is mainly aimed at building trust in model predictions, maybe a “deeper” proxy for trust could be called internalizability, which is where the person (after using the model) is simulating the model but they don’t know it.