The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490
Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways
What is Interpretability? • Many papers make axiomatic claims that some model is interpretable and therefore preferable • But what interpretability is and precisely what desiderata it serves are seldom defined • Does interpretability hold consistent meaning across papers?
Inconsistent Definitions • Papers use the words interpretable, explainable, intelligible, transparent, and understandable , both interchangeably (within papers) and inconsistently (across papers) • One common thread, however, is that interpretability is something other than performance
We want good models Evaluation Metric
We also want interpretable models Evaluation Metric Interpretation
The Human Wants Something the Metric Doesn’t Evaluation Metric Interpretation
What Gives? • So either the metric captures everything and people seeking interpretable models are crazy or… • The metric / loss functions we optimize are fundamentally mismatched from real life objectives • We hope to refine the discourse on interpretability, introducing more specific language • Through the lens of the literature, we create a taxonomy of both objectives & methods
Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways
Trust • Does the model know when it’s uncertain? • Does the model make same mistakes as human? • Are we comfortable with the model?
Causality • We may want models to tell us something about the natural world • Supervised models are trained simply to make predictions, but often used to take actions • Caruana (2015) shows a mortality predictor (for use in triage) that assigns lower risk to asthma patients
Transferability • The idealized training setups often differ from real world • Real problem may be non-stationary, noisier, etc. • Want sanity-checks that the model doesn’t depend on weaknesses in setup
Informativeness • We may train a model to make a decision • But it’s real purpose is to aid a person in making a decision • Thus an interpretation may simply be valuable for the extra bits it carries
Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways
Transparency • Proposed solutions conferring interpretability tend to fall into two categories • Transparency addresses understanding how the model works • Explainability concerns the model’s ability to offer some (potentially post-hoc) explanation
Simulatability • One notion of transparency is simplicity • This accords with papers advocating small decision trees • A model is transparent if a person can step through the algorithm in reasonable time
Decomposability • A relaxed notion requires understanding individual components of a model • Such as: weights of a linear model or the nodes of a decision tree
Transparent Algorithms • A yet weaker notion would require only that we understand the behavior algorithm • E.g. convergence of convex optimizations, generalization bounds
Post-Hoc Interpretability A h y e s , s o m e t h i n g c o o l i s h a p p e n i n g i n n o d e 7 5 0 , 3 4 5 , 1 6 7 … m a y b e i t s e e s a c a t ? M a y b e w e ’ l l s e e s o m e t h i n g a w e s o m e i f w e j i g g l e t h e i n p u t s ?
Verbal Explanations • Just as people generate explanations (absent transparency), we might train a (possibly separate) model to generate explanations • We might consider image captions as interpretations of object predictions (Image: Karpathy et al 2015)
Saliency Maps • While the full relationship between input and output might be impossible to describe succinctly, local explanations are potentially useful. (Image: Wang et al 2016)
Case-Based Explanations • Another way to generate a post-hoc explanation might be to retrieve labeled items that are deemed similar by the model • For some models, we can retrieve histories from similar patients (Image: Mikolov et al 2014)
Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways
Discussion Points • Linear models not strictly more interpretable than deep learning • Claims about interpretability must be qualified • Transparency may be at odds with the goals of AI • Post-hoc interpretations may potentially mislead
Thanks! Acknowledgments: Zachary C. Lipton was supported by the Division of Biomedical Informatics at UCSD, via training grant (T15LM011271) from the NIH/NLM. Thanks to Charles Elkan, Julian McAuley, David Kale, Maggie Makar, Been Kim, Lihong Li, Rich Caruana, Daniel Fried, Jack Berkowitz, & Sepp Hochreiter References: The Mythos of Model Interpretability (ICML Workshop on Human Interpretability 2016) - ZC Lipton http://arxiv.org/abs/1511.03677 Directly Modeling Missing Data with RNNs (MLHC 2016) - ZC Lipton, DC Kale, R Wetzel http://arxiv.org/abs/1606.04130 Learning to Diagnose (ICLR 2016) - ZC Lipton, DC Kale, Charles Elkan, R Wetzel http://arxiv.org/abs/1511.03677 Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. ( 2015) - R Caruana et al http://dl.acm.org/citation.cfm?id=2788613
Recommend
More recommend