the mythos of model interpretability
play

The Mythos of Model Interpretability Zachary C. Lipton - PowerPoint PPT Presentation

The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline What is interpretability ? What are its desiderata? What model properties confer interpretability? Caveats, pitfalls, and takeaways


  1. The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490

  2. Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways

  3. What is Interpretability? • Many papers make axiomatic claims that some model is interpretable and therefore preferable • But what interpretability is and precisely what desiderata it serves are seldom defined • Does interpretability hold consistent meaning across papers?

  4. Inconsistent Definitions • Papers use the words interpretable, explainable, intelligible, transparent, and understandable , both interchangeably (within papers) and inconsistently (across papers) • One common thread, however, is that interpretability is something other than performance

  5. We want good models Evaluation Metric

  6. We also want interpretable models Evaluation Metric Interpretation

  7. The Human Wants Something the Metric Doesn’t Evaluation Metric Interpretation

  8. What Gives? • So either the metric captures everything and people seeking interpretable models are crazy or… • The metric / loss functions we optimize are fundamentally mismatched from real life objectives • We hope to refine the discourse on interpretability, 
 introducing more specific language • Through the lens of the literature, we create a taxonomy of both objectives & methods 


  9. Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways

  10. Trust • Does the model know 
 when it’s uncertain? • Does the model make 
 same mistakes as human? • Are we comfortable 
 with the model?

  11. Causality • We may want models to 
 tell us something about 
 the natural world • Supervised models are 
 trained simply to make 
 predictions, but often used to take actions • Caruana (2015) shows a mortality predictor (for use in triage) that assigns lower risk to asthma patients

  12. Transferability • The idealized training 
 setups often differ from 
 real world • Real problem may be 
 non-stationary, noisier, 
 etc. • Want sanity-checks that 
 the model doesn’t depend 
 on weaknesses in setup

  13. Informativeness • We may train a model 
 to make a decision • But it’s real purpose is 
 to aid a person in 
 making a decision • Thus an interpretation 
 may simply be valuable for the extra bits it carries

  14. Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways

  15. Transparency • Proposed solutions conferring interpretability tend to fall into two categories • Transparency addresses understanding how the model works • Explainability concerns the model’s ability to offer some (potentially post-hoc) explanation

  16. Simulatability • One notion of transparency 
 is simplicity • This accords with papers 
 advocating small decision trees • A model is transparent if a 
 person can step through the 
 algorithm in reasonable time

  17. Decomposability • A relaxed notion requires 
 understanding individual 
 components of a model • Such as: weights of a linear 
 model or the nodes of a 
 decision tree

  18. Transparent Algorithms • A yet weaker notion 
 would require only 
 that we understand the 
 behavior algorithm • E.g. convergence of 
 convex optimizations, 
 generalization bounds 


  19. Post-Hoc Interpretability A h y e s , s o m e t h i n g c o o l i s h a p p e n i n g i n n o d e 7 5 0 , 3 4 5 , 1 6 7 … m a y b e i t s e e s a c a t ? 
 M a y b e w e ’ l l s e e s o m e t h i n g a w e s o m e i f w e j i g g l e t h e i n p u t s ?

  20. Verbal Explanations • Just as people generate 
 explanations (absent 
 transparency), we might 
 train a (possibly separate) 
 model to generate 
 explanations • We might consider image 
 captions as interpretations 
 of object predictions 
 (Image: Karpathy et al 2015)

  21. Saliency Maps • While the full relationship 
 between input and output 
 might be impossible to 
 describe succinctly, 
 local explanations are 
 potentially useful. (Image: Wang et al 2016)

  22. Case-Based Explanations • Another way to generate 
 a post-hoc explanation might 
 be to retrieve labeled items 
 that are deemed similar 
 by the model • For some models, we can 
 retrieve histories from 
 similar patients 
 (Image: Mikolov et al 2014)

  23. Outline • What is interpretability ? • What are its desiderata? • What model properties confer interpretability? • Caveats, pitfalls, and takeaways

  24. Discussion Points • Linear models not strictly more interpretable than deep learning • Claims about interpretability must be qualified • Transparency may be at odds with the goals of AI • Post-hoc interpretations may potentially mislead

  25. Thanks! Acknowledgments: Zachary C. Lipton was supported by the Division of Biomedical Informatics at UCSD, via training grant (T15LM011271) from the NIH/NLM. Thanks to Charles Elkan, Julian McAuley, David Kale, Maggie Makar, Been Kim, Lihong Li, Rich Caruana, Daniel Fried, Jack Berkowitz, & Sepp Hochreiter References: The Mythos of Model Interpretability (ICML Workshop on Human Interpretability 2016) - ZC Lipton http://arxiv.org/abs/1511.03677 Directly Modeling Missing Data with RNNs (MLHC 2016) - ZC Lipton, DC Kale, R Wetzel http://arxiv.org/abs/1606.04130 
 Learning to Diagnose (ICLR 2016) - ZC Lipton, DC Kale, Charles Elkan, R Wetzel http://arxiv.org/abs/1511.03677 Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. ( 2015) - R Caruana et al 
 http://dl.acm.org/citation.cfm?id=2788613

Recommend


More recommend