sanity checks for saliency maps
play

Sanity Checks for Saliency Maps Julius Adebayo *+ , Justin Gilmer # , - PowerPoint PPT Presentation

Sanity Checks for Saliency Maps Julius Adebayo *+ , Justin Gilmer # , Michael Muelly # , Ian Goodfellow # , Moritz Hardt ^ # , Been Kim # * Work was done during the Google AI residency program, + MIT, ^ UC Berkeley, # Google Brain. Interpretability


  1. Sanity Checks for Saliency Maps Julius Adebayo *+ , Justin Gilmer # , Michael Muelly # , Ian Goodfellow # , Moritz Hardt ^ # , Been Kim # * Work was done during the Google AI residency program, + MIT, ^ UC Berkeley, # Google Brain.

  2. Interpretability To use machine learning more responsibly .

  3. Investigating post-training interpretability methods. Given a fixed model, find the evidence of prediction . � 3

  4. Investigating post-training interpretability methods. A trained machine learning model (e.g., neural network) Junco Bird-ness Given a fixed model, find the evidence of prediction . Why was this a Junco bird? � 4

  5. One of the most popular techniques: Saliency maps A trained machine learning model (e.g., neural network) Junco Bird-ness The promise: these pixels are the Caaaaan do! evidence of prediction. � 5

  6. Sanity check question. A trained machine learning model (e.g., neural network) Junco Bird-ness The promise: these pixels are the evidence of prediction. � 6

  7. Sanity check question. A trained machine learning model (e.g., neural network) Junco Bird-ness The promise: these pixels are the evidence of If so, when prediction changes, the explanation should change. prediction. Extreme case: If prediction is random, the explanation should REALLY change. � 7

  8. Sanity check: When prediction changes, do explanations change? Saliency map

  9. Sanity check: When prediction changes, do explanations change? Saliency map Randomized weights! Network now makes garbage predictions.

  10. Sanity check: When prediction changes, do explanations change? Saliency map !!!!!???!? Randomized weights! Network now makes garbage predictions.

  11. Sanity check: When prediction changes, do explanations change? Saliency map !!!!!???!? Randomized weights! Network now makes garbage predictions. the evidence of prediction?????

  12. Sanity check1: When prediction changes, do explanations change? No! Before After Backprop Guided Integrated Gradient

  13. Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Networks trained with…. � 13

  14. Conclusion • Confirmation bias : Just because it “makes sense” to humans, doesn’t mean it reflects the evidence for prediction. • Do sanity checks for your interpretability methods! (e.g., TCAV [K. et al ’18]) • Others who independently reached the same conclusions: [Nie, Zhang, Patel ’18] [Ulyanov, Vedaldi, Lempitsky ’18] • Some of these methods have been shown to be useful for humans. Why? More studies needed. Poster #30 10:45am - 12:45pm @Room 210 � 14

Recommend


More recommend