Explaining Deep Neural Networks with a Polynomial Time Algorithm for - PowerPoint PPT Presentation

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation Marco Ancona , Cengiz Öztireli 2 , Markus Gross 1,2 1 Department of Computer Science, ETH Zurich, Switzerland 2 Disney Research, Zurich, Switzerland

… TARGET Pre-trained model … Attribution method

Attribution methods Simple occlusion Gradient * Input Saliency Maps Zeiler et al. 2014 Shrikumar et al. 2016 Simonyan et al. 2015 Layer-wise Relevance Meaningful Perturbation Integrated Gradients Propagation (LRP) Fong et al. 2017 Sundararajan et al. 2017 Bach et al. 2015 Guided Prediction Difference DeepLIFT Backpropagation Analysis Shrikumar et al. 2017 Springenberg et al. 2014 Zintgraf et al. 2017 Grad-CAM KernelSHAP/DeepSHAP LIME Selvaraju et al. 2016 Lundberg et al., 2017 Ribeiro et al. 2016 …

Evaluating attribution methods • No ground-truth explanation à not easy to evaluate empirically

Evaluating attribution methods • No ground-truth explanation à not easy to evaluate empirically • Often based on heuristics à not easy to justify theoretically

Evaluating attribution methods • No ground-truth explanation à not easy to evaluate empirically • Often based on heuristics à not easy to justify theoretically “Axiomatic approach” From a set of desired properties to the method definition

(Some) desirable properties Completeness Attributions should sum up to the output of the function being considered, for comprehensive accounting. Symmetry If two features have exactly the same role in the model, they should receive the same attribution. Linearity Attributions generated for a linear combination of two models should also be a linear combination of the original attributions. Continuity Attributions for two nearly identical inputs on a continuous function should be nearly identical.

Shapley Values Shapley, Lloyd S., 1953 The only attribution method that satisfies all the aforementioned properties.

Shapley Values Shapley, Lloyd S., 1953 The function to analyze (eg. the map from the input layer to a specific output neuron in a DNN)

Shapley Values Shapley, Lloyd S., 1953 S is a given set of input features

Shapley Values Shapley, Lloyd S., 1953

Shapley Values Shapley, Lloyd S., 1953 All unique subsets S of features taken from the input (set) P

Shapley Values Shapley, Lloyd S., 1953 average

Shapley Values Shapley, Lloyd S., 1953 average marginal contribution

Shapley Values Shapley, Lloyd S., 1953 all subsets average marginal contribution

Shapley Values Shapley, Lloyd S., 1953 all subsets average marginal contribution “The average marginal contribution of a feature with respect to all subsets of other features”

Shapley Values Shapley, Lloyd S., 1953 Issue : testing all subsets is unfeasible!

Shapley value sampling Castro et al., 2009 0.16

Shapley value sampling Castro et al., 2009 0.16 0.10

Shapley value sampling Castro et al., 2009 0.16 0.10 0.25

Shapley value sampling Castro et al., 2009 0.16 0.10 0.25 -0.35

Pros : Shapley value sampling is unbiased

Pros : Shapley value sampling is unbiased Cons : might require a lot of samples (network evaluations) to produce an accurate result

Pros : Shapley value sampling is unbiased Cons : might require a lot of samples (network evaluations) to produce an accurate result Can we avoid sampling?

Shapley value sampling

Deep Approximate Shapley Propagation

Deep Approximate Shapley Propagation ReLU

Deep Approximate Shapley Propagation k out of N Features on ReLU

Deep Approximate Shapley Propagation k out of N Features on ReLU “Rectified” Normal Distribution

Deep Approximate Shapley Propagation 0 0 0 0 0 0

Deep Approximate Shapley Propagation To propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Gast et al., 2018

Deep Approximate Shapley Propagation To propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Gast et al., 2018 Affine transformation Rectified Linear Unit Leaky Rectified Linear Unit Mean pooling Max pooling … The use of other probabilistic frameworks is also possible

DASP vs other methods ü (Very) fast Gradient-based methods ✗ Poor Shapley Value estimation DASP ü Unbiased Shapley Value estimator Sampling-based methods ✗ Slow

For details, come at the poster Lightweight Probabilistic Deep Network (Keras) github.com/marcoancona/LPDN Pacific Ballroom #63 Deep Approximate Shapley Propagation github.com/marcoancona/DASP Thank you References Lloyd S. Shapley, A value for n-person games, 1952 Castro et al., Polynomial calculation of the Shapley value based on sampling, 2009 Fatima et al., A linear approximation method for the Shapley value, 2014 Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classifier, 2016 Sundararajan et al., Axiomatic attribution for deep networks, 2017 Shrikumar at al., Learning important features through propagating activation differences, 2017 Lundberg et al., A Unified Approach to Interpreting Model Predictions, 2017 Gast et al., Lightweight Probabilistic Deep Networks, 2018

Explaining Deep Neural Networks with a Polynomial Time Algorithm for - PowerPoint PPT Presentation

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation Marco Ancona , Cengiz ztireli 2 , Markus Gross 1,2 1 Department of Computer Science, ETH Zurich, Switzerland 2 Disney Research, Zurich, Switzerland

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Explaining Deep Learning Predictions and Isaac Ahern Integrating Domain Ontologies Outline

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

On the Expressive Power of Deep Neural Networks Maithra Raghu, Ben Poole, Jon Kleinberg, Surya

Weight Parameterizations in Deep Neural Networks Sergey Zagoruyko e Paris-Est, Universit

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Polynomial-time reductions We have seen several reductions: Polynomial-time reductions Informal

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

Function Pointers Refined Memory Model 1 The C0 Memory Model so far Local Memory Allocated

14.581 International Trade Lecture 16: Gravity Models (Theory) 14.581 Week 9 Spring

IT and economies of scale in the Media Business @Mediahuis Koen Vandaele Peter Soetens

9/25/2015 Department of Large Animal Sciences Introduction Global food demands increase ->

Statistical Learning [RN2 Sec 20.1-20.2] [RN3 Sec 20.1-20.2] CS 486/686 University of Waterloo

USQCD Propagator Formats C. DeTar University of Utah ILDG 14 June 2009 ILDG 14: June 5, 2009

Instance-based Method for Post-hoc Interpretability: a Local Approach Thibault Laugel LIP6 -

Computing Posterior Probabilities CSE 4308/5360: Artificial Intelligence I University of Texas

Explaining Deep Neural Networks with a Polynomial Time Algorithm for - PowerPoint PPT Presentation

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation Marco Ancona , Cengiz ztireli 2 , Markus Gross 1,2 1 Department of Computer Science, ETH Zurich, Switzerland 2 Disney Research, Zurich, Switzerland

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Explaining Deep Learning Predictions and Isaac Ahern Integrating Domain Ontologies Outline

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

On the Expressive Power of Deep Neural Networks Maithra Raghu, Ben Poole, Jon Kleinberg, Surya

Weight Parameterizations in Deep Neural Networks Sergey Zagoruyko e Paris-Est, Universit

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Polynomial-time reductions We have seen several reductions: Polynomial-time reductions Informal

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

Function Pointers Refined Memory Model 1 The C0 Memory Model so far Local Memory Allocated

14.581 International Trade Lecture 16: Gravity Models (Theory) 14.581 Week 9 Spring

IT and economies of scale in the Media Business @Mediahuis Koen Vandaele Peter Soetens

9/25/2015 Department of Large Animal Sciences Introduction Global food demands increase -&gt;

Statistical Learning [RN2 Sec 20.1-20.2] [RN3 Sec 20.1-20.2] CS 486/686 University of Waterloo

USQCD Propagator Formats C. DeTar University of Utah ILDG 14 June 2009 ILDG 14: June 5, 2009

Instance-based Method for Post-hoc Interpretability: a Local Approach Thibault Laugel LIP6 -

Computing Posterior Probabilities CSE 4308/5360: Artificial Intelligence I University of Texas

9/25/2015 Department of Large Animal Sciences Introduction Global food demands increase ->