Machine Learning Lecture 09: Explainable AI (II) Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set of notes is based on internet resources and references listed at the end. Nevin L. Zhang (HKUST) Machine Learning 1 / 73
Pixel-Level Explanations Outline 1 Pixel-Level Explanations 2 Feature-Level Explanations LIME SHAP Values 3 Concept-Level Explanations TCAV ACE 4 Instance-Level Explanations Counterfactual Explanations Nevin L. Zhang (HKUST) Machine Learning 2 / 73
Feature-Level Explanations Outline 1 Pixel-Level Explanations 2 Feature-Level Explanations LIME SHAP Values 3 Concept-Level Explanations TCAV ACE 4 Instance-Level Explanations Counterfactual Explanations Nevin L. Zhang (HKUST) Machine Learning 3 / 73
Feature-Level Explanations Features In this lecture, features refer to Super-pixels in images data obtained by standard image segmentation algorithms such as SLIC (Achanta et al . 2012). Presence or absence of words in text data. Input variables in tabular data. Nevin L. Zhang (HKUST) Machine Learning 4 / 73
Feature-Level Explanations Interpretable Data Representations The original representation of an image x is a tensor of pixels. A simplified/interpretable representation z x is a binary vector over the M super-pixels, whose components are all 1’s. For sparse binary vector z ∈ { 0 , 1 } M correspond to an image that consists of a subset of the super-pixels. It is denoted as x z . The original and interpretable representations of text are related in a similar manner. Nevin L. Zhang (HKUST) Machine Learning 5 / 73
Feature-Level Explanations LIME LIME (Ribeiro et al . 2016) LIME stands for Local Interpretable Model-Agnostic Explanations. It is for explaining a binary classifier. f ( x ): Probability of input x belonging to the class. LIME explains f ( x ) using an surrogate model g ( z ), a linear model or a decision tree on the simplified features. The surrogate model should be faithful to f ( . ) in the neighborhood of x . Ideally, g ( z x ) = f ( x ), and g ( z ) ≈ f ( x z ) if x z is close to z . Nevin L. Zhang (HKUST) Machine Learning 7 / 73
Feature-Level Explanations LIME LIME (Ribeiro et al . 2016) The surrogate model g is determined by minimizing: ξ ( x ) = arg min g ∈ G L ( f , g , π x ) + Ω( g ) G is a family of surrogate models. L ( f , g , π x ) measures how unfaithful g ( z ) is in approximating f ( x z ) in the neighborhood of x . Ω( g ) is a penalty for model complexity. Nevin L. Zhang (HKUST) Machine Learning 8 / 73
Feature-Level Explanations LIME LIME with Sparse Linear Model Researchers often use sparse linear model for the surrogate model g ( z ) = w ⊤ z In this case, Ω( g ) is the number of non-zero weights. And � π x ( z )( f ( x z ) − g ( z )) 2 L ( f , g , π x ) = z : samples around z x where π x ( z ) = exp( − D ( z x , z ) /σ ) is a proximity measure between x and x z , and D is L2 distance for images, and cosine distance for text. L ( f , g , π x ) + Ω( g ) is minimized using K -LASSO to ensure that no more than K super-pixels are used in the explanation. Nevin L. Zhang (HKUST) Machine Learning 9 / 73
Feature-Level Explanations LIME LIME Intuition Although a model might be very complex globally, it can still be faithfully approximated using a linear model locally. Nevin L. Zhang (HKUST) Machine Learning 10 / 73
Feature-Level Explanations LIME LIME Example Nevin L. Zhang (HKUST) Machine Learning 11 / 73
Feature-Level Explanations LIME LIME Example LIME reveals that the classification is based on the wrong reasons. Nevin L. Zhang (HKUST) Machine Learning 12 / 73
Feature-Level Explanations LIME LIME Evaluation (Ribeiro et al . 2018) Local Faithfulness: Train interpretable models f with a small number of features, and check to see how many of them are considered important in g by LIME. Remove some features and see how well prediction changes with g match those with f . Interpretability: How much can LIME explanations help users choose a better model. How much can LIME explanation help users improve a classifier by removing features that do not generalize. Nevin L. Zhang (HKUST) Machine Learning 13 / 73
Feature-Level Explanations LIME Anchors (Ribeiro et al . 2018) LIME does not allow human to predict model behaviour on unseen instances. Anchors : Another model-agnostic explanation method. It gives a rule that sufficiently anchors the prediction locally such that changes to the rest of the feature values of the instance do not matter. In other words, for instances on which the anchor holds, the prediction is (almost) always the same. Nevin L. Zhang (HKUST) Machine Learning 14 / 73
Feature-Level Explanations SHAP Values The Shapley values (Wikipedia) The Shapley value is a solution concept in cooperative game theory. It was named in honor of Lloyd Shapley, who introduced it in 1951 and won the Nobel Prize in Economics for it in 2012. To each cooperative game it assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. The Shapley value is characterized by a collection of desirable properties. Nevin L. Zhang (HKUST) Machine Learning 16 / 73
Feature-Level Explanations SHAP Values Cooperative Games ( Knight 2020) A characteristic function game : f : 2 { 1 ,..., M } → R For any subset S of players, f ( S ) is their payoff if they act as a coalition. Question : What is the fair way to divide the total payoff f ( { 1 , . . . , M } ) Example : 3 persons share a taxi. Here are the costs for each individual journey: Person 1: 6 Person 2: 12 Person 3: 42 How much should each individual contribute? Nevin L. Zhang (HKUST) Machine Learning 17 / 73
Feature-Level Explanations SHAP Values Cooperative Games: Example Define a set function f ( S ) S { 1 } 6 { 2 } 12 { 3 } 42 { 1, 2 } 12 { 1, 3 } 42 { 2, 3 } 42 { 1, 2, 3 } 42 Question: How to divide the total cost 42 among the three persons? Nevin L. Zhang (HKUST) Machine Learning 18 / 73
Feature-Level Explanations SHAP Values The Shapley Values A fair way to divide the total payoff f ( { 1 , . . . , M } ) to the players is to use the Shapley values: The Shapley Value for player i is: 1 � ∆ f φ i ( f ) = π ( i ) M ! π : permutation of 1 � [ f ( S i π ∪ i ) − f ( S i = π )] M ! π : permutation of { 1, . . . , M } S π is the set of predecessors of i in π , ∆ f π ( i ) is the marginal contribution of player i w.r.t π . The Shapley values satisfy: M � φ i ( f ) = f ( { 1 , . . . , M } ) i =1 Nevin L. Zhang (HKUST) Machine Learning 19 / 73
Feature-Level Explanations SHAP Values The Shapley Values: Example ∆ f ∆ f ∆ f π π (1) π (2) π (3) { 1, 2, 3 } 6 6 30 { 1, 3, 2 } 6 0 36 { 2, 1, 3 } 0 12 30 { 2, 3, 1 } 0 12 30 { 3, 1, 2 } 0 0 42 { 3, 2, 1 } 0 0 42 φ i ( f ) 2 5 35 Nevin L. Zhang (HKUST) Machine Learning 20 / 73
Feature-Level Explanations SHAP Values Use of Shapley Values in XAI (Lundberg and Lee 2017) Consider explaining the prediction f ( x ) of a complex model f on an input x . We regard each feature i as a player. Their joint “payoff” is f ( x ). How do we divide the “payoff” f ( x ) among the features? Answer: Shapley values. To apply Shapley values, we need a set function f x : 2 { 1 ,..., M } → R . Obviously, f x ( { 1 , . . . , M } ) = f ( x ). How about f x ( S ) for a proper subset S of { 1 , . . . , M } ? Nevin L. Zhang (HKUST) Machine Learning 21 / 73
Feature-Level Explanations SHAP Values Use of Shapley Values in XAI Given an input x , x S denotes the subset of feature values for features in S . Define f x ( S ) using conditional expectation: f x ( S ) = E x ′ [ f ( x ′ ) | x ′ S = x S ] To estimate f x ( S ), sample a set of input examples x 1 , . . . , x N such that x i s = x s , and set: N f x ( S ) ≈ 1 � f ( x i ) N i =1 Another way is to independently sample x 1 S , . . . , x N S for features not in S , and ¯ ¯ N f x ( S ) ≈ 1 � f ( x S , x i S ) ¯ N i =1 Nevin L. Zhang (HKUST) Machine Learning 22 / 73
Feature-Level Explanations SHAP Values Use of Shapley Values in XAI Yet another (seemingly more common) way is to set f x ( S ) ≈ f ( x S , x r S ) ¯ where x r S are reference values for features not in S : ¯ For images, researchers usually use 0 (black) as the reference values. For tabular data, the reference values can be data mean, data median, representative training example, etc. (Watzman 2020). Nevin L. Zhang (HKUST) Machine Learning 23 / 73
Feature-Level Explanations SHAP Values Use of Shapley Values in XAI Given a set function f x : 2 { 1 ,..., M } → R , the Shapley value for feature i is: φ i ( f , x ) = 1 � [ f x ( S i π ∪ i ) − f x ( S i π )] M ! π : permutation of { 1, . . . , M } where S π is the set of predecessors of i in π . φ 0 = f ( x ) − � M i =1 φ i ( f , x ) is the base value , the value of f when no feature is present, i.e., f x ( ∅ ). Additive Feature Attribution : M � f ( x ) = φ 0 + φ i ( f , x ) i =1 φ i ( f , x ) is the contribution of feature i to f ( x ). It is called the SHapley Additive exPlanation (SHAP) value of i . Nevin L. Zhang (HKUST) Machine Learning 24 / 73
Recommend
More recommend