Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector Hideomi Tsunakawa 1 , Yoshitaka Kameya 1 , Hanju Lee 2 , Yosuke Shinya 2 , and Naoki Mitsumoto 2 1 Department of Information Engineering, Meijo University 2 DENSO CORPORATION IJCNN-19 1
Outline • Background • Proposed method: CRP • Experiments IJCNN-19 2
Outline • Background • Proposed method: CRP • Experiments IJCNN-19 3
Background: SSD (1) • Object detection is a well-known task in computer vision • SSD (Single-Shot MultiBox Detector) [Liu+ ECCV-16] : – Known for its high speed and accuracy – Outputs: Classification • Confidences for classes Localization • Location offsets (center on x-axis, center on y-axis, width, height) Input: Output: IJCNN-19 4
Background: SSD (2) • SSD: – Based on a (large) single convolutional network – Layers for classification and layers for localization are connected from several convolutional layers Localization → Different resolutions Classification Input VGG-16 until Pool5 layer image Non-maximum suppression Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 300 38 19 19 10 Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv10_2 Conv11_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 Loc11 1 Conv: Conv: Conv: 1x1x256 Conv: 1x1x128 Conv: 1x1x128 Conv: 1x1x128 3x3x1024 Conv: 3x3x512-s2 Conv: 3x3x256-s1 1x1x1024 Conv: 3x3x256-s2 Conv: 3x3x256-s1 IJCNN-19 5
Background: LRP (1) • LRP (Layer-wise Relevance Propagation) [Bach+ 15] : – Often used for interpreting predictions of DNNs Output: Input: Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 300 38 19 19 10 Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 IJCNN-19 6
Background: LRP (1) • LRP (Layer-wise Relevance Propagation) [Bach+ 15] : – Often used for interpreting predictions of DNNs – Propagates relevance backward from the output to the input features – Creates a heatmap using relevance at the input features Output: Input: Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 300 Relevance Heatmap: 38 19 19 10 Cls9 5 to “dog” Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 Relevance propagation IJCNN-19 7
Background: LRP (2) • LRP is equipped with several propagation rules: – Common: Layer l Layer l + 1 ( l + 1) : distributed to lower units R j ( l ) := S j R i j R i R i j : passed through connection ( l + 1) R j R i j ( l ) R i IJCNN-19 8
Background: LRP (2) • LRP is equipped with several propagation rules: – Common: Layer l Layer l + 1 ( l + 1) : distributed to lower units R j ( l ) := S j R i j R i R i j : passed through connection ( l + 1) R j R i j ( l ) R i IJCNN-19 9
Background: LRP (2) • LRP is equipped with several propagation rules: – Common: Layer l Layer l + 1 ( l + 1) : distributed to lower units R j ( l ) := S j R i j R i R i j : passed through connection ( l + 1) R j R i j ( l ) R i IJCNN-19 10
Background: LRP (2) • LRP is equipped with several propagation rules: – Common: Layer l Layer l + 1 ( l + 1) : distributed to lower units R j ( l ) := S j R i j R i R i j : passed through connection ( l + 1) R j – Simple LRP: R i j ( l ) R i – -LRP: – -LRP: IJCNN-19 11
Background: Indistinguishable Heatmaps (1) • Heatmaps are almost invariant even when the target class has been changed • Heatmaps obtained with -LRP ( = 1 , = 0 ): Target class: “dog” Target class: “cat” (actually predicted) (“what - if” analysis) IJCNN-19 12
Background: Indistinguishable Heatmaps (2) • Relevance propagated in each layer: Relevance decreases exponentially IJCNN-19 13
Background: Indistinguishable Heatmaps (3) • Recent works that seem to support our observation: – [Adebayo+ NeurIPS-18]: • Uses Inception v3 (a large network) • If relevance = gradient input, the input part dominates → Heatmaps will be invariant (since the input is of course fixed) – [Ancona+ ICLR-18]: • Several methods tend to return similar heatmaps (theoretically or empirically): – Gradient input – DeepLIFT (Rescale) – Integrated Gradients – Simple LRP IJCNN-19 14
Background: Our Motivation • We introduce contrastive relevance that highlights the more important part to the target class Target class: “dog” Target class: “cat” • We design the meaning of relevance to be consistent in two heterogeneous tasks in SSD: – Classification – Localization (Regression) IJCNN-19 15
Outline ✓ Background • Proposed method: CRP • Experiments IJCNN-19 16
Contrastive Relevance Propagation (CRP) • CRP: LRP tailored for SSD – Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same A detected box Classification Cls4 Loc4 Layer Cls7 Loc7 Relevance Cls8 to class k Loc8 300 38 19 19 10 of interest Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 High-Level Feature Layer Low-Level Feature Layers IJCNN-19 17
Contrastive Relevance Propagation (CRP) • CRP: LRP tailored for SSD – Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same A detected box Cls4 Localization Loc4 Layer Cls7 Loc7 Relevance Cls8 Loc8 to shifting 300 38 19 19 10 Cls9 5 to right Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 High-Level Feature Layer Low-Level Feature Layers IJCNN-19 18
Contrastive Relevance Propagation (CRP) • CRP: LRP tailored for SSD – Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same Another detected box Cls4 Loc4 Cls7 Classification Loc7 Layer Cls8 Relevance Loc8 300 38 19 19 10 to class k’ Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 of interest 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 High-Level Feature Layer Low-Level Feature Layers IJCNN-19 19
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k class K IJCNN-19 20
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Initial Classification Relevance layer class 1 0 class k * 1 (target) class k 0 class K 0 IJCNN-19 21
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k We use w + -rule ( -LRP with = 1 , = 0 ) class K to find units that positively contribute to class k * IJCNN-19 22
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k class K At this moment, we can compute a class-specific relevance R i [ k *] for the target class k * by summing up the passed relevance IJCNN-19 23
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) We compute contrastive relevance class k class K “average relevance” over other classes to find units that make a significantly positive or a significantly negative contribution to the target class k * IJCNN-19 24
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k Until the input layer, we use w + -rule class K to distribute the positivity or the negativity of contrastive relevance (activations x i are non-negative due to ReLU) IJCNN-19 25
CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k Until the input layer, we use w + -rule class K to distribute the positivity or the negativity of contrastive relevance (activations x i are non-negative due to ReLU) IJCNN-19 26
CRP: Propagation Rules in Localization Low-level feature High-level feature layer layer Localization layer center on x-axis center on y-axis (target) width height IJCNN-19 27
CRP: Propagation Rules in Localization Low-level feature High-level feature layer layer Localization Initial layer Relevance center on x-axis 0 center on y-axis 1 (target) width 0 height 0 IJCNN-19 28
Recommend
More recommend