The 2nd Learning from Imperfect Data (LID) Workshop Revisiting Class Activation Mapping for Learning from Imperfect Data Wonho Bae *, Junhyug Noh*, Jinhwan Seo, and Gunhee Kim
Challenge Results 1 st place Track 3: Weakly Supervised Object Localization 2 nd place Track 1: Weakly Supervised Semantic Segmentation 2
Weakly-Supervised Object Localization monkey Input Output 3
Class Activation Mapping (CAM) 1 2 3 ⋯ = ! ",$ .: 012345 ! ',$ GAP CNN … … ! *,$ 6 789 , CAM (Class Activation Maps) ! ",$ ∗ + ! ',$ ∗ + ! (,$ ∗ + ⋯ + ! *,$ ∗ = , " , ' , ( + $ , - resize > @ ABC > + $ D $ localization result 4
Class Activation Mapping (CAM) 1 2 3 ⋯ = ! ",$ .: 012345 ! ',$ GAP CNN … … ! *,$ 6 789 , CAM (Class Activation Maps) ! ",$ ∗ + ! ',$ ∗ + ! (,$ ∗ + ⋯ + ! *,$ ∗ = , " , ' , ( + $ , - resize > @ ABC > + $ D $ localization result 5
Class Activation Mapping (CAM) for Track 3 1 2 3 ⋯ $ ! !,# *: ,-./01 ! $,# GAP CNN … … ! %,# # 234 " CAM (Class Activation Maps) ! !,# ∗ + ! $,# ∗ + ! %,# ∗ + ⋯ + ! &,# ∗ = " & " ' " ( ! % " ) resize > # &'( 5 ! % $ % localization result 6
How to Grasp Whole Object Region? [HaS] Singh, et al. ICCV 2017 [AE] Wei, et al. CVPR 2017 7 [ACoL] Zhang, et al. CVPR 2018 [ADL] Choe, et al. CVPR 2019
Our Approach Motivation • • Information to capture the whole area of the object already exists in feature maps Problem • • Three modules (M1 – M3) of CAM do not take phenomena (P1 – P3) into account • It results in the localization being limited to small discriminative regions of an object Solution • • Correctly utilize the information by simply modifying the three modules Phenomena observed in the feature map ( , ) M1: Global Average Pooling ( GAP) P1: 1 2 3 ⋯ = ! ",$ .: 012345 ! ',$ GAP P2: CNN … … ! *,$ P3: 6 789 , M2: Class Activation Maps ( CAM) M3: Thresholding resize > @ ABC ! ",$ ∗ + ! ',$ ∗ + ! (,$ ∗ + ⋯ + ! *,$ ∗ = , " , ' , ( , - + $ > + $ D $ localization result
Our Approach (1) Thresholded Average Pooling Problem : Global Average Pooling (GAP) under P1 • Phenomena observed in the feature map ( , ) M1: Global Average Pooling ( GAP) P1: 1 2 3 ⋯ = ! ",$ .: 012345 ! ',$ GAP CNN … … ! *,$ 6 789 , M2: Class Activation Maps ( CAM) M3: Thresholding resize > @ ABC ! ",$ ∗ + ! ',$ ∗ + ! (,$ ∗ + ⋯ + ! *,$ ∗ = , " , ' , ( + $ , - > + $ D $ localization result 9
Our Approach (1) Thresholded Average Pooling Problem : Global Average Pooling (GAP) under P1 • $ ! # ! " 10
Our Approach (1) Thresholded Average Pooling Problem : Global Average Pooling (GAP) under P1 • # $,& GAP # $,& ∗ = ⋮ 2.5 (= 0.04 ) " ( $ (max: 64.7 ) ⋮ = 0.100 + 0.099 + ⋯ GAP # # ',& ∗ ',& = ⋮ 9.9 (= 0.01 ) * +,- ( ' (max: 59.2 ) Classification phase Localization phase 11
Our Approach (1) Thresholded Average Pooling Problem : Global Average Pooling (GAP) under P1 • Solution: Thresholded Average Pooling (TAP) • 12
Our Approach (2) Negative Weight Clamping Problem : Class Activation Maps (CAM) under P2 • Phenomena observed in the feature map ( , ) M1: Global Average Pooling ( GAP) 1 2 3 ⋯ = ! ",$ .: 012345 ! ',$ GAP P2: CNN … … ! *,$ 6 789 , M2: Class Activation Maps ( CAM) M3: Thresholding resize > @ ABC ! ",$ ∗ + ! ',$ ∗ + ! (,$ ∗ + ⋯ + ! *,$ ∗ = , " , ' , ( + $ , - > + $ D $ localization result
Our Approach (2) Negative Weight Clamping Problem : Class Activation Maps (CAM) under P2 • − = Both Positive only Negative only 14
Our Approach (2) Negative Weight Clamping Problem : Class Activation Maps (CAM) under P2 • IoA between the ground truth boxes and the CAMs Positive weights Negative weights 15
Our Approach (2) Negative Weight Clamping Problem : Class Activation Maps (CAM) under P2 • Solution : Negative Weight Clamping (NWC) • 16
Our Approach (3) Percentile as a Thresholding Standard Problem: Maximum as a Standard (MaS) under P3 • Phenomena observed in the feature map ( , ) M1: Global Average Pooling ( GAP) 1 2 3 ⋯ = ! ",$ .: 012345 ! ',$ GAP CNN … … ! *,$ P3: 6 789 , M2: Class Activation Maps ( CAM) M3: Thresholding resize > @ ABC ! ",$ ∗ + ! ',$ ∗ + ! (,$ ∗ + ⋯ + ! *,$ ∗ = , " , ' , ( + $ , - > + $ D $ localization result 17
Our Approach (3) Percentile as a Thresholding Standard Problem: Maximum as a Standard (MaS) under P3 • threshold ( ! !"# ) 100 − percentile (%) threshold ( ! !"# ) 100 − percentile (%) Num of channels Result with CAM CAM values (descending order) (activation > " !.# ) 18
Our Approach (3) Percentile as a Thresholding Standard Problem: Maximum as a Standard (MaS) under P3 • Solution: Percentile as a Standard (PaS) • 19
Experimental Setting Backbone: ResNet50-SE • Batch size: 210 • Input size: 384 × 384 • Random crop size: 336 × 336 • TAP threshold ( 𝜐 !"# ): 0.05 • PaS percentile ( 𝑗 ): 98 • 20
Results on Validation Set Results with different components • To preserve the details of masks, we also applied a fully connected CRF. • • The performance gradually improves as each component is added. 21
Leaderboard Track 3: Weakly Supervised Object Localization • 22
Qualitative Results CAM + Ours CAM + Ours CAM + Ours CAM + Ours
Expansion to Track 1 24
Expansion to Track 1 Our target! 25
Class Activation Mapping (CAM) for Track 1 1 2 3 ⋯ $ ! !,# *: ,-./01 ! $,# GAP CNN … … ! %,# # 234 " CAM (Class Activation Maps) ! !,# ∗ + ! $,# ∗ + ! %,# ∗ + ⋯ + ! &,# ∗ = " & " ' " ( ! % " ) resize > # &'( 5 ! % $ % localization result 26
Leaderboard Track 1: Weakly Supervised Semantic Segmentation • 27
Thank You!
Recommend
More recommend