Dual-Gradients Localization framework for Weakly Supervised Object Localization Chuangchuang Tan 1* , Tao Ruan 1* , Guanghua Gu 2 , Shikui Wei 1 , Yao Zhao 1 1 Beijing Jiaotong University 2 Yanshan University 1
Target o Weakly Supervised Object Localization (WSOL) o WSOL is understanding an image at pixel level only using image-level annotations o use much cheaper annotations Beijing Jiaotong University 2
WSOL o Steps of previous works : o Force classification network to focus on more regions of feature map. o Produce localization map on the last convolutional layer by applying CAM. o Problem: o ignore the localization ability of other layers. o Both localization and classification tasks are I can produce WSOL, too trained online Beijing Jiaotong University 3
Dual-Gradients Localization(DGL) framework o Main ideas : o Utilize gradients of classification loss function to mine entire target object regions . o Leverage gradients of target class to identify the correlation ratio of pixels to the target class within any convolutional feature maps o Characteristics o Simple, DGL is a offline approach, needn’t to train for localization. o Effective, achieving localization on any convolutional layer. Source image Mixed_6f Mixed_6e Beijing Jiaotong University 4
Overview of the DGL framework Classification model … GAP … FC 𝑇 𝑗 𝑇 𝑜 𝑇 𝑜−1 softmax Cross-entropy loss 𝜖𝐾(p, 𝛽𝑧 𝑑 ) 𝜖𝑧 𝐷 𝜖𝑇 𝜖𝑇 𝑇 𝑜 𝑚 2 𝑜𝑝𝑠𝑛𝑝𝑚𝑗𝑨𝑓 … 𝑚 2 𝑜𝑝𝑠𝑛𝑝𝑚𝑗𝑨𝑓 𝑡𝑣𝑛 𝑏𝑜𝑒 𝑠𝑓𝑡𝑗𝑨𝑓 ⨂ ⊝ E nhanced map 𝑇 𝑗 Class-aware Enhanced Map Branch Pixel-level Selection Branch Feature Maps Localization Maps Beijing Jiaotong University 5
Classification model Classification model … GAP … FC 𝑇 𝑗 𝑇 𝑜 𝑇 𝑜−1 softmax Cross-entropy loss o Classification model architecture : o use a customized InceptionV3, i.e. SPG-plain. o remove the layers after the second Inception block, i.e., the third Inception block, pooling and linear layer. o add two convolutional layers o add a GAP layer and a softmax layer Beijing Jiaotong University 6
Class-aware Enhanced Map Branch o feature maps predicted to class c only capture the discrimination parts of objects, when the feature maps close the boundary of classification regions o the feature maps located at center of classification regions can highlight more object regions 𝜖cost(p, 𝛽𝑧 𝑑 ) 𝜖𝑇 𝑇 𝑜 𝑚 2 𝑜𝑝𝑠𝑛𝑝𝑚𝑗𝑨𝑓 𝑚 2 𝑜𝑝𝑠𝑛𝑝𝑚𝑗𝑨𝑓 ⊝ E nhanced map A 𝑇 𝑗 Class-aware Enhanced Map Branch Feature Maps Beijing Jiaotong University 7
Class-aware Enhanced Map Branch o our key idea of Class-aware Enhanced Map is pulling the feature maps toward inside of the classification region for specific-class, along with gradients of classification loss function. 𝜖cost(p, 𝛽𝑧 𝑑 ) 𝜖𝑇 𝑇 𝑜 𝑚 2 𝑜𝑝𝑠𝑛𝑝𝑚𝑗𝑨𝑓 𝑚 2 𝑜𝑝𝑠𝑛𝑝𝑚𝑗𝑨𝑓 ⊝ E nhanced map A 𝑇 𝑗 Class-aware Enhanced Map Branch Feature Maps Beijing Jiaotong University 8
Pixel-level Selection Branch o Is gradients or weights? o CAM actually achieves localization by employing a weighted sum of feature maps and gradients of target class on the last convolutional layer, instead of weights of the final FC layer. o Pixel-level Selection is a generalization to CAM. 𝜖𝑧 𝐷 𝜖𝑇 … 𝑡𝑣𝑛 𝑏𝑜𝑒 𝑠𝑓𝑡𝑗𝑨𝑓 ⨂ E nhanced map A Pixel-level Selection Branch Beijing Jiaotong University 9
Results on the Validation Set of LID MS: Multi-scale inputs during test MC: Morph close the localization map during test MS MC mIoU ✘ ✘ 58.23 ✔ ✘ 61.46 ✔ ✔ 62.22 o Fusion the localization maps of branch1 and branch2 on Mixed_6e layer. o Input size 324 Beijing Jiaotong University 10
Qualitative Results o Examples of DGL on test set Beijing Jiaotong University 11
Thanks Beijing Jiaotong University 12
Recommend
More recommend