Neural Information Processing Systems (NeurIPS) 2018 Recurrent Transformer Networks for Semantic Correspondence Seungryong Kim 1 , Stepthen Lin 2 , Sangryul Jeon 1 , Dongbo Min 3 , Kwanghoon Sohn 1 Dec. 05, 2018 1) 2) 3)
In Introduction Semantic Correspondence • Establishing dense correspondences between semantically similar images , i.e., different instances within the same object or scene categories • For example, the wheels of two different cars, the body of people or animals Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 2
Introduction In Challenges in Semantic Correspondence ? Pho hotometric De Deformations Ge Geometric De Deformations Lack of Lac of Sup Supervis ision • Intra-class appearance and • Different viewpoint or baseline • Labor-intensive of annotation • Non-rigid shape deformations • Degraded by subjectivity attribute variations • Etc. • Etc. • Etc. Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 3
Problem Formulation Objective 𝐔 𝑗 = 𝐁 𝑗 , 𝐠 𝑗 𝑗 ′ = T 𝑗 𝑗 𝑗 How to estimate locally-varying affine transformation fields without ground-truth supervision? Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 4
Background Methods for Geometric Invariance in Feature Extraction Step Spatial Transformer Networks (STNs)-based methods [Jaderberg et al., NeurIPS’15] ∗ 𝐁 𝑗 is learned wo/ 𝐁 𝑗 ∗ But, 𝐠 𝑗 is learned w/ 𝐠 𝑗 Geometric inference based on only source or target image • UCN [Choy et al. , NeurIPS’16 ] • CAT-FCSS [Kim et al. , TPAMI’18] • Etc. Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 5
Background Methods for Geometric Invariance in Regularization Step ∗ 𝐔 𝑗 is learned wo/ 𝐔 𝑗 using self- or meta-supervision Geometric Inference using source/target images Globally-varying geometric Inference only Only fixed, untransformed versions of the features • GMat. [Rocco et al. , CVPR’17] • GMat. w/Inl. [Rocco et al. , CVPR’18] • Etc. Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 6
Recurrent Transformer Networks (RTNs) Networks Configuration • To weaves the advantages of STN-based methods and geometric matching methods by recursively estimating geometric transformation residuals using geometry-aligned feature activations Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 7
Recurrent Transformer Networks (RTNs) Feature Extraction Networks • Input images 𝐽 𝑡 and 𝐽 𝑢 are passed through Siamese convolution networks with parameters 𝐗 𝐺 such that 𝐸 𝑗 = 𝐺 𝐽 𝐗 𝐺 • Using CAT-FCSS, VGGNet (conv4-4), ResNet (conv4-23) Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 8
Recurrent Transformer Networks (RTNs) Recurrent Geometric Matching Networks • Constraint correlation volume construction 𝑡 , 𝐸 𝑢 (𝐔 𝑘 )) =< 𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑘 ) >/ < 𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑘 ) > 2 𝐷(𝐸 𝑗 Source Target Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 9
Recurrent Transformer Networks (RTNs) Recurrent Geometric Matching Networks • Recurrent geometric inference 𝑙 − 𝐔 𝑗 𝑙−1 = 𝐺(𝐷(𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑗 𝑙−1 ))|𝐗 𝐻 ) 𝐔 𝑗 Source Target Iter. 1 Iter. 2 Iter. 3 Iter. 4 Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 10
Recurrent Transformer Networks (RTNs) Weakly-supervised Learning Intuition : matching score between the source 𝐸 𝑡 at each pixel 𝑗 and the target • 𝐸 𝑢 (𝐔 𝑗 ) should be maximized while keeping the scores of other candidates low • Loss Function : 𝑡 , 𝐸 𝑢 𝐔 ∗ log(𝑞(𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑘 ))) 𝑀 𝐸 𝑗 = − 𝑞 𝑘 𝑘∈𝑁 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑘 )) is a Softmax probability where the function 𝑞(𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑘 ))) exp(𝐷(𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑘 )) = 𝑞(𝐸 𝑗 𝑡 , 𝐸 𝑢 (𝐔 𝑚 ))) σ 𝑚∈𝑁 𝑗 exp(𝐷(𝐸 𝑗 ∗ denotes a class label defined as 1 if 𝑘 = 𝑗 , 0 otherwise where 𝑞 𝑘 Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 11
Experimental Results Results on the TSS Benchmark Source Target SCNet GMat. w/Inl. RTNs [Han et al. , ICCV’17] [Rocco et al. , CVPR’18] images images Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 12
Experimental Results Results on the PF-PASCAL Benchmark Source Target SCNet GMat. w/Inl. RTNs [Han et al. , ICCV’17] [Rocco et al. , CVPR’18] images images Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 13
Experimental Results Results on the PF-PASCAL Benchmark Source Target SCNet GMat. w/Inl. RTNs [Han et al. , ICCV’17] [Rocco et al. , CVPR’18] images images Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 14
Concluding Remarks • RTNs learn to infer locally-varying geometric fields for semantic correspondence in an end-to-end and weakly-supervised fashion • The key idea is to utilize and iteratively refine the transformations and convolutional activations through matching between the image pair • A technique is presented for weakly-supervised training of RTNs Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018 15
Thank you! See you at 210 & 230 AB #119 Seungryong Kim, Ph.D. Digital Image Media Lab. Yonsei University, Seoul, Korea Tel: +82-2-2123-2879 E-mail: srkim89@yonsei.ac.kr Homepage: http://diml.yonsei.ac.kr/~srkim/
Recommend
More recommend