transferability vs discriminability
play

Transferability vs. Discriminability: Batch Spectral Penalization - PowerPoint PPT Presentation

Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation Xinyang Chen, Sinan Wang, Mingsheng Long, Jianmin Wang School of Software BNRist, Research Center for Big Data Tsinghua University


  1. Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation Xinyang Chen, Sinan Wang, Mingsheng Long, Jianmin Wang School of Software BNRist, Research Center for Big Data Tsinghua University International Conference on Machine Learning, 2019 X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 1 / 9

  2. Motivation Transfer Learning: Unsupervised Domain Adaptation Non-IID distributions P � = Q Only unlabeled data in target domain Source Domain Target Domain 2D Renderings Real Images P ( x , y ) ≠ Q ( x , y ) Model Representation Model f : x → y f : x → y X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 2 / 9

  3. Motivation Adversarial Domain Adaption Matching distributions across source and target domains s.t. P ≈ Q Adversarial adaptation: learning features indistinguishable across domains min F , G E ( F , G ) + γ dis P ↔ Q ( F , D ) (1) max dis P ↔ Q ( F , D ) , D We analysis features extracted by DANN 1 with a ResNet-50 2 pretrained on Imagenet i ) ∼ P L ( G ( F ( x s i )) , y s E ( F , G ) = E ( x s i ) i , y s i ∼ P log[ D ( f s dist P ↔ Q ( F , D ) = E x s i )] (2) i ∼ Q log[1 − D ( f t + E x t i )] 1 Ganin et al. Unsupervised domain adaptation by backpropagation. ICML ’15. 2 He et al. Deep residual learning for image recognition. CVPR ’15. X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 3 / 9

  4. Motivation Discriminability of Feature Representations Two key criteria that characterize the goodness of feature representations Transferability: the ability of feature to bridge the discrepancy across domains Discriminability: the easiness of separating different categories by a supervised classifier trained over the feature representations Discriminability of features extracted by DANN, worse discriminability is found: 0.35 ResNet-50 A to W 50 DANN W to A 0.30 A to D 40 0.25 Error rate max J(W) D to A 0.20 30 0.15 20 0.10 10 0.05 0 0.00 ResNet-50 DANN Source Target Average (a) max J ( W ) (b) Classification error rate Figure: Analysis of discriminability of feature: (a) LDA, (b) MLP. X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 4 / 9

  5. Method Why Discriminability Is Weakened? Corresponding Angles: corresponding angle is the angle between two eigenvectors corresponding to the same singular value index, which are equally important in their feature matrices. singular values corresponding angles corresponding angles 1.0 1.0 ResNet_source 0.60 ResNet ResNet ResNet_target DANN DANN DANN_source 0.9 0.9 0.55 DANN_target 0.50 Corresponding angle Corresponding angle 0.8 0.8 singular value 0.45 0.7 0.7 0.40 0.6 0.35 0.6 0.30 0.5 0.5 0.25 0.4 0.4 0.20 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 Index Index Index (a) σ (b) cos( ψ ) (c) cos( ψ ) Figure: SVD analysis. We compute (a) the singular values (normalized version); (b) corresponding angles (unnormalized version); (c) corresponding angles (normalized version). In normalized version we scale all values so that the largest value is 1. Only the eigenvectors with largest singular values tend to carry transferable knowledge X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 5 / 9

  6. Method BSP: Batch Spectral Penalization cross- G entropy ! " L SVD x BSP f batch batch L bsp SVD GRL F # $ binary D cross- entropy BSP combined with DANN to strengthen discriminability of feature min F , G E ( F , G ) + γ dis P ↔ Q ( F , D ) + β L bsp ( F ) (3) max dis P ↔ Q ( F , D ) , D k ( σ 2 s , i + σ 2 � L bsp ( F ) = t , i ) , (4) i =1 X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 6 / 9

  7. Experiments Results Table: Accuracy (%) on Office-31 for unsupervised domain adaptation Method A → W D → W W → D A → D D → A W → A Avg ResNet-50 68.4 ± 0.2 96.7 ± 0.1 99.3 ± 0.1 68.9 ± 0.2 62.5 ± 0.3 60.7 ± 0.3 76.1 DAN 80.5 ± 0.4 97.1 ± 0.2 99.6 ± 0.1 78.6 ± 0.2 63.6 ± 0.3 62.8 ± 0.2 80.4 DANN 82.0 ± 0.4 96.9 ± 0.2 99.1 ± 0.1 79.7 ± 0.4 68.2 ± 0.4 67.4 ± 0.5 82.2 JAN 85.4 ± 0.3 97.4 ± 0.2 99.8 ± 0.2 84.7 ± 0.3 68.6 ± 0.3 70.0 ± 0.4 84.3 GTA 89.5 ± 0.5 97.9 ± 0.3 99.8 ± 0.4 87.7 ± 0.5 72.8 ± 0.3 71.4 ± 0.4 86.5 CDAN 93.1 ± 0.2 98.2 ± 0.2 100.0 ± 0.0 89.8 ± 0.3 70.1 ± 0.4 68.0 ± 0.4 86.6 CDAN+E 94.1 ± 0.1 98.6 ± 0.1 100.0 ± 0.0 92.9 ± 0.2 71.0 ± 0.3 69.3 ± 0.3 87.7 BSP+DANN (Proposed) 93.0 ± 0.2 98.0 ± 0.2 100.0 ± 0.0 90.0 ± 0.4 71.9 ± 0.3 73.0 ± 0.3 87.7 BSP+CDAN (Proposed) 93.3 ± 0.2 98.2 ± 0.2 100.0 ± 0.0 93.0 ± 0.2 73.6 ± 0.3 72.6 ± 0.3 88.5 X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 7 / 9

  8. Experiments Analysis singular values corresponding angles corresponding angles 1.0 1.0 ResNet_source ResNet ResNet 0.60 ResNet_target DANN DANN DANN_source BSP+DANN 0.9 BSP+DANN 0.9 DANN_target 0.55 BSP+DANN_source BSP+DANN_target Corresponding angle 0.50 Corresponding angle 0.8 0.8 singular value 0.45 0.7 0.7 0.40 0.6 0.35 0.6 0.30 0.5 0.5 0.25 0.4 0.4 0.20 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 Index Index Index (a) σ (b) cos( ψ ) (c) cos( ψ ) 2.0 0.35 ResNet-50 ResNet-50 DANN DANN 0.30 1.8 BSP+DANN BSP+DANN 0.25 A-distance Error rate 1.6 0.20 0.15 1.4 0.10 1.2 0.05 0.00 1.0 Source Target Average A->W D->W (d) Classification error (e) A-distance X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 8 / 9

  9. Summary Thanks! Poster: tonight at Pacific Ballroom #256 X. Chen et al. (Tsinghua Univ.) BSP: Batch Spectral Penalization June 12, 2019 9 / 9

Recommend


More recommend