implicit class conditioned domain alignment for
play

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain - PowerPoint PPT Presentation

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation 1,2 1,4 Xiang Jiang Qicheng Lao 1,3 1 Stan Matwin Mohammad Havaei 1 Imagia 2 Dalhousie University 3 Polish Academy of Sciences 4 Mila, Universit e de Montr


  1. Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation 1,2 1,4 Xiang Jiang Qicheng Lao 1,3 1 Stan Matwin Mohammad Havaei 1 Imagia 2 Dalhousie University 3 Polish Academy of Sciences 4 Mila, Universit´ e de Montr´ eal June 13, 2020 Implicit Alignment for UDA June 13, 2020 1 / 32

  2. Introduction: Unsupervised Domain Adaptation Introduction: Unsupervised Domain Adaptation (UDA) The setup of UDA: predict observed variable X labeling function f , labels Y = f ( X ) domain variable D scanner The goal is to learn p ( y | x ) where predict D S = { ( x i , f S ( x i )) } n i =1 D T = { x j } m j =1 disease image f S = f T Implicit Alignment for UDA June 13, 2020 2 / 32

  3. ✶ Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Implicit Alignment for UDA June 13, 2020 3 / 32

  4. ✶ Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Limitation : p S ( x ) = p T ( x ) � p S ( x | y ) = p T ( x | y ) Implicit Alignment for UDA June 13, 2020 3 / 32

  5. Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Limitation : p S ( x ) = p T ( x ) � p S ( x | y ) = p T ( x | y ) Prototype-based class-conditioned explicit alignment [Luo et al., 2017, Xie et al., 2018]: min L ( D S ) + λ 1 dis ( D S , D T ) + λ 2 L explicit (3) θ max dis ( D S , D T ) (4) f where S − c j T ] L explicit = E [ c j (5) S = 1 � c j ✶ { y i = j } f φ ( x i ) (6) N j ( x i , y i ) ∈D S Implicit Alignment for UDA June 13, 2020 3 / 32

  6. Related Work Related Work Adversarial domain-discriminator based approaches [Ganin et al., 2016]: min L ( D S ) + λ dis ( D S , D T ) (1) θ max dis ( D S , D T ) (2) f Limitation : p S ( x ) = p T ( x ) � p S ( x | y ) = p T ( x | y ) Prototype-based class-conditioned explicit alignment [Luo et al., 2017, Xie et al., 2018]: min L ( D S ) + λ 1 dis ( D S , D T ) + λ 2 L explicit (3) θ max dis ( D S , D T ) (4) f where S − c j T ] L explicit = E [ c j (5) S = 1 � c j ✶ { y i = j } f φ ( x i ) (6) N j ( x i , y i ) ∈D S Limitation : Error accumulation in explicit optimization on pseudo-labels Implicit Alignment for UDA June 13, 2020 3 / 32

  7. Motivation Motivations Applied motivation Theoretical motivation Implicit Alignment for UDA June 13, 2020 4 / 32

  8. Motivation Applied Motivation Challenges for applying UDA in real-world applications [Tan et al., 2019]: within-domain class imbalance; between-domain class distribution shift, aka, prior probability shift. ������������������� ������ � ����������������������������� � ��������������������������������������� � ���������������������������� ������ Implicit Alignment for UDA June 13, 2020 5 / 32 �

  9. Motivation Theoretical Motivation: Empirical Domain Divergence Definition ([Ben-David et al., 2010]) The H ∆ H divergence between two domains is defined as | E D T [ h � = h ′ ] − E D S [ h � = h ′ ] | , d H ∆ H ( D S , D T ) = 2 sup (7) h , h ′ ∈H Definition (mini-batch based empirical domain discrepancy) Let B S , B T be minibatches from U S and U T , respectively, where B S ⊆ U S , B T ⊆ U T , and |B S | = |B T | . The empirical estimation of d H ∆ H ( B S , B T ) over the minibatches B S , B T is defined as � � � � ˆ � [ h � = h ′ ] − � [ h � = h ′ ] d H ∆ H ( B S , B T ) = sup � . (8) � � � � h , h ′ ∈H � B T B S Implicit Alignment for UDA June 13, 2020 6 / 32

  10. Motivation Theoretical Motivation: The Decomposition Theorem (The decomposition of ˆ d H ∆ H ( B S , B T )) We define three disjoint sets on the label space: Y C := Y S ∩ Y T , Y S := Y S − Y C , and Y T := Y T − Y C . We also define the following disjoint sets on the input space where B C S := { x ∈ B S | y ∈ Y C } , B C ∈ Y C } , B C S := { x ∈ B S | y / T := { x ∈ B T | y ∈ Y C } , ∈ Y C } . The empirical ˆ B C T := { x ∈ B T | y / d H ∆ H ( B S , B T ) divergence can be decomposed into as the following: � � ˆ � ξ C ( h , h ′ ) + ξ C ( h , h ′ ) d H ∆ H ( B S , B T ) = sup � , (9) � � h , h ′ ∈H where ξ C ( h , h ′ ) = � h � = h ′ � � h � = h ′ � � � ✶ − ✶ , (10) B C B C T S ξ C ( h , h ′ ) = � h � = h ′ � � h � = h ′ � � � − . (11) ✶ ✶ B C B C T S Implicit Alignment for UDA June 13, 2020 7 / 32

  11. Input samples � � � � 3 4 5 6 Label space � � � � � � shortcut shortcut Domain discriminator 1 0 (source, target) domain discriminator (source, target) domain discriminator 3 shortcut 6 3 shortcut ( , ) 6 ( , ) 3 6 goal Motivation goal ( , ) 4 4 ( , ) Theoretical Motivation: Domain-Discriminator Shortcut (source, target) domain discriminator (source, target) domain discriminator 3 shortcut 3 6 shortcut 6 ( , ) 3 6 Misaligned: ( , ) 3 6 goal goal ( , ) 4 4 4 4 Aligned: ( , ) Remark (The domain discriminator shortcut) Let f c be a classifier that maps x to a class label y c . Let f d be a domain discriminator that maps x to a binary domain label y d . For the empirical class-misaligned divergence ξ C ( h , h ′ ) with sample x ∈ B C S ∪ B C T , there exists a domain discriminator shortcut function � 1 f c ( x ) ∈ Y S f d ( x ) = (12) 0 f c ( x ) ∈ Y T , such that the domain label can be solely determined by the domain-specific class labels. (More pronounced under imbalance and distribution shift.) Implicit Alignment for UDA June 13, 2020 8 / 32

  12. Proposed Approach Proposed Approach 𝑞 - (𝑦) 𝑞 - 𝑦 𝑧 𝑞(𝑧) 𝑞(𝑨|𝑦; 𝜚) 𝑞(𝑧 * |𝑨 * ; 𝜄) 𝑞 * (𝑦) 𝑞 * 𝑦 , 𝑧 𝑞(𝑧) 𝑧 * , pseudo-labels sampling implicit domain-invariant data classifier alignment representations (a) (b) (c) (d) For p S ( x ), we sample x ∼ p S ( x | y ) p ( y ) based on the alignment distribution p ( y ) For p T ( x ), we sample a class aligned minibatch x ∼ p T ( x | ˆ y ) p ( y ) using identical p ( y ), with the help of pseudo-labels ˆ y T Implicit Alignment for UDA June 13, 2020 9 / 32

  13. Proposed Approach Proposed Approach 1: Input: dataset S = { ( x i , y i ) } N i =1 , T = { x i } M i =1 , label space Y , label alignment distribution p ( y ), 2: classifier f c ( · ; θ ) 3: 4: while not converged do # predict pseudo-labels for T 5: ˆ y i ) } M T ← { ( x i , ˆ i =1 where x i ∈ T and ˆ y i = f c ( x i ; θ ) 6: # sample N unique classes in the label space 7: Y ← draw N samples in Y from p ( y ) 8: # sample K examples conditioned on each y j ∈ Y 9: for y j in Y do 10: ( X ′ S , Y ′ S ) � draw K samples in S from p S ( x | y = y j ) 11: T � draw K samples in ˆ X ′ T from p T ( x | ˆ y = y j ) 12: end for 13: # domain adaptation training on this minibatch 14: train minibatch ( X ′ S , Y ′ S , X ′ T ) 15: 16: end while Implicit Alignment for UDA June 13, 2020 10 / 32

  14. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Implicit Alignment for UDA June 13, 2020 11 / 32

  15. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Provides balanced training across all classes; 2 Implicit Alignment for UDA June 13, 2020 11 / 32

  16. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Provides balanced training across all classes; 2 Removes the need to optimize model parameters from pseudo-labels explicitly; 3 Implicit Alignment for UDA June 13, 2020 11 / 32

  17. Proposed Approach Advantages of the proposed approach Minimizes the class-misaligned divergence ξ C ( h , h ′ ), providing a more reliable 1 empirical estimation of domain divergence; Provides balanced training across all classes; 2 Removes the need to optimize model parameters from pseudo-labels explicitly; 3 Simple to implement and is orthogonal to different domain discrepancy 4 measures: DANN and MDD. Implicit Alignment for UDA June 13, 2020 11 / 32

Recommend


More recommend