discrepancy for unsupervised domain adaptation
play

discrepancy for unsupervised domain adaptation Hongliang Yan - PowerPoint PPT Presentation

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA Training Test Problem: (Source (Target) Training and test sets are related but under


  1. Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21

  2. Domain Adaptation DA Training Test Problem: (Source ) (Target) Training and test sets are related but under different distributions. Methodology: • Learn feature space that combine discriminativeness and domain invariance . minimize source error + domain discrepancy Figure 1. Illustration of dataset bias. [1]https://cs.stanford.edu/~jhoffman/domainadapt/

  3. Maximum Mean Discrepancy (MMD) • representing distances between distributions as distances between mean embeddings of features     2 2 s t MMD ( , ) sup || [ (x )] [ (x )]|| s t E E s t H x ~ x ~ s t   || || 1 H • An empirical estimate M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j

  4. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j

  5. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c

  6. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c

  7. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c Figure 2. Class prior distribution of three digit recognition datasets.

  8. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H   1 1 c c with class prior distribution   s t and w M M w N N c c c c

  9. Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H   1 1 c c with class prior distribution   s t and w M M w N N c c c c MMD can be minimized by either learning domain invariant representation or preserving the class weights in source domain.

  10. Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain  • Introducing an auxiliary weight for each class c in source domain c M t 1 1       2 s t 2 MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j   t s w w c c c C C      s s t t 2 || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c

  11. Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain  • Introducing an auxiliary weight for each class c in source domain c M t 1 1   1 M 1 t       2 s t 2 MMD ( D D , ) || (x ) (x ) ||      2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N s w s t i j H y   M N 1 1 i j i   1 1 i j   t s w w c c c C C C C           2 t s t t s s t t 2 || ( (x )) ( (x )) || || ( (x )) ( (x )) || w c E w E w E w E c c c c H c c c H     1 1 c c 1 1 c c

  12. Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

  13. Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l w , s t  M W,   i 1 l { ,..., l l } 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

  14. Weighted DAN 1. Replace MMD with Weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L 2. To further exploit the unlabeled data in target domain, empirical risk is considered as semi-supervised model in [5]: M N 1 1        ˆ s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t  ˆ N M N W,{ } , y     1 j j 1 1 { ,..., } i j l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015. [5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence . IOS Press, 2002.

  15. Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • E-step:  t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j   t t t ( | x ) (x ,W) p y c g j j j [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  16. Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • E-step:  t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j   t t t ( | x ) (x ,W) p y c g j j j • C-step:   ˆ ˆ ① Assign the pseudo labels on target domain: t N t t t { } arg max ( | x ) y y p y c  j j 1 j j j  c ② update the auxiliary class-specific weights for source domain:   1   ˆ ˆ ˆ t s t t where ( ) w w w y N c c c c c j j is an indictor function which equals 1 if x = c , and equals 0 otherwise. ( ) 1 c x [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  17. Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • M-step:  ˆ t N Fixed and , updating W . The problem is reformulated as: { } y  1 j j M N 1        s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t M W    1 1 { ,..., } i j l l l 1 L The gradient of the three items is computable and W can be optimized by using a mini-batch SGD. [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

  18. Experimental results • Comparison with state-of-the-arts Table 1. Experimental results on office-10+Caltech-10

  19. Experimental results • Empirical analysis Figure 3. Performance of various model Figure 4. Visualization of the learned features of DAN and weighted DAN. under different class weight bias.

  20. Summary • Introduce class-specific weight into MMD to reduce the effect of class weight bias cross domains. • Develop WDAN model and optimize it in an CEM framework. • Weighted MMD can be applied to other scenarios where MMD is used for distribution distance measurement, e.g., image generation

Recommend


More recommend