Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21
Domain Adaptation DA Training Test Problem: (Source ) (Target) Training and test sets are related but under different distributions. Methodology: • Learn feature space that combine discriminativeness and domain invariance . minimize source error + domain discrepancy Figure 1. Illustration of dataset bias. [1]https://cs.stanford.edu/~jhoffman/domainadapt/
Maximum Mean Discrepancy (MMD) • representing distances between distributions as distances between mean embeddings of features 2 2 s t MMD ( , ) sup || [ (x )] [ (x )]|| s t E E s t H x ~ x ~ s t || || 1 H • An empirical estimate M t 1 1 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j
Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j
Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j C C 2 s s t t || ( (x )) ( (x )) || w E w E c c c c H 1 1 c c s t and w M M w N N c c c c
Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1 Effect of class weight bias should be removed: 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j ① Changes in sample selection criteria C C 2 s s t t || ( (x )) ( (x )) || w E w E c c c c H 1 1 c c s t and w M M w N N c c c c
Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1 Effect of class weight bias should be removed: 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j ① Changes in sample selection criteria C C 2 s s t t || ( (x )) ( (x )) || w E w E c c c c H 1 1 c c s t and w M M w N N c c c c Figure 2. Class prior distribution of three digit recognition datasets.
Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1 Effect of class weight bias should be removed: 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j ① Changes in sample selection criteria C C 2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H 1 1 c c with class prior distribution s t and w M M w N N c c c c
Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1 Effect of class weight bias should be removed: 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j ① Changes in sample selection criteria C C 2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H 1 1 c c with class prior distribution s t and w M M w N N c c c c MMD can be minimized by either learning domain invariant representation or preserving the class weights in source domain.
Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain • Introducing an auxiliary weight for each class c in source domain c M t 1 1 2 s t 2 MMD ( D D , ) || (x ) (x ) || s t i j H M N 1 1 i j t s w w c c c C C s s t t 2 || ( (x )) ( (x )) || w E w E c c c c H 1 1 c c
Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain • Introducing an auxiliary weight for each class c in source domain c M t 1 1 1 M 1 t 2 s t 2 MMD ( D D , ) || (x ) (x ) || 2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N s w s t i j H y M N 1 1 i j i 1 1 i j t s w w c c c C C C C 2 t s t t s s t t 2 || ( (x )) ( (x )) || || ( (x )) ( (x )) || w c E w E w E w E c c c c H c c c H 1 1 c c 1 1 c c
Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W 1 { ,..., } i l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W 1 { ,..., } i l l l 1 L 1 M s s l l min (x , ;W) MMD ( D D , ) y i i l w , s t M W, i 1 l { ,..., l l } 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
Weighted DAN 1. Replace MMD with Weighted MMD item in DAN[4]: 1 M s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W 1 { ,..., } i l l l 1 L 2. To further exploit the unlabeled data in target domain, empirical risk is considered as semi-supervised model in [5]: M N 1 1 ˆ s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t ˆ N M N W,{ } , y 1 j j 1 1 { ,..., } i j l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015. [5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence . IOS Press, 2002.
Optimization: an extension of CEM[6] ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y 1 j j The model is optimized by alternating between three steps : • E-step: t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j t t t ( | x ) (x ,W) p y c g j j j [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
Optimization: an extension of CEM[6] ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y 1 j j The model is optimized by alternating between three steps : • E-step: t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j t t t ( | x ) (x ,W) p y c g j j j • C-step: ˆ ˆ ① Assign the pseudo labels on target domain: t N t t t { } arg max ( | x ) y y p y c j j 1 j j j c ② update the auxiliary class-specific weights for source domain: 1 ˆ ˆ ˆ t s t t where ( ) w w w y N c c c c c j j is an indictor function which equals 1 if x = c , and equals 0 otherwise. ( ) 1 c x [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
Optimization: an extension of CEM[6] ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y 1 j j The model is optimized by alternating between three steps : • M-step: ˆ t N Fixed and , updating W . The problem is reformulated as: { } y 1 j j M N 1 s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t M W 1 1 { ,..., } i j l l l 1 L The gradient of the three items is computable and W can be optimized by using a mini-batch SGD. [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
Experimental results • Comparison with state-of-the-arts Table 1. Experimental results on office-10+Caltech-10
Experimental results • Empirical analysis Figure 3. Performance of various model Figure 4. Visualization of the learned features of DAN and weighted DAN. under different class weight bias.
Summary • Introduce class-specific weight into MMD to reduce the effect of class weight bias cross domains. • Develop WDAN model and optimize it in an CEM framework. • Weighted MMD can be applied to other scenarios where MMD is used for distribution distance measurement, e.g., image generation
Recommend
More recommend