discrepancy for unsupervised domain adaptation Hongliang Yan - PowerPoint PPT Presentation

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21

Domain Adaptation DA Training Test Problem: (Source ） (Target) Training and test sets are related but under different distributions. Methodology: • Learn feature space that combine discriminativeness and domain invariance . minimize source error + domain discrepancy Figure 1. Illustration of dataset bias. [1]https://cs.stanford.edu/~jhoffman/domainadapt/

Maximum Mean Discrepancy (MMD) • representing distances between distributions as distances between mean embeddings of features     2 2 s t MMD ( , ) sup || [ (x )] [ (x )]|| s t E E s t H x ~ x ~ s t   || || 1 H • An empirical estimate M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j

Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j

Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1       2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c

Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c

Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c   s t and w M M w N N c c c c Figure 2. Class prior distribution of three digit recognition datasets.

Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H   1 1 c c with class prior distribution   s t and w M M w N N c c c c

Motivation • Class weight bias cross domains remains unsolved but ubiquitous M t 1 1   Effect of class weight bias should be removed:     2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j ① Changes in sample selection criteria C C      2 s s t t || ( (x )) ( (x )) || w E w E ② Applications are not concerned c c c c H   1 1 c c with class prior distribution   s t and w M M w N N c c c c MMD can be minimized by either learning domain invariant representation or preserving the class weights in source domain.

Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain  • Introducing an auxiliary weight for each class c in source domain c M t 1 1       2 s t 2 MMD ( D D , ) || (x ) (x ) || s t i j H M N   1 1 i j   t s w w c c c C C      s s t t 2 || ( (x )) ( (x )) || w E w E c c c c H   1 1 c c

Weighted MMD Main idea: reweighting classes in source domain so that they have the same class weights as target domain  • Introducing an auxiliary weight for each class c in source domain c M t 1 1   1 M 1 t       2 s t 2 MMD ( D D , ) || (x ) (x ) ||      2 2 s t MMD ( D D , ) || (x ) (x ) || s t i j H M N s w s t i j H y   M N 1 1 i j i   1 1 i j   t s w w c c c C C C C           2 t s t t s s t t 2 || ( (x )) ( (x )) || || ( (x )) ( (x )) || w c E w E w E w E c c c c H c c c H     1 1 c c 1 1 c c

Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

Weighted DAN 1. Replace MMD with weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l w , s t  M W,   i 1 l { ,..., l l } 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

Weighted DAN 1. Replace MMD with Weighted MMD item in DAN[4]: 1 M     s s l l min (x , ;W) MMD ( D D , ) y i i l s t M W   1 { ,..., } i l l l 1 L 2. To further exploit the unlabeled data in target domain, empirical risk is considered as semi-supervised model in [5]: M N 1 1        ˆ s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t  ˆ N M N W,{ } , y     1 j j 1 1 { ,..., } i j l l l 1 L [4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015. [5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence . IOS Press, 2002.

Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • E-step:  t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j   t t t ( | x ) (x ,W) p y c g j j j [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • E-step:  t t ( | x ) p y c Fixed W , estimating the class posterior probability of target samples: j j   t t t ( | x ) (x ,W) p y c g j j j • C-step:   ˆ ˆ ① Assign the pseudo labels on target domain: t N t t t { } arg max ( | x ) y y p y c  j j 1 j j j  c ② update the auxiliary class-specific weights for source domain:   1   ˆ ˆ ˆ t s t t where ( ) w w w y N c c c c c j j is an indictor function which equals 1 if x = c , and equals 0 otherwise. ( ) 1 c x [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

Optimization: an extension of CEM[6]  ˆ Parameters to be estimated including three parts, i.e., t N W, ,{ } y  1 j j The model is optimized by alternating between three steps : • M-step:  ˆ t N Fixed and , updating W . The problem is reformulated as: { } y  1 j j M N 1        s s t t l l min (x , ;W) (x , ;W) MMD ( D D , ) y y , i i i i l w s t M W    1 1 { ,..., } i j l l l 1 L The gradient of the three items is computable and W can be optimized by using a mini-batch SGD. [7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

Experimental results • Comparison with state-of-the-arts Table 1. Experimental results on office-10+Caltech-10

Experimental results • Empirical analysis Figure 3. Performance of various model Figure 4. Visualization of the learned features of DAN and weighted DAN. under different class weight bias.

Summary • Introduce class-specific weight into MMD to reduce the effect of class weight bias cross domains. • Develop WDAN model and optimize it in an CEM framework. • Weighted MMD can be applied to other scenarios where MMD is used for distribution distance measurement, e.g., image generation

discrepancy for unsupervised domain adaptation Hongliang Yan - PowerPoint PPT Presentation

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA Training Test Problem: (Source (Target) Training and test sets are related but under

Discrepancy and SDPs Nikhil Bansal (TU Eindhoven, Netherlands ) Outline Discrepancy Theory

Unsupervised Domain Adaptation Based on Source-guided Discrepancy 23 th Sep. Han Bao (The

Constructive Discrepancy Minimization for Convex Sets Thomas Rothvoss UW Seattle Discrepancy

Discrepancy of Random Set Systems Rebecca Hoberg and Thomas Rothvo Discrepancy theory Set

Flow Cytometry Data Assessment Flow Cytometry Data Assessment with L2 Discrepancy Learning with

The discrepancy of the linear flow on the torus Bence Borda Alfr ed R enyi Institute of

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems Stephen

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Towards Assumption-free Unsupervised Domain Adaptation for Visual recognition

Domain Adaptation with Asymmetrically Relaxed Distribution Alignment Yifan Wu , Ezra Winston,

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Unsupervised learning of multimodal image registration using domain adaptation with projected

Discrepancy Theory and Applications to Bin Packing Thomas Rothvoss Joint work with Becca Hoberg

Lower Bounds for L 1 Discrepancy Armen Vagharshakyan Brown University January 10, 2013 Armen

On some sets with minimal L 2 discrepancy Dmitriy Bilyk University of South Carolina, Columbia,

Announcements Course survey See link on Piazza Fitting: Please respond by Wed 2/21

Geometric Rank Functions and Rational Points on Curves Eric Katz (University of Waterloo) joint

New INCOMPASS observations of the monsoon over southern India with Doug Parker, Andrew Turner,

Graph Colouring with Distances J AN VAN DEN H EUVEL Department of Mathematics London School of

Pr t ss t

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Search for Latent Variables ICA, Tensors, and NMF Pierre Comon, Christian Jutten GIPSA-lab

Evaluating the Impact of Word Embeddings on Similarity Scoring for Practical Information