Connecting the Dots with Landmarks : Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Kristen Grauman and Fei Sha
The perils of mismatched domains e c fi f O e h T TRAIN TEST Poor cross-domain generalization Different underlying distributions Overfit to datasets’ idiosyncrasies Images from [Saenko et al.’10].
Common to many areas Computer vision Text processing Speech recognition e c fi f O e h T Language modeling etc.
Unsupervised domain adaptation Setup Source domain (with labeled data) Target domain (no labels for training)
Unsupervised domain adaptation Setup Source domain (with labeled data) Target domain (no labels for training) Different distributions Objective Learn classifier to work well on the target
Many existing works Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Huang et al., Bickel et al., ’07] [Sethy et al., ’06] [Shimodaira, ’00]
Many existing works Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Huang et al., Bickel et al., ’07] [Sethy et al., ’06] [Shimodaira, ’00] [Evgeniou and Pontil, ’05] + - ++ [Duan et al., ’09] -- [Duan et al., Daumé III et al., Saenko et al., ’10] - [Kulis et al., Chen et al., ’11] Adjusting mismatched models
Many existing works Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] Inferring [Gong et al., ’12] [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models
Many existing works Correcting sampling bias + + + - - [This work] - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] Inferring [Gong et al., ’12] [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models
Snags Forced adaptation Attempting to adapt all source data points, including “hard” ones Implicit discrimination Learning discrimination biased to source, rather than optimized w.r.t. target
Our key insights Forced adaptation ➔ Select the best instances for adaptation Implicit discriminations ➔ Approximate discriminative loss on target
Landmarks Landmarks are labeled source instances distributed similarly to the target domain.
Landmarks Landmarks are labeled source instances distributed similarly to the target domain.
Landmarks Landmarks are labeled source instances distributed similarly to the target domain.
Landmarks Landmarks are labeled source instances distributed similarly to the target domain. Roles Ease adaptation difficulty Provide discrimination (biased to target)
Key steps Coarse Source Landmarks Target Fine- grained 1 Identify landmarks at multiple scales. 10
Key steps Construct auxiliary domain 2 adaptation tasks 11
Key steps 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks 11
Key steps 3 Obtain domain- invariant features 4 Construct auxiliary domain 2 adaptation tasks Predict target labels 11
Key steps Coarse Source Landmarks Target Fine- grained 1 Identify landmarks at multiple scales. 12
Identifying landmarks Objective ?
Identifying landmarks Objective ?
Identifying landmarks Objective ? ?
Maximum mean discrepancy (MMD) Empirical estimate [Gretton et al. ’06] a universal RKHS kernel function induced by the l -th landmark (from the source domain)
Method for identifying landmarks Integer programming where
Method for identifying landmarks Convex relaxation
Method for identifying landmarks Convex relaxation
Method for identifying landmarks Convex relaxation
How to choose the kernel functions? Gaussian kernels Plus: universal (characteristic) Minus: how to choose the bandwidth?
How to choose the kernel functions? Gaussian kernels Plus: universal (characteristic) Minus: how to choose the bandwidth? Our solution: bandwidth---granularity Examining distributions at multiple granularities Multiple bandwidths, multiple sets of landmarks
Other details Class balance constraint Recovering from (See paper for details)
What do landmarks look like? Headphone Mug Target target Source 19
What do landmarks look like? Headphone Mug Target target σ =2 6 0 σ =2 Source 19 -3 σ =2
What do landmarks look like? Headphone Mug Target target σ =2 6 0 σ =2 Source 19 -3 σ =2 Unselected
Key steps Construct auxiliary domain 2 adaptation tasks 20
Constructing easier auxiliary tasks Source Landmarks Target At each scale σ Intuition: distributions are closer (cf. Theorem 1)
Constructing easier auxiliary tasks New source Landmarks New target At each scale σ Intuition: distributions are closer (cf. Theorem 1)
Auxiliary tasks new basis of features by a geodesic flow kernel (GFK) based method - Integrate out domain changes - Obtain domain-invariant representation [Gong, et al. ’12]
Key steps Construct auxiliary domain 2 adaptation tasks 24
Key steps 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks 24
Combining features discriminatively Multiple kernel learning on the labeled landmarks Arriving at domain-invariant feature space Discriminative loss biased to the target
Key steps 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks 26
Key steps 3 Obtain domain- invariant features 4 Construct auxiliary domain 2 adaptation tasks Predict target labels 26
Experimental study Four vision datasets/domains on visual object recognition [Griffin et al. ’07, Saenko et al. 10’] e c fi f O e h T Four types of product reviews on sentiment analysis Books, DVD, electronics, kitchen appliances [Biltzer et al. ’07]
Comparing with Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] Inferring [Gong et al., ’12] [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models
Comparing with Correcting sampling bias + + + - - - - [Pan et al., ’09] [Huang et al., ’07] Inferring [Gong et al., ’12] domain- [Gopalan et al., ’11] [Blitzer et al., ’06] invariant features + - ++ -- -+ + - + - + - + + - Adjusting mismatched models
e c fi f O Object recognition No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark 60 Acuracy (%) 45 30 15 0 A-->C A-->D C-->A C-->W W-->A W-->C
e c fi f O Object recognition No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark 60 Acuracy (%) 45 30 15 0 A-->C A-->D C-->A C-->W W-->A W-->C
e c fi f O Object recognition No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark 60 Acuracy (%) 45 30 15 0 A-->C A-->D C-->A C-->W W-->A W-->C
Sentiment analysis Pan et al.'09 Gopalan et al.'11 GFK Saenko et al. ’10 Blitzer et al.’06 Huang et al.’07 Landmark 85 80 Acuracy (%) 75 70 65 60 55 K-->D D-->B B-->E E-->K
Auxiliary tasks easier to solve Empirical results on visual object recognition
Auxiliary tasks easier to solve Empirical results on visual object recognition Original tasks
Auxiliary tasks easier to solve Empirical results on visual object recognition Auxiliary tasks Original tasks
Landmarks good proxy to target discrimination Non-landmarks Random selection Landmark 80 Acuracy (%) 66 53 39 25 A-->C A-->D A-->W C-->A C-->D C-->W W-->A W-->C W-->D
Summary landmarks an intrinsic structure, shared between domains labeled source instances distributed similarly to the target auxiliary tasks provably easier to solve discriminative loss despite unlabeled target Outperformed the state-of-the-art
Recommend
More recommend