connecting the dots with landmarks
play

Connecting the Dots with Landmarks : Discriminatively Learning - PowerPoint PPT Presentation

Connecting the Dots with Landmarks : Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Kristen Grauman and Fei Sha The perils of mismatched


  1. Connecting the Dots with Landmarks : Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Kristen Grauman and Fei Sha

  2. The perils of mismatched domains e c fi f O e h T TRAIN TEST Poor cross-domain generalization Different underlying distributions Overfit to datasets’ idiosyncrasies Images from [Saenko et al.’10].

  3. Common to many areas Computer vision Text processing Speech recognition e c fi f O e h T Language modeling etc.

  4. Unsupervised domain adaptation Setup Source domain (with labeled data) Target domain (no labels for training)

  5. Unsupervised domain adaptation Setup Source domain (with labeled data) Target domain (no labels for training) Different distributions Objective Learn classifier to work well on the target

  6. Many existing works Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Huang et al., Bickel et al., ’07] [Sethy et al., ’06] [Shimodaira, ’00]

  7. Many existing works Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Huang et al., Bickel et al., ’07] [Sethy et al., ’06] [Shimodaira, ’00] [Evgeniou and Pontil, ’05] + - ++ [Duan et al., ’09] -- [Duan et al., Daumé III et al., Saenko et al., ’10] - [Kulis et al., Chen et al., ’11] Adjusting mismatched models

  8. Many existing works Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] Inferring [Gong et al., ’12] [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models

  9. Many existing works Correcting sampling bias + + + - - [This work] - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] Inferring [Gong et al., ’12] [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models

  10. Snags Forced adaptation Attempting to adapt all source data points, including “hard” ones Implicit discrimination Learning discrimination biased to source, rather than optimized w.r.t. target

  11. Our key insights Forced adaptation ➔ Select the best instances for adaptation Implicit discriminations ➔ Approximate discriminative loss on target

  12. Landmarks Landmarks are labeled source instances distributed similarly to the target domain.

  13. Landmarks Landmarks are labeled source instances distributed similarly to the target domain.

  14. Landmarks Landmarks are labeled source instances distributed similarly to the target domain.

  15. Landmarks Landmarks are labeled source instances distributed similarly to the target domain. Roles Ease adaptation difficulty Provide discrimination (biased to target)

  16. Key steps Coarse Source Landmarks Target Fine- grained 1 Identify landmarks at multiple scales. 10

  17. Key steps Construct auxiliary domain 2 adaptation tasks 11

  18. Key steps 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks 11

  19. Key steps 3 Obtain domain- invariant features 4 Construct auxiliary domain 2 adaptation tasks Predict target labels 11

  20. Key steps Coarse Source Landmarks Target Fine- grained 1 Identify landmarks at multiple scales. 12

  21. Identifying landmarks Objective ?

  22. Identifying landmarks Objective ?

  23. Identifying landmarks Objective ? ?

  24. Maximum mean discrepancy (MMD) Empirical estimate [Gretton et al. ’06] a universal RKHS kernel function induced by the l -th landmark (from the source domain)

  25. Method for identifying landmarks Integer programming where

  26. Method for identifying landmarks Convex relaxation

  27. Method for identifying landmarks Convex relaxation

  28. Method for identifying landmarks Convex relaxation

  29. How to choose the kernel functions? Gaussian kernels Plus: universal (characteristic) Minus: how to choose the bandwidth?

  30. How to choose the kernel functions? Gaussian kernels Plus: universal (characteristic) Minus: how to choose the bandwidth? Our solution: bandwidth---granularity Examining distributions at multiple granularities Multiple bandwidths, multiple sets of landmarks

  31. Other details Class balance constraint Recovering from (See paper for details)

  32. What do landmarks look like? Headphone Mug Target target Source 19

  33. What do landmarks look like? Headphone Mug Target target σ =2 6 0 σ =2 Source 19 -3 σ =2

  34. What do landmarks look like? Headphone Mug Target target σ =2 6 0 σ =2 Source 19 -3 σ =2 Unselected

  35. Key steps Construct auxiliary domain 2 adaptation tasks 20

  36. Constructing easier auxiliary tasks Source Landmarks Target At each scale σ Intuition: distributions are closer (cf. Theorem 1)

  37. Constructing easier auxiliary tasks New source Landmarks New target At each scale σ Intuition: distributions are closer (cf. Theorem 1)

  38. Auxiliary tasks new basis of features by a geodesic flow kernel (GFK) based method - Integrate out domain changes - Obtain domain-invariant representation [Gong, et al. ’12]

  39. Key steps Construct auxiliary domain 2 adaptation tasks 24

  40. Key steps 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks 24

  41. Combining features discriminatively Multiple kernel learning on the labeled landmarks Arriving at domain-invariant feature space Discriminative loss biased to the target

  42. Key steps 3 Obtain domain- invariant features Construct auxiliary domain 2 adaptation tasks 26

  43. Key steps 3 Obtain domain- invariant features 4 Construct auxiliary domain 2 adaptation tasks Predict target labels 26

  44. Experimental study Four vision datasets/domains on visual object recognition [Griffin et al. ’07, Saenko et al. 10’] e c fi f O e h T Four types of product reviews on sentiment analysis Books, DVD, electronics, kitchen appliances [Biltzer et al. ’07]

  45. Comparing with Correcting sampling bias + + + - - - - [Sethy et al., ’09] [Sugiyama et al., ’08] [Muandet et al., ’13] [Pan et al., ’09] [Huang et al., Bickel et al., ’07] Inferring [Gong et al., ’12] [Argyriou et al, ’08] [Sethy et al., ’06] [Chen et al., ’12] [Daumé III, ’07] domain- [Shimodaira, ’00] [Gopalan et al., ’11] [Blitzer et al., ’06] invariant [Evgeniou and Pontil, ’05] features + - ++ [Duan et al., ’09] -- -+ + [Duan et al., Daumé III et al., Saenko et al., ’10] - + - + - + + [Kulis et al., Chen et al., ’11] - Adjusting mismatched models

  46. Comparing with Correcting sampling bias + + + - - - - [Pan et al., ’09] [Huang et al., ’07] Inferring [Gong et al., ’12] domain- [Gopalan et al., ’11] [Blitzer et al., ’06] invariant features + - ++ -- -+ + - + - + - + + - Adjusting mismatched models

  47. e c fi f O Object recognition No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark 60 Acuracy (%) 45 30 15 0 A-->C A-->D C-->A C-->W W-->A W-->C

  48. e c fi f O Object recognition No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark 60 Acuracy (%) 45 30 15 0 A-->C A-->D C-->A C-->W W-->A W-->C

  49. e c fi f O Object recognition No adaptation Gopalan et al.'11 Pan et al.'09 GFK Landmark 60 Acuracy (%) 45 30 15 0 A-->C A-->D C-->A C-->W W-->A W-->C

  50. Sentiment analysis Pan et al.'09 Gopalan et al.'11 GFK Saenko et al. ’10 Blitzer et al.’06 Huang et al.’07 Landmark 85 80 Acuracy (%) 75 70 65 60 55 K-->D D-->B B-->E E-->K

  51. Auxiliary tasks easier to solve Empirical results on visual object recognition

  52. Auxiliary tasks easier to solve Empirical results on visual object recognition Original tasks

  53. Auxiliary tasks easier to solve Empirical results on visual object recognition Auxiliary tasks Original tasks

  54. Landmarks good proxy to target discrimination Non-landmarks Random selection Landmark 80 Acuracy (%) 66 53 39 25 A-->C A-->D A-->W C-->A C-->D C-->W W-->A W-->C W-->D

  55. Summary landmarks an intrinsic structure, shared between domains labeled source instances distributed similarly to the target auxiliary tasks provably easier to solve discriminative loss despite unlabeled target Outperformed the state-of-the-art

Recommend


More recommend