What is domain adaptation? - PDF document

✠ A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation? �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 1

✁ � Example: named entity recognition persons, locations, organizations, etc. train test (labeled) (unlabeled) standard NER supervised learning 85.5% Classifier New York Times New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Example: named entity recognition persons, locations, organizations, etc. train test (labeled) (unlabeled) non-standard NER (realistic) setting 64.1% Classifier labeled data not available New York Times Reuters New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 2

✂ � ✁ Domain difference performance drop train test ideal setting NER 85.5% NYT NYT Classifier New York Times New York Times realistic setting NER 64.1% NYT Reuters Classifier Reuters New York Times �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Another NER example train test ideal setting gene 54.1% name recognizer mouse mouse realistic setting gene 28.1% name recognizer fly mouse �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 3

✁ ✂ � ✂ ✄ ✁ � ✂ ✁ � ✆ � Other examples Spam filtering: Public email collection personal inboxes Sentiment analysis of product reviews Digital cameras cell phones Movies books Can we do better than standard supervised learning? Domain adaptation: to design learning methods that are aware of the training and test domain difference. �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ How do we solve the problem in general? �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 4

� Observation 1 domain-specific features wingless daughterless eyeless apexless … �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Observation 1 domain-specific features • describing phenotype wingless daughterless • in fly gene nomenclature eyeless • feature “-less” weighted high apexless … CD38 feature still PABPC5 useful for other … organisms? No! �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✡☛ 5

✻ ✁ ✁ ✁ ✁ ✁ ✻ ● ✁ ✁ ● ✁ ✁ ✛ ✁ Observation 2 generalizable features ✻❂❁✤✖❃✻ ❄❆❅❈❇❊❉ ✰✵✭❋✦❃★✿✪✘✥✧✦✟✭✯✭✮✦✚✙ ✂☎✄✝✆✟✞✡✠✝✄☞☛✍✌✎✞✡✠☎✏✑✄✓✒✕✔✑✆ ●✼❍■● ✴❏✻❂❁■✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙▲✶✓✳▼✰◆✖✘✳ ✔✜☛✓✒✕✏✑✄✣✢✤✢ ✖✘✗✚✙ ✖✍✥✧✦ ❖P✦✍✳✲✳✵✭ ✻❂❁✟✖✼✻ ◗✝❘❚❙✕◗❊❄❱❯ ✰✵✭ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗ ✦✩★✫✪✘✥✬✦✟✭✮✭✯✦✟✙✱✰✲✗❳❲❨✦❃✻❩✖✘✳ ✥✧✖✍✰✲✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭ ✖✘✗✚✙✱✰❬✗❭✖❪✥✽✖✘✗✩✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ in each ✻❴✰✵✭✯✭❵✸✤✦✟✭✩❛ �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Observation 2 generalizable features ✄✍❞❫✠☞❡♥✄❣✢✤✢❃✄✝✂ ✻❂❁✤✖❃✻ ✐✡❥❧❦✟♠ ✰✵✭ ✙✘✦✟❖P✖✍✪✩✦✍✗✼✻❩✖✘✪✿✳✵✦✟✶✝✰✵❖ ●✼❍■● ✴✤✻❂❁▲✗✤✦✿✸✘✥✧✴✝✗✤✭❑✖✿✗✤✙■✶✝✳✲✰♦✖✘✳ ✖✘✗✚✙ ❜❝✰❬✗✤✶✝✳✵✦✚✭✮✭ ✖✘✥✧✦ ❖P✦✿✳▼✳✵✭ ✻❂❁✟✖✼✻ ♣✼qsr✝♣✤✐☎t ✰✵✭ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗ ✄✍❞❫✠☎❡❢✄❣✢✤✢❏✄✓✂ ✰❤✗✺❲❨✦❏✻✉✖✍✳ ✥✽✖✘✰❬✗ ✖✘✗✚✖✘✳✵✴✘✶✷✴✹✸✩✭✺✪✟✖✼✻✽✻✾✦✿✥❀✗✩✭ ✖✘✗✟✙✱✰❤✗✈✖✇✥❨✖✘✗✤✶✘✦❭✴✚❲❫✖✤✙✣✸✘✳ in each ✻❴✰✵✭✮✭✼✸✤✦✟✭❃❛ feature “X be expressed” �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✑✠ 6

✁ ✁ � ✁ General idea: two-stage approach domain-specific features Source Target Domain Domain generalizable features features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Goal Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 7

✁ � ✁ ✂ Regular classification Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Generalization: to emphasize generalizable features in the trained model Source Target Domain Domain features Stage 1 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 8

✄ ✁ Adaptation: to pick up domain-specific features for the target domain Source Target Domain Domain features Stage 2 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✁ ✡✆ Regular semi-supervised learning Source Target Domain Domain features �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 9

✠ � ✁ ✁ � � ✁ � ☛ ✁ Comparison with related work We explicitly model generalizable features. Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007]. We do not need labeled target data but we need multiple source (training) domains. Some work requires labeled target data [Daumé III 2007]. We have a 2 nd stage of adaptation, which uses semi-supervised learning. Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007]. �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Implementation of the two- stage approach with logistic regression classifiers �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ 10

✗ ✘ ✖ ✠ ✁ ✖ ✖ Logistic regression classifiers 0.2 0 -less 4.5 1 w x T 5 0 exp( ) y x w = p y ( | , ) -0.3 0 w x T exp( ' ) 3.0 1 ∑ y p binary features y : : ' : : X be expressed 2.1 0 -0.9 1 … and wingless are 0.4 0 expressed in… T x w y w y x �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ Learning a logistic regression classifier 0.2 0 regularization term 4.5 1 ( 5 0 w w 2 = λ ˆ arg min -0.3 0 w 3.0 1 penalize large weights : : w x T N exp( ) 1 : : − y log control model complexity ∑ w T x N 2.1 0 exp( ) i = 1 ∑ y ' -0.9 1 y ' 0.4 0 T x log likelihood of w y training data �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠✕✠ 11

Generalizable features in weight vectors D 1 D 2 D K K source domains 0.2 3.2 0.1 4.5 0.5 0.7 5 4.5 4.2 domain-specific features -0.3 -0.1 0.1 3.0 3.5 3.2 : : : … generalizable : : : features 2.1 0.1 1.7 -0.9 -1.0 0.1 0.4 -0.2 0.3 w K w 1 w 2 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁� We want to decompose w in this way h non-zero entries for h 0.2 0 0.2 4.5 0 4.5 generalizable 5 4.6 0.4 features -0.3 0 -0.3 = + 3.0 3.2 -0.2 : : : : : : 2.1 0 2.1 -0.9 0 -0.9 0.4 0 0.4 �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁✂ 12

Feature selection matrix A matrix A selects h generalizable features 0 1 0 0 1 0 0 … 0 0 0 0 0 0 0 1 … 0 1 0 h : : 1 : : : 0 0 0 0 0 … 1 : 0 0 z = A x A 1 0 x �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁� Decomposition of w weights for domain-specific features weights for generalizable features 0.2 0 0.2 0 4.5 1 4.5 1 4.6 0 5 0 0.4 0 -0.3 0 3.2 1 -0.3 0 + = : : 3.0 1 -0.2 1 : : : : : : 3.6 0 : : : : 2.1 0 2.1 0 -0.9 1 -0.9 1 0.4 0 0.4 0 w T x v T z + u T x = �✂✁☎✄✝✆✟✞✡✠☞☛☎☛☎✆ ✌✎✍✑✏✓✒✔✠☞☛☎☛☎✆ ✠ ✁✂ 13

What is domain adaptation? - PDF document

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation?

discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA

Domain Adaptation with Asymmetrically Relaxed Distribution Alignment Yifan Wu , Ezra Winston,

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation 1,2 1,4 Xiang

Towards Assumption-free Unsupervised Domain Adaptation for Visual recognition

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of

Domain adaptation model for retinopathy detection from cross-domain OCT images Jing Wang 1;2 ,

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Theoretical Analysis of Domain Adaptation Current state of the art Shai Ben-David September 14,

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems Stephen

Active Online Domain Adaptation Yining Chen (Stanford) , Haipeng Luo (USC), Tengyu Ma (Stanford),

for Domain Adaptation in Chest X-ray Classification Matthias Lenga, Heinrich Schulz, Axel

Unsupervised Domain Adaptation Based on Source-guided Discrepancy 23 th Sep. Han Bao (The

Transfer learning/Domain adaptation: Methods and

Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Outline

Domain adaptation with optimal transport from mapping to learning with joint distribution R.

Action Segmentation with Jo Join int Self-Superv rvised Temporal Domain Adaptation Min-Hung

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable

Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition Rama

Robust Causal Domain Adaptation in a Simple Diagnostic Setting Thijs van Ommen Ghent, July 4,

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation Min-Hung Chen 1 Baopu

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

Unsupervised learning of multimodal image registration using domain adaptation with projected

What is domain adaptation? - PDF document

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign What is domain adaptation?

discrepancy for unsupervised domain adaptation Hongliang Yan 2017/06/21 Domain Adaptation DA

Domain Adaptation with Asymmetrically Relaxed Distribution Alignment Yifan Wu , Ezra Winston,

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation 1,2 1,4 Xiang

Towards Assumption-free Unsupervised Domain Adaptation for Visual recognition

Instance Weighting for Domain Adaptation in NLP Jing Jiang &amp; ChengXiang Zhai University of

Domain adaptation model for retinopathy detection from cross-domain OCT images Jing Wang 1;2 ,

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Theoretical Analysis of Domain Adaptation Current state of the art Shai Ben-David September 14,

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction Rahaf Aljundi,

Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems Stephen

Active Online Domain Adaptation Yining Chen (Stanford) , Haipeng Luo (USC), Tengyu Ma (Stanford),

for Domain Adaptation in Chest X-ray Classification Matthias Lenga, Heinrich Schulz, Axel

Unsupervised Domain Adaptation Based on Source-guided Discrepancy 23 th Sep. Han Bao (The

Transfer learning/Domain adaptation: Methods and

Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Outline

Domain adaptation with optimal transport from mapping to learning with joint distribution R.

Action Segmentation with Jo Join int Self-Superv rvised Temporal Domain Adaptation Min-Hung

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable

Dictionaries, Manifolds and Domain Adaptation for Image and Video- based Recognition Rama

Robust Causal Domain Adaptation in a Simple Diagnostic Setting Thijs van Ommen Ghent, July 4,

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation Min-Hung Chen 1 Baopu

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

Unsupervised learning of multimodal image registration using domain adaptation with projected

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of