Multi-task Tri-training Multi-task tri-training 1. Train one model with 3 objective functions. 2. Use predictions on unlabeled data for third if two agree. y = 1 y = 1 x 1 11
Multi-task Tri-training Multi-task tri-training 1. Train one model with 3 objective functions. 2. Use predictions on unlabeled data for third if two agree. 3. Restrict final layers to y = 1 y = 1 use different x representations. 1 11
Multi-task Tri-training Multi-task tri-training 1. Train one model with 3 objective functions. 2. Use predictions on unlabeled data for third if two agree. 3. Restrict final layers to y = 1 y = 1 use different x representations. 4. Train third objective 1 function only on pseudo labeled to bridge domain shift. 11
Multi-task Tri-training 12
Multi-task Tri-training BiLSTM (Plank et al., 2016) 12
Multi-task Tri-training BiLSTM (Plank et al., 2016) char w 2 BiLSTM 12
Multi-task Tri-training BiLSTM BiLSTM (Plank et al., 2016) char char w 1 w 2 BiLSTM BiLSTM 12
Multi-task Tri-training BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
Multi-task Tri-training m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
Multi-task Tri-training m 2 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
Multi-task Tri-training m 2 m 3 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
Multi-task Tri-training m 2 m 3 m 1 m 2 m 3 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
Multi-task Tri-training m 2 m 3 m 2 m 3 m 1 m 2 m 3 m 1 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
Multi-task Tri-training L orth = ∥ W ⊤ m 1 W m 2 ∥ 2 orthogonality constraint (Bousmalis et al., 2016) F m 2 m 3 m 2 m 3 m 1 m 2 m 3 m 1 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM 12
⃗ Multi-task Tri-training L orth = ∥ W ⊤ m 1 W m 2 ∥ 2 orthogonality constraint (Bousmalis et al., 2016) F m 2 m 3 m 2 m 3 m 1 m 2 m 3 m 1 m 1 BiLSTM BiLSTM BiLSTM (Plank et al., 2016) char char char w 1 w 2 w 3 BiLSTM BiLSTM BiLSTM L ( θ ) = − ∑ i ∑ log P m i ( y | h ) + γ L orth Loss: 1,.., n 12
Data & Tasks 13
Data & Tasks Two tasks: Domains: 13
Data & Tasks Two tasks: Domains: Sentiment analysis on Amazon reviews dataset (Blitzer et al, 2006) 13
Data & Tasks Two tasks: Domains: Sentiment analysis on Amazon reviews dataset (Blitzer et al, 2006) POS tagging on SANCL 2012 dataset (Petrov and McDonald, 2012) 13
Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) 14
Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) 14
Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) 14
Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) 14
Sentiment Analysis Results 82 80.25 Accuracy 78.5 76.75 75 Avg over 4 target domains VFAE* DANN* Asym* Source only Self-training Tri-training Tri-training-Disagr. MT-Tri * result from Saito et al., (2017) ‣ Multi-task tri-training slightly outperforms tri-training, but has higher variance. 14
POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri 15
POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri 15
POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri 15
POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri 15
POS Tagging Results Trained on 10% labeled data (WSJ) 89.8 89.525 Accuracy 89.25 88.975 88.7 Avg over 5 target domains Source (+embeds) Self-training Tri-training Tri-training-Disagr. MT-Tri ‣ Tri-training with disagreement works best with little data. 15
POS Tagging Results Trained on full labeled data (WSJ) 92 91.25 Accuracy 90.5 89.75 89 Avg over 5 target domains TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri * result from Schnabel & Schütze (2014) 16
POS Tagging Results Trained on full labeled data (WSJ) 92 91.25 Accuracy 90.5 89.75 89 Avg over 5 target domains TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri * result from Schnabel & Schütze (2014) 16
POS Tagging Results Trained on full labeled data (WSJ) 92 91.25 Accuracy 90.5 89.75 89 Avg over 5 target domains TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri * result from Schnabel & Schütze (2014) 16
POS Tagging Results Trained on full labeled data (WSJ) 92 91.25 Accuracy 90.5 89.75 89 Avg over 5 target domains TnT Stanford* Source (+embeds) Tri-training Tri-training-Disagr. MT-Tri * result from Schnabel & Schütze (2014) ‣ Tri-training works best in the full data setting. 16
POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri 17
POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri 17
POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri 17
POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri ‣ Classic tri-training works best on OOV tokens. 17
POS Tagging Analysis Accuracy on out-of-vocabulary (OOV) tokens 11 80 Accuracy on OOV tokens 8.25 72.5 % OOV tokens 5.5 65 2.75 57.5 0 50 Answers Emails Newsgroups Reviews Weblogs OOV tokens Src Tri MT-Tri ‣ Classic tri-training works best on OOV tokens. ‣ MT-Tri does worse than source-only baseline on OOV. 17
POS Tagging Analysis POS accuracy per binned log frequency 0.018 Accuracy delta vs. src-only baseline 0.014 0.009 0.005 0 -0.005 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Binned frequency MT-Tri Tri 18
Recommend
More recommend