Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides :
Roadmap ! 2 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •
Improving NMT in low-Resource scenarios ! 3 NMT is notorious! • Bilingually low-resource scenario: large amounts of bilingual training data is • not available IDEA: Use existing resources from other tasks and train one model for all tasks • using multi-task learning This effectively injects inductive biases to help improving the generalisation of • NMT Auxiliary tasks: Semantic Parsing, Syntactic Parsing, Named Entity Recognition •
Encoders-Decoders for Individual Tasks ! 4 Machine Translation I went home متفر هناخ هب نم Encoder Decoder Semantic Parsing Obama was elected and Encoder Decoder his voter celebrated Syntactic Parsing apartment N The burglar robbed the NP Encoder Decoder DT the apartment VP robbed V S burglar Named-Entity Recognition N NP DT the Jim bought 300 shares of Encoder Decoder B-PER 0 0 0 0 B-ORG I-ORG 0 B-MISC Acme Corp. in 2006
Sharing Scenario ! 5 translation Named-entities sentence Multitask seq2seq Parse tree task tag Semantic graph Machine Translation Encoder Decoder Encoder Decoder Semantic Parsing Syntactic Parsing Encoder Decoder Named-Entity Encoder Decoder Recognition
Partial Parameter Sharing ! 6 <translation> I went home متفر هناخ هب نم Encoder Decoder نمهبهناخمتفر <EOS> shared shared (3) (3) (3)(3)(3)(3) (3) (3) (3) (3) h 2 h 4 h 5 g 4 g 5 h 1 h 3 g 1 g 2 g 3 k e s c a n T e r e Context f r e (2) (2)(2)(2)(2) t (2) (2) (2) (2) (2) n h 5 h 1 h 2 h 3 h 4 g 4 g 5 g 1 g 2 g 3 i (1)(1)(1)(1) (1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 2 g 3 I went home <EOS> <translation> Zaremoodi & Haffari, NAACL, 2018
Roadmap ! 7 Introduction & Background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •
Adaptive Knowledge Sharing in MTL ! 8 ! Sharing the parameters of the recurrent units among all tasks ! Task interference sharing the knowledge for ! Inability to leverage commonalities among subsets of tasks controlling the information flow in the hidden states ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks
Adaptive Knowledge Sharing in MTL ! 9 ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks ! Extend the recurrent units with multiple blocks ! each block has its own information flow through the time ! Routing mechanism: to softly direct the input to these blocks Task Block
Adaptive Knowledge Sharing ! 10 Routing : Blocks : 흉 푡 Task Block
Adaptive Knowledge Sharing ! 11 We use the proposed recurrent unit inside encoder and decoder. <EOS> نمهبهناخمتفر Context (1) (1)(1)(1)(1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 3 g 2 I went home <EOS> <translation> 흉 푡 흉 푡 Task Task Block Block
Roadmap ! 12 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •
Experiments ! 13 Language Pairs: English to Farsi/Vietnamese • Datasets: • English to Farsi: TED corpus & LDC2016E93 ▪ English to Vietnamese: IWSLT 2015 (TED and TEDX talks) ▪ Semantic parsing: AMR corpus(newswire, weblogs, web discussion forums and broadcast ▪ conversations) Syntactic parsing: Penn Treebank ▪ NER: CONLL NER Corpus (newswire articles from the Reuters Corpus) ▪ NMT Architecture: GRU for blocks, 400 RNN hidden states and word embedding • NMT best practice: • Optimisation: Adam ▪ Byte Pair Encoding (BPE) on both source/target ▪ Evaluation metrics: PPL, TER and BLEU ▪
Experiments ! 14 BLEU English ➔ Farsi English ➔ Vietnamese
Experiments (English to Farsi) ! 15 0.5 0.45 0.4 0.35 0.3 0.25 0.2 B lock 1 Block 2 Block 3 MT S emantic S yntac tic NE R ! Average block usage. ! Blocks specialisation: Block 1: MT, Semantic Parsing, Block 2: Syntactic/Semantic Parsing, Block 3: NER
Conclusion ! 16 ! Address the task interference issue in MTL ! extending the recurrent units with multiple blocks ! with a trainable routing network
! 17 Questions? Paper :
Recommend
More recommend