adaptive knowledge sharing in multi task learning
play

Adaptive Knowledge Sharing in Multi-Task Learning: Improving - PowerPoint PPT Presentation

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides : Roadmap ! 2 Introduction & background Adaptive


  1. Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation Poorya ZareMoodi , Wray Buntine, Gholamreza (Reza) Haffari Monash University Slides :

  2. Roadmap ! 2 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

  3. Improving NMT in low-Resource scenarios ! 3 NMT is notorious! • Bilingually low-resource scenario: large amounts of bilingual training data is • not available IDEA: Use existing resources from other tasks and train one model for all tasks • using multi-task learning This effectively injects inductive biases to help improving the generalisation of • NMT Auxiliary tasks: Semantic Parsing, Syntactic Parsing, Named Entity Recognition •

  4. Encoders-Decoders for Individual Tasks ! 4 Machine Translation I went home متفر هناخ هب نم Encoder Decoder Semantic Parsing Obama was elected and Encoder Decoder his voter celebrated Syntactic Parsing apartment N The burglar robbed the NP Encoder Decoder DT the apartment VP robbed V S burglar Named-Entity Recognition N NP DT the Jim bought 300 shares of Encoder Decoder B-PER 0 0 0 0 B-ORG I-ORG 0 B-MISC Acme Corp. in 2006

  5. Sharing Scenario 
 ! 5 translation Named-entities sentence Multitask seq2seq Parse tree task tag Semantic graph Machine Translation Encoder Decoder Encoder Decoder Semantic Parsing Syntactic Parsing Encoder Decoder Named-Entity Encoder Decoder Recognition

  6. Partial Parameter Sharing ! 6 <translation> I went home متفر هناخ هب نم Encoder Decoder نمهبهناخمتفر <EOS> shared shared (3) (3) (3)(3)(3)(3) (3) (3) (3) (3) h 2 h 4 h 5 g 4 g 5 h 1 h 3 g 1 g 2 g 3 k e s c a n T e r e Context f r e (2) (2)(2)(2)(2) t (2) (2) (2) (2) (2) n h 5 h 1 h 2 h 3 h 4 g 4 g 5 g 1 g 2 g 3 i (1)(1)(1)(1) (1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 2 g 3 I went home <EOS> <translation> Zaremoodi & Haffari, NAACL, 2018

  7. Roadmap ! 7 Introduction & Background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

  8. Adaptive Knowledge Sharing in MTL ! 8 ! Sharing the parameters of the recurrent units among all tasks ! Task interference sharing the knowledge for ! Inability to leverage commonalities among subsets of tasks controlling the information flow in the hidden states ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks

  9. Adaptive Knowledge Sharing in MTL ! 9 ! IDEA ! Multiple experts in handling different kinds of information ! Adaptively share experts among the tasks ! Extend the recurrent units with multiple blocks ! each block has its own information flow through the time ! Routing mechanism: to softly direct the input to these blocks Task Block

  10. Adaptive Knowledge Sharing ! 10 Routing : Blocks : 흉 푡 Task Block

  11. Adaptive Knowledge Sharing ! 11 We use the proposed recurrent unit inside encoder and decoder. <EOS> نمهبهناخمتفر Context (1) (1)(1)(1)(1) (1) (1) (1) (1) (1) h 1 h 2 h 3 h 4 h 5 g 4 g 5 g 1 g 3 g 2 I went home <EOS> <translation> 흉 푡 흉 푡 Task Task Block Block

  12. Roadmap ! 12 Introduction & background • Adaptive knowledge sharing in Multi-Task Learning • Experiments & analysis • Conclusion •

  13. Experiments ! 13 Language Pairs: English to Farsi/Vietnamese • Datasets: • English to Farsi: TED corpus & LDC2016E93 ▪ English to Vietnamese: IWSLT 2015 (TED and TEDX talks) ▪ Semantic parsing: AMR corpus(newswire, weblogs, web discussion forums and broadcast ▪ conversations) Syntactic parsing: Penn Treebank ▪ NER: CONLL NER Corpus (newswire articles from the Reuters Corpus) ▪ NMT Architecture: GRU for blocks, 400 RNN hidden states and word embedding • NMT best practice: • Optimisation: Adam ▪ Byte Pair Encoding (BPE) on both source/target ▪ Evaluation metrics: PPL, TER and BLEU ▪

  14. Experiments ! 14 BLEU English ➔ Farsi English ➔ Vietnamese

  15. Experiments (English to Farsi) ! 15 0.5 0.45 0.4 0.35 0.3 0.25 0.2 B lock 1 Block 2 Block 3 MT S emantic S yntac tic NE R ! Average block usage. ! Blocks specialisation: Block 1: MT, Semantic Parsing, Block 2: Syntactic/Semantic Parsing, Block 3: NER

  16. Conclusion ! 16 ! Address the task interference issue in MTL ! extending the recurrent units with multiple blocks ! with a trainable routing network

  17. ! 17 Questions? Paper :

Recommend


More recommend