context to sequence
play

Context to Sequence Typical Frameworks and Applications Piji Li - PowerPoint PPT Presentation

Context to Sequence Typical Frameworks and Applications Piji Li Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong FDU-CUHK, 2017 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 1 / 59


  1. Context to Sequence Typical Frameworks and Applications Piji Li Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong FDU-CUHK, 2017 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 1 / 59

  2. Outline Introduction 1 Frameworks 2 Overview Teacher Forcing Adversarial Reinforce Tricks Applications 3 Conclusions 4 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 2 / 59

  3. Introduction Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 3 / 59

  4. Introduction Typical ctx2seq frameworks have obtained significant improvements: Neural machine translation. Abstraction text summarization. Dialog/Conversation system - Chatbot. Caption generation for images and videos. Various strategies to train a better ctx2seq model: Improving teacher forcing. Adversarial training. Reinforcement learning. Tricks (copy, coverage, dual training, etc.). Interesting applications. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 4 / 59

  5. Frameworks Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 5 / 59

  6. Outline Introduction 1 Frameworks 2 Overview Teacher Forcing Adversarial Reinforce Tricks Applications 3 Conclusions 4 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 6 / 59

  7. Overview Figure 1: Seq2seq framework with attention mechanism and teacher forcing. 1 1 https://github.com/OpenNMT Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 7 / 59

  8. Outline Introduction 1 Frameworks 2 Overview Teacher Forcing Adversarial Reinforce Tricks Applications 3 Conclusions 4 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 8 / 59

  9. Teacher Forcing Feed the ground-truth sample y t back to the model to be conditioned on for the prediction of later outputs. Advantages : Force the decoder to stay close to the ground-truth sequence. Faster convergence speed. Disadvantage : In prediction: sampling & greedy decoding; beam search. Mismatch between training and testing. Error accumulation during decoding phase. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 9 / 59

  10. Teacher Forcing Improve the Performance Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. ” Scheduled sampling for sequence prediction with recurrent neu- ral networks .” NIPS, 2015. [Google Research] Lamb, Alex M., Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. ” Professor forcing: A new algorithm for training recurrent networks .” NIPS, 2016. [University of Montreal] Jang, Eric, Shixiang Gu, and Ben Poole. ” Categorical reparameter- ization with gumbel-softmax .” ICLR, 2017. Gu, Jiatao, Daniel Jiwoong Im, and Victor OK Li. ” Neural Machine Translation with Gumbel-Greedy Decoding .” arXiv (2017). Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 10 / 59

  11. Teacher Forcing Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. ” Scheduled sampling for sequence prediction with recurrent neu- ral networks .” NIPS, 2015. [Google Research] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 11 / 59

  12. Teacher Forcing Scheduled Sampling [1] - Framework Overview of the scheduled sampling method: Figure 2: Illustration of the Scheduled Sampling approach, where one flips a coin at every time step to decide to use the true previous token or one sampled from the model itself.[1] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 12 / 59

  13. Teacher Forcing Scheduled Sampling [1] - Experiments Image Captioning, MSCOCO: Constituency Parsing, WSJ 22: Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 13 / 59

  14. Teacher Forcing Lamb, Alex M., Anirudh Goyal ALIAS PARTH GOYAL, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. ” Professor forcing: A new algorithm for training recurrent net- works .” NIPS, 2016. [University of Montreal] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 14 / 59

  15. Teacher Forcing Professor Forcing [3] - Framework Architecture of the Professor Forcing: Figure 3: Match the dynamics of free running with teacher forcing. [3] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 15 / 59

  16. Teacher Forcing Professor Forcing [3] - Adversarial Training Adversarial training paradigm: Discriminator is Bi-RNN + MLP. D: G: Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 16 / 59

  17. Teacher Forcing Professor Forcing [3] - Experiments Character-Level Language Modeling, Penn-Treebank: Figure 4: Training Negative Log-Likelihood. Training cost decreases faster. Training time is 3 times more. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 17 / 59

  18. Teacher Forcing Jang, Eric, Shixiang Gu, and Ben Poole. ” Categorical reparameter- ization with gumbel-softmax .” ICLR, 2017. Gu, Jiatao, Daniel Jiwoong Im, and Victor OK Li. ” Neural Machine Translation with Gumbel-Greedy Decoding .” arXiv (2017). Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 18 / 59

  19. Teacher Forcing Gumbel Softmax [2] The Gumbel-Max trick (Gumbel, 1954) provides a simple and efficient way to draw samples z from a categorical distribution with class prob- abilities π : Gumbel(0, 1): u ∼ Uniform(0 , 1) and g = − log ( − log ( u )). Gumbel-Softmax is differentiable. Between softmax and one hot. Example: Char-RNN. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 19 / 59

  20. Teacher Forcing Discussions Teacher forcing is good enough. Teacher forcing is indispensable. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 20 / 59

  21. Outline Introduction 1 Frameworks 2 Overview Teacher Forcing Adversarial Reinforce Tricks Applications 3 Conclusions 4 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 21 / 59

  22. Adversarial Training Generative Adversarial Network (GAN) 2 : 2 Source of figure: https://goo.gl/uPxWTs Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 22 / 59

  23. Adversarial Training Bahdanau, Dzmitry, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. ” An actor- critic algorithm for sequence prediction .” arXiv 2016. (Basic work, connect AC with GAN) Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. ” SeqGAN: Se- quence Generative Adversarial Nets with Policy Gradient .” AAAI 2017. Li, Jiwei, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. ” Adversarial learning for neural dialogue generation .” EMNLP 2017. Wu, Lijun, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. ” Adversarial Neural Machine Translation .” arXiv 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 23 / 59

  24. Adversarial Training SeqGAN [9] Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. ” SeqGAN: Se- quence Generative Adversarial Nets with Policy Gradient .” AAAI 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 24 / 59

  25. Adversarial Training SeqGAN [9] - Framework Overview of the framework: Figure 5: Left: D is trained over the real data and the generated data by G. Right: G is trained by policy gradient where the final reward signal is provided by D and is passed back to the intermediate action value via Monte Carlo search. [9] Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 25 / 59

  26. Adversarial Training SeqGAN [9] - Training Discriminator: CNN (Highway) Policy Gradient: (1) Pre-train the generator and discriminator. (2) Adversarial training. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 26 / 59

  27. Adversarial Training SeqGAN [9] - Experiments Results on three tasks: Policy Gradient: Wang, Jun, et. al. ” IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models .” SIGIR 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 27 / 59

  28. Adversarial Training Adversarial Dialog [4] Li, Jiwei, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. ” Adversarial learning for neural dialogue generation .” EMNLP 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 28 / 59

  29. Adversarial Training Adversarial Dialog [4] - Framework G: seq2seq. D: a hierarchical recurrent encoder. Training: policy gradient. Add teacher forcing back. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 29 / 59

  30. Adversarial Training Adversarial NMT [8] Wu, Lijun, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. ” Adversarial Neural Machine Translation .” arXiv 2017. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 30 / 59

  31. Adversarial Training Adversarial NMT [8] - Framework G: seq2seq. D: CNN Training: policy gradient. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 31 / 59

  32. Adversarial Training Adversarial NMT [8] - Experiments Figure 6: Different NMT systems’ performances on En → Fr translation. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 32 / 59

  33. Adversarial Training Discussions Fine tuning. More robust. Difficult to train. Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 33 / 59

  34. Outline Introduction 1 Frameworks 2 Overview Teacher Forcing Adversarial Reinforce Tricks Applications 3 Conclusions 4 Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 34 / 59

  35. Tricks Copy mechanism. Coverage or diversity. Dual or reconstruction. CNN based seq2seq Piji Li (CUHK) Context to Sequence FDU-CUHK, 2017 35 / 59

Recommend


More recommend