seqgan sequence generative adversarial nets with policy
play

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient - PowerPoint PPT Presentation

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu , Weinan Zhang , Jun Wang , Yong Yu Shanghai Jiao Tong University, University College London Attribution Multiple slides taken from


  1. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu † , Weinan Zhang † , Jun Wang ‡ , Yong Yu † † Shanghai Jiao Tong University, ‡ University College London

  2. Attribution • Multiple slides taken from • Hung-yi Lee • Paarth Neekhara • Ruirui Li • Original authors at AAAI 2017 • Presented by: Pratyush Maini

  3. Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

  4. All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo More than 500 species in the zoo (not updated since 2018.09)

  5. All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo GAN ACGAN BGAN CGAN DCGAN EBGAN fGAN GoGAN …… Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, “ Variational Approaches for Auto-Encoding Generative Adversarial Networks ”, arXiv, 2017

  6. Three Categories of GAN 1. Generation −0.3 0.1 Generator ⋮ 0.9 random vector image 2. Conditional Generation blue eyes, “Girl with red hair, Generator red hair” short hair text paired data image 3. Unsupervised Conditional Generation x y domain y domain x Generator Photo Vincent van unpaired data Gogh’s style

  7. Anime Face Generation Generator Draw Examples

  8. Powered by: http://mattya.github.io/chainer-DCGAN/ Basic Idea of GAN It is a neural network (NN), or a function. high vector dimensional Generator image vector 0.1 3 −3 −3 Generator ⋮ Generator ⋮ 2.4 2.4 0.9 0.9 Each dimension of input vector Longer hair represents some characteristics. 0.1 0.1 2.1 −3 Generator Generator ⋮ ⋮ 5.4 2.4 0.9 3.5 Open mouth blue hair

  9. Basic Idea of GAN It is a neural network (NN), or a function. scalar Discri- image Larger value means real, minator smaller value means fake. Discri- Discri- 1.0 1.0 minator minator Discri- Discri- 0.1 0.1 minator minator

  10. Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

  11. Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 1 : Fix generator G, and update discriminator D sample Update 1 1 1 1 D generated Database 0 0 0 0 objects randomly vector vector vector vector G Fix sampled Discriminator learns to assign high scores to real objects and low scores to generated objects.

  12. Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 2 : Fix discriminator D, and update generator G Generator learns to “fool” the discriminator hidden layer NN Discri- 0.13 Generator minator vector update fix Gradient Ascent large network

  13. Algorithm • Initialize generator and discriminator G D • In each training iteration: Sample some Update 1 1 1 1 real objects: D Learning Generate some D fake objects: 0 0 0 0 vector vector vector vector fix G Learning G vector vector vector vector image 1 G D image image image update fix

  14. Anime Face Generation 100 updates Source of training data: https://zhuanlan.zhihu.com/p/24767059

  15. Anime Face Generation 1000 updates

  16. Anime Face Generation 2000 updates

  17. Anime Face Generation 5000 updates

  18. Anime Face Generation 10,000 updates

  19. Anime Face Generation 20,000 updates

  20. Anime Face Generation 50,000 updates

  21. In 2019, with StyleGAN …… Source of video: https://www.gwern.net/Faces

  22. The first GAN [Ian J. Goodfellow, et al., NIPS, 2014]

  23. Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 1. Reinforcement Learning 2. GAN + RL 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

  24. NLP tasks usually involve Se Sequence Generatio ion How to use GAN to improve sequence generation?

  25. Reinforcement Learning Learn to maximize expected reward E.g. Policy Gradient En De Input sentence c response sentence x “How are you?” Chatbot “Not bad” “I’m John” +1 -1 Input sentence c 𝑆 𝑑, 𝑦 Human response sentence x reward human [Li, et al., EMNLP , 2016]

  26. Policy Gradient 𝜄 𝑢+1 ← 𝜄 𝑢 + 𝜃𝛼 ത 𝑆 𝜄 𝑢 𝜄 𝑢 𝑂 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝛼𝑚𝑝𝑕𝑄 𝜄 𝑢 𝑦 𝑗 |𝑑 𝑗 𝑂 ෍ 𝑗=1 𝑆 𝑑 1 , 𝑦 1 𝑑 1 , 𝑦 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 is positive 𝑆 𝑑 2 , 𝑦 2 𝑑 2 , 𝑦 2 Updating 𝜄 to increase 𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 …… …… 𝑆 𝑑 𝑗 , 𝑦 𝑗 is negative 𝑆 𝑑 𝑂 , 𝑦 𝑂 𝑑 𝑂 , 𝑦 𝑂 Updating 𝜄 to decrease 𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗

  27. Policy Gradient Maximum Reinforcement Learning - Likelihood Policy Gradient 𝑂 𝑂 1 1 Objective 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝑚𝑝𝑕𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 𝑦 𝑗 |𝑑 𝑗 𝑂 ෍ 𝑚𝑝𝑕𝑄 𝜄 ො 𝑂 ෍ Function 𝑗=1 𝑗=1 𝑂 𝑂 1 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝛼𝑚𝑝𝑕𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 𝑦 𝑗 |𝑑 𝑗 𝑂 ෍ 𝛼𝑚𝑝𝑕𝑄 𝜄 ො 𝑂 ෍ Gradient 𝑗=1 𝑗=1 𝑦 1 , … , 𝑑 𝑂 , ො 𝑑 1 , 𝑦 1 , … , 𝑑 𝑂 , 𝑦 𝑂 𝑑 1 , ො 𝑦 𝑂 Training Data 𝑦 𝑗 = 1 obtained from interaction 𝑆 𝑑 𝑗 , ො weighted by 𝑆 𝑑 𝑗 , 𝑦 𝑗

  28. Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 1. Reinforcement Learning 2. GAN + RL 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

  29. Why we need GAN? Maximize • Chat-bot as example I’m good. likelihood Output: Not bad I’m John. Human output better sentence x Training Criterion better Training …… Encoder Decoder data: A: How are you ? Seq2seq B: I’m good. Input sentence c …… How are you ?

  30. Conditional GAN I am busy. However, there is an issue when you train your generator. En De Input sentence c response sentence x Chatbot Input sentence c 𝑆 𝑑, 𝑦 Discriminator response sentence x reward Replace human evaluation with machine evaluation [Li, et al., EMNLP , 2017]

  31. Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner , et al. , arXiv, 2016][Weili Nie, et al. ICLR, 2019] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] Reinforcement Learning • [Yu, et al., AAAI, 2017][Li, et al., EMNLP , 2017] [Tong Che , et al , arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]

  32. scalar Discriminator Use the distribution as the input of A B A discriminator A A A Avoid the sampling B B B process Update Parameters We can do backpropagation now. <BOS> A B

  33. What is the problem? Discriminator with constraint (e.g. WGAN) can be helpful. • Real sentence 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 Discriminator can 0 0 0 1 0 immediately find • Generated the difference. 0 0 0 0 1 0.9 0.1 0.1 0 0 0.1 0.9 0.1 0 0 Can never 0 0 0.7 0.1 0 be 1-hot 0 0 0.1 0.8 0.1 0 0 0 0.1 0.9

  34. Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner , et al. , arXiv, 2016][Weili Nie, et al. ICLR, 2019] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] Reinforcement Learning • [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che , et al , arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]

  35. Tips for Sequence Generation GAN . RL is difficult to train GAN is difficult to train Sequence Generation GAN (RL+GAN)

  36. Tips for Sequence Generation GAN I don’t know which • Typical part is wrong … Discrimi En De You is good 0.1 nator Chatbot • Reward for Every Generation Step You 0.9 En De Discrimi You is 0.1 Chatbot nator You is good 0.1

  37. Tips for Sequence Generation GAN • Reward for Every Generation Step You 0.9 En De Discrimi You is 0.1 Chatbot nator You is good 0.1 Method 1. Monte Carlo (MC) Search [Yu, et al., AAAI, 2017] Method 2. Discriminator For Partially Decoded Sequences [Li, et al., EMNLP , 2017] Method 3. Step-wise evaluation [Tual, Lee, TASLP , 2019][Xu, et al., EMNLP , 2018][William Fedus, et al., ICLR, 2018]

  38. Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

  39. Task 1. Given a dataset of real-world structured sequences, train a generative model G θ to produce sequences that mimic the real ones. 2. We want G θ to fit the unknown true data distribution p true ( y t |Y 1: t— 1 ), which is only revealed by the given dataset D = {Y 1: T } .

Recommend


More recommend