SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu † , Weinan Zhang † , Jun Wang ‡ , Yong Yu † † Shanghai Jiao Tong University, ‡ University College London
Attribution • Multiple slides taken from • Hung-yi Lee • Paarth Neekhara • Ruirui Li • Original authors at AAAI 2017 • Presented by: Pratyush Maini
Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation
All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo More than 500 species in the zoo (not updated since 2018.09)
All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo GAN ACGAN BGAN CGAN DCGAN EBGAN fGAN GoGAN …… Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, “ Variational Approaches for Auto-Encoding Generative Adversarial Networks ”, arXiv, 2017
Three Categories of GAN 1. Generation −0.3 0.1 Generator ⋮ 0.9 random vector image 2. Conditional Generation blue eyes, “Girl with red hair, Generator red hair” short hair text paired data image 3. Unsupervised Conditional Generation x y domain y domain x Generator Photo Vincent van unpaired data Gogh’s style
Anime Face Generation Generator Draw Examples
Powered by: http://mattya.github.io/chainer-DCGAN/ Basic Idea of GAN It is a neural network (NN), or a function. high vector dimensional Generator image vector 0.1 3 −3 −3 Generator ⋮ Generator ⋮ 2.4 2.4 0.9 0.9 Each dimension of input vector Longer hair represents some characteristics. 0.1 0.1 2.1 −3 Generator Generator ⋮ ⋮ 5.4 2.4 0.9 3.5 Open mouth blue hair
Basic Idea of GAN It is a neural network (NN), or a function. scalar Discri- image Larger value means real, minator smaller value means fake. Discri- Discri- 1.0 1.0 minator minator Discri- Discri- 0.1 0.1 minator minator
Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation
Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 1 : Fix generator G, and update discriminator D sample Update 1 1 1 1 D generated Database 0 0 0 0 objects randomly vector vector vector vector G Fix sampled Discriminator learns to assign high scores to real objects and low scores to generated objects.
Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 2 : Fix discriminator D, and update generator G Generator learns to “fool” the discriminator hidden layer NN Discri- 0.13 Generator minator vector update fix Gradient Ascent large network
Algorithm • Initialize generator and discriminator G D • In each training iteration: Sample some Update 1 1 1 1 real objects: D Learning Generate some D fake objects: 0 0 0 0 vector vector vector vector fix G Learning G vector vector vector vector image 1 G D image image image update fix
Anime Face Generation 100 updates Source of training data: https://zhuanlan.zhihu.com/p/24767059
Anime Face Generation 1000 updates
Anime Face Generation 2000 updates
Anime Face Generation 5000 updates
Anime Face Generation 10,000 updates
Anime Face Generation 20,000 updates
Anime Face Generation 50,000 updates
In 2019, with StyleGAN …… Source of video: https://www.gwern.net/Faces
The first GAN [Ian J. Goodfellow, et al., NIPS, 2014]
Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 1. Reinforcement Learning 2. GAN + RL 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation
NLP tasks usually involve Se Sequence Generatio ion How to use GAN to improve sequence generation?
Reinforcement Learning Learn to maximize expected reward E.g. Policy Gradient En De Input sentence c response sentence x “How are you?” Chatbot “Not bad” “I’m John” +1 -1 Input sentence c 𝑆 𝑑, 𝑦 Human response sentence x reward human [Li, et al., EMNLP , 2016]
Policy Gradient 𝜄 𝑢+1 ← 𝜄 𝑢 + 𝜃𝛼 ത 𝑆 𝜄 𝑢 𝜄 𝑢 𝑂 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝛼𝑚𝑝𝑄 𝜄 𝑢 𝑦 𝑗 |𝑑 𝑗 𝑂 𝑗=1 𝑆 𝑑 1 , 𝑦 1 𝑑 1 , 𝑦 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 is positive 𝑆 𝑑 2 , 𝑦 2 𝑑 2 , 𝑦 2 Updating 𝜄 to increase 𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 …… …… 𝑆 𝑑 𝑗 , 𝑦 𝑗 is negative 𝑆 𝑑 𝑂 , 𝑦 𝑂 𝑑 𝑂 , 𝑦 𝑂 Updating 𝜄 to decrease 𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗
Policy Gradient Maximum Reinforcement Learning - Likelihood Policy Gradient 𝑂 𝑂 1 1 Objective 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝑚𝑝𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 𝑦 𝑗 |𝑑 𝑗 𝑂 𝑚𝑝𝑄 𝜄 ො 𝑂 Function 𝑗=1 𝑗=1 𝑂 𝑂 1 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝛼𝑚𝑝𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 𝑦 𝑗 |𝑑 𝑗 𝑂 𝛼𝑚𝑝𝑄 𝜄 ො 𝑂 Gradient 𝑗=1 𝑗=1 𝑦 1 , … , 𝑑 𝑂 , ො 𝑑 1 , 𝑦 1 , … , 𝑑 𝑂 , 𝑦 𝑂 𝑑 1 , ො 𝑦 𝑂 Training Data 𝑦 𝑗 = 1 obtained from interaction 𝑆 𝑑 𝑗 , ො weighted by 𝑆 𝑑 𝑗 , 𝑦 𝑗
Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 1. Reinforcement Learning 2. GAN + RL 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation
Why we need GAN? Maximize • Chat-bot as example I’m good. likelihood Output: Not bad I’m John. Human output better sentence x Training Criterion better Training …… Encoder Decoder data: A: How are you ? Seq2seq B: I’m good. Input sentence c …… How are you ?
Conditional GAN I am busy. However, there is an issue when you train your generator. En De Input sentence c response sentence x Chatbot Input sentence c 𝑆 𝑑, 𝑦 Discriminator response sentence x reward Replace human evaluation with machine evaluation [Li, et al., EMNLP , 2017]
Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner , et al. , arXiv, 2016][Weili Nie, et al. ICLR, 2019] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] Reinforcement Learning • [Yu, et al., AAAI, 2017][Li, et al., EMNLP , 2017] [Tong Che , et al , arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]
scalar Discriminator Use the distribution as the input of A B A discriminator A A A Avoid the sampling B B B process Update Parameters We can do backpropagation now. <BOS> A B
What is the problem? Discriminator with constraint (e.g. WGAN) can be helpful. • Real sentence 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 Discriminator can 0 0 0 1 0 immediately find • Generated the difference. 0 0 0 0 1 0.9 0.1 0.1 0 0 0.1 0.9 0.1 0 0 Can never 0 0 0.7 0.1 0 be 1-hot 0 0 0.1 0.8 0.1 0 0 0 0.1 0.9
Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner , et al. , arXiv, 2016][Weili Nie, et al. ICLR, 2019] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] Reinforcement Learning • [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che , et al , arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]
Tips for Sequence Generation GAN . RL is difficult to train GAN is difficult to train Sequence Generation GAN (RL+GAN)
Tips for Sequence Generation GAN I don’t know which • Typical part is wrong … Discrimi En De You is good 0.1 nator Chatbot • Reward for Every Generation Step You 0.9 En De Discrimi You is 0.1 Chatbot nator You is good 0.1
Tips for Sequence Generation GAN • Reward for Every Generation Step You 0.9 En De Discrimi You is 0.1 Chatbot nator You is good 0.1 Method 1. Monte Carlo (MC) Search [Yu, et al., AAAI, 2017] Method 2. Discriminator For Partially Decoded Sequences [Li, et al., EMNLP , 2017] Method 3. Step-wise evaluation [Tual, Lee, TASLP , 2019][Xu, et al., EMNLP , 2018][William Fedus, et al., ICLR, 2018]
Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation
Task 1. Given a dataset of real-world structured sequences, train a generative model G θ to produce sequences that mimic the real ones. 2. We want G θ to fit the unknown true data distribution p true ( y t |Y 1: t— 1 ), which is only revealed by the given dataset D = {Y 1: T } .
Recommend
More recommend