CoT: Cooperative Training for Generative Modeling of Discrete Data - PowerPoint PPT Presentation

CoT: Cooperative Training for Generative Modeling of Discrete Data https://github.com/desire2020/CoT Sidi Lu, Lantao Yu, Siyuan Feng, Yaoming Zhu, Weinan Zhang, and Yong Yu Shanghai Jiao Tong University

Autoregressive Models • Autoregressive models factorize the distribution sequentially to build a fully tractable density function: • 𝑞 𝜄 𝑦 0 , 𝑦 1 , … , 𝑦 𝑜−1 = 𝑞 𝜄 𝑦 0 𝑞 𝜄 𝑦 1 𝑡 [0:1) )𝑞 𝜄 𝑦 2 𝑡 [0:2) )𝑞 𝜄 𝑦 3 𝑡 [0:3) ) … 𝑞 𝜄 𝑦 𝑜−1 𝑡 [0:𝑜−1) )

Teacher Forcing and Exposure Bias • For each sequence in the training set, maximize the estimated likelihood in the log scale: initial state p(x|s) p(x|s) p(x|s) Model Model Model forced forced estimate estimate ··· observation observation I have

Teacher Forcing and Exposure Bias • When used to generate random sample: initial state p(x|s) p(x|s) p(x|s) Model Model Model self self stochastic stochastic ··· observation observation sample sample Billie Jean

Teacher Forcing and Exposure Bias • Exposure Bias [Ranzato et al., 2015]: • The intermediate process under training stage and inference stage is inconsistent. • The distribution shift would accumulate along the timeline. Teacher Forcing Real p(x|s) Model Training Prefix Random Sampling Generated p(x|s) Model Inference Prefix

Exposure Bias and Kullback-Leibler Divergence • Exposure Bias could also be regarded as a result of optimization via minimizing Kullback-Leibler Divergence, denoted as KL(P||Q) for measured distributions P, Q.

Kullback-Leibler Divergence, Symmetry of Divergences • For any P, Q, KL(P||Q) not necessarily equals to KL(Q||P) • KL ---smoothed and symmetrized--> Jensen-Shannon Divergence • where M = 0.5 * (P + G)

GAN, SeqGAN and Language GANs • Ian Goodfellow proposed Generative Adversarial Network [2014] • Ideally, GAN minimizes the JSD • Can’t be directly applied to discrete sequence generation • SeqGAN uses the REINFORCE gradient estimator to resolve this.

Problems of SeqGAN • Not trivially able to work from scratch. • SeqGAN’s work-around: Pre-training via teacher forcing. • Trade diversity for quality (mode collapse) • According to previous reports([Lu et al. 2018; Caccia et al. 2018])

Problems of SeqGAN • Training signal is too sparse. initial state p(x|s) p(x|s) p(x|s) Generator Generator Generator self self stochastic stochastic observation observation sample sample Billie Jean single point signal single point signal Discriminator

Cooperative Training: Back to Formula! • Reconsider the algorithm from estimating & minimizing JSD: • where M = 0.5 * (P + G) • Instead of using a discriminator to achieve this, use another sequence model called “Mediator” to approximate the mixture density M.

Cooperative Training: More Information from Mediator • Key Idea: The mediator provides DISTRIBUTION level signal in each time step. Generator Generator Generator initial state G(x|s) G(x|s) G(x|s) Billie Jean signal signal signal M(x|s) M(x|s) M(x|s) Mediator Mediator Mediator

Cooperative Training: Factorizing the Cumulative Gradient Through Time, Final Objectives • Generator Gradient: • where 𝜌 𝑕 𝑡 𝑢 = 𝐻 𝜄 𝑦 𝑡 𝑢 , 𝜌 𝑛 𝑡 𝑢 = 𝑁 𝜚 𝑦 𝑡 𝑢 , • Mediator Objective:

Experiment: Synthetic Turing Test

Experiment: Real World Data Quality Test on EMNLP2017 WMT Reasonable Diversity Test on News Section EMNLP2017 WMT News Section

Poster #44 Conclusion • Key Ideas: • Use a max-max game to replace min-max game of GANs, while still focusing on minimization of JSD. • Use distribution-level signal from the introduced mediator in each step. • Advantage: • Works from scratch. • Trade-off invariant performance gain while still being computationally cheap enough.

CoT: Cooperative Training for Generative Modeling of Discrete Data - PowerPoint PPT Presentation

CoT: Cooperative Training for Generative Modeling of Discrete Data https://github.com/desire2020/CoT Sidi Lu, Lantao Yu, Siyuan Feng, Yaoming Zhu, Weinan Zhang, and Yong Yu Shanghai Jiao Tong University Autoregressive Models Autoregressive

generative design systems Generative Brief Design Definitions Workshop Processes

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Cooperative Game Theory Outline Introduction Relationship between Non-cooperative and

Cooperative Choice Cooperative and non-cooperative motives and their consequences via Mark

Crown-of- Thorns ( Acanthaster planci) outbreak in Malapascua island: Monitoring & Control

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

CLIMBS Life and General Insurance Cooperative CLIMBS Life and General Insurance Cooperative A

COOPERATIVE EDUCATION INVENTORY STUDY Association of Cooperative Christina Clamp, PhD.

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

NCSS NCSS NRCS NRCS NCSS NCSS National Cooperative Soil Survey National Cooperative Soil

Cooperative Concepts Kane County Division of Transportation (KDOT) July 19 th 2019 Presented by:

Outline Introduction Full-duplex system Cooperative system Cooperative full-duplex

Health Information Exchange Health Information Exchange Cooperative Agreement Program:

lecture 24 image capture - photography: model of image formation - image blur - camera

Sun Java TM System Identity Solution Stuart Sim Chief Architect Global Education & Research

Earnings Conference Call and Webcast July 30, 2015 Forward Looking Statements This

Outline Public-key crypto basics Public key encryption and signatures CSci 5271 Introduction to

CPSC 121: Models of Computation Instructor: Bob Woodham woodham@cs.ubc.ca Department of Computer

Chapter 2 Master the skill of converting between various radix systems. Data Representation

CS101 Lecture 03: Hexadecimal Numbers Text Representation Hexadecimal Numbers Text Encoding

ISO/IEC TS 18661 OVERVIEW 23 rd IEEE Symposium on Computer Arithmetic ARITH23 July 13, 2016

CoT: Cooperative Training for Generative Modeling of Discrete Data - PowerPoint PPT Presentation

CoT: Cooperative Training for Generative Modeling of Discrete Data https://github.com/desire2020/CoT Sidi Lu, Lantao Yu, Siyuan Feng, Yaoming Zhu, Weinan Zhang, and Yong Yu Shanghai Jiao Tong University Autoregressive Models Autoregressive

generative design systems Generative Brief Design Definitions Workshop Processes

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Cooperative Game Theory Outline Introduction Relationship between Non-cooperative and

Cooperative Choice Cooperative and non-cooperative motives and their consequences via Mark

Crown-of- Thorns ( Acanthaster planci) outbreak in Malapascua island: Monitoring &amp; Control

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

CLIMBS Life and General Insurance Cooperative CLIMBS Life and General Insurance Cooperative A

COOPERATIVE EDUCATION INVENTORY STUDY Association of Cooperative Christina Clamp, PhD.

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

NCSS NCSS NRCS NRCS NCSS NCSS National Cooperative Soil Survey National Cooperative Soil

Cooperative Concepts Kane County Division of Transportation (KDOT) July 19 th 2019 Presented by:

Outline Introduction Full-duplex system Cooperative system Cooperative full-duplex

Health Information Exchange Health Information Exchange Cooperative Agreement Program:

lecture 24 image capture - photography: model of image formation - image blur - camera

Sun Java TM System Identity Solution Stuart Sim Chief Architect Global Education &amp; Research

Earnings Conference Call and Webcast July 30, 2015 Forward Looking Statements This

Outline Public-key crypto basics Public key encryption and signatures CSci 5271 Introduction to

CPSC 121: Models of Computation Instructor: Bob Woodham woodham@cs.ubc.ca Department of Computer

Chapter 2 Master the skill of converting between various radix systems. Data Representation

CS101 Lecture 03: Hexadecimal Numbers Text Representation Hexadecimal Numbers Text Encoding

ISO/IEC TS 18661 OVERVIEW 23 rd IEEE Symposium on Computer Arithmetic ARITH23 July 13, 2016

Crown-of- Thorns ( Acanthaster planci) outbreak in Malapascua island: Monitoring & Control

Sun Java TM System Identity Solution Stuart Sim Chief Architect Global Education & Research