ControlVAE: Controllable Variational Autoencoder Huajie Shao, - PowerPoint PPT Presentation

ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc. at Seattle 1

Background--VAE Machine Translation Disentanglement representation learning 2

VAE model VAE Encoder Decoder 4 3 3 5(4|3) 2(3|4) Desired #(%) )(%) '((%) * +(,|.) log(2(3|4) − Fig, The basic VAE model ∑ )(%)'((5 4 3 ||2 4 ) ELBO objective function Recon. term KL- divergence 3

Background • KL-vanishing (posterior collapse) Ø KL tends to zero during model training • Trade-off between KL-divergence and KL vanishing reconstruction quality KL-divergence Recon. accuracy 4

Related work Study Cons Description Cost annealing increase weight 𝛾 on KL from 0 until to 1 still suffer from KL-vanishing ( bowman2015) using sigmoid function after N steps 𝜸 -VAE assign a large and fixed weight to KL term fixed weight leads to high recon. error (higgins2017) in the VAE objective Drawback of existing work: TamingVAE formulate the reconstruction loss as an • suffer from local minima 1. Fixed weight on KL term, leading to high recon. error (rezende2018) optimization using Lagrangian multiplier • KL vanishing 2. KL vanishing (posterior collapse) FactorVAE Decompose KL into three terms: fixed weight, has high recon. error (kim2018) Index_code, total correlation and wise-KL Add a mutual information maximization infoVAE • fixed weight term to encourage mutual information (zhao2017) • cannot explicitly control KL value between x and z 5

Motivation [1] Language modeling: KL vanishing [2] Disentanglement: information capacity (KL-divergence) 6

ControlVAE Framework VAE Encoder Decoder 4 3 3 5(4|3) 2(3|4) Desired #(%) )(%) '((%) * +(,|.) log(2(3|4) − KL ∑ Controller )(%)'((5 4 3 ||2 4 ) + − VAE objective Feedback Fig, Framework of ControlVAE via dynamic learning 7

ControlVAE Model Objective function: 𝛾 (t) Where is the output of a controller 8

PID control algorithm D term I term P term PID algorithm e(t) § e(t) is the error between the real KL-divergence and the set point § K p is the coefficient for proportional (P) term § K i is the coefficient for integer (I) term § K d is the coefficient for derivative (D) term 9

Non-linear PI Controller application-specific constant Insight of PI controller § When e(t) > 0: output " 𝐿𝑀 (t) is very small, reduce 𝛾 𝑢 , boost KL value; § When e(t) < 0: output " 𝐿𝑀 (t) is larger than set point, increase 𝛾 𝑢 to optimize KL term; KL Set point t 10

Non-linear PI Controller " # 1 + exp(* + ) + Desired *(+) 5(+) "6(+) KL Value P ∑ ∑ Objective + 3 + − −" . / * 4 012 I Feedback Fig. PI controller 11

Evaluation Applications: v Language modeling: text and dialog generation v Disentanglement representation learning v Image generation Benchmark datasets: o Language modeling: [1] Penn Tree Bank (PTB) [2] Switchboard(SW) telephone conversation o Disentanglement: DSprites o Image generation: CelebA 12

Evaluation: Language modeling (PTB data) (c) Weight 𝛾(𝑢) (a) KL divergence (b) Recon. loss Baselines: 1) Cost annealing : gradually increases the weight on KL-divergence from 0 until to 1 after N steps using Sigmoid function 2) Cyclical annealing : splits the training process into M cycles and each increases the weight from 0 until to 1 using 13 a linear function

Evaluation: Language modeling Switchboard (SW) to measure the diversity of generated text 14

Evaluation: Disentanglement (Dsprites data) (b) Weight 𝛾(𝑢) (a) Recon. error (c) Disentangled factors Baselines: 1) \beta-VAE : Burgess, C. P., Higgins, I., Pal, A., Matthey,et al. (2018). Understanding disentangling in $\beta $- VAE. arXiv preprint arXiv:1804.03599. 2) FactorVAE : Kim, Hyunjik, and Andriy Mnih. "Disentangling by Factorising." In International Conference on 15 Machine Learning, pp. 2649-2658. 2018.

Evaluation: Disentanglement ! -VAE ( ! = 100 ) FactorVAE ( % = 10 ) ControlVAE (KL=16) x y scale orient shape Fig., Example of traverse a single latent dimension in a range of [-3, 3] 16

Evaluation: Image generation (b) KL divergence (a) Recon. loss 17

Conclusion • Propose a new controllable VAE, ControlVAE, that combines a PI controller, with the basic VAE model. • Design a new non-linear PI controller, to automatically tune the weight in the VAE objective. • ControlVAE can not only avert the KL-vanishing, but also control the diversity of generated text. • Achieve better disentangling and reconstruction quality than the existing methods. 18

Thank you very much!! Q&A 19

Backup 20

PI Parameter Tuning KL Set point • Tune K p , when output # 𝐿𝑀 (t) is very small, error >>0, t e.g., K p = 0.01 P term • Tune K i , when output # 𝐿𝑀 (t) is very large, e(t) < 0 e.g., K i = 0.001 or 0.0001 I term 21

Set Point Guideline • The set point of KL-divergence is largely application specific. Ø Text generation : slightly increase the KL-divergence, denoted by KL vae , produced by the basic VAE or by Cost annealing method. Ø ELBO improvement : KL should be increased within the following bound 22

ELBO improvement 23

ControlVAE: Controllable Variational Autoencoder Huajie Shao, - PowerPoint PPT Presentation

ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc.

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder Zeyan

T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion Tianming Wang

Transfer Learning Approach for Botnet Detection based on Recurrent Variational Autoencoder

Machine Learning Lecture 12: Variational Autoencoder Nevin L. Zhang lzhang@cse.ust.hk

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Controllable Response Generation Susana Benavidez Andrew Kirjner Nick Seay Mentor: Sina

CS598LAZ - Variational Autoencoders Raymond Yeh, Junting Lou, Teck-Yian Lim Outline - Review

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2

Q3 2020 earnings call November 5, 2020 Please refer to page 2 for risks and uncertainties related

Department of Pediatrics Meeting October 29, 2020 Topic: Department of Pediatrics Year-End

Linear Optimal Control (LQR) Robert Platt Northeastern University The linear control problem

APT TECHNICAL CPD - MAF TRANSFER PRICING AND PERFORMANCE EVALUATION Transfer Pricing and

Optimal Control McGill COMP 765 Oct 3 rd , 2017 Classical Control Quiz Question 1: Can a PID

2016 / 17: Quarter 4 (Year End) Communities and Housing Committee Director of Development

Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology

Control of Networks Algorithms, Fundamental Limitations, Impossibility Results Alex Olshevsky

ControlVAE: Controllable Variational Autoencoder Huajie Shao, - PowerPoint PPT Presentation

ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc.

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Lecture 8: Autoencoder &amp; DBM Princeton University COS 495 Instructor: Yingyu Liang

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder Zeyan

T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion Tianming Wang

Transfer Learning Approach for Botnet Detection based on Recurrent Variational Autoencoder

Machine Learning Lecture 12: Variational Autoencoder Nevin L. Zhang lzhang@cse.ust.hk

&lt;Title&gt; Yiqun Hu, SP Group Agenda Condition monitoring &amp; anomaly detection

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Controllable Response Generation Susana Benavidez Andrew Kirjner Nick Seay Mentor: Sina

CS598LAZ - Variational Autoencoders Raymond Yeh, Junting Lou, Teck-Yian Lim Outline - Review

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

The Variational Predictive Natural Gradient Da Tang 1 Rajesh Ranganath 2 1 Columbia University 2

Q3 2020 earnings call November 5, 2020 Please refer to page 2 for risks and uncertainties related

Department of Pediatrics Meeting October 29, 2020 Topic: Department of Pediatrics Year-End

Linear Optimal Control (LQR) Robert Platt Northeastern University The linear control problem

APT TECHNICAL CPD - MAF TRANSFER PRICING AND PERFORMANCE EVALUATION Transfer Pricing and

Optimal Control McGill COMP 765 Oct 3 rd , 2017 Classical Control Quiz Question 1: Can a PID

2016 / 17: Quarter 4 (Year End) Communities and Housing Committee Director of Development

Neural Packet Routing Shihan Xiao , Haiyan Mao, Bo Wu, Wenjie Liu, Fenglin Li Network Technology

Control of Networks Algorithms, Fundamental Limitations, Impossibility Results Alex Olshevsky

Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection