controlvae controllable variational autoencoder
play

ControlVAE: Controllable Variational Autoencoder Huajie Shao, - PowerPoint PPT Presentation

ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc.


  1. ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc. at Seattle 1

  2. Background--VAE Machine Translation Disentanglement representation learning 2

  3. VAE model VAE Encoder Decoder 4 3 3 5(4|3) 2(3|4) Desired #(%) )(%) '((%) * +(,|.) log(2(3|4) − Fig, The basic VAE model ∑ )(%)'((5 4 3 ||2 4 ) ELBO objective function Recon. term KL- divergence 3

  4. Background • KL-vanishing (posterior collapse) Ø KL tends to zero during model training • Trade-off between KL-divergence and KL vanishing reconstruction quality KL-divergence Recon. accuracy 4

  5. Related work Study Cons Description Cost annealing increase weight 𝛾 on KL from 0 until to 1 still suffer from KL-vanishing ( bowman2015) using sigmoid function after N steps 𝜸 -VAE assign a large and fixed weight to KL term fixed weight leads to high recon. error (higgins2017) in the VAE objective Drawback of existing work: TamingVAE formulate the reconstruction loss as an • suffer from local minima 1. Fixed weight on KL term, leading to high recon. error (rezende2018) optimization using Lagrangian multiplier • KL vanishing 2. KL vanishing (posterior collapse) FactorVAE Decompose KL into three terms: fixed weight, has high recon. error (kim2018) Index_code, total correlation and wise-KL Add a mutual information maximization infoVAE • fixed weight term to encourage mutual information (zhao2017) • cannot explicitly control KL value between x and z 5

  6. Motivation [1] Language modeling: KL vanishing [2] Disentanglement: information capacity (KL-divergence) 6

  7. ControlVAE Framework VAE Encoder Decoder 4 3 3 5(4|3) 2(3|4) Desired #(%) )(%) '((%) * +(,|.) log(2(3|4) − KL ∑ Controller )(%)'((5 4 3 ||2 4 ) + − VAE objective Feedback Fig, Framework of ControlVAE via dynamic learning 7

  8. ControlVAE Model Objective function: 𝛾 (t) Where is the output of a controller 8

  9. PID control algorithm D term I term P term PID algorithm e(t) § e(t) is the error between the real KL-divergence and the set point § K p is the coefficient for proportional (P) term § K i is the coefficient for integer (I) term § K d is the coefficient for derivative (D) term 9

  10. Non-linear PI Controller application-specific constant Insight of PI controller § When e(t) > 0: output " 𝐿𝑀 (t) is very small, reduce 𝛾 𝑢 , boost KL value; § When e(t) < 0: output " 𝐿𝑀 (t) is larger than set point, increase 𝛾 𝑢 to optimize KL term; KL Set point t 10

  11. Non-linear PI Controller " # 1 + exp(* + ) + Desired *(+) 5(+) "6(+) KL Value P ∑ ∑ Objective + 3 + − −" . / * 4 012 I Feedback Fig. PI controller 11

  12. Evaluation Applications: v Language modeling: text and dialog generation v Disentanglement representation learning v Image generation Benchmark datasets: o Language modeling: [1] Penn Tree Bank (PTB) [2] Switchboard(SW) telephone conversation o Disentanglement: DSprites o Image generation: CelebA 12

  13. Evaluation: Language modeling (PTB data) (c) Weight 𝛾(𝑢) (a) KL divergence (b) Recon. loss Baselines: 1) Cost annealing : gradually increases the weight on KL-divergence from 0 until to 1 after N steps using Sigmoid function 2) Cyclical annealing : splits the training process into M cycles and each increases the weight from 0 until to 1 using 13 a linear function

  14. Evaluation: Language modeling Switchboard (SW) to measure the diversity of generated text 14

  15. Evaluation: Disentanglement (Dsprites data) (b) Weight 𝛾(𝑢) (a) Recon. error (c) Disentangled factors Baselines: 1) \beta-VAE : Burgess, C. P., Higgins, I., Pal, A., Matthey,et al. (2018). Understanding disentangling in $\beta $- VAE. arXiv preprint arXiv:1804.03599. 2) FactorVAE : Kim, Hyunjik, and Andriy Mnih. "Disentangling by Factorising." In International Conference on 15 Machine Learning, pp. 2649-2658. 2018.

  16. Evaluation: Disentanglement ! -VAE ( ! = 100 ) FactorVAE ( % = 10 ) ControlVAE (KL=16) x y scale orient shape Fig., Example of traverse a single latent dimension in a range of [-3, 3] 16

  17. Evaluation: Image generation (b) KL divergence (a) Recon. loss 17

  18. Conclusion • Propose a new controllable VAE, ControlVAE, that combines a PI controller, with the basic VAE model. • Design a new non-linear PI controller, to automatically tune the weight in the VAE objective. • ControlVAE can not only avert the KL-vanishing, but also control the diversity of generated text. • Achieve better disentangling and reconstruction quality than the existing methods. 18

  19. Thank you very much!! Q&A 19

  20. Backup 20

  21. PI Parameter Tuning KL Set point • Tune K p , when output # 𝐿𝑀 (t) is very small, error >>0, t e.g., K p = 0.01 P term • Tune K i , when output # 𝐿𝑀 (t) is very large, e(t) < 0 e.g., K i = 0.001 or 0.0001 I term 21

  22. Set Point Guideline • The set point of KL-divergence is largely application specific. Ø Text generation : slightly increase the KL-divergence, denoted by KL vae , produced by the basic VAE or by Cost annealing method. Ø ELBO improvement : KL should be increased within the following bound 22

  23. ELBO improvement 23

Recommend


More recommend