variational laplace autoencoders
play

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and - PowerPoint PPT Presentation

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and Learning Lab Seoul National University, South Korea Introduction - Variational Autoencoders - Two Challenges of Amortized Variational Inference -


  1. Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and Learning Lab Seoul National University, South Korea

  2. Introduction - Variational Autoencoders - Two Challenges of Amortized Variational Inference - Contributions

  3. Variational Autoencoders (VAEs) ā€¢ Generative network šœ„ š‘ž # š² š“ = š’Ŗ(š’‰ # š“ , šœ , š‰) , š‘ž š“ = š’Ŗ(šŸ, š‰) ā€¢ Inference network šœš : amortized inference of š‘ž # š“ š² , (š²) ) š‘Ÿ 2 š“ š² = š’Ŗ(š‚ 2 š² , diag š‰ 2 ā€¢ Networks jointly trained by maximizing the Evidence Lower Bound (ELBO) ā„’ š² = š”½ ; log š‘ž # š², š“ āˆ’ log š‘Ÿ 2 š“ š² = log š‘ž # š² āˆ’ šø @A (š‘Ÿ 2 š“ š² āˆ„ š‘ž # š“ š² ) ā‰¤ log š‘ž # (š²) Kingma, D. P. and Welling, M. Auto-encoding variational bayes. In ICLR , 2014. Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In ICML , 2014

  4. Two Challenges of Amortized Variational Inference 1. Enhancing the expressiveness of š‘Ÿ 2 š“ š² ā€¢ The full-factorized assumption is restrictive to capture complex posteriors ā€¢ E.g. normalizing flows (Rezende & Mohamed, 2015; Kingma et al., 2016) 2. Reducing the amortization error of š‘Ÿ 2 š“ š² ā€¢ The error due to the inaccuracy of the inference network ā€¢ E.g. gradient-based refinements of š‘Ÿ 2 š“ š² (Kim et al, 2018; Marino et al., 2018; Krishnan et al. 2018) Rezende, D. J. and Mohamed, S. Variational inference with normalizing flows. In ICML , 2015. Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. Improved variational inference with inverse autoregressive flow. In NeurIPS , 2016. Kim, Y., Wiseman, S., Millter, A. C., Sontag, D., and Rush, A. M. Semi-amortized variational autoencoders. In ICML , 2018. Marino, J., Yisong, Y., and Mandt, S. Iterative amortized inference. In ICML , 2018. Krishnan, R. G., Liang, D., and Hoffman, M. D. On the challenges of learning with inference networks on sparse high-dimensional data. In AISTAT , 2018.

  5. Contributions ā€¢ The Laplace approximation of the posterior to improve the training of latent deep generative models with: 1. Enhanced expressiveness of full-covariance Gaussian posterior 2. Reduced amortization error due to direct covariance computation from the generative network behavior ā€¢ A novel posterior inference exploiting local linearity of ReLU networks

  6. Approach - Posterior Inference using Local Linear Approximations - Generalization: Variational Laplace Autoencoders

  7. Observation 1: Probabilistic PCA ā€¢ A linear Gaussian model (Tipping & Bishop, 1999) š‘ž(š“) = š’Ŗ šŸ, š‰ š‘ž # š² š“ = š’Ŗ(š—š“ + šœ, šœ , š‰) ā€¢ The posterior distribution is exactly 1 šœ , šš»š— š” š² āˆ’ šœ , šš» š‘ž # š“ š² = š’Ŗ NO 1 šœ , š— š” š— + š‰ where šš» = Toy example . 1-dim pPCA on 2-dim data Tipping, M. E. and Bishop, C. M. Probabilistic Principal Component Analysis. J. R. Statist. Soc. B , 61(3):611ā€“622, 1999.

  8. Observation 2: Piece-wise Linear ReLU Networks ā€¢ ReLU networks are piece-wise linear (Pascanu et al., 2014; Montufar et al., 2014) š’‰ # š“ ā‰ˆ š— š“ š“ + šœ š“ ā€¢ Locally equivalent to probabilistic PCA š‘ž # š² š“ ā‰ˆ š’Ŗ(š— š“ š“ + šœ š“ , šœ , š‰) Toy example . 1-dim ReLUVAE on 2-dim data Pascanu, R., Montufar, G., and Bengio, Y. On the number of response regions of deep feedforward networks with piecewise linear activations. In ICLR , 2014. Montufar, G., Pascanu, R., Cho, K., and Bengio, Y. On the number of linear regions of deep neural networks. In NeurIPS , 2014.

  9. Posterior Inference using Local Linear Approximations Linear models give exact ReLU networks are posterior distribution locally linear Observation 2 Observation 1 Posterior approximation based on the local linearity

  10. Posterior Inference using Local Linear Approximations 1. Iteratively find the posterior mode š‚ where the density is concentrated ā€¢ Solve under the linear assumption š’‰ # š‚ š’– ā‰ˆ š— š’– š‚ R + šœ š’– NO š‚ RSO = 1 1 š” š² āˆ’ šœ š” š— R + š‰ šœ , š— R š— R šœ , ā€¢ Repeat for T steps 2. Posterior approximation using š‘ž # š² š“ ā‰ˆ š’Ŗ(š— š‚ š“ + šœ š‚ , šœ , š‰) NO 1 š” š— š‚ + š‰ š‘Ÿ š“ š² = š’Ŗ š‚, šš» , where šš» = šœ , š— š‚

  11. Generalization: Variational Laplace Autoencoders 1. Find the posterior mode s.t. āˆ‡ š“ log š‘ž š², š“ | š“Vš‚ = 0 ā€¢ Initialize š‚ X using the inference network ā€¢ Iteratively refine š‚ R (e.g. use gradient-descent) 2. The Laplace approximation defines the posterior as: š‘Ÿ š“ š² = š’Ŗ š‚, šš» , where šš» NšŸ = šš³ = āˆ’āˆ‡ š“ , log š‘ž š², š“ | š“Vš‚ 3. Evaluate the ELBO using š‘Ÿ š“ š² and train the model

  12. Results - Posterior Covariance - Log-likelihood Results

  13. Experiments ā€¢ Image datasets: MNIST, OMNIGLOT, Fashion MNIST, SVHN, CIFAR10 ā€¢ Baselines ā€¢ VAE ā€¢ Semi-Amortized (SA) VAE (Kim et al, 2018) ā€¢ VAE + Householder Flows (HF) (Tomczak & Welling, 2016) ā€¢ Variational Laplace Autoencoder (VLAE) ā€¢ T=1, 2, 4, 8 (number of iterative updates or flows)

  14. Posterior Covariance Matrices

  15. Log-likelihood Results on CIFAR10 2390 2370 2350 VAE SA-VAE VAE+HF VLAE T=1 T=2 T=3 T=4

  16. Thank you Visit our poster session at Pacific Ballroom #2 Code available at : https://github.com/yookoon/VLAE

Recommend


More recommend