mcmc and variational inference for autoencoders
play

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , - PowerPoint PPT Presentation

Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1 Ecole


  1. Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1 Ecole Polytechnique, 2 ENS Paris-Saclay September 9, 2020 MCMC and Variational Inference for AutoEncoders

  2. Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders

  3. Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments Problem MCMC and Variational Inference for AutoEncoders

  4. Introduction Deep Latent Generative Models (DLGMs) MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments Generative modelling objective ◮ Objective: Learn and sample from a model of the true underlying data distribution p ∗ given a dataset { x 1 , . . . , x n } where x i ∈ R P , with P ≫ 1 . ◮ Two-steps ◮ Specify a class of model { p θ , θ ∈ Θ } . θ n by maximizing the likelihood ◮ Find the best ˆ n θ n = arg max ˆ � log p θ ( x i ) . θ i =1 MCMC and Variational Inference for AutoEncoders

  5. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) Variational Inference Implementation & Deep Learning MetFlow and MetVAE: MCMC & VI From classical to Flow-based MCMC Experiments MCMC and Variational Inference for AutoEncoders

  6. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Latent variable modelling ◮ Autoencoders assume the existence of a latent variable whose dimension D is much smaller than the dimension of the observation P . ◮ Attached to the latent variable z ∈ R D is a prior distribution π from which we can sample from. ◮ The specification of the model is completed by specifying the conditional distribution of the observation x given the latent variable z : x | z ∼ p θ ( x | z ) ◮ The marginal likelihood of the observations is obtained by computing first the joint distribution of the observation and the latent variable p θ ( x, z ) = p θ ( x | z ) π ( z ) and then marginalizing w.r.t. the latent variable z : � p θ ( x ) = p θ ( x | z ) π ( z )d z . MCMC and Variational Inference for AutoEncoders

  7. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Data Generation with Latent variables ◮ Draw latent variable z ∼ π . ◮ Draw observation x | z ∼ p θ ( x | z ) . ◮ Each region in the latent space is associated to a particular form of observation. MCMC and Variational Inference for AutoEncoders

  8. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Optimisation of the model ◮ Estimation Perform maximum likelihood estimation with stochastic gradient techniques. ◮ Obtain unbiased estimators of the gradient of � p θ ( x ) = p θ ( x | z ) π ( z )d z . ◮ Usually untractable !! MCMC and Variational Inference for AutoEncoders

  9. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Fisher’s Identity ◮ Idea: take advantage of Fisher’s identity: � ∇ θ p θ ( x, z ) ∇ θ log p θ ( x ) = d z p θ ( x ) � ∇ θ log p θ ( x, z ) p θ ( x, z ) = p θ ( x ) d z � = ∇ θ log p θ ( x, z ) p θ ( z | x )d z . ◮ Gradient of incomplete likelihood of the observations is computed using the complete likelihood (which is tractable !) ◮ However, we need to sample from the posterior p θ ( z | x ) . MCMC and Variational Inference for AutoEncoders

  10. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Markov Chain Monte Carlo ◮ Idea: Build an ergodic Markov chain whose invariant distribution is the target, known up to a normalization constant: p θ ( z | x ) ∝ π ( z ) p θ ( x | z ) . ◮ Metropolis Hastings (MH) algorithms is an option - Draw a proposal z ′ from q θ ( z ′ | z, x ) - Accept / Reject the proposal with probability α θ ( z, z ′ ) = 1 ∧ p θ ( z ′ | x ) q θ ( z | z ′ , x ) p θ ( z | x ) q θ ( z ′ | z, x ) . Figure: Markov chain targetting a correlated Gaussian distribution MCMC and Variational Inference for AutoEncoders

  11. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Markov Chain Monte Carlo ◮ Many recent advances for efficient MCMC methods, using Langevin dynamics, Hamiltonian Monte Carlo. ◮ Pros: provide a theoretically sound framework to sample from p θ ( z | x ) ∝ p θ ( x | z ) π ( z ) (known up to a constant). ◮ Cons: − mixing times in high dimensions. − convergence assessment. − multimodality (metastability). ◮ But Cons do not always outweights the Pros, see [HM19] MCMC and Variational Inference for AutoEncoders

  12. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Variational Inference ◮ Idea: Introduce a parametric family of probability distributions Q = { q φ , φ ∈ Φ } . ◮ Goal minimize a divergence between q φ and the untractable posterior p θ ( · | x ) . ◮ For each observation x : different target posterior p θ ( z | x ) . ◮ Idea: use amortized Variational Inference: x �→ q φ ( z | x ) . MCMC and Variational Inference for AutoEncoders

  13. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Variational Inference ◮ Evidence Lower BOund (ELBO) � p θ ( x, z ) � � ELBO( θ, φ ; x ) = log q φ ( z | x )d z q φ ( z | x ) � p θ ( z | x ) p θ ( x ) � � = log q φ ( z | x ) q φ ( z | x ) = log p θ ( x ) − KL( q φ ( z | x ) � p θ ( z | x )) ≤ log p θ ( x ) . ◮ The ELBO is a lower bound of the incomplete data likelihood also referred to as the evidence. - the bound is tight if Q contains the true posterior p θ ( · | x ) . ◮ The KL divergence measures the discrepancy when approximating the posterior with the variational distribution. - Can be replaced by f -divergence. ◮ The ELBO is tractable and can be easily optimized using the reparameterization trick, crucial for stochastic gradient descent. MCMC and Variational Inference for AutoEncoders

  14. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments Variational Auto Encoder The Variational Auto Encoder (VAE) builds on the representational power of (Deep) Neural Networks to implement a very flexible class of encoders q φ ( z | x ) and decoders p θ ( z | x ) . ◮ The encoder q φ is parameterized by a deep neural network, which takes as input the observation x and outputs parameters for the distribution q φ ( · | x ) . ◮ The decoder p θ ( z | x ) is built symmetrically as a neural network which takes as input a latent variable z and outputs the parameters of the distribution p θ ( x | z ) . MCMC and Variational Inference for AutoEncoders

  15. Introduction Deep Latent Generative Models (DLGMs) Markov Chain Monte Carlo (MCMC) MetFlow and MetVAE: MCMC & VI Variational Inference From classical to Flow-based MCMC Implementation & Deep Learning Experiments ”Classical” implementation ◮ In most examples, the dimension P of the observation x is large. ◮ The dimension of the latent space D is typically much smaller. ◮ The distribution of the latent variable denoted π is Gaussian. ◮ ... More sophisticated proposals can be considere: Gaussian mixture or hierarchical priors. ◮ In the vanilla implementation the variational distribution q φ ( · | x ) is q φ ( z | x ) = N( z ; µ φ ( x ) , σ φ ( x ) Id) where µ φ ( x ) , σ φ ( x ) are the output of a neural network taking the observation x as input. This parameterization is often referred to as the mean field approximation. MCMC and Variational Inference for AutoEncoders

Recommend


More recommend