implicit reparameterization gradients
play

Implicit Reparameterization Gradients Michael Figurnov, Shakir - PowerPoint PPT Presentation

Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room 210 #33 Reparameterization gradients Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with


  1. Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room 210 #33

  2. Reparameterization gradients Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables Implicit Reparameterization Gradients — Michael Figurnov

  3. Reparameterization gradients Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables continuous differentiable backpropagation (Normal, ...) (ELBO, …) Implicit Reparameterization Gradients — Michael Figurnov

  4. Reparameterization gradients Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables requires a tractable inverse transformation! Normal, Logistic, … continuous differentiable backpropagation (Normal, ...) (ELBO, …) Implicit Reparameterization Gradients — Michael Figurnov

  5. Reparameterization gradients Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables requires a tractable inverse transformation! Normal, Logistic, … We show how to use implicit differentiation for reparameterization of other continuous random variables, such as Gamma and von Mises Implicit Reparameterization Gradients — Michael Figurnov

  6. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit Implicit Reparameterization Gradients — Michael Figurnov

  7. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit Implicit Reparameterization Gradients — Michael Figurnov

  8. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit Implicit Reparameterization Gradients — Michael Figurnov

  9. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit using any sampler Implicit (e.g., rejection sampling) Implicit Reparameterization Gradients — Michael Figurnov

  10. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit using any sampler Implicit (e.g., rejection sampling) Implicit Reparameterization Gradients — Michael Figurnov

  11. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit using any sampler Implicit (e.g., rejection sampling) Derivation: implicit differentiation Implicit Reparameterization Gradients — Michael Figurnov

  12. Explicit and implicit reparameterization Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit using any sampler often not implemented in numerical libraries Implicit (e.g., rejection sampling) Derivation: implicit differentiation Implicit Reparameterization Gradients — Michael Figurnov

  13. How to compute ? Relative metrics (lower is better) Gamma Von Mises Method Error Time Error Time Automatic differentiation of the CDF code 1x 1x 1x 1x Finite difference 832x 2x 514x 1.2x Jankowiak & Obermeyer (2018) 18x 5x - - concurrent work; closed-form approximation Jankowiak, Obermeyer “Pathwise Derivatives Beyond the Reparameterization Trick.” ICML, 2018 Implicit Reparameterization Gradients — Michael Figurnov

  14. How to compute ? Relative metrics (lower is better) Gamma Von Mises Method Error Time Error Time Automatic differentiation of the CDF code 1x 1x 1x 1x Finite difference 832x 2x 514x 1.2x Jankowiak & Obermeyer (2018) 18x 5x - - concurrent work; closed-form approximation Knowles (2015) 2840x 63x - - approximate explicit reparameterization Knowles, “Stochastic gradient variational Bayes for Gamma approximating distributions.” arXiv, 2015 Jankowiak, Obermeyer “Pathwise Derivatives Beyond the Reparameterization Trick.” ICML, 2018 Implicit Reparameterization Gradients — Michael Figurnov

  15. Variational Autoencoder 2D latent spaces for MNIST 3 3 -3 3 Normal prior and posterior Implicit Reparameterization Gradients — Michael Figurnov

  16. Variational Autoencoder 2D latent spaces for MNIST 3 3 𝜌 𝜌 -3 3 - 𝜌 𝜌 Normal prior and posterior Uniform prior, von Mises posterior Torus adapted from https://en.wikipedia.org/wiki/Torus#/media/File:Sphere-like_degenerate_torus.gif Implicit Reparameterization Gradients — Michael Figurnov

  17. Variational Autoencoder 2D latent spaces for MNIST 3 3 𝜌 𝜌 Also in the paper: Latent Dirichlet Allocation -3 3 - 𝜌 𝜌 Normal prior and posterior Uniform prior, von Mises posterior Torus adapted from https://en.wikipedia.org/wiki/Torus#/media/File:Sphere-like_degenerate_torus.gif Implicit Reparameterization Gradients — Michael Figurnov

  18. Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih A more general view of the reparameterization gradients ● ○ Decouple sampling from gradient estimation ● Reparameterization gradients for Gamma, von Mises, Beta, Dirichlet, ... ○ Faster and more accurate than the alternatives Implemented in TensorFlow Probability: ○ tfp.distributions.{Gamma,VonMises,Beta,Dirichlet,...} ● Move away from making modelling choices for computational convenience Poster: Room 210 #33 Implicit Reparameterization Gradients — Michael Figurnov

Recommend


More recommend