cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs) Reparameterization trick Dhruv Batra Georgia Tech Administrativia HW4 Grades Released Regrade requests close: 12/03, 11:55pm Please check solutions


  1. CS 4803 / 7643: Deep Learning Topics: – Variational Auto-Encoders (VAEs) – Reparameterization trick Dhruv Batra Georgia Tech

  2. Administrativia • HW4 Grades Released – Regrade requests close: 12/03, 11:55pm – Please check solutions first! • Grade histogram: 7643 – Max possible: 100 (regular credit) + 40 (extra credit) (C) Dhruv Batra 2

  3. Administrativia • HW4 Grades Released – Regrade requests close: 12/03, 11:55pm – Please check solutions first! • Grade histogram: 4803 – Max possible: 100 (regular credit) + 40 (extra credit) (C) Dhruv Batra 3

  4. Recap from last time (C) Dhruv Batra 4

  5. Variational Autoencoders (VAE)

  6. So far... PixelCNNs define tractable density function, optimize likelihood of training data: VAEs define intractable density function with latent z : Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  7. Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 7

  8. Autoencoders Reconstructed data Train such that features Doesn’t use labels! L2 Loss function: can be used to reconstruct original data Reconstructed Encoder : 4-layer conv input data Decoder : 4-layer upconv Decoder Input data Features Encoder Input data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  9. Autoencoders Autoencoders can reconstruct data, and can learn features to initialize a supervised model Reconstructed Features capture factors of input data variation in training data. Can we generate new images from an Decoder autoencoder? Features Encoder Input data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  10. Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! q 𝜚 𝑨 𝑦 p 𝜄 𝑦 𝑨 Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

  11. Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 11

  12. Key problem • P(z|x) (C) Dhruv Batra 12

  13. What is Variational Inference? • Key idea – Reality is complex – Can we approximate it with something “simple”? – Just make sure simple thing is “close” to the complex thing. (C) Dhruv Batra 13

  14. Intuition (C) Dhruv Batra 14

  15. The general learning problem with missing data • Marginal likelihood – x is observed, z is missing: N Y ll ( θ : D ) = log P ( x i | θ ) i =1 N X = log P ( x i | θ ) i =1 N X X = log P ( x i , z | θ ) i =1 z (C) Dhruv Batra 15

  16. Jensen’s inequality • Use: log å z P( z ) g( z ) ≥ å z P( z ) log g( z ) (C) Dhruv Batra 16

  17. Applying Jensen’s inequality • Use: log å z P( z ) g( z ) ≥ å z P( z ) log g( z ) (C) Dhruv Batra 17

  18. Evidence Lower Bound • Define potential function F( q ,Q): N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 18

  19. ELBO: Factorization #1 (GMMs) N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 19

  20. ELBO: Factorization #2 (VAEs) N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 20

  21. Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 21

  22. Amortized Inference Neural Networks (C) Dhruv Batra 22

  23. VAEs (C) Dhruv Batra 23 Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder

  24. Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! q 𝜚 𝑨 𝑦 p 𝜄 𝑦 𝑨 Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

  25. Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  26. Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  27. Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  28. Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  29. Variational Auto Encoders Maximize Putting it all together: maximizing the Sample x|z from likelihood of likelihood lower bound original input being reconstructed Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  30. Variational Auto Encoders: Generating Data Use decoder network. Now sample z from prior! Data manifold for 2-d z Sample x|z from Vary z 1 Decoder network Sample z from Vary Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 z 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  31. Variational Auto Encoders: Generating Data Diagonal prior on z => independent Degree of smile latent variables Different Vary dimensions of z z 1 encode interpretable factors of variation Head pose Vary Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 z 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  32. Plan for Today • VAEs – Reparameterization trick (C) Dhruv Batra 32

  33. Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 33

  34. Variational Auto Encoders Maximize Putting it all together: maximizing the Sample x|z from likelihood of likelihood lower bound original input being reconstructed Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  35. Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  36. Basic Problem n E z ∼ p θ ( z ) [ f ( z )] (C) Dhruv Batra 36

  37. Basic Problem min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 37

  38. Basic Problem min E z ∼ p θ ( z ) [ f ( z )] • Goal θ r θ E z ∼ p θ ( z ) [ f ( z )] • Need to compute: (C) Dhruv Batra 38

  39. Basic Problem r θ E z ∼ p θ ( z ) [ f ( z )] • Need to compute: (C) Dhruv Batra 39

  40. Example (C) Dhruv Batra 40

  41. Does this happen in supervised learning? min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 41

  42. But what about other kinds of learning? min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 42

  43. Two Options • Score Function based Gradient Estimator aka REINFORCE (and variants) • Path Derivative Gradient Estimator aka “reparameterization trick” (C) Dhruv Batra 43

  44. Option 1 • Score Function based Gradient Estimator aka REINFORCE (and variants) (C) Dhruv Batra 44

Recommend


More recommend