CS 4803 / 7643: Deep Learning Topics: – Variational Auto-Encoders (VAEs) – Reparameterization trick Dhruv Batra Georgia Tech
Administrativia • HW4 Grades Released – Regrade requests close: 12/03, 11:55pm – Please check solutions first! • Grade histogram: 7643 – Max possible: 100 (regular credit) + 40 (extra credit) (C) Dhruv Batra 2
Administrativia • HW4 Grades Released – Regrade requests close: 12/03, 11:55pm – Please check solutions first! • Grade histogram: 4803 – Max possible: 100 (regular credit) + 40 (extra credit) (C) Dhruv Batra 3
Recap from last time (C) Dhruv Batra 4
Variational Autoencoders (VAE)
So far... PixelCNNs define tractable density function, optimize likelihood of training data: VAEs define intractable density function with latent z : Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 7
Autoencoders Reconstructed data Train such that features Doesn’t use labels! L2 Loss function: can be used to reconstruct original data Reconstructed Encoder : 4-layer conv input data Decoder : 4-layer upconv Decoder Input data Features Encoder Input data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Autoencoders Autoencoders can reconstruct data, and can learn features to initialize a supervised model Reconstructed Features capture factors of input data variation in training data. Can we generate new images from an Decoder autoencoder? Features Encoder Input data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! q 𝜚 𝑨 𝑦 p 𝜄 𝑦 𝑨 Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 11
Key problem • P(z|x) (C) Dhruv Batra 12
What is Variational Inference? • Key idea – Reality is complex – Can we approximate it with something “simple”? – Just make sure simple thing is “close” to the complex thing. (C) Dhruv Batra 13
Intuition (C) Dhruv Batra 14
The general learning problem with missing data • Marginal likelihood – x is observed, z is missing: N Y ll ( θ : D ) = log P ( x i | θ ) i =1 N X = log P ( x i | θ ) i =1 N X X = log P ( x i , z | θ ) i =1 z (C) Dhruv Batra 15
Jensen’s inequality • Use: log å z P( z ) g( z ) ≥ å z P( z ) log g( z ) (C) Dhruv Batra 16
Applying Jensen’s inequality • Use: log å z P( z ) g( z ) ≥ å z P( z ) log g( z ) (C) Dhruv Batra 17
Evidence Lower Bound • Define potential function F( q ,Q): N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 18
ELBO: Factorization #1 (GMMs) N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 19
ELBO: Factorization #2 (VAEs) N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 20
Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 21
Amortized Inference Neural Networks (C) Dhruv Batra 22
VAEs (C) Dhruv Batra 23 Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder
Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! q 𝜚 𝑨 𝑦 p 𝜄 𝑦 𝑨 Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders Maximize Putting it all together: maximizing the Sample x|z from likelihood of likelihood lower bound original input being reconstructed Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders: Generating Data Use decoder network. Now sample z from prior! Data manifold for 2-d z Sample x|z from Vary z 1 Decoder network Sample z from Vary Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 z 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders: Generating Data Diagonal prior on z => independent Degree of smile latent variables Different Vary dimensions of z z 1 encode interpretable factors of variation Head pose Vary Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 z 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Plan for Today • VAEs – Reparameterization trick (C) Dhruv Batra 32
Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 33
Variational Auto Encoders Maximize Putting it all together: maximizing the Sample x|z from likelihood of likelihood lower bound original input being reconstructed Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
Basic Problem n E z ∼ p θ ( z ) [ f ( z )] (C) Dhruv Batra 36
Basic Problem min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 37
Basic Problem min E z ∼ p θ ( z ) [ f ( z )] • Goal θ r θ E z ∼ p θ ( z ) [ f ( z )] • Need to compute: (C) Dhruv Batra 38
Basic Problem r θ E z ∼ p θ ( z ) [ f ( z )] • Need to compute: (C) Dhruv Batra 39
Example (C) Dhruv Batra 40
Does this happen in supervised learning? min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 41
But what about other kinds of learning? min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 42
Two Options • Score Function based Gradient Estimator aka REINFORCE (and variants) • Path Derivative Gradient Estimator aka “reparameterization trick” (C) Dhruv Batra 43
Option 1 • Score Function based Gradient Estimator aka REINFORCE (and variants) (C) Dhruv Batra 44
Recommend
More recommend