Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB
Objectives • Learn how VAEs help in sampling from a data distribution • Write the objective function of a VAE • Derive how VAE objective is adapted for SGD
VAE setup • We are interested in maximizing the data likelihood 𝑄 𝑌 = 𝑄 𝑌 𝑨; 𝜄 𝑄 𝑨 𝑒𝑨 • Let 𝑄 𝑌 𝑨; 𝜄 be modeled by 𝑔 𝑨; 𝜄 • Further, let us assume that 𝑄 𝑌 𝑨; 𝜄 = 𝒪 𝑌 𝑔 𝑨; 𝜄 , 𝜏 2 𝐽 Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
We do not care about distribution of z • Latent variable z is drawn from a standard normal z ~ N (0, I ) θ X N • It may represent many different variations of the data Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Example of a variable transformation z X = g(z) = z/10 + z/‖z‖ Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Because of Gaussian assumption, the most obvious variation may not be the most likely • Although the ‘2’ on the right is a better choice as a variation of the one on the left, the one in the middle is more likely due to the Gaussian assumption Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Sampling z from standard normal is problematic • It may give samples of z that are unlikely to have produced X • Can we sample z itself intelligently? • Enter Q( z | X ) to compute, e.g., E z~Q P(X|z) • All we need to do is reduce the KL divergence between P(X) and E z~Q P(X|z) • Hence, a variational method Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
VAE Objective Setup D [ Q ( z ) ‖ P ( z | X )] = E z ~ Q [log Q (z) − log P ( z | X )] = E z ~ Q [log Q (z) − log P ( X | z ) − log P (z)] + log P ( X ) Rearranging some terms: log P ( X ) − D [ Q ( z ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z ) ‖ P ( z )] Introducing dependency of Q on X : log P ( X ) − D [ Q ( z | X ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Optimizing the RHS • Q is encoding X into z; P ( X | z ) is decoding z • Assume in LHS Q ( z | X ) is a high capacity NN • For: E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] • Assume: Q ( z | X ) = N (z| μ ( X; θ ),∑( X; θ )) • Then KL divergence is: D [ N ( μ ( X ),Σ( X )) ‖ N ( 0 , I )] = 1/2 [ tr(Σ( X )) + μ ( X ) T μ ( X ) − k − log det(Σ( X )) ] • In SGD, the objective becomes maximizing: E X ∼ D *log P(X)−D*Q( z|X ) ‖ P( z|X)]] =E X ∼ D [E z ∼ Q [log P(X|z )+ − D*Q( z|X ) ‖ P(z )]] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Moving the gradient inside the expectation • We need to compute the gradient of: log P(X|z ) − D*Q( z|X ) ‖ P(z)+ • The first term does not depend on parameters Q , but E z ∼ Q [log P(X|z)] does! • So, we need to generate z that are plausible, i.e. decodable Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
The actual model that resists backpropagation • Cannot backpropagate through a stochastic unit Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
The actual model that resists backpropagation Reparameterization trick: e ∼ N (0,I) and z=μ(X)+Σ 1/2 (X) ∗ e This works, if Q(z|X) and P(z) are continuous • E X ∼ D [E e ∼ N(0,I) [log P(X|z= μ( X) + Σ 1/2 (X) ∗ e)+−D*Q( z|X )‖P(z)++ • Now, we can BP end-to-end, because expectations are not with respect to distributions dependent on the model Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Test-time sampling is straightforward • The encoder pathway, including the multiplication and addition are discarded • For getting an estimate of likelihood of a test sample, generate z, and then compute P(z|X) Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Conditional VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Sample results for a MNIST VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Sample results for a MNIST CVAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch
Recommend
More recommend