TensorFlow Workshop 2018 Introduction to Deep Models Part II: Variational Autoencoders and Latent Spaces Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Feature Extraction with Autoencoders As discussed in Part I, the process of manually defining features is typically infeasible for complex datasets. The hidden layers of neural networks naturally define features to a certain degree, however, we may wish to find a collection of features which completely characterizes a given example. To be precise, we must first clarify what it means to “completely characterize” an example. A simple, but natural, way to define this concept is to say that a set of features characterizes an example if the full example can be reproduced from those features alone. Although it may sound rather trivial at first, this leads to a natural approach for automating feature extraction: train a neural network to learn the identity mapping, and introduce a bottleneck layer to force a reduction in the data/feature dimensions. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Autoencoder Model Input Hidden Encoded Hidden Reconstructed SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Auto-Encoding Variational Bayes Kingma, D.P . and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. A particularly effective autoencoding model, introduced by Kingma and Welling in 2013, is the variational autoencoder (VAE). The VAE model is defined in terms of a probabilistic, Bayesian framework. In this framework, the features at the bottleneck of the network are interpretted as unobservable latent variables . To approximate the underlying Bayesian model, VAE networks introduce a sampling procedure in the latent variable space. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Variational Autoencoder Input Hidden Latent Hidden Reconstructed ε SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Variational Autoencoder Graph [TensorBoard] SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Sampling Procedure The encoder produces means { µ k } and standard deviations { σ k } corresponding to a collection of independent normal distributions for the latent variables. A vector ε is sampled from a normal distribution N (0 , I ) and the sample latent vector is defined by: z = µ + σ ⊙ ε The introduction of the standard normal sample ε , referred to as the “reparameterization trick”, is used to maintain a differentiable relation between the weights of the network and the loss function (since the sample ε is fixed at each step). This allows us to train the network end-to-end using the backpropogation method. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Sampling Procedure Practical Implementation In practice, it is not numerically stable to work with the standard deviations { σ k } directly; instead, the network is trained to predict the values { log( σ k ) } and the latent vector is sampled via: z = µ + exp(log σ ) ⊙ ε This has the additional benefit of removing the restriction that the network predictions for { σ k } must always be positive. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Variational Bayesian Model The VAE framework aims to approximate the intractable poseterior distritbution p θ ( z | x ) in the latent space by a recognition model : q φ ( z | x ) ∼ distribution of z given x where φ correspond to the model parameters of the encoder component of the network, and θ correspond to the parameters of the network’s decoder which is used to define a generative model : p θ ( x | z ) ∼ distribution of x given z SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Variational Bayesian Model SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Motivation for Kullback–Leibler Divergence When using an encoder/decoder model structure it is helpful to anchor the input values received by the decoder during training (similar to the motivation for batch normalization). For example, the encoder component may learn to produce latent representations distributed according to a normal distribution N ( µ, Σ) for some mean vector µ and covariance matrix Σ . However, this latent distribution can be shifted arbitrarily without affecting the theoretically attainable performance of the network. In particular, there are an infinite number of model configurations which can achieve the optimal level of performance. The lack of a unique solution can be problematic during training; to address this, we can attempt to bias the encoder toward the distribution N (0 , I ) . SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Kullback–Leibler Divergence The KL-Divergence is introduced to the loss as a regularization term; assuming the prior is taken to be a standard normal: � � � � � �� � � � = 1 � � + µ T µ − N − log det KL N ( µ, Σ) � N std tr Σ Σ � 2 Model accuracy is accounted for by the “reconstruction loss” term: � � log p θ ( x | z ) E q φ ( z | x ) The full loss function is then defined to be the negative Expectation Lower Bound (ELBO) which, after some manipulation, is given by: � � � � � � � � − ELBO ( θ, φ ) = KL q φ ( z | x ) � � N std − E q φ ( z | x ) log p θ ( x | z ) SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Kullback–Leibler Divergence Example For a diagonal covariance, e.g.independent latent variables, with parameters { µ k } and { σ k } the KL-Divergence reduces to: � � � � � � N � = 1 � � σ 2 k + µ 2 k − 1 − log σ 2 KL q φ ( z | x ) � N std � k 2 k =1 In the case of binary classification (assuming a Bernoulli prior distribution), the reconstruction loss coincides precisely with the negative binary cross entropy; i.e. setting � x = D ( z ) , we have: � � log p θ ( x | z ) = x · log( � x ) + (1 − x ) · log(1 − � x ) E q φ ( z | x ) SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Example: Latent Space Interpolation Once the VAE model is trained, we can investigate the learned latent representations by decoding points in the latent space. For example, after training a VAE model on the MNIST dataset we can use the encoder (i.e. recognition model) to retrieve the latent representations of two handwritten digits, e.g. z 0 = E [ x 0 ] and z 1 = E [ x 1 ] where x 0 ∼ “3” and x 1 ∼ “7” . Linear interpolation can then be used to visualize the path connecting the two data points: � � x θ = D (1 − θ ) · z 0 + θ · z 1 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
Recommend
More recommend