Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - PowerPoint PPT Presentation

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB

Objectives • Learn how VAEs help in sampling from a data distribution • Write the objective function of a VAE • Derive how VAE objective is adapted for SGD

VAE setup • We are interested in maximizing the data likelihood 𝑄 𝑌 = 𝑄 𝑌 𝑨; 𝜄 𝑄 𝑨 𝑒𝑨 • Let 𝑄 𝑌 𝑨; 𝜄 be modeled by 𝑔 𝑨; 𝜄 • Further, let us assume that 𝑄 𝑌 𝑨; 𝜄 = 𝒪 𝑌 𝑔 𝑨; 𝜄 , 𝜏 2 𝐽 Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

We do not care about distribution of z • Latent variable z is drawn from a standard normal z ~ N (0, I ) θ X N • It may represent many different variations of the data Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Example of a variable transformation z X = g(z) = z/10 + z/‖z‖ Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Because of Gaussian assumption, the most obvious variation may not be the most likely • Although the ‘2’ on the right is a better choice as a variation of the one on the left, the one in the middle is more likely due to the Gaussian assumption Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Sampling z from standard normal is problematic • It may give samples of z that are unlikely to have produced X • Can we sample z itself intelligently? • Enter Q( z | X ) to compute, e.g., E z~Q P(X|z) • All we need to do is reduce the KL divergence between P(X) and E z~Q P(X|z) • Hence, a variational method Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

VAE Objective Setup D [ Q ( z ) ‖ P ( z | X )] = E z ~ Q [log Q (z) − log P ( z | X )] = E z ~ Q [log Q (z) − log P ( X | z ) − log P (z)] + log P ( X ) Rearranging some terms: log P ( X ) − D [ Q ( z ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z ) ‖ P ( z )] Introducing dependency of Q on X : log P ( X ) − D [ Q ( z | X ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Optimizing the RHS • Q is encoding X into z; P ( X | z ) is decoding z • Assume in LHS Q ( z | X ) is a high capacity NN • For: E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] • Assume: Q ( z | X ) = N (z| μ ( X; θ ),∑( X; θ )) • Then KL divergence is: D [ N ( μ ( X ),Σ( X )) ‖ N ( 0 , I )] = 1/2 [ tr(Σ( X )) + μ ( X ) T μ ( X ) − k − log det(Σ( X )) ] • In SGD, the objective becomes maximizing: E X ∼ D *log P(X)−D*Q( z|X ) ‖ P( z|X)]] =E X ∼ D [E z ∼ Q [log P(X|z )+ − D*Q( z|X ) ‖ P(z )]] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Moving the gradient inside the expectation • We need to compute the gradient of: log P(X|z ) − D*Q( z|X ) ‖ P(z)+ • The first term does not depend on parameters Q , but E z ∼ Q [log P(X|z)] does! • So, we need to generate z that are plausible, i.e. decodable Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

The actual model that resists backpropagation • Cannot backpropagate through a stochastic unit Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

The actual model that resists backpropagation Reparameterization trick: e ∼ N (0,I) and z=μ(X)+Σ 1/2 (X) ∗ e This works, if Q(z|X) and P(z) are continuous • E X ∼ D [E e ∼ N(0,I) [log P(X|z= μ( X) + Σ 1/2 (X) ∗ e)+−D*Q( z|X )‖P(z)++ • Now, we can BP end-to-end, because expectations are not with respect to distributions dependent on the model Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Test-time sampling is straightforward • The encoder pathway, including the multiplication and addition are discarded • For getting an estimate of likelihood of a test sample, generate z, and then compute P(z|X) Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Conditional VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Sample results for a MNIST VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Sample results for a MNIST CVAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Advanced Machine Learning Generative Adversarial Networks Amit Sethi, EE, IITB

Objectives • Articulate how using a discriminator helps a generator • Write the objective function of GAN • Write the training algorithm for GAN

GAN trains two networks together z G x' D y x • GAN objective: min 𝐻 max 𝑊 𝐸, 𝐻 = 𝐸 𝔽 𝒚~𝑞 𝒚 (𝒚) log 𝐸 𝒚 + 𝔽 𝒜~𝑞 𝒜 𝒜 [log(1 − 𝐸(𝐻 𝒜 )] Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

At the solution, the transformed distribution from z will emulate p x ( x ) p x ( x ) p x ( x ) p x ( x ) p x ( x ) D D D D G G G G Training steps  • As training progresses, the distributions of the transformed noise and the data will become indistinguishable Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

The trick is to allow D to catch up before improving G in each iteration o For training iterations o For k steps o Update discriminator by ascending 1 + log(1 − 𝐸 𝐻(𝒜 𝑗 ) 𝑛 log 𝐸 𝒚 𝑗 o 𝛼 𝜄 𝐸 𝑛 𝑗=1 o Update generator by descending 1 log(1 − 𝐸 𝐻(𝒜 𝑗 ) 𝑛 o 𝛼 𝜄 𝐻 𝑛 𝑗=1 Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

An optimum exists 𝑞 𝒚 (𝒚) • For a fixed generator, 𝐸 𝒚 = 𝑞 𝒚 𝒚 +𝑞 𝐻 (𝒚) • Because 𝔽 𝒚~𝑞 𝒚 (𝒚) log 𝐸 𝒚 + 𝔽 𝒜~𝑞 𝒜 𝒜 [log(1 − 𝐸(𝐻 𝒜 )] = 𝑞 𝒚 𝒚 log 𝐸 𝒚 + 𝑞 𝐻 𝒚 log 1 − 𝐸 𝒚 𝑒𝑦 𝒚 And, optimal of 𝑏 log 𝑧 + 𝑐 log(1 − 𝑧) is 𝑏 𝑏+𝑐 Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

Generator’s optimization reduces as follows… • 𝔽 𝒚~𝑞 𝒚 (𝒚) log 𝐸 𝒚 + 𝔽 𝒜~𝑞 𝒜 𝒜 [log(1 − 𝐸(𝐻 𝒜 )] 𝑞 𝒚 (𝒚) = 𝔽 𝒚~𝑞 𝒚 (𝒚) log 𝑞 𝒚 𝒚 + 𝑞 𝐻 (𝒚) 𝑞 𝑯 (𝒚) + 𝔽 𝒚~𝑞 𝑯 𝒚 log 𝑞 𝒚 𝒚 + 𝑞 𝐻 (𝒚) = − log 4 + 𝐿𝑀 𝑞 𝒚 (𝒚) 𝑞 𝒚 𝒚 + 𝑞 𝐻 (𝒚) 2 + 𝐿𝑀 𝑞 𝐻 (𝒚) 𝑞 𝒚 𝒚 + 𝑞 𝐻 (𝒚) 2 • This assumes that the generator and the discriminator are high capacity such that these can model the desired distributions arbitrarily well. Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

Some sample generations and interpolations of latent vector Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

DC-GAN was designed to generate better images • No pooling – convolutions with > or < 1 stride • No fully connected layers • Heavy use of batchnorm • Use ReLU in G , leakyReLU in D in all but final layers • Use tanh in the last layer of G Source: “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” by Radford et al. in ICLR 2016

While mode- collapse isn’t evident, there is some underfitting Source: “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” by Radford et al. in ICLR 2016

GAN features can directly be used for classification Source: “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” by Radford et al. in ICLR 2016

GANs allow latent vector “arithmetic” Source: “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” by Radford et al. in ICLR 2016

Advantages and disadvantages of GAN • Markov chains not • No explicit generator needed • Need to sync D with G • Only backprop used • Mode collapse • No inference needed • Models a wide range of functions Source: “Generative Adversarial Nets” by Goodfellow et al. in NeurIPS 2014

Conditional GAN introduces another variable (e.g. class) • Instead of the GAN objective: 𝔽 𝒚~𝑞 𝒚 (𝒚) log 𝐸 𝒚 + 𝔽 𝒜~𝑞 𝒜 𝒜 [log(1 − 𝐸(𝐻 𝒜 )] • CGAN uses a modified objective: 𝔽 𝒚~𝑞 𝒚 (𝒚) log 𝐸 𝒚|𝒛 + 𝔽 𝒜~𝑞 𝒜 𝒜 [log(1 − 𝐸(𝐻 𝒜|𝒛 )] Source: “Conditional Generative Adversarial Nets” by Mirza and Osindero, Arxiv 2014

Conditional GAN introduces another variable (e.g. class) Source: “Conditional Generative Adversarial Nets” by Mirza and Osindero, Arxiv 2014

Each row is conditioned upon one digit label of a CGAN Source: “Conditional Generative Adversarial Nets” by Mirza and Osindero, Arxiv 2014

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - PowerPoint PPT Presentation

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn how VAEs help in sampling from a data distribution Write the objective function of a VAE Derive how VAE objective is adapted for SGD VAE setup

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Discrete and Continuous Reinforcement Learning (not part of exam material) 1 1 ADVANCED

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MGT-SM: A Method for Constructing Cellular Signal Transduction Networks Min Li, Ruiqing Zheng,

Latest version of the slides can be obtained from

19. Dynamic Programming I Memoization, Optimal Substructure, Overlapping Sub-Problems,

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

S T. Abdul-Aziz A Histomoniasis Slide Study Set L. R. McDougald American Association of Avian

Exploring Large Regression Model Spaces via Trans-dimensional Genetic Algorithms Ricardo S.

Signs of life On the third day there was a wedding in Cana of Galilee On the third day there was

Unbelief Swallowing up a Bit of the Word of God Unbelief reinterpreting the Word of God within

Sambuz

Useful Links

Newsletter

Mail Us