What should be the objective function of the overall network? Real or Fake Discriminator Real Images Generator z ∼ N (0 , I ) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
What should be the objective function of the overall network? Let’s look at the objective function of the Real or Fake generator first Discriminator Real Images Generator z ∼ N (0 , I ) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
What should be the objective function of the overall network? Let’s look at the objective function of the Real or Fake generator first Discriminator Given an image generated by the generator as G φ ( z ) the discriminator assigns a score D θ ( G φ ( z )) to it Real Images Generator z ∼ N (0 , I ) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
What should be the objective function of the overall network? Let’s look at the objective function of the Real or Fake generator first Discriminator Given an image generated by the generator as G φ ( z ) the discriminator assigns a score D θ ( G φ ( z )) to it This score will be between 0 and 1 and will tell us Real Images Generator the probability of the image being real or fake z ∼ N (0 , I ) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
What should be the objective function of the overall network? Let’s look at the objective function of the Real or Fake generator first Discriminator Given an image generated by the generator as G φ ( z ) the discriminator assigns a score D θ ( G φ ( z )) to it This score will be between 0 and 1 and will tell us Real Images Generator the probability of the image being real or fake For a given z , the generator would want to maximize log D θ ( G φ ( z )) (log likelihood) or z ∼ N (0 , I ) minimize log(1 − D θ ( G φ ( z ))) 8/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
This is just for a single z and the generator would like to do this for all possible values of z , Real or Fake Discriminator Real Images Generator z ∼ N (0 , I ) 9/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
This is just for a single z and the generator would like to do this for all possible values of z , For example, if z was discrete and drawn from a Real or Fake uniform distribution ( i.e. , p ( z ) = 1 N ∀ z ) then the Discriminator generator’s objective function would be N 1 � min N log(1 − D θ ( G φ ( z ))) φ i =1 Real Images Generator z ∼ N (0 , I ) 9/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
This is just for a single z and the generator would like to do this for all possible values of z , For example, if z was discrete and drawn from a Real or Fake uniform distribution ( i.e. , p ( z ) = 1 N ∀ z ) then the Discriminator generator’s objective function would be N 1 � min N log(1 − D θ ( G φ ( z ))) φ i =1 Real Images Generator However, in our case, z is continuous and not uniform ( z ∼ N (0 , I )) so the equivalent objective function would be z ∼ N (0 , I ) ˆ min p ( z ) log(1 − D θ ( G φ ( z ))) φ min φ E z ∼ p ( z ) [log(1 − D θ ( G φ ( z )))] 9/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Now let’s look at the discriminator Real or Fake Discriminator Real Images Generator z ∼ N (0 , I ) 10/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Now let’s look at the discriminator The task of the discriminator is to assign a high score to real images and a low score to fake images Real or Fake Discriminator Real Images Generator z ∼ N (0 , I ) 10/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Now let’s look at the discriminator The task of the discriminator is to assign a high score to real images and a low score to fake images Real or Fake And it should do this for all possible real images Discriminator and all possible fake images Real Images Generator z ∼ N (0 , I ) 10/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Now let’s look at the discriminator The task of the discriminator is to assign a high score to real images and a low score to fake images Real or Fake And it should do this for all possible real images Discriminator and all possible fake images In other words, it should try to maximize the following objective function Real Images Generator max E x ∼ pdata [log D θ ( x )]+ E z ∼ p ( z ) [log(1 − D θ ( G φ ( z )))] θ z ∼ N (0 , I ) 10/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
If we put the objectives of the generator and discriminator together we get a minimax game Real or Fake min max [ E x ∼ p data log D θ ( x ) φ θ Discriminator + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] Real Images Generator z ∼ N (0 , I ) 11/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
If we put the objectives of the generator and discriminator together we get a minimax game Real or Fake min max [ E x ∼ p data log D θ ( x ) φ θ Discriminator + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] The first term in the objective is only w.r.t. the parameters of the discriminator ( θ ) Real Images Generator z ∼ N (0 , I ) 11/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
If we put the objectives of the generator and discriminator together we get a minimax game Real or Fake min max [ E x ∼ p data log D θ ( x ) φ θ Discriminator + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] The first term in the objective is only w.r.t. the parameters of the discriminator ( θ ) Real Images Generator The second term in the objective is w.r.t. the parameters of the generator ( φ ) as well as the discriminator ( θ ) z ∼ N (0 , I ) 11/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
If we put the objectives of the generator and discriminator together we get a minimax game Real or Fake min max [ E x ∼ p data log D θ ( x ) φ θ Discriminator + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] The first term in the objective is only w.r.t. the parameters of the discriminator ( θ ) Real Images Generator The second term in the objective is w.r.t. the parameters of the generator ( φ ) as well as the discriminator ( θ ) z ∼ N (0 , I ) The discriminator wants to maximize the second term whereas the generator wants to minimize it (hence it is a two-player game) 11/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
So the overall training proceeds by alternating between these two step Real or Fake Discriminator Real Images Generator z ∼ N (0 , I ) 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
So the overall training proceeds by alternating between these two step Step 1: Gradient Ascent on Discriminator Real or Fake Discriminator max [ E x ∼ p data log D θ ( x )+ E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] θ Real Images Generator z ∼ N (0 , I ) 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
So the overall training proceeds by alternating between these two step Step 1: Gradient Ascent on Discriminator Real or Fake Discriminator max [ E x ∼ p data log D θ ( x )+ E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] θ Step 2: Gradient Descent on Generator min E z ∼ p ( z ) log(1 − D θ ( G φ ( z ))) Real Images Generator φ z ∼ N (0 , I ) 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
So the overall training proceeds by alternating between these two step Step 1: Gradient Ascent on Discriminator Real or Fake Discriminator max [ E x ∼ p data log D θ ( x )+ E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] θ Step 2: Gradient Descent on Generator min E z ∼ p ( z ) log(1 − D θ ( G φ ( z ))) Real Images Generator φ In practice, the above generator objective does not work well and we use a slightly modified objective z ∼ N (0 , I ) 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
So the overall training proceeds by alternating between these two step Step 1: Gradient Ascent on Discriminator Real or Fake Discriminator max [ E x ∼ p data log D θ ( x )+ E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] θ Step 2: Gradient Descent on Generator min E z ∼ p ( z ) log(1 − D θ ( G φ ( z ))) Real Images Generator φ In practice, the above generator objective does not work well and we use a slightly modified objective z ∼ N (0 , I ) Let us see why 12/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
When the sample is likely fake, we want to give a feedback to the generator (using gradients) 4 log(1 − D ( g ( x ))) 2 Loss 0 − 2 − 4 0 0 . 2 0 . 4 0 . 6 0 . 8 1 D ( G ( z )) 13/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
When the sample is likely fake, we want to give a feedback to the generator (using gradients) 4 log(1 − D ( g ( x ))) However, in this region where D ( G ( z )) is close to 0, the curve of the loss function is very flat 2 and the gradient would be close to 0 Loss 0 − 2 − 4 0 0 . 2 0 . 4 0 . 6 0 . 8 1 D ( G ( z )) 13/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
When the sample is likely fake, we want to give a feedback to the generator (using gradients) 4 log(1 − D ( g ( x ))) However, in this region where D ( G ( z )) is close − log( D ( g ( x ))) to 0, the curve of the loss function is very flat 2 and the gradient would be close to 0 Loss Trick: Instead of minimizing the likelihood of 0 the discriminator being correct, maximize the likelihood of the discriminator being wrong − 2 − 4 0 0 . 2 0 . 4 0 . 6 0 . 8 1 D ( G ( z )) 13/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
When the sample is likely fake, we want to give a feedback to the generator (using gradients) 4 log(1 − D ( g ( x ))) However, in this region where D ( G ( z )) is close − log( D ( g ( x ))) to 0, the curve of the loss function is very flat 2 and the gradient would be close to 0 Loss Trick: Instead of minimizing the likelihood of 0 the discriminator being correct, maximize the likelihood of the discriminator being wrong − 2 In effect, the objective remains the same but − 4 the gradient signal becomes better 0 0 . 2 0 . 4 0 . 6 0 . 8 1 D ( G ( z )) 13/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: end for 7: end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: end for 7: end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: • Sample minibatch of m examples { x (1) , .., x ( m ) } from data generating distribution p data ( x ) 5: end for 7: end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: • Sample minibatch of m examples { x (1) , .., x ( m ) } from data generating distribution p data ( x ) 5: • Update the discriminator by ascending its stochastic gradient: 6: m 1 � � x ( i ) � � � � z ( i ) ���� � ∇ θ log D θ + log 1 − D θ G φ m i =1 end for 7: end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: • Sample minibatch of m examples { x (1) , .., x ( m ) } from data generating distribution p data ( x ) 5: • Update the discriminator by ascending its stochastic gradient: 6: m 1 � � x ( i ) � � � � z ( i ) ���� � ∇ θ log D θ + log 1 − D θ G φ m i =1 end for 7: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 8: end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
With that we are now ready to see the full algorithm for training GANs 1: procedure GAN Training for number of training iterations do 2: for k steps do 3: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 4: • Sample minibatch of m examples { x (1) , .., x ( m ) } from data generating distribution p data ( x ) 5: • Update the discriminator by ascending its stochastic gradient: 6: m 1 � � x ( i ) � � � � z ( i ) ���� � ∇ θ log D θ + log 1 − D θ G φ m i =1 end for 7: • Sample minibatch of m noise samples { z (1) , .., z ( m ) } from noise prior p g ( z ) 8: • Update the generator by ascending its stochastic gradient 9: m 1 � � � � z ( i ) ���� � ∇ φ log D θ G φ m i =1 end for 10: 11: end procedure 14/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Module 23.2: Generative Adversarial Networks - Architecture 15/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now look at one of the popular neural networks used for the generator and discriminator (Deep Convolutional GANs) 16/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now look at one of the popular neural networks used for the generator and discriminator (Deep Convolutional GANs) For discriminator, any CNN based classifier with 1 class (real) at the output can be used (e.g. VGG, ResNet, etc.) 16/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now look at one of the popular neural networks used for the generator and discriminator (Deep Convolutional GANs) For discriminator, any CNN based classifier with 1 class (real) at the output can be used (e.g. VGG, ResNet, etc.) Figure: Generator (Redford et al 2015) (left) and discriminator (Yeh et al 2016) (right) used in DCGAN 16/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). Use batchnorm in both the generator and the discriminator. 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU activation in generator for all layers except for the output, which uses tanh. 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Architecture guidelines for stable Deep Convolutional GANs Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator). Use batchnorm in both the generator and the discriminator. Remove fully connected hidden layers for deeper architectures. Use ReLU activation in generator for all layers except for the output, which uses tanh. Use LeakyReLU activation in the discriminator for all layers 17/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Module 23.3: Generative Adversarial Networks - The Math Behind it 18/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now delve a bit deeper into the objective function used by GANs and see what it implies 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) What do we wish should happen at the end of training? 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) What do we wish should happen at the end of training? p G ( x ) = p data ( x ) 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) What do we wish should happen at the end of training? p G ( x ) = p data ( x ) Can we prove this formally even though the model is not explicitly computing this density? 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
We will now delve a bit deeper into the objective function used by GANs and see what it implies Suppose we denote the true data distribution by p data ( x ) and the distribution of the data generated by the model as p G ( x ) What do we wish should happen at the end of training? p G ( x ) = p data ( x ) Can we prove this formally even though the model is not explicitly computing this density? We will try to prove this over the next few slides 19/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Theorem The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is D achieved if and only if p G = p data 20/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Theorem The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is D achieved if and only if p G = p data is equivalent to 20/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Theorem The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is D achieved if and only if p G = p data is equivalent to Theorem 1 If p G = p data then the global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved and D 20/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Theorem The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is D achieved if and only if p G = p data is equivalent to Theorem 1 If p G = p data then the global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved and D 2 The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is D achieved only if p G = p data 20/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Outline of the Proof The ‘if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved if p G = p data D The ‘only if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved only if p G = p data D 21/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Outline of the Proof The ‘if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved if p G = p data D (a) Find the value of V ( D, G ) when the generator is optimal i.e. , when p G = p data The ‘only if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved only if p G = p data D 21/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Outline of the Proof The ‘if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved if p G = p data D (a) Find the value of V ( D, G ) when the generator is optimal i.e. , when p G = p data (b) Find the value of V ( D, G ) for other values of the generator i.e. , for any p G such that p G � = p data The ‘only if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved only if p G = p data D 21/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Outline of the Proof The ‘if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved if p G = p data D (a) Find the value of V ( D, G ) when the generator is optimal i.e. , when p G = p data (b) Find the value of V ( D, G ) for other values of the generator i.e. , for any p G such that p G � = p data (c) Show that a < b ∀ p G � = p data (and hence the minimum V ( D, G ) is achieved when p G = p data ) The ‘only if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved only if p G = p data D 21/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Outline of the Proof The ‘if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved if p G = p data D (a) Find the value of V ( D, G ) when the generator is optimal i.e. , when p G = p data (b) Find the value of V ( D, G ) for other values of the generator i.e. , for any p G such that p G � = p data (c) Show that a < b ∀ p G � = p data (and hence the minimum V ( D, G ) is achieved when p G = p data ) The ‘only if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved only if p G = p data D Show that when V ( D, G ) is minimum then p G = p data 21/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Outline of the Proof The ‘if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved if p G = p data D (a) Find the value of V ( D, G ) when the generator is optimal i.e. , when p G = p data (b) Find the value of V ( D, G ) for other values of the generator i.e. , for any p G such that p G � = p data (c) Show that a < b ∀ p G � = p data (and hence the minimum V ( D, G ) is achieved when p G = p data ) The ‘only if’ part: The global minimum of the virtual training criterion C ( G ) = max V ( G, D ) is achieved only if p G = p data D Show that when V ( D, G ) is minimum then p G = p data 21/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
First let us look at the objective function again min max [ E x ∼ p data log D θ ( x ) + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] φ θ 22/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
First let us look at the objective function again min max [ E x ∼ p data log D θ ( x ) + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] φ θ We will expand it to its integral form ˆ ˆ min max p data ( x ) log D θ ( x ) + p ( z ) log(1 − D θ ( G φ ( z ))) φ θ x z 22/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
First let us look at the objective function again min max [ E x ∼ p data log D θ ( x ) + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] φ θ We will expand it to its integral form ˆ ˆ min max p data ( x ) log D θ ( x ) + p ( z ) log(1 − D θ ( G φ ( z ))) φ θ x z Let p G ( X ) denote the distribution of the X ’s generated by the generator and since X is a function of z we can replace the second integral as shown below ˆ ˆ min max p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x )) φ θ x x 22/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
First let us look at the objective function again min max [ E x ∼ p data log D θ ( x ) + E z ∼ p ( z ) log(1 − D θ ( G φ ( z )))] φ θ We will expand it to its integral form ˆ ˆ min max p data ( x ) log D θ ( x ) + p ( z ) log(1 − D θ ( G φ ( z ))) φ θ x z Let p G ( X ) denote the distribution of the X ’s generated by the generator and since X is a function of z we can replace the second integral as shown below ˆ ˆ min max p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x )) φ θ x x The above replacement follows from the law of the unconscious statistician (click to link of wikipedia page) 22/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x To find the optima we will take the derivative of the term inside the integral w.r.t. D and set it to zero 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x To find the optima we will take the derivative of the term inside the integral w.r.t. D and set it to zero d d ( D θ ( x )) ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) = 0 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x To find the optima we will take the derivative of the term inside the integral w.r.t. D and set it to zero d d ( D θ ( x )) ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) = 0 1 1 p data ( x ) D θ ( x ) + p G ( x ) 1 − D θ ( x )( − 1) = 0 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x To find the optima we will take the derivative of the term inside the integral w.r.t. D and set it to zero d d ( D θ ( x )) ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) = 0 1 1 p data ( x ) D θ ( x ) + p G ( x ) 1 − D θ ( x )( − 1) = 0 p data ( x ) p G ( x ) D θ ( x ) = 1 − D θ ( x ) 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x To find the optima we will take the derivative of the term inside the integral w.r.t. D and set it to zero d d ( D θ ( x )) ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) = 0 1 1 p data ( x ) D θ ( x ) + p G ( x ) 1 − D θ ( x )( − 1) = 0 p data ( x ) p G ( x ) D θ ( x ) = 1 − D θ ( x ) ( p data ( x ))(1 − D θ ( x )) = ( p G ( x ))( D θ ( x )) 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Okay, so our revised objective is given by ˆ min max ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) dx φ θ x Given a generator G, we are interested in finding the optimum discriminator D which will maximize the above objective function The above objective will be maximized when the quantity inside the integral is maximized ∀ x To find the optima we will take the derivative of the term inside the integral w.r.t. D and set it to zero d d ( D θ ( x )) ( p data ( x ) log D θ ( x ) + p G ( x ) log(1 − D θ ( x ))) = 0 1 1 p data ( x ) D θ ( x ) + p G ( x ) 1 − D θ ( x )( − 1) = 0 p data ( x ) p G ( x ) D θ ( x ) = 1 − D θ ( x ) ( p data ( x ))(1 − D θ ( x )) = ( p G ( x ))( D θ ( x )) p data ( x ) D θ ( x ) = p G ( x ) + p data ( x ) 23/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
This means for any given generator p data ( x ) D ∗ G ( G ( x )) = p data ( x ) + p G ( x ) 24/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
This means for any given generator p data ( x ) D ∗ G ( G ( x )) = p data ( x ) + p G ( x ) Now the if part of the theorem says “if p G = p data ....” 24/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
This means for any given generator p data ( x ) D ∗ G ( G ( x )) = p data ( x ) + p G ( x ) Now the if part of the theorem says “if p G = p data ....” So let us substitute p G = p data into D ∗ G ( G ( x )) and see what happens to the loss functions 24/38 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 23
Recommend
More recommend