Wasserstein GAN Martin Arjovsky, Soumith Chintala, Léon Bottou, ICML 2017 Presented by Yaochen Xie 12-22–2017
Contents ❖ GAN and its applications [1] ❖ GAN vs. Variational Auto-Encoder [2] ❖ What’s wrong with GAN [3], [4] ❖ JS Divergence and KL Divergence [3], [4] ❖ Wasserstein Distance [4], [5] ❖ WGAN and its Implementation [4]
Take A Look Back at GAN D and G play the following two-player minimax game with the value function V(D, G):
Applications of GAN - Image Translation Conditional GAN Triangle GAN
Applications of GAN - Super-Resolution
Applications of GAN - Image Inpainting Real Input TV LR GAN
GAN vs. VAE - AutoEncoder
GAN vs. VAE - Variational AutoEncoder • add a constraint on the encoding network , that forces it to generate latent vectors that roughly follow a unit gaussian distribution • generative loss : mean squared error • latent loss : KL divergence
GAN vs. VAE • VAE - explicit, use MSE to judge generation quality • GAN - implicit, use discriminator to judge generation quality
Drawbacks of GAN Unstable, not Gradient Vanishing Mode Collapse converging
Kullback–Leibler divergence (Relative Entropy) A metric that measures the distance between two distributions Continuous Distributions: Discrete Distributions: (P||Q) (P||Q) is not equal to Notice that: Rigorously, KL divergence cannot be considered as a distance.
Jensen-Shannon divergence A symmetrized and smoothed version of the KL Divergence When two distribution are far from each other….
Where is the loss from? Cross Entropy: Loss based on Cross Entropy: What if p and q belongs to continuous distributions? Expectation
What’s going wrong? Now we fix G, and let D be optimum: => 2 times of Jensen-Shannon divergence Till now, to optimize the loss is equivalent to minimize the JS-divergence between Pr and Pg. Gradient Vanishing !
What’s going wrong? When G is fixed and D is optimum: = Mode Collapse Unstable KL —> ∞ KL —> 0
What’s going wrong?
We need a weaker distance WHY? 1. We wish to be continuous The most fundamental di ff erence between such distances is their impact on the convergence of sequences of probability distributions. iif. —> 0 E A weaker distance means easier to converge.
We need a weaker distance WHY? 2. We wish Continuity means that when a sequence of parameters converges to , the distributions also converge to The weaker this distance, the easier it is to define a continuous mapping from θ -space to P θ -space, since it’s easier for the distributions to converge. Then
Wasserstein (Earth-Mover) Distance ! “ If each distribution is viewed as a unit amount of "dirt" piled on , the metric is the minimum "cost" of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the distance it has to be moved ”
Wasserstein Distance ! • KL-Divergence and JS-Divergence are too strong for the loss function 1 to be continuous. • Wasserstein distance is a weaker measurement of distance s.t.: 1. is continuous if is continuous. 2. is continuous and di ff erentiable almost everywhere if is locally Lipschitz with finite expectation of local Lipschitz constant.
Wasserstein Distance !
Optimal Transportation View of GAN Brenier potential
Convex Geometry Minkowski theorem Alexandrov theorem Geometric Interpretation to Optimal Transport Map
Wasserstein distance in WGAN Kantorovich-Rubinstein Duality : https://vincentherrmann.github.io/blog/wasserstein/ when μ and ν have bounded support, where Lip( f ) denotes the minimal Lipschitz constant for f .
Implementation Compared with origin GAN, WGAN conducts four changes: - Discriminator (with sigmoid activation) —> Critic (without sigmoid) - - 1 L D = L G = - Truncation of parameters in Critic (Discriminator). - Do not use momentum when gradient descending.
Experiments
References [1] Ian J. Goodfellow. Generative Adversarial Nets. ., and Max Welling. Auto-encoding variational bayes. [2] Kingma, Diederik P [3] Martin Arjovsky and L´eon Bottou. Towards principled methods for training generative adversarial networks. [4] Martin Arjovsky, Soumith Chintala and Léon Bottou. Wasserstein GAN. [5] Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, A Geometric View of Optimal Transportation and Generative Model
Recommend
More recommend