intro tutorial on gans
play

Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning - PowerPoint PPT Presentation

Yale Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018 Yale Outline Overview: Generative modeling Generative Adversarial Networks (GANs) Hands-on : build vanilla GAN on FashionMNIST GAN


  1. Yale Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018

  2. Yale Outline Overview: • Generative modeling • Generative Adversarial Networks (GANs) Hands-on : build vanilla GAN on FashionMNIST GAN improvements (f-GAN, WGAN, WGAN-GP) Hands on : build WGAN on FashionMNIST 2

  3. Yale Task: given a training dataset, generate more samples from the same data distribution Why do we care? • HEP: fast simulation • Domain adaptation • Latent representation learning • Simulation and planning for RL • … 3

  4. Yale Generative Modeling Build a generative model with probability distribution 4

  5. Yale Taxonomy of Generative Models From I. Goodfellow … Direct Maximum Likelihood GAN Explicit density Implicit density Markov Chain Approximate density Tractable density GSN Fully visible belief nets: Variational Markov Chain NADE MADE Boltzmann machine VAE PixelRNN Change of variables models (nonlinear ICA) 5

  6. Yale Generative Adversarial Networks 2-player game between generator Distinguish real samples from fake and discriminator samples Latent prior mapped to sample space implicitly defines a Transform noise into a realistic sample distribution Discriminator tells how fake or real a sample looks via a score Real data 6

  7. Yale Minimax Formulation Construct a two-person zero-sum minimax game with a value We have an inner maximization by D and an outer minimization by G 
 With perfect discriminator, generator minimizes 7

  8. 
 
 Yale How to train your GANs Alternate gradient descent on D and gradient ascent on G Heuristic loss function (non-saturating): 
 instead of 8

  9. Yale Demo #1 Switch to i 🐎📔 for 1 st hands-on activity: 
 training a vanilla GAN on FashionMNIST 9

  10. Yale f -Divergence Divergence = “a function which establishes the "distance" of one probability distribution to the other on a statistical manifold" f-divergences: Convex conjugate: No access to distributions in functional form. Use empirical expectations instead: 10

  11. Yale f-GAN Extend GAN formalism: Any f -divergence can be used as GAN objective 11

  12. Yale Problems with f -GANs If at any point the supports of these distributions have no overlap, this family of measures collapses to a null or infinite distance & (some) solutions Instance noise Label flipping Label smoothing learn more 12

  13. Yale Solving the Disjoint Support Problem Other distance metrics: Integral Probability Metrics, Wasserstein Distances, Proper Scoring Rules e.g. replace family of f -divergences with the Wasserstein-1 distance , often referred to as the Earth Movers Distance Intuition: think of PDFs as mounds of dirt; EMD describes how much “work” it takes to transform one mound of dirt into another Accounts for both “mass” and distance —i.e., this works for disjoint PDFs! Excellent WGAN review! 13

  14. Yale The Wasserstein-1 Distance …is intractable! The infimum is over a uncountably infinite set of candidate distribution pairings Kantorovich-Rubenstein duality to the rescue! What does this give us? Restriction to K-Lipschitz functions Excellent blog post about WGAN and KR-duality from V. Herrmann! 14

  15. Yale Lipschitz? We can constrain a neural network to be 
 K -Lipschitz! This lets us parameterize f(x) as a neural network, and clamp the weights to For a Lipschitz continuous be in a compact space, say [-c, c] function, there is a double cone This guarantees that a network f is 
 (shown in white) whose vertex can K- Lipschitz, with K = r(c) , for some function r be translated along the graph, Great! The network can now operate in so that the graph Wasserstein-1 space up to a constant factor! always remains entirely outside the cone. f is now a critic 15

  16. Yale The WGAN Algorithm Training critic: For each batch of real samples, we want the output of f to be as big (1.0) as possible For each batch of fake samples, we want the output of f to be as small (-1.0) as possible Training generator: We want f to be as 
 big (1.0) as possible Excellent blog post about WGAN from A. Irpan! 16

  17. Yale WGAN Introspection Reliable metric to train critic to convergence 17

  18. Yale WGAN Deficiencies Restricting neural net to have weights in compact space restricts expressivity (capacity underutilization) Gradients explode / vanish Why not make the network 1-Lipschitz? 18

  19. Yale WGAN-GP WGAN with Gradient Penalty TL;DR: penalize critic for having a gradient norm too far from unity This is a better way to ensure 1-Lipschitz critics For every real sample, build a fake sample, and randomly linearly interpolate between the two 19

  20. Yale Demo #2 Switch batch to i 🐎📔 for 2 nd hands-on activity: 
 training WGAN on FashionMNIST 20

  21. Yale Thank You! Question? You can find me at: � michela.paganini@yale.edu 21

  22. 
 Yale Theoretical dynamics of minimax GANs for optimal D From original paper, know that the optimal discriminator is: Define generator solving for infinite capacity discriminator, We can rewrite value as Simplifying notation, and applying some algebra But we recognize this as a summation of two KL-divergences And can combine these into the Jenson-Shannon divergence This yields a unique global minimum precisely when 22

Recommend


More recommend