Generative networks part 2: GANs 23 / 54 Recap on generative - PowerPoint PPT Presentation

Generative networks part 2: GANs 23 / 54

Recap on generative networks Generative networks provide a way to sample from any distribution. 1. Sample z ∼ µ , where µ denotes an efficiently sampleable distribution (e.g., uniform or Gaussian). 2. Output g ( z ) , where g : R d → R m is a deep network. Notation: let g # µ (pushforward of µ through g ) denote this distribution. 24 / 54

Recap on generative networks Generative networks provide a way to sample from any distribution. 1. Sample z ∼ µ , where µ denotes an efficiently sampleable distribution (e.g., uniform or Gaussian). 2. Output g ( z ) , where g : R d → R m is a deep network. Notation: let g # µ (pushforward of µ through g ) denote this distribution. Brief remarks: ◮ Can this model any target distribution ν ? Yes, (roughly) for the same reason that g can approximate any f : R d → R m . ◮ Graphical models let us sample and estimate probabilities; what about here? Nope. 24 / 54

Univariate examples g ( x ) = x , the identity function, mapping Uniform ([0 , 1]) to itself. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 25 / 54

Univariate examples g ( x ) = x 2 , 2 mapping Uniform ([0 , 1]) to something ∝ √ x . 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 26 / 54

Univariate examples g is inverse CDF of Gaussian, input distribution is Uniform ([0 , 1]) and output is Gaussian. 1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 27 / 54

Another way to visualize generative networks Given a sample from a distribution (even g # µ ), here’s the “kernel density” / “Parzen window” estimate of its density: 1. Start with random draw ( x i ) n i =1 . 2. “Place bumps at every x i ”: � x − x i p ( x ) := 1 � n � Define ˆ i =1 k , n h where k is a kernel function (not the SVM one!), h is the “bandwidth”; for example: 28 / 54

Another way to visualize generative networks Given a sample from a distribution (even g # µ ), here’s the “kernel density” / “Parzen window” estimate of its density: 1. Start with random draw ( x i ) n i =1 . 2. “Place bumps at every x i ”: � x − x i p ( x ) := 1 � n � Define ˆ i =1 k , n h where k is a kernel function (not the SVM one!), h is the “bandwidth”; for example: � � ◮ Gaussian: k ( z ) ∝ exp −� z � 2 / 2 ; ◮ Epanechnikov: k ( z ) ∝ max { 0 , 1 − � z � 2 } . 28 / 54

Examples — univariate sampling. Univariate sample, kernel density estimate (kde), GMM E-M. kde 0.4 gmm 0.3 0.2 0.1 0.0 2 1 0 1 2 3 4 5 29 / 54

Examples — univariate sampling. Univariate sample, kernel density estimate (kde), GAN kde. 0.4 kde gan kde 0.3 0.2 0.1 0.0 2 1 0 1 2 3 4 5 This is admittedly very indirect! As mentioned, there aren’t great ways to get GAN/VAE density information. 30 / 54

Examples — bivariate sampling. Bivariate sample, GMM E-M. 6 5 4 3 2 1 0 1 2 2 1 0 1 2 3 4 5 31 / 54

Examples — bivariate sampling. Bivariate sample, kernel density estimate (kde). 6 5 4 3 2 1 0 1 2 2 1 0 1 2 3 4 5 32 / 54

Examples — bivariate sampling. Bivariate sample, GAN kde. 6 5 4 3 2 1 0 1 2 2 1 0 1 2 3 4 5 Question: how will this plot change with network capacity? 33 / 54

Approaches we’ve seen for modeling distributions. 34 / 54

Approaches we’ve seen for modeling distributions. Let’s survey our approaches to density estimation. ◮ Graphical models: can be interpretable, can encode domain knowledge. ◮ Kernel density estimation: easy to implement, converges to the right thing, suffers a curse of dimension. ◮ Training: easy for KDE, messy for graphical models. Interpretability: fine for both. Sampling: easy for both. Probability measurements: easy for KDE, sometimes easy for graphical model. 34 / 54

Approaches we’ve seen for modeling distributions. Let’s survey our approaches to density estimation. ◮ Graphical models: can be interpretable, can encode domain knowledge. ◮ Kernel density estimation: easy to implement, converges to the right thing, suffers a curse of dimension. ◮ Training: easy for KDE, messy for graphical models. Interpretability: fine for both. Sampling: easy for both. Probability measurements: easy for KDE, sometimes easy for graphical model. Deep networks. ◮ Either we have easy sampling, or we can estimate densities. Doing both seems to have major computational or data costs. 34 / 54

Brief VAE Recap 35 / 54

(Variational) Autoencoders Autoencoder : ◮ f g − − → latent z i = f ( x i ) − − → x i = g ( z i ) . ˆ x i map map � n 1 Objective: i =1 ℓ ( x i , ˆ x i ) . n 36 / 54

(Variational) Autoencoders Autoencoder : ◮ f g − − → latent z i = f ( x i ) − − → x i = g ( z i ) . ˆ x i map map � n 1 Objective: i =1 ℓ ( x i , ˆ x i ) . n Variational Autoencoder : ◮ f g − − → latent distribution µ i = f ( x i ) − − − − − − − → x i ∼ g # µ i . ˆ x i map pushforward � n 1 � ℓ ( x i , ˆ � Objective: x i ) + λ KL ( µ, µ i ) . i =1 n 36 / 54

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x i ∼ g # µ i ˆ 37 / 54

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.2 0.4 0.6 0.8 1.0 x i ∼ g # µ with small λ ˆ 37 / 54

Generative Adversarial Networks (GANs) 38 / 54

Generative network setup and training. ◮ We are given ( x i ) n i =1 ∼ ν . ◮ We want to find g so that ( g ( z i )) n i =1 ≈ ( x i ) n i =1 , where ( z i ) n i =1 ∼ µ . Problem: this isn’t as simple as fitting g ( z i ) ≈ x i . 39 / 54

Generative network setup and training. ◮ We are given ( x i ) n i =1 ∼ ν . ◮ We want to find g so that ( g ( z i )) n i =1 ≈ ( x i ) n i =1 , where ( z i ) n i =1 ∼ µ . Problem: this isn’t as simple as fitting g ( z i ) ≈ x i . Solutions: ◮ VAE: For each x i , construct distribution µ i , so that ˆ x i ∼ g # µ i and x i are close, as are µ i and µ . To generate fresh samples, get z ∼ µ and output g ( z ) . ◮ GAN: Pick a distance notion between distributions (or between samples ( g ( z i )) n i =1 and ( x i ) n i =1 ) and pick g to minimize that! 39 / 54

GAN overview GAN approach: we minimize D ( ν, g # µ ) directly, where “ D ” is some notion of distance/divergence: ◮ Jensen-Shannon Divergence (original GAN paper). ◮ Wasserstein distance (influential follow-up). 40 / 54

GAN overview GAN approach: we minimize D ( ν, g # µ ) directly, where “ D ” is some notion of distance/divergence: ◮ Jensen-Shannon Divergence (original GAN paper). ◮ Wasserstein distance (influential follow-up). Each distance is computed with an alternating/adversarial scheme: 1. We have some current choice g t , and use it to produce a sample x i ) n (ˆ i =1 with ˆ x i = g t ( z i ) . x i ) n 2. We train a discriminator/critic f t to find differences between (ˆ i =1 and ( x i ) n i =1 . 3. We then pick a new generator g t +1 , trained to fool f t ! 40 / 54

Jensen-Shannon divergence (original GAN) 41 / 54

Original GAN formulation p = p 2 + p g Let p, p g denote density of data and generator, ˜ 2 . Original GAN minimizes Jensen-Shannon Divergence : 2 · JS ( p, p g ) = KL ( p, ˜ p ) + KL ( p g , ˜ p ) p ( x ) ln p ( x ) p g ( x ) ln p g ( x ) � � = p ( x ) d x + p ( x ) d x ˜ ˜ = E p ln p ( x ) p ( x ) + E p g ln p g ( x ) p ( x ) . ˜ ˜ 42 / 54

Original GAN formulation p = p 2 + p g Let p, p g denote density of data and generator, ˜ 2 . Original GAN minimizes Jensen-Shannon Divergence : 2 · JS ( p, p g ) = KL ( p, ˜ p ) + KL ( p g , ˜ p ) p ( x ) ln p ( x ) p g ( x ) ln p g ( x ) � � = p ( x ) d x + p ( x ) d x ˜ ˜ = E p ln p ( x ) p ( x ) + E p g ln p g ( x ) p ( x ) . ˜ ˜ But we’ve been saying we can’t write down p g ? 42 / 54

Original GAN formulation p = p 2 + p g Let p, p g denote density of data and generator, ˜ 2 . Original GAN minimizes Jensen-Shannon Divergence : 2 · JS ( p, p g ) = KL ( p, ˜ p ) + KL ( p g , ˜ p ) p ( x ) ln p ( x ) p g ( x ) ln p g ( x ) � � = p ( x ) d x + p ( x ) d x ˜ ˜ = E p ln p ( x ) p ( x ) + E p g ln p g ( x ) p ( x ) . ˜ ˜ But we’ve been saying we can’t write down p g ? Original GAN approach applies alternating minimization to   n m  1 + 1 � �  . � � � � inf sup ln f ( x i ) ln 1 − f ( g ( z j )) n m g ∈G f ∈F i =1 j =1 f : X → (0 , 1) 42 / 54

Original GAN formulation and algorithm. Original GAN objective:   n m  1 + 1 � �  . � � � � inf sup ln f ( x i ) ln 1 − f ( g ( z j )) n m g ∈G f ∈F i =1 j =1 f : X → (0 , 1) Algorithm alternates these two steps: 1. Hold g fixed and optimize f . Specifically, generate a sample (ˆ x j ) m j =1 = ( g ( z j )) m j =1 , and approximately optimize   n m  1 + 1 � �  . � � � � sup ln f ( x i ) ln 1 − f (ˆ x j ) n m f ∈F i =1 j =1 f : X → (0 , 1) 2. Hold f fixed and optimize g . Specifically, generate ( z j ) m j =1 and approximately optimize   n m  1 + 1 � �  . � � � � inf ln f ( x i ) ln 1 − f ( g ( z j )) n m g ∈G i =1 j =1 43 / 54

Generative networks part 2: GANs 23 / 54 Recap on generative - PowerPoint PPT Presentation

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide a way to sample from any distribution. 1. Sample z , where denotes an efficiently sampleable distribution (e.g., uniform or Gaussian). 2.

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

Advanced Section #8: Generative Adversarial Networks (GANs) CS109B Data Science 2 Vincent Casser

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

generative design systems Generative Brief Design Definitions Workshop Processes

Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018

Generative Adversarial Networks Benjamin Striner 1 1 Carnegie Mellon University April 8, 2019

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist NIPS 2016

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at

Generative Adversarial Networks (GANs) Prof. Seungchul Lee Industrial AI Lab. Source 1

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Re-Work Deep

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at

Notes and Announcements Midterm exam: Oct 20 , Wednesday, In Class Late Homeworks Turn

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

via Threshold-Based Pruning Edward Gan & Peter Bailis 1 MacroBase: Analytics on Fast Streams

Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

Extended Path Integral Formulation for Volumetric Transport T. Hachisuka I. Georgiev W. Jarosz

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Average-Case Acceleration Through Spectral Density Estimation and Universal Asymptotic Optimality

Generative networks part 2: GANs 23 / 54 Recap on generative - PowerPoint PPT Presentation

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide a way to sample from any distribution. 1. Sample z , where denotes an efficiently sampleable distribution (e.g., uniform or Gaussian). 2.

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

Advanced Section #8: Generative Adversarial Networks (GANs) CS109B Data Science 2 Vincent Casser

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

generative design systems Generative Brief Design Definitions Workshop Processes

Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018

Generative Adversarial Networks Benjamin Striner 1 1 Carnegie Mellon University April 8, 2019

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist NIPS 2016

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at

Generative Adversarial Networks (GANs) Prof. Seungchul Lee Industrial AI Lab. Source 1

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Re-Work Deep

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at

Notes and Announcements Midterm exam: Oct 20 , Wednesday, In Class Late Homeworks Turn

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

via Threshold-Based Pruning Edward Gan &amp; Peter Bailis 1 MacroBase: Analytics on Fast Streams

Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

Extended Path Integral Formulation for Volumetric Transport T. Hachisuka I. Georgiev W. Jarosz

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Average-Case Acceleration Through Spectral Density Estimation and Universal Asymptotic Optimality

via Threshold-Based Pruning Edward Gan & Peter Bailis 1 MacroBase: Analytics on Fast Streams