Cycle-Consistent Adversarial Learning as Approximate Bayesian - PowerPoint PPT Presentation

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales

Motivation: Unpaired Image-to-Image Translation ⋯ Figure 1: From Zhu et al. (2017) ⋯ ⋯ 1 Monet Photos Zebras Horses Summer Winter Monet photo zebra horse summer winter photo Monet horse zebra winter summer Photograph Monet Van Gogh Cezanne Ukiyo-e Paired Unpaired X Y x i y i ( ) ( ) n o , n o , , n o ,

Cycle-Consistent Adversarial Learning (CycleGAN) gan Encourage tighter correspondences—must be able to reconstruct Cycle-consistency losses • Introduced by Kim et al. (2017); Zhu et al. (2017) gan 2 Yield realistic outputs in the other domain. Distribution matching (gan objectives) • Forward and reverse mappings m φ : x �→ z and µ θ : z �→ x • Discriminators D α and D β ℓ reverse ( α ; φ ) = E p ∗ ( z ) [log D α ( z )] + E q ∗ ( x ) [log( 1 − D α ( m φ ( x )))] , ( β ; θ ) = E p ∗ ( x ) [log D β ( x )] + E p ∗ ( z ) [log( 1 − D β ( µ θ ( z )))] . ℓ forward output from input and vice versa. May alleviate mode-collapse const ( θ , φ ) = E q ∗ ( x ) [ ∥ x − µ θ ( m φ ( x )) ∥ ρ ρ ] , ℓ reverse const ( θ , φ ) = E p ∗ ( z ) [ ∥ z − m φ ( µ θ ( z )) ∥ ρ ℓ forward ρ ] .

Contributions We cast the problem of learning inter-domain correspondences variable model (lvm) . 1. We introduce implicit latent variable models (ilvms) , • prior over latent variables specified flexibly as implicit distribution . 2. We develop a new variational inference (vi) algorithm based on • minimizing the symmetric Kullback-Leibler (kl) divergence • between a variational and exact joint distribution . 3. We demonstrate that cyclegan (Kim et al., 2017; Zhu et al., 2017) can be instantiated as a special case of our framework. 3 without paired data as approximate Bayesian inference in a latent

Implicit Latent Variable Models z n treatment of prior information. • Offers utmost degree of flexibility in • Given only by a finite collection Implicit Prior usual) Join Distribution N Prescribed Likelihood x n likelihood prior 4 Likelihood p θ ( x n | z n ) is prescribed (as p ∗ ( z ) p θ ( x , z ) = p θ ( x | z ) � �� Prior p ∗ ( z ) over latent variables specified as implicit distribution θ Z ∗ = { z ∗ m } M m = 1 of its samples, z ∗ m ∼ p ∗ ( z )

Implicit Latent Variable Models: Example Unpaired Image-to-Image Translation one domain. 5 • Prior distribution p ∗ ( z ) specified by images Z ∗ = { z ∗ m } M m = 1 from • Empirical data distribution q ∗ ( x ) specified by images X ∗ = { x n } N n = 1 from another domain. (a) samples from p ∗ ( z ) (b) a sample from q ∗ ( x )

Inference in Implicit Latent Variable Models Having specified the generative model, our aims are Classical Variational Inference • Reduces inference problem to optimization problem 6 • Optimize θ by maximizing marginal likelihood p θ ( x ) • Infer hidden representations z by computing posterior p θ ( z | x ) Both require intractable p θ ( x ) • must resort to approximate inference • Approximate exact posterior p θ ( z | x ) with variational posterior q φ ( z | x ) φ kl [ q φ ( z | x ) ∥ p θ ( z | x )] min

Symmetric Joint-Matching Variational Inference

Joint-Matching Variational Inference Variational Joint • Consider instead directly approximating the exact joint with variational joint x n z n N 7 q φ ( x , z ) = q φ ( z | x ) q ∗ ( x ) • variational posterior q φ ( z | x ) also prescribed φ θ

Symmetric Joint-Matching Variational Inference Minimize symmetric kl divergence between joints for details) 2. Helps avoid under/over-dispersed approximations (see paper 1. Because we can: Why? reverse kl 8 where forward kl kl symm [ p θ ( x , z ) ∥ q φ ( x , z )] kl symm [ p ∥ q ] = kl [ p ∥ q ] + kl [ q ∥ p ] � �� • kl symm [ p θ ( x , z ) ∥ q φ ( x , z )] tractable • kl symm [ p θ ( z | x ) ∥ q φ ( z | x )] intractable

Reverse kl Variational Objective constant sample! intractable • Minimizing reverse kl divergence between joints equivalent to • Recall (negative) elbo, 9 maximizing usual evidence lower bound (elbo) , kl [ q φ ( x , z ) ∥ p θ ( x , z )] = E q φ ( x , z ) [log q φ ( x , z ) − log p θ ( x , z )] − H [ q ∗ ( x )] = E q φ ( x , z ) [log q φ ( z | x ) − log p θ ( x , z )] � �� L nelbo ( θ , φ ) + E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] L nelbo ( θ , φ ) = E q ∗ ( x ) q φ ( z | x ) [ − log p θ ( x | z )] � �� L nell ( θ , φ ) • kl term is intractable as prior p ∗ ( z ) is unavailable—can only

Forward kl Variational Objective constant unavailable—can only sample! intractable • Minimizing forward kl divergence between joints (aplbo) 10 kl [ p θ ( x , z ) ∥ q φ ( x , z )] = E p θ ( x , z ) [log p θ ( x , z ) − log q φ ( x , z )] − H [ p ∗ ( z )] = E p θ ( x , z ) [log p θ ( x | z ) − log q φ ( x , z )] � �� L naplbo ( θ , φ ) • New variational objective, aggregate posterior lower bound + E p ∗ ( z ) kl [ p θ ( x | z ) ∥ q ∗ ( x )] L naplbo ( θ , φ ) = E p ∗ ( z ) p θ ( x | z ) [ − log q φ ( z | x )] � �� L nelp ( θ , φ ) • kl term is intractable as empirical data distribution q ∗ ( x ) is

Density Ratio Estimation and f -divergence Approximation f • Estimate divergence using a l.b. that just requires samples! • Turns divergence estimation into an optimization problem f where tractable General f -divergence lower bound (Nguyen et al., 2010) 11 intractable For convex lower-semicontinuous function f : R + → R , E q ∗ ( x ) D f [ p ∗ ( z ) ∥ q φ ( z | x )] ≥ max α L latent ( α ; φ ) , � �� ( α ; φ ) = E q ∗ ( x ) q φ ( z | x ) [ f ′ ( r α ( z ; x ))] − E q ∗ ( x ) p ∗ ( z ) [ f ⋆ ( f ′ ( r α ( z ; x )))] L latent • r α is a neural net with parameters α , with equality at α ( z ; x ) = q φ ( z | x ) r ∗ p ∗ ( z )

kl divergence lower bound tractable tractable kl tractable intractable tractable Example: kl divergence lower bound Yields estimate of the elbo where all terms are tractable, kl where 12 intractable kl For f ( u ) = u log u , we instantiate the kl lower bound E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] ≥ max α L latent ( α ; φ ) � �� L latent ( α ; φ ) = E q ∗ ( x ) q φ ( z | x ) [log r α ( z ; x )] − E q ∗ ( x ) p ∗ ( z ) [ r α ( z ; x ) − 1 ] + E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] L nelbo ( θ , φ ) = L nell ( θ , φ ) � �� ≥ max α L nell ( θ , φ ) + L latent ( α ; φ ) � ��

CycleGAN as a Special Case

Cycle-consistency as Conditional Probability Maximization For Gaussian likelihood and variational posterior prior probabilities: Cycle-consistency corresponds to maximizing conditional 13 p θ ( x | z ) = N ( x | µ θ ( z ) , τ 2 I ) , q φ ( z | x ) = N ( z | m φ ( x ) , t 2 I ) const ( θ , φ ) from L nell ( θ , φ ) Can instantiate ℓ reverse as posterior q φ ( z | x ) degenerates (as t → 0) const ( θ , φ ) from L nelp ( θ , φ ) Can instantiate ℓ forward as likelihood p θ ( x | z ) degenerates (as τ → 0) • ell. forces q φ ( z | x ) to place mass on hidden representations that recover the data • elp. forces p θ ( x | z ) to generate observations that recover the

Distribution Matching as Regularization f gan kl gan For appropriate setting of f , and simplifying the mappings and Summary Approximately minimizes intractable divergences: kl gan f discriminators, gan 14 • Can instantiate ℓ reverse ( α ; φ ) from L latent ( α ; φ ) ( β ; θ ) from L observed • Can instantiate ℓ forward ( β ; θ ) • D f [ p ∗ ( z ) ∥ q φ ( z | x )] — forces q φ ( z | x ) to match prior p ∗ ( z ) • D f [ q ∗ ( x ) ∥ p θ ( x | z )] — forces p θ ( x | z ) to match data q ∗ ( x ) L nelbo ( θ , φ ) ≥ max α L nell ( θ , φ ) + L latent ( α ; φ ) � �� const ( θ , φ ) ( α ; φ ) ℓ reverse ℓ reverse L naplbo ( θ , φ ) ≥ max β L nelp ( θ , φ ) + L observed ( β ; θ ) � �� const ( θ , φ ) ( β ; θ ) ℓ forward ℓ forward

Conclusion • Formulated implicit latent variable models , which introduces implicit prior over latent variables knowledge • Developed new paradigm for variational inference • directly approximates exact joint distribution • minimizes the symmetric kl divergence • Provided theoretical treatment of the links between CycleGAN methods and Variational Bayes Poster Session To find out more, come visit us at our poster! Poster #14, Session 4 (17:10-18:00 Saturday, 14 July) 15 • Offers utmost degree of flexibility in incorporating prior

Questions? 15

Cycle-Consistent Adversarial Learning as Approximate Bayesian - PowerPoint PPT Presentation

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales Motivation: Unpaired Image-to-Image Translation

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Feasibility of Consistent, Feasibility of Consistent, Feasibility of Consistent, Feasibility of

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

CSS Modules with BEM Consistent Design Consistent Design Different Module Versions Consistent

General Structure of a PW code Self-Consistent KS eqs. or Global Minimization approach

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Jun-Yan Zhu, et,

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

A Probabilistic Model for Using Social Networks in Personalized Item Recommendation Allison J.B.

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

The Discovery of Asymptotic Freedom & The Emergence of QCD David Gross Nobel Lecture

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1

Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen Zhongxiang Dai Bryan Kian Hsiang Low

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay Viswanathan, Ganesh

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Intractable Problems and DP with Bitmask Problem Solving Club March 1, 2017 Agenda

Cycle-Consistent Adversarial Learning as Approximate Bayesian - PowerPoint PPT Presentation

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales Motivation: Unpaired Image-to-Image Translation

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Feasibility of Consistent, Feasibility of Consistent, Feasibility of Consistent, Feasibility of

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

CSS Modules with BEM Consistent Design Consistent Design Different Module Versions Consistent

General Structure of a PW code Self-Consistent KS eqs. or Global Minimization approach

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Jun-Yan Zhu, et,

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

A Probabilistic Model for Using Social Networks in Personalized Item Recommendation Allison J.B.

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

The Discovery of Asymptotic Freedom &amp; The Emergence of QCD David Gross Nobel Lecture

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1

Deep Gaussian Processes (IPVI DGP) Haibin Yu*, Yizhou Chen* Zhongxiang Dai Bryan Kian Hsiang Low

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay Viswanathan, Ganesh

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Intractable Problems and DP with Bitmask Problem Solving Club March 1, 2017 Agenda

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

The Discovery of Asymptotic Freedom & The Emergence of QCD David Gross Nobel Lecture

Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen Zhongxiang Dai Bryan Kian Hsiang Low