Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales
Motivation: Unpaired Image-to-Image Translation ⋯ Figure 1: From Zhu et al. (2017) ⋯ ⋯ 1 Monet Photos Zebras Horses Summer Winter Monet photo zebra horse summer winter photo Monet horse zebra winter summer Photograph Monet Van Gogh Cezanne Ukiyo-e Paired Unpaired X Y x i y i ( ) ( ) n o , n o , , n o ,
Cycle-Consistent Adversarial Learning (CycleGAN) gan Encourage tighter correspondences—must be able to reconstruct Cycle-consistency losses • Introduced by Kim et al. (2017); Zhu et al. (2017) gan 2 Yield realistic outputs in the other domain. Distribution matching (gan objectives) • Forward and reverse mappings m φ : x �→ z and µ θ : z �→ x • Discriminators D α and D β ℓ reverse ( α ; φ ) = E p ∗ ( z ) [log D α ( z )] + E q ∗ ( x ) [log( 1 − D α ( m φ ( x )))] , ( β ; θ ) = E p ∗ ( x ) [log D β ( x )] + E p ∗ ( z ) [log( 1 − D β ( µ θ ( z )))] . ℓ forward output from input and vice versa. May alleviate mode-collapse const ( θ , φ ) = E q ∗ ( x ) [ ∥ x − µ θ ( m φ ( x )) ∥ ρ ρ ] , ℓ reverse const ( θ , φ ) = E p ∗ ( z ) [ ∥ z − m φ ( µ θ ( z )) ∥ ρ ℓ forward ρ ] .
Contributions We cast the problem of learning inter-domain correspondences variable model (lvm) . 1. We introduce implicit latent variable models (ilvms) , • prior over latent variables specified flexibly as implicit distribution . 2. We develop a new variational inference (vi) algorithm based on • minimizing the symmetric Kullback-Leibler (kl) divergence • between a variational and exact joint distribution . 3. We demonstrate that cyclegan (Kim et al., 2017; Zhu et al., 2017) can be instantiated as a special case of our framework. 3 without paired data as approximate Bayesian inference in a latent
Implicit Latent Variable Models z n treatment of prior information. • Offers utmost degree of flexibility in • Given only by a finite collection Implicit Prior usual) Join Distribution N Prescribed Likelihood x n likelihood prior 4 Likelihood p θ ( x n | z n ) is prescribed (as p ∗ ( z ) p θ ( x , z ) = p θ ( x | z ) � �� � � �� � Prior p ∗ ( z ) over latent variables specified as implicit distribution θ Z ∗ = { z ∗ m } M m = 1 of its samples, z ∗ m ∼ p ∗ ( z )
Implicit Latent Variable Models: Example Unpaired Image-to-Image Translation one domain. 5 • Prior distribution p ∗ ( z ) specified by images Z ∗ = { z ∗ m } M m = 1 from • Empirical data distribution q ∗ ( x ) specified by images X ∗ = { x n } N n = 1 from another domain. (a) samples from p ∗ ( z ) (b) a sample from q ∗ ( x )
Inference in Implicit Latent Variable Models Having specified the generative model, our aims are Classical Variational Inference • Reduces inference problem to optimization problem 6 • Optimize θ by maximizing marginal likelihood p θ ( x ) • Infer hidden representations z by computing posterior p θ ( z | x ) Both require intractable p θ ( x ) • must resort to approximate inference • Approximate exact posterior p θ ( z | x ) with variational posterior q φ ( z | x ) φ kl [ q φ ( z | x ) ∥ p θ ( z | x )] min
Symmetric Joint-Matching Variational Inference
Joint-Matching Variational Inference Variational Joint • Consider instead directly approximating the exact joint with variational joint x n z n N 7 q φ ( x , z ) = q φ ( z | x ) q ∗ ( x ) • variational posterior q φ ( z | x ) also prescribed φ θ
Symmetric Joint-Matching Variational Inference Minimize symmetric kl divergence between joints for details) 2. Helps avoid under/over-dispersed approximations (see paper 1. Because we can: Why? reverse kl 8 where forward kl kl symm [ p θ ( x , z ) ∥ q φ ( x , z )] kl symm [ p ∥ q ] = kl [ p ∥ q ] + kl [ q ∥ p ] � �� � � �� � • kl symm [ p θ ( x , z ) ∥ q φ ( x , z )] tractable • kl symm [ p θ ( z | x ) ∥ q φ ( z | x )] intractable
Reverse kl Variational Objective constant sample! intractable • Minimizing reverse kl divergence between joints equivalent to • Recall (negative) elbo, 9 maximizing usual evidence lower bound (elbo) , kl [ q φ ( x , z ) ∥ p θ ( x , z )] = E q φ ( x , z ) [log q φ ( x , z ) − log p θ ( x , z )] − H [ q ∗ ( x )] = E q φ ( x , z ) [log q φ ( z | x ) − log p θ ( x , z )] � �� � � �� � L nelbo ( θ , φ ) + E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] L nelbo ( θ , φ ) = E q ∗ ( x ) q φ ( z | x ) [ − log p θ ( x | z )] � �� � � �� � L nell ( θ , φ ) • kl term is intractable as prior p ∗ ( z ) is unavailable—can only
Forward kl Variational Objective constant unavailable—can only sample! intractable • Minimizing forward kl divergence between joints (aplbo) 10 kl [ p θ ( x , z ) ∥ q φ ( x , z )] = E p θ ( x , z ) [log p θ ( x , z ) − log q φ ( x , z )] − H [ p ∗ ( z )] = E p θ ( x , z ) [log p θ ( x | z ) − log q φ ( x , z )] � �� � � �� � L naplbo ( θ , φ ) • New variational objective, aggregate posterior lower bound + E p ∗ ( z ) kl [ p θ ( x | z ) ∥ q ∗ ( x )] L naplbo ( θ , φ ) = E p ∗ ( z ) p θ ( x | z ) [ − log q φ ( z | x )] � �� � � �� � L nelp ( θ , φ ) • kl term is intractable as empirical data distribution q ∗ ( x ) is
Density Ratio Estimation and f -divergence Approximation f • Estimate divergence using a l.b. that just requires samples! • Turns divergence estimation into an optimization problem f where tractable General f -divergence lower bound (Nguyen et al., 2010) 11 intractable For convex lower-semicontinuous function f : R + → R , E q ∗ ( x ) D f [ p ∗ ( z ) ∥ q φ ( z | x )] ≥ max α L latent ( α ; φ ) , � �� � � �� � ( α ; φ ) = E q ∗ ( x ) q φ ( z | x ) [ f ′ ( r α ( z ; x ))] − E q ∗ ( x ) p ∗ ( z ) [ f ⋆ ( f ′ ( r α ( z ; x )))] L latent • r α is a neural net with parameters α , with equality at α ( z ; x ) = q φ ( z | x ) r ∗ p ∗ ( z )
kl divergence lower bound tractable tractable kl tractable intractable tractable Example: kl divergence lower bound Yields estimate of the elbo where all terms are tractable, kl where 12 intractable kl For f ( u ) = u log u , we instantiate the kl lower bound E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] ≥ max α L latent ( α ; φ ) � �� � � �� � L latent ( α ; φ ) = E q ∗ ( x ) q φ ( z | x ) [log r α ( z ; x )] − E q ∗ ( x ) p ∗ ( z ) [ r α ( z ; x ) − 1 ] + E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] L nelbo ( θ , φ ) = L nell ( θ , φ ) � �� � � �� � ≥ max α L nell ( θ , φ ) + L latent ( α ; φ ) � �� � � �� �
CycleGAN as a Special Case
Cycle-consistency as Conditional Probability Maximization For Gaussian likelihood and variational posterior prior probabilities: Cycle-consistency corresponds to maximizing conditional 13 p θ ( x | z ) = N ( x | µ θ ( z ) , τ 2 I ) , q φ ( z | x ) = N ( z | m φ ( x ) , t 2 I ) const ( θ , φ ) from L nell ( θ , φ ) Can instantiate ℓ reverse as posterior q φ ( z | x ) degenerates (as t → 0) const ( θ , φ ) from L nelp ( θ , φ ) Can instantiate ℓ forward as likelihood p θ ( x | z ) degenerates (as τ → 0) • ell. forces q φ ( z | x ) to place mass on hidden representations that recover the data • elp. forces p θ ( x | z ) to generate observations that recover the
Distribution Matching as Regularization f gan kl gan For appropriate setting of f , and simplifying the mappings and Summary Approximately minimizes intractable divergences: kl gan f discriminators, gan 14 • Can instantiate ℓ reverse ( α ; φ ) from L latent ( α ; φ ) ( β ; θ ) from L observed • Can instantiate ℓ forward ( β ; θ ) • D f [ p ∗ ( z ) ∥ q φ ( z | x )] — forces q φ ( z | x ) to match prior p ∗ ( z ) • D f [ q ∗ ( x ) ∥ p θ ( x | z )] — forces p θ ( x | z ) to match data q ∗ ( x ) L nelbo ( θ , φ ) ≥ max α L nell ( θ , φ ) + L latent ( α ; φ ) � �� � � �� � const ( θ , φ ) ( α ; φ ) ℓ reverse ℓ reverse L naplbo ( θ , φ ) ≥ max β L nelp ( θ , φ ) + L observed ( β ; θ ) � �� � � �� � const ( θ , φ ) ( β ; θ ) ℓ forward ℓ forward
Conclusion • Formulated implicit latent variable models , which introduces implicit prior over latent variables knowledge • Developed new paradigm for variational inference • directly approximates exact joint distribution • minimizes the symmetric kl divergence • Provided theoretical treatment of the links between CycleGAN methods and Variational Bayes Poster Session To find out more, come visit us at our poster! Poster #14, Session 4 (17:10-18:00 Saturday, 14 July) 15 • Offers utmost degree of flexibility in incorporating prior
Questions? 15
Recommend
More recommend