disentangling disentanglement in variational autoencoders
play

Disentangling Disentanglement in Variational Autoencoders ICML 2019 - PowerPoint PPT Presentation

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments of Statistics and Engineering Science, University of Oxford Emile Mathieu , Tom Rainforth , N. Siddharth , Yee Whye Teh Variational


  1. Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments of Statistics and Engineering Science, University of Oxford Emile Mathieu ⋆ , Tom Rainforth ⋆ , N. Siddharth ⋆ , Yee Whye Teh

  2. Variational Autoencoders x 1 Factors (makeup) z n (gender) z l Model Inference Model Generative z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 1 z m (beard)

  3. Disentanglement Independence Factors Meaningful (makeup) z n (gender) z l x j x i Model Inference Model Generative z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 x 1 1 z m (beard)

  4. 1 Model Factors Independent (scale) z n (shape) z l Model Inference Generative x 1 z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 Disentanglement = Independence z m (angle)

  5. 1 Model Factors Co-Related (makeup) z n (gender) z l Model Inference Generative x 1 z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 Decomposition ∈ {Independence, Clustering, Sparsity, …} z m (beard)

  6. Decomposition: A Generalization of Disentanglement Characterise decomposition as the fulfilment of two factors: (a) level of overlap between encodings in the latent space, 2 (b) matching between the marginal posterior q φ ( z ) and structured prior p ( z ) to constrain with the required decomposition.

  7. Decomposition: An Analysis Desired Structure 3 p ( z )

  8. Decomposition: An Analysis Insufficient Overlap 3 ent q φ ( z | x ) p θ ( x | z ) p D ( x ) q φ ( z ) p ( z ) p θ ( x )

  9. Decomposition: An Analysis Too Much Overlap 3 ch q φ ( z | x ) p θ ( x | z ) p D ( x ) q φ ( z ) p ( z ) p θ ( x )

  10. Decomposition: An Analysis Appropriate Overlap 3 ate q φ ( z | x ) p θ ( x | z ) p D ( x ) q φ ( z ) p ( z ) p θ ( x )

  11. 4 maxent It places no direct pressure on the latents to be independent! Implications constant Overlap — Deconstructing the β -VAE L β ( x ) = E q φ ( z | x ) [ log p θ ( x | z )] − β · KL ( q φ ( z | x ) || p ( z )) = L ( x ) ( π θ,β , q φ ) +( β − 1 ) · H q φ + log F β � �� � � �� � � �� � ELBO with β -annealed prior β -VAE disentangles largely by controlling the level of overlap

  12. Decomposition: Objective Reconstruct observations Control level of overlap Impose desired structure 5 L α,β ( x ) = E q φ ( z | x ) [ log p θ ( x | z )] − β · KL ( q φ ( z | x ) � p ( z )) − α · D ( q φ ( z ) , p ( z ))

  13. Decomposition: Generalising Disentanglement 1Matthey et al., dSprites: Disentanglement testing Sprites dataset , p. 1. 2Kim and Mnih, “Disentangling by Factorising”, p. 2. 6 Independence : p ( z ) = N ( 0 , σ ⋆ ) Figure 1: β -VAE trained on 2D Shapes 1 computing disentanglement 2 .

  14. Decomposition: Generalising Disentanglement pinwheel dataset. 3 3 http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab . 7 Clustering : p ( z ) = ∑ k ρ k · N ( µ k , σ k ) β = 0 . 01 β = 0 . 5 β = 1 . 0 β = 1 . 2 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 α = 0 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 β = 0 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 α = 1 α = 3 α = 5 α = 8 Figure 2: Density of aggregate posterior q φ ( z ) with different α , β for the

  15. Decomposition: Generalising Disentanglement 4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 8 Sparsity : p ( z ) = ∏ d ( 1 − γ ) · N ( z d ; 0 , 1 ) + γ · N ( z d ; 0 , σ 2 0 ) Avg. latent magnitude Trouser 0.6 Dress Shirt 0.4 0.2 0.0 0 5 10 15 20 25 30 35 40 45 Latent dimension Figure 3: Sparsity of learnt representations for the Fashion-MNIST 4 dataset.

  16. leg separation Decomposition: Generalising Disentanglement dress width shirt fit sleeve style Figure 3: Latent space traversals for “active” dimensions 4 . 4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 8 Sparsity : p ( z ) = ∏ d ( 1 − γ ) · N ( z d ; 0 , 1 ) + γ · N ( z d ; 0 , σ 2 0 ) (a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40

  17. Decomposition: Generalising Disentanglement 4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 8 Sparsity : p ( z ) = ∏ d ( 1 − γ ) · N ( z d ; 0 , 1 ) + γ · N ( z d ; 0 , σ 2 0 ) Avg. Normalised Sparsity 0.5 0.4 0.3 0.2 0 200 400 600 800 1000 alpha γ = 0, β = 0.1 γ = 0, β = 1 γ = 0, β = 5 γ = 0.8, β = 0.1 γ = 0.8, β = 1 γ = 0.8, β = 5 Figure 3: Sparsity vs regularisation strength α (higher better) 4 .

  18. Recap We propose and develop: • Decomposition: a generalisation of disentanglement involving: (a) overlap of latent encodings only contributes to overlap. • An objective that incorporates both factors (a) and (b). • Experiments that showcase efficacy at different decompositions: 9 (b) match between q φ ( z ) and p ( z ) • A theoretical analysis of the β -VAE objective showing it primarily • independence • clustering • sparsity

  19. Emile Mathieu Tom Rainforth N. Siddharth Yee Whye Teh Code Paper iffsid/disentangling-disentanglement arXiv:1812.02833 Come talk to us at our poster: #5 9

Recommend


More recommend