learning discrete and continuous factors of data via
play

Learning Discrete and Continuous Factors of Data via Alternating - PowerPoint PPT Presentation

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1 Motivation Shape? square Postion x? 0.3 Our goal is to disentangle the underlying


  1. Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1

  2. Motivation Shape? square Postion x? 0.3 ◮ Our goal is to disentangle the underlying explanatory factors of Postion y? 0.7 data without any supervision. Rotation? 40 ° Size? 0.5 2

  3. Motivation square square 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 0.5 3

  4. Motivation square ellipse 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 0.5 3

  5. Motivation square square 0.3 1 0.7 0.7 40 ° 40 ° 0.5 0.5 3

  6. Motivation square square 0.3 0.3 0.7 0.7 40 ° 0 ° 0.5 0.5 3

  7. Motivation square square 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 1 3

  8. Motivation ◮ Most recent methods focus on learning only the continuous factors of variation. 4

  9. Motivation ◮ Most recent methods focus on learning only the continuous factors of variation. ◮ Learning discrete representations is known as a challenging problem. However, learning continuous and discrete representations is a more challenging problem. 4

  10. Outline Method Experiments Conclusion Method 5

  11. Overview of our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer Method 6

  12. Overview of our method ◮ We propose an efficient procedure for implicitly penalizing the total correlation by controlling the information flow on each variables . ◮ We propose a method for jointly learning discrete and continuous latent variables in an alternating maximization framework . Method 6

  13. Limitation of β -VAE framework ◮ β -VAE sets β > 1 to penalize TC ( z ) for disentangled representations . ◮ However, it penalizes the mutual information( = I ( x, z ) ) between the data and the latent variables. Method 7

  14. Our method ◮ We aim at penalizing TC ( z ) by sequentially penalizing the individual summand I ( z 1 : i − 1 ; z i ) . m � TC ( z ) = I ( z 1 : i − 1 ; z i ) . i =2 Method 8

  15. Our method ◮ We aim at penalizing TC ( z ) by sequentially penalizing the individual summand I ( z 1 : i − 1 ; z i ) . m � TC ( z ) = I ( z 1 : i − 1 ; z i ) . i =2 ◮ We implicitly minimizes each summand, I ( z 1 : i − 1 ; z i ) by sequentially maximizing the left hand side I ( x ; z 1: i ) for all i = 2 , . . . , m 1. I ( x ; z 1: i ) = I ( x ; z 1: i − 1 ) + I ( x ; z i ) − I ( z 1 : i − 1 ; z i ) . ↑ 2. I ( x ; z 1: i ) = I ( x ; z 1: i − 1 ) + I ( x ; z i ) − I ( z 1 : i − 1 ; z i ) . ↑ • ↑ ↓ Method 8

  16. Our method ◮ In practice, we maximize I ( x ; z 1: i ) by minimizing reconstruction term while penalizing z i +1: m with high β ( := β h ) and the others with small β ( := β l ). Method 9

  17. Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10

  18. Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10

  19. Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10

  20. Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10

  21. Graphical model Figure: Graphical models view. Solid lines denote the generative process and the dashed lines denote the inference process . x, z, d denotes the data, continuous latent code, and the discrete latent code respectively. Method 11

  22. Motviation of our method ◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE . Method 12

  23. Motviation of our method ◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE . ◮ Inspired by these findings, our idea is to alternate between finding the most likely discrete configuration of the variables given the continuous factors, and updating the parameters ( φ, θ ) given the discrete configurations. Method 12

  24. Construct unary term 𝑦 (1) ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑦 (1) Method 13

  25. Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑦 (1) Method 13

  26. Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො 𝑦 (1) Method 13

  27. Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො Method 13

  28. Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are rec represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො rec 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො rec Method 13

  29. Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are rec represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑣 1 𝑦 (1) 𝑓 𝑙 ◮ u ( i ) 𝑦 (1) denotes the vector of the likelihood ො θ log p θ ( x ( i ) | z ( i ) , e k ) evaluated at each rec k ∈ [ S ] . 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො rec Method 13

  30. Alternating minimization scheme ◮ Our goal is to maximize the variational lower bound of the following objective, L ( θ, φ ) = I ( x ; [ z, d ]) − β E x ∼ p ( x ) D KL ( q φ ( z | x ) � p ( z )) − λD KL ( q ( d ) � p ( d )) ◮ After rearranging the terms, we arrive at the following optimization problem.    n  � ⊺ d ( i ) − λ ′ �   u ( i ) d ( i ) ⊺ d ( j ) maximize maximize   θ   θ,φ d (1) ,...d ( n )   i =1 i � = j � �� � := L LB ( θ,φ ) n � D KL ( q φ ( z | x ( i ) ) || p ( z )) − β i =1 � d ( i ) � 1 = 1 , d ( i ) ∈ { 0 , 1 } S , ∀ i, subject to Method 14

  31. Finding the most likely discrete configuration 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15

  32. Finding the most likely discrete configuration 𝑓 1 𝑦 (1) 𝑓 1 𝑦 (i) 𝑦 (n) 𝑓 1 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15

  33. Finding the most likely discrete configuration 𝑓 1 𝑦 (1) 𝑓 1 𝑦 (i) 𝑦 (n) 𝑓 1 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑓 𝑙 𝑓 𝑙 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑓 𝑙 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15

Recommend


More recommend