Learning Discrete and Continuous Factors of Data via Alternating - PowerPoint PPT Presentation
Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1 Motivation Shape? square Postion x? 0.3 Our goal is to disentangle the underlying
Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1
Motivation Shape? square Postion x? 0.3 ◮ Our goal is to disentangle the underlying explanatory factors of Postion y? 0.7 data without any supervision. Rotation? 40 ° Size? 0.5 2
Motivation square square 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 0.5 3
Motivation square ellipse 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 0.5 3
Motivation square square 0.3 1 0.7 0.7 40 ° 40 ° 0.5 0.5 3
Motivation square square 0.3 0.3 0.7 0.7 40 ° 0 ° 0.5 0.5 3
Motivation square square 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 1 3
Motivation ◮ Most recent methods focus on learning only the continuous factors of variation. 4
Motivation ◮ Most recent methods focus on learning only the continuous factors of variation. ◮ Learning discrete representations is known as a challenging problem. However, learning continuous and discrete representations is a more challenging problem. 4
Outline Method Experiments Conclusion Method 5
Overview of our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer Method 6
Overview of our method ◮ We propose an efficient procedure for implicitly penalizing the total correlation by controlling the information flow on each variables . ◮ We propose a method for jointly learning discrete and continuous latent variables in an alternating maximization framework . Method 6
Limitation of β -VAE framework ◮ β -VAE sets β > 1 to penalize TC ( z ) for disentangled representations . ◮ However, it penalizes the mutual information( = I ( x, z ) ) between the data and the latent variables. Method 7
Our method ◮ We aim at penalizing TC ( z ) by sequentially penalizing the individual summand I ( z 1 : i − 1 ; z i ) . m � TC ( z ) = I ( z 1 : i − 1 ; z i ) . i =2 Method 8
Our method ◮ We aim at penalizing TC ( z ) by sequentially penalizing the individual summand I ( z 1 : i − 1 ; z i ) . m � TC ( z ) = I ( z 1 : i − 1 ; z i ) . i =2 ◮ We implicitly minimizes each summand, I ( z 1 : i − 1 ; z i ) by sequentially maximizing the left hand side I ( x ; z 1: i ) for all i = 2 , . . . , m 1. I ( x ; z 1: i ) = I ( x ; z 1: i − 1 ) + I ( x ; z i ) − I ( z 1 : i − 1 ; z i ) . ↑ 2. I ( x ; z 1: i ) = I ( x ; z 1: i − 1 ) + I ( x ; z i ) − I ( z 1 : i − 1 ; z i ) . ↑ • ↑ ↓ Method 8
Our method ◮ In practice, we maximize I ( x ; z 1: i ) by minimizing reconstruction term while penalizing z i +1: m with high β ( := β h ) and the others with small β ( := β l ). Method 9
Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10
Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10
Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10
Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10
Graphical model Figure: Graphical models view. Solid lines denote the generative process and the dashed lines denote the inference process . x, z, d denotes the data, continuous latent code, and the discrete latent code respectively. Method 11
Motviation of our method ◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE . Method 12
Motviation of our method ◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE . ◮ Inspired by these findings, our idea is to alternate between finding the most likely discrete configuration of the variables given the continuous factors, and updating the parameters ( φ, θ ) given the discrete configurations. Method 12
Construct unary term 𝑦 (1) ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑦 (1) Method 13
Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑦 (1) Method 13
Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො 𝑦 (1) Method 13
Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො Method 13
Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are rec represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො rec 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො rec Method 13
Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are rec represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑣 1 𝑦 (1) 𝑓 𝑙 ◮ u ( i ) 𝑦 (1) denotes the vector of the likelihood ො θ log p θ ( x ( i ) | z ( i ) , e k ) evaluated at each rec k ∈ [ S ] . 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො rec Method 13
Alternating minimization scheme ◮ Our goal is to maximize the variational lower bound of the following objective, L ( θ, φ ) = I ( x ; [ z, d ]) − β E x ∼ p ( x ) D KL ( q φ ( z | x ) � p ( z )) − λD KL ( q ( d ) � p ( d )) ◮ After rearranging the terms, we arrive at the following optimization problem. n � ⊺ d ( i ) − λ ′ � u ( i ) d ( i ) ⊺ d ( j ) maximize maximize θ θ,φ d (1) ,...d ( n ) i =1 i � = j � �� � := L LB ( θ,φ ) n � D KL ( q φ ( z | x ( i ) ) || p ( z )) − β i =1 � d ( i ) � 1 = 1 , d ( i ) ∈ { 0 , 1 } S , ∀ i, subject to Method 14
Finding the most likely discrete configuration 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15
Finding the most likely discrete configuration 𝑓 1 𝑦 (1) 𝑓 1 𝑦 (i) 𝑦 (n) 𝑓 1 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15
Finding the most likely discrete configuration 𝑓 1 𝑦 (1) 𝑓 1 𝑦 (i) 𝑦 (n) 𝑓 1 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑓 𝑙 𝑓 𝑙 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑓 𝑙 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.