Learning Discrete and Continuous Factors of Data via Alternating - PowerPoint PPT Presentation

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1

Motivation Shape? square Postion x? 0.3 ◮ Our goal is to disentangle the underlying explanatory factors of Postion y? 0.7 data without any supervision. Rotation? 40 ° Size? 0.5 2

Motivation square square 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 0.5 3

Motivation square ellipse 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 0.5 3

Motivation square square 0.3 1 0.7 0.7 40 ° 40 ° 0.5 0.5 3

Motivation square square 0.3 0.3 0.7 0.7 40 ° 0 ° 0.5 0.5 3

Motivation square square 0.3 0.3 0.7 0.7 40 ° 40 ° 0.5 1 3

Motivation ◮ Most recent methods focus on learning only the continuous factors of variation. 4

Motivation ◮ Most recent methods focus on learning only the continuous factors of variation. ◮ Learning discrete representations is known as a challenging problem. However, learning continuous and discrete representations is a more challenging problem. 4

Outline Method Experiments Conclusion Method 5

Overview of our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer Method 6

Overview of our method ◮ We propose an efficient procedure for implicitly penalizing the total correlation by controlling the information flow on each variables . ◮ We propose a method for jointly learning discrete and continuous latent variables in an alternating maximization framework . Method 6

Limitation of β -VAE framework ◮ β -VAE sets β > 1 to penalize TC ( z ) for disentangled representations . ◮ However, it penalizes the mutual information( = I ( x, z ) ) between the data and the latent variables. Method 7

Our method ◮ We aim at penalizing TC ( z ) by sequentially penalizing the individual summand I ( z 1 : i − 1 ; z i ) . m � TC ( z ) = I ( z 1 : i − 1 ; z i ) . i =2 Method 8

Our method ◮ We aim at penalizing TC ( z ) by sequentially penalizing the individual summand I ( z 1 : i − 1 ; z i ) . m � TC ( z ) = I ( z 1 : i − 1 ; z i ) . i =2 ◮ We implicitly minimizes each summand, I ( z 1 : i − 1 ; z i ) by sequentially maximizing the left hand side I ( x ; z 1: i ) for all i = 2 , . . . , m 1. I ( x ; z 1: i ) = I ( x ; z 1: i − 1 ) + I ( x ; z i ) − I ( z 1 : i − 1 ; z i ) . ↑ 2. I ( x ; z 1: i ) = I ( x ; z 1: i − 1 ) + I ( x ; z i ) − I ( z 1 : i − 1 ; z i ) . ↑ • ↑ ↓ Method 8

Our method ◮ In practice, we maximize I ( x ; z 1: i ) by minimizing reconstruction term while penalizing z i +1: m with high β ( := β h ) and the others with small β ( := β l ). Method 9

Our method 𝑟 𝜚 𝑨 𝑦 𝑞 𝜄 𝑦 𝑨, 𝑒 𝑨 1 𝑨 𝑗 𝑨 𝑨 𝑜 𝑦 𝑒 𝑦 ො Min cost flow solver 𝛾 ℎ on KL regularizer 𝛾 𝑚 on KL regularizer ◮ Every latent dimensions are heavily penalized with β h . Each penalty on latent dimension is sequentially relieved one at a time with β l in a cascading fashion . Method 10

Graphical model Figure: Graphical models view. Solid lines denote the generative process and the dashed lines denote the inference process . x, z, d denotes the data, continuous latent code, and the discrete latent code respectively. Method 11

Motviation of our method ◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE . Method 12

Motviation of our method ◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE . ◮ Inspired by these findings, our idea is to alternate between finding the most likely discrete configuration of the variables given the continuous factors, and updating the parameters ( φ, θ ) given the discrete configurations. Method 12

Construct unary term 𝑦 (1) ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑦 (1) Method 13

Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑦 (1) Method 13

Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො 𝑦 (1) Method 13

Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො Method 13

Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are rec represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑦 (1) 𝑓 𝑙 𝑦 (1) ො rec 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො rec Method 13

Construct unary term 𝑦 (1) 𝑓 1 𝑦 (1) ො ◮ The discrete latent variables are rec represented using one-hot encodings of each variables d ( i ) ∈ { e 1 , . . . , e S } . 𝑣 1 𝑦 (1) 𝑓 𝑙 ◮ u ( i ) 𝑦 (1) denotes the vector of the likelihood ො θ log p θ ( x ( i ) | z ( i ) , e k ) evaluated at each rec k ∈ [ S ] . 𝑦 (1) 𝑓 𝑇 𝑦 (1) ො rec Method 13

Alternating minimization scheme ◮ Our goal is to maximize the variational lower bound of the following objective, L ( θ, φ ) = I ( x ; [ z, d ]) − β E x ∼ p ( x ) D KL ( q φ ( z | x ) � p ( z )) − λD KL ( q ( d ) � p ( d )) ◮ After rearranging the terms, we arrive at the following optimization problem.    n  � ⊺ d ( i ) − λ ′ �   u ( i ) d ( i ) ⊺ d ( j ) maximize maximize   θ   θ,φ d (1) ,...d ( n )   i =1 i � = j � �� := L LB ( θ,φ ) n � D KL ( q φ ( z | x ( i ) ) || p ( z )) − β i =1 � d ( i ) � 1 = 1 , d ( i ) ∈ { 0 , 1 } S , ∀ i, subject to Method 14

Finding the most likely discrete configuration 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15

Finding the most likely discrete configuration 𝑓 1 𝑦 (1) 𝑓 1 𝑦 (i) 𝑦 (n) 𝑓 1 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15

Finding the most likely discrete configuration 𝑓 1 𝑦 (1) 𝑓 1 𝑦 (i) 𝑦 (n) 𝑓 1 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑓 𝑙 𝑓 𝑙 𝑦 (1) 𝑦 (i) 𝑦 (n) 𝑓 𝑙 𝑦 (1) 𝑦 (i) 𝑦 (n) ො ො ො 𝑦 (1) 𝑦 (i) 𝑦 (n) ◮ With the unary terms, we solve inner maximization problem L LB ( θ, φ ) over the discrete variables [ d (1) , . . . , d ( n ) ] . 1 1 Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations” ICML2018. Method 15

Learning Discrete and Continuous Factors of Data via Alternating - PowerPoint PPT Presentation

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1 Motivation Shape? square Postion x? 0.3 Our goal is to disentangle the underlying

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Continuous Improvement Continuous Improvement Update on Continuous Improvement Process Update on

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Discrete Structures of Computer Science Amanda Watson What is Discrete Mathematics?

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

Nanosphere Lithography via Nanosphere Lithography via Continuous Convective Assembly Continuous

Discrete vs. Continuous Data MDM4U: Mathematics of Data Management Recap Identify the discrete

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

continuous random variables continuous random variables Discrete random variable: takes values in

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Another family of Steiner triple systems without almost parallel classes Daniel Horsley (Monash

Section 4: Maps between groups Matthew Macauley Department of Mathematical Sciences Clemson

Better SMT Proofs for Easier Reconstruction AITP 2019, Obergurgl Austria Haniel Barbosa,

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

Discrete parafermions and quantum-group symmetries Yacine Ikhlef LPTHE (CNRS/Paris-6) joint

Reusing Constraint Proofs in Program Analysis Andrea Aquino , Francesco A. Bianchi ,

EM & Hidden Markov Models CMSC 691 UMBC Recap from last time Expectation Maximization

An Introduction to Z3 Huixing Fang National Trusted Embedded Software Engineering Technology

Learning Discrete and Continuous Factors of Data via Alternating - PowerPoint PPT Presentation

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1 Motivation Shape? square Postion x? 0.3 Our goal is to disentangle the underlying

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Continuous Improvement Continuous Improvement Update on Continuous Improvement Process Update on

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Discrete Structures of Computer Science Amanda Watson What is Discrete Mathematics?

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

Nanosphere Lithography via Nanosphere Lithography via Continuous Convective Assembly Continuous

Discrete vs. Continuous Data MDM4U: Mathematics of Data Management Recap Identify the discrete

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

continuous random variables continuous random variables Discrete random variable: takes values in

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Another family of Steiner triple systems without almost parallel classes Daniel Horsley (Monash

Section 4: Maps between groups Matthew Macauley Department of Mathematical Sciences Clemson

Better SMT Proofs for Easier Reconstruction AITP 2019, Obergurgl Austria Haniel Barbosa,

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 &amp; angr

Discrete parafermions and quantum-group symmetries Yacine Ikhlef LPTHE (CNRS/Paris-6) joint

Reusing Constraint Proofs in Program Analysis Andrea Aquino , Francesco A. Bianchi ,

EM &amp; Hidden Markov Models CMSC 691 UMBC Recap from last time Expectation Maximization

An Introduction to Z3 Huixing Fang National Trusted Embedded Software Engineering Technology

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

EM & Hidden Markov Models CMSC 691 UMBC Recap from last time Expectation Maximization