On Casting Importance Weighted Autoencoder to an EM Algorithm to - PowerPoint PPT Presentation

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models D.Kim 1 and J.Hwang 2 and Y.Kim 1 Speaker : Dongha Kim 1 Department of Statistics, Seoul National University, South Korea 2 SK Telecom, South Korea November 07, 2019 XAIENCE 2019 November 07, 2019 1 / 31

Introduction Outline 1 Introduction 2 Proposed methods IWAE as EM algorithm IWEM miss-IWAE 3 Empirical analysis 4 Summary 5 References XAIENCE 2019 November 07, 2019 2 / 31

Introduction Deep generative model with latent variable • X : observable variable • Z : latent variable ∼ p ( z ) (ex: N ( 0 , I )) Z X | Z = z ∼ p ( x | z ; θ ) XAIENCE 2019 November 07, 2019 3 / 31

Introduction Deep generative model with latent variable • The log-likelihood of the observable vector x : � log p ( x ; θ ) = log p ( x | z ; θ ) p ( z ) d z . • Marginalizing operation is problematic. → Hard to estimate MLE directly. • An alternative approach: • Calculate lower bound which is easy to compute. • VAE (Kingma and Welling, 2013; Rezende et al., 2014) � IWAE (Burda et al., 2015) XAIENCE 2019 November 07, 2019 4 / 31

Introduction Variational autoencoders (VAE) • Employ a variational posterior distribution q ( z | x ; φ ) : � � p ( x , z ; θ ) �� L VAE ( x ; θ, φ ) := E z ∼ q log q ( z | x ; φ ) • In practice, we use the Monte Carlo method: L � p ( x , z l ; θ ) � L VAE ( x ; θ, φ ) := 1 ˆ � log , L q ( z l | x ; φ ) l = 1 where z 1 , ..., z L ∼ q ( z | x ; φ ) . L VAE w.r.t. ( θ, φ ) . • Maximize ˆ XAIENCE 2019 November 07, 2019 5 / 31

Introduction Importance weighted autoencoders (IWAE) • Use multiple samples from q ( z | x ; φ ) . � � K �� p ( x , z k ; θ ) 1 � L IWAE ( x ; θ, φ ) := E z 1 ,..., z K ∼ q log q ( z k | x ; φ ) K k = 1 • More tight lower bound than VAE. • Use the Monte Carlo method: � K � 1 p ( x , z k ; θ ) L IWAE ( x ; θ, φ ) := log ˆ � , K q ( z k | x ; φ ) k = 1 where z 1 , ..., z K ∼ q ( z | x ; φ ) . L IWAE w.r.t. ( θ, φ ) . • Maximize ˆ XAIENCE 2019 November 07, 2019 6 / 31

Introduction Contents • Interpret IWAE as an EM algorithm with importance sampling (IS). • Develop IWAE by 1 learning the proposal distribution carefully 2 and devising an annealing strategy. → IWEM (importance weighted EM algorithm ) • Generalize IWEM for missing data problems. → miss-IWEM XAIENCE 2019 November 07, 2019 7 / 31

Proposed methods Outline 1 Introduction 2 Proposed methods IWAE as EM algorithm IWEM miss-IWAE 3 Empirical analysis 4 Summary 5 References XAIENCE 2019 November 07, 2019 8 / 31

Proposed methods IWAE as EM algorithm Outline 1 Introduction 2 Proposed methods IWAE as EM algorithm IWEM miss-IWAE 3 Empirical analysis 4 Summary 5 References XAIENCE 2019 November 07, 2019 9 / 31

Proposed methods IWAE as EM algorithm EM algorithm 1 E-step • θ c : the current estimate of θ . • Calculate the expected value of the complete log likelihood function: Q ( θ | θ c ; x ) := E z ∼ p ( z | x ; θ c ) [ log p ( x , z ; θ )] . 2 M-step • Update the current estimate by maximizing Q ( θ | θ c ; x ) . XAIENCE 2019 November 07, 2019 10 / 31

Proposed methods IWAE as EM algorithm EM algorithm with IS 1 E-step • Approximate Q by employing a proposal distribution q ( z | x ; φ ) : K w k ˆ � Q ( θ | θ c , φ ; x ) := k ′ = 1 w k ′ · log p ( x , z k ; θ ) � K k = 1 where z k ∼ q ( z | x ; φ ) and w k = p ( x , z k ; θ c ) q ( z k | x ; φ ) for k = 1 , ..., K . 2 M-step • Update θ by maximizing ˆ Q ( θ | θ c , φ ; x ) . 3 P-step (if necessary) • Update φ by encouraging q ( z | x ; φ ) to be a good proposal distribution. XAIENCE 2019 November 07, 2019 11 / 31

Proposed methods IWAE as EM algorithm IWAE = EM algorithm Proposition 1. The following equality holds for any θ c : � θ = θ c = ∇ θ ˆ ∇ θ ˆ L IWAE ( x ; θ, φ ) � Q ( θ | θ c , φ ; x ) � � � θ = θ c � IWAE = EM algorithm • if we use GD based optimization method. • Updating φ in IWAE can be understood as P-step : ˆ L IWAE ( x ; θ c , φ ) max φ XAIENCE 2019 November 07, 2019 12 / 31

Proposed methods IWAE as EM algorithm IWAE = EM algorithm (cont.) IWAE as EM algorithm 1 E-step • Calculate ˆ Q ( θ | θ c , φ ; x ) 2 M-step • Update θ by maximizing ˆ Q ( θ | θ c , φ ; x ) . 3 P-step • Update φ by maximizing ˆ L IWAE ( x ; θ c , φ ) . XAIENCE 2019 November 07, 2019 13 / 31

Proposed methods IWEM Outline 1 Introduction 2 Proposed methods IWAE as EM algorithm IWEM miss-IWAE 3 Empirical analysis 4 Summary 5 References XAIENCE 2019 November 07, 2019 14 / 31

Proposed methods IWEM Optimal P-step � Using ˆ Q inevitably causes variance due to IS. • Small variance → stable learning procedure • The optimal proposal distribution (Owen, 2013): q opt ( z ) | log p ( x , z ; θ c ) | · p ( x , z ; θ c ) . ∝ � IWAE uses p ( x , z ; θ c ) . • New P-step: replace p ( x , z ; θ c ) in IWAE to q opt ( z ) : � K � q opt ( z k ) 1 � L opt ( θ c , φ ; x ) := log ˆ . q ( z k | x ; φ ) K k = 1 XAIENCE 2019 November 07, 2019 15 / 31

Proposed methods IWEM Annealing strategy • In general, � � � ˆ ˆ L VAE ( x ; θ, φ ) � Var ≪ Var Q ( θ | θ, φ ; x ) . • Using VAE at early steps → small variance • New E-step : take a convex combination with VAE: Q α ( θ | θ c , φ ; x ) := α · ˆ ˆ Q ( θ | θ c , φ ; x ) + ( 1 − α ) · ˆ L VAE ( θ, φ ; x ) . • α ∈ [ 0 , 1 ] : annealing controller • start from zero and • increase it incrementally up to one as the iteration proceeds. XAIENCE 2019 November 07, 2019 16 / 31

Proposed methods IWEM IWAE vs. IWEM IWAE IWEM 1 E-step 1 E-step • Calculate ˆ • Calculate ˆ Q ( θ | θ c , φ ; x ) Q α ( θ | θ c , φ ; x ) 2 M-step 2 M-step • Update θ by maximizing ˆ • Update θ by maximizing ˆ Q α . Q . 3 P-step 3 P-step • Update φ by maximizing • Update φ by maximizing ˆ ˆ L IWAE ( x ; θ c , φ ) . L opt ( θ c , φ ; x ) . XAIENCE 2019 November 07, 2019 17 / 31

Proposed methods miss-IWAE Outline 1 Introduction 2 Proposed methods IWAE as EM algorithm IWEM miss-IWAE 3 Empirical analysis 4 Summary 5 References XAIENCE 2019 November 07, 2019 18 / 31

Proposed methods miss-IWAE Missing data problem • x = ( x ( o ) , x ( m ) ) • We only observe x ( o ) . • The log-likelihood is � log p ( x ( o ) ; θ ) = log p ( x ( o ) , x ( m ) , z ; θ ) d z d x ( m ) � Need to formulate a proposal distribution of ( x ( m ) , z ) . XAIENCE 2019 November 07, 2019 19 / 31

Proposed methods miss-IWAE Formulation of proposal distribution • We use the following proposal distribution: q ( x ( m ) , z | x ( o ) ; θ, φ ) := p ( x ( m ) | z ; θ ) · q ( z | ˘ x ; φ ) • q ( z | x ; φ ) : same distribution as q in IWEM. x = ( x ( o ) , ˘ x ( m ) ) • ˘ x ( m ) : imputed value of x ( m ) . • ˘ z from the distribution q ( z | ( x ( o ) , 0 ); φ ) , • Draw ˘ x ( m ) from the distribution p ( x ( m ) | ˘ • and draw ˘ z ; θ ) . XAIENCE 2019 November 07, 2019 20 / 31

Proposed methods miss-IWAE miss-IWEM � Simply replaces q ( z | x ; φ ) in IWEM to q ( x ( m ) , z | x ( o ) ; θ, φ ) . 1 E-step • Calculate ˆ m ( θ | θ c , φ ; x ( o ) ) := α · ˆ Q α Q m ( θ | θ c , φ ; x ( o ) ) + ( 1 − α ) · ˆ L VAE ( θ, φ ; x ( o ) ) . m 2 M-step • Update θ by maximizing ˆ m ( θ | θ c , φ ; x ( o ) ) . Q α 3 P-step • Update φ by maximizing ˆ L opt m ( θ c , φ ; x ( o ) ) . XAIENCE 2019 November 07, 2019 21 / 31

Empirical analysis Outline 1 Introduction 2 Proposed methods IWAE as EM algorithm IWEM miss-IWAE 3 Empirical analysis 4 Summary 5 References XAIENCE 2019 November 07, 2019 22 / 31

Empirical analysis Experimental setup • Model • p ( z ) : N ( 0 40 , I 40 ) • ( p ( x | z ; θ ) , q ( z | x ; φ )) : (MLP, MLP) or (DeConv, Conv) • Optimization algorithm • Adam (Kingma and Ba, 2014) • Performance measure • Approximated test log-likelihood. • Datasets • Static biMNIST, Dynamic biMNIST, Omniglot, Caltech 101 Silhouette XAIENCE 2019 November 07, 2019 23 / 31

Empirical analysis Complete data analysis Performance results IWEM-woa 1 MLP VAE IWAE IWEM sta. MNIST -88.21 -87.68 -87.00 -87.11 dyn. MNIST -85.31 -84.30 -84.10 -84.16 Omniglot -108.46 -106.80 -106.50 -106.38 Caltech 101 -119.67 -118.06 -116.92 -116.54 IWEM-woa 1 CNN VAE IWAE IWEM sta. MNIST -84.63 -83.54 -83.32 -83.77 dyn. MNIST -84.08 -81.56 -81.07 -81.28 Omniglot -101.63 -100.27 -100.15 -100.39 Caltech 101 -109.24 -106.94 -106.19 -106.05 1 IWEM w/o annealing strategy XAIENCE 2019 November 07, 2019 24 / 31

Empirical analysis Incomplete data analysis Generation of missing samples 1 Divide an image into 9 equal patches. 2 Generate an incomplete image by removing the predefined number of patches randomly. XAIENCE 2019 November 07, 2019 25 / 31

Empirical analysis Incomplete data analysis (cont.) Performance results • Static biMNIST + (MLP, MLP) • Missing rate ↑ ⇒ margin ↑ # of cropped missIWAE 2 miss-IWEM-woa 3 miss-IWEM patches 3 -90.29 -89.79 -89.71 4 -92.07 -90.97 -90.76 5 -95.54 -93.33 -92.23 6 -102.26 -97.66 -95.18 2 Mattei and Frellsen (2018) 3 miss-IWEM w/o annealing strategy XAIENCE 2019 November 07, 2019 26 / 31

On Casting Importance Weighted Autoencoder to an EM Algorithm to - PowerPoint PPT Presentation

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models D.Kim 1 and J.Hwang 2 and Y.Kim 1 Speaker : Dongha Kim 1 Department of Statistics, Seoul National University, South Korea 2 SK Telecom, South Korea

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Casting Simulation Technology Julian Gnz Application Specialist Casting, CD-adapco Casting

Casting Simulations with STAR-Cast Julian Gnz, CD-adapco Need for Casting Simulation

CSE 333 Section 6 HW3 Overview, Casting 1 Section Plan Casting HW 3 Overview 3

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang

via STAR-Cast Santhanu Jana, Access e.V., Aachen Talk overview Centrifugal casting

SLIP-CASTING a ceramic forming technique WHAT IS SLIP-CASTING?

Evolution of the Aluminium High Pressure Die Casting in India CONTENTS INDIA - A VIBRANT ECONOMY

Macro Scale Casting Simulation with STAR-Cast J.Gnz CD-adapco Casting Systems Product

Computational Geometry Lecture 5: Casting a polyhedron Computational Geometry Lecture 5: Casting

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Join Indias First Indigenous Investment Casting foundry exclusively for Aerospace, Defence

Join Indias First Indigenous Investment Casting foundry exclusively for Aerospace, Defence

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Learning dynamical systems with particle stochastic approximation EM Fredrik Lindsten, Link

The EM Algorithm 0.6 s 1 {A: .3 ,B: .2 ,C: .5 } 0.30.3 0.20.10.3 p ( O | ) o 1 ,o 2

K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data

High Dimensional Data Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter 2012 UCSD

Learning a Belief Network If you know the structure have observed all of the variables

Unsupervised Learning About this class Build a model for your data. Which datapoints

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

On Casting Importance Weighted Autoencoder to an EM Algorithm to - PowerPoint PPT Presentation

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models D.Kim 1 and J.Hwang 2 and Y.Kim 1 Speaker : Dongha Kim 1 Department of Statistics, Seoul National University, South Korea 2 SK Telecom, South Korea

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Casting Simulation Technology Julian Gnz Application Specialist Casting, CD-adapco Casting

Casting Simulations with STAR-Cast Julian Gnz, CD-adapco Need for Casting Simulation

CSE 333 Section 6 HW3 Overview, Casting 1 Section Plan Casting HW 3 Overview 3

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Lecture 8: Autoencoder &amp; DBM Princeton University COS 495 Instructor: Yingyu Liang

via STAR-Cast Santhanu Jana, Access e.V., Aachen Talk overview Centrifugal casting

SLIP-CASTING a ceramic forming technique WHAT IS SLIP-CASTING?

Evolution of the Aluminium High Pressure Die Casting in India CONTENTS INDIA - A VIBRANT ECONOMY

Macro Scale Casting Simulation with STAR-Cast J.Gnz CD-adapco Casting Systems Product

Computational Geometry Lecture 5: Casting a polyhedron Computational Geometry Lecture 5: Casting

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Join Indias First Indigenous Investment Casting foundry exclusively for Aerospace, Defence

Join Indias First Indigenous Investment Casting foundry exclusively for Aerospace, Defence

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Learning dynamical systems with particle stochastic approximation EM Fredrik Lindsten, Link

The EM Algorithm 0.6 s 1 {A: .3 ,B: .2 ,C: .5 } 0.30.3 0.20.10.3 p ( O | ) o 1 ,o 2

K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data

High Dimensional Data Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter 2012 UCSD

Learning a Belief Network If you know the structure have observed all of the variables

Unsupervised Learning About this class Build a model for your data. Which datapoints

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang