Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) EM 1 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 2 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Introduction Last time we discussed mixture models Use of K-means and EM as a way to partition data More generally estimation of latent variables such as class membership Today a few other perspectives on EM will be discussed. Henrik I. Christensen (RIM@GT) EM 3 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 4 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An alternative view Find ML solution to model with latent variables Have a set of observed variables - X Have a set of latent variables - Z Have a set of model parameters - θ Our criteria function is �� � ln p ( X | θ ) = ln p ( X , Z | θ ) Z Unfortunately sum inside ln expression Henrik I. Christensen (RIM@GT) EM 5 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An anternative view If Z was observed or known it would be simpler If { X, Z } - the complete set was known - great X alone is considered an incomplete dataset. However we can compute / estimate p ( X | Z , θ ) Iteratively we can update Z to be a good estimate of the distribution. The estimate of Z can be used to update the model parameters Henrik I. Christensen (RIM@GT) EM 6 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An alternative view 1 Choose initial value of θ old 2 E-Step: Compute p ( Z | X , θ old ) 3 M-Step: Compute θ new θ new = arg max Q ( θ, θ old ) θ where (complete data log likelihood is) Q ( θ, θ old ) = � p ( Z | x , θ old ) ln P ( X , Z | θ ) Z 4 Check for convergence - return to 2 if not done. Henrik I. Christensen (RIM@GT) EM 7 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 8 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Mixtures of Bernoulli Distributions What if we had a mixture of discrete random variables? Consider a Bernoulli example X is here described by D binary variables, x i , controlled by the average µ i , ie. D � µ x i i (1 − µ i ) (1 − x i ) p ( x | µ ) = i =1 then we have E [ x ] = µ cov [ c ] = diag { µ i (1 − µ i ) } Henrik I. Christensen (RIM@GT) EM 9 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Mixtures of Bernoulli Distributions A mixture would then be K � p ( x | µ, π ) = π k p ( x | µ k ) k =1 and K � E [ x ] = π k µ k k =1 K � � � Σ k + µ k µ T − E [ x ] E [ x ] T cov [ x ] = π k k k =1 Our objective function would be � K N � � � ln p ( X | µ, π ) = ln π k p ( x i | µ k ) i =1 k =1 Henrik I. Christensen (RIM@GT) EM 10 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bernoulli Mixtures If we have an unobserved latent variable, z. K � p ( x | µ k ) z k p ( x | z , µ ) = k =1 and the mixture of variables K � π z k p ( z | π ) = k k =1 The objective function is then � � N K D � � � ln p ( X , Z | µ, π ) = ln π k + [ x ni ln µ ki + (1 − x ni ) ln(1 − µ ki )] z nk n =1 k =1 i =1 Henrik I. Christensen (RIM@GT) EM 11 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bernoulli Mixtures As before we can compute the responsibility γ π k p ( x n | µ k ) γ ( z nk ) = E [ z nk ] = � K j =1 π j p ( x n | µ j ) From this we can derive a structure as seen earlier N � = γ ( z nk ) N k n =1 N 1 � µ k = ¯ x k = γ ( z nk ) x n N k n =1 N k π k = N Henrik I. Christensen (RIM@GT) EM 12 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Small Bernoulli Mixture Example Henrik I. Christensen (RIM@GT) EM 13 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 14 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Bayesian Linear Regression We have p ( w | t ) = N ( w | m N , S N ) where S N ( S − 1 0 m 0 + α Φ T t ) m N = S − 1 S − 1 + β ΦΦ T = N 0 The log-likelihood is when ln p ( t , w | α, β ) = ln p ( t | w , β ) + ln p ( w | α ) Henrik I. Christensen (RIM@GT) EM 15 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bayesian Linear Regression In the E step - compute posterior for w In the M step - compute α and β given w We can derive (see book) M α = m T N m n + Tr ( S N ) and a similar expression for β For responsibility we get likewise γ = M − α Tr ( S N ) Henrik I. Christensen (RIM@GT) EM 16 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 17 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM The general problem we are trying to address is We have a set of observed variables - X We have a set of latent variables - Z We have a model parameter set - θ Goal to maximize p ( X | θ ) Assumption: Hard to optimize p ( X | θ ) directly Easier to optimize p ( X , Z | θ ) Lets assume we can define a distribution q ( Z ) Henrik I. Christensen (RIM@GT) EM 18 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM We are trying to optimize ln p ( X | θ ) = ln p ( X , Z | θ ) − ln p ( Z | X , θ ) We can rewrite this to ln p ( X | θ ) = L ( q , θ ) + KL ( q || p ) where � p ( X , Z | θ ) � � L ( q , θ ) = q ( Z ) ln q ( Z ) Z � p ( Z | X , θ ) � � KL ( q || p ) = − q ( Z ) ln q ( Z ) Z So L ( q , θ ) is an estimate of the joint distribution and KL is the Kullback-Leibler comparison of q ( Z ) to p ( Z | X , θ ). Henrik I. Christensen (RIM@GT) EM 19 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM We can now formulate the general algorithm The E-step is used for maximization of L ( q , θ ) with a fixed θ The M-step allow optimization of L ( . ) wrt θ Henrik I. Christensen (RIM@GT) EM 20 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 21 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Summary Expectation maximization is widely used in robotics and estimation in general Basically iterative generation of a model and optimization of the model Particularly useful for estimation with mixture models - optimize the models and the mixture coefficients iteratively rather than in batch An important tool to have available for estimation and learning Henrik I. Christensen (RIM@GT) EM 22 / 23
Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A useful reference M. J. Wainwright & M. Jordan, Graphical Models, Exponential Families and Variational Inference , Foundations and Trends in Machine Learning , No 1-2, Vol 1., 2008 http://www.nowpublishers.com/product.aspx?product=MAL Henrik I. Christensen (RIM@GT) EM 23 / 23
Recommend
More recommend