Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Mixture Models/EM 1 / 29
Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 2 / 29
Introduction K-means Clustering MoG Summary Introduction In many cases the uni-modal assumption of a normal distribution is a major challenge. I.e. handling multiple hypotheses or modelling of multiple instances - people, ... Mixtures of Gaussians are a way to model richer distributions The mixture of Gaussians can be considered a model with latent variables. Expectation Maximization (EM) is a general technique to find maximum likelihood estimators for models with latent variables Mixture models are widely used for clustering of data K-means is another clustering technique that has similarities to EM Henrik I. Christensen (RIM@GT) Mixture Models/EM 3 / 29
Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 4 / 29
Introduction K-means Clustering MoG Summary K-means clustering Consider clustering of data { x 1 , x 2 , ..., x N } into K groups. Assume for now that each data point is in a D-dimensional Euclidean space Each cluster is represented by a “center” estimate µ i Challenge: how to find an optimal assignment of data to clusters? Have an indicator variable r ni ∈ { 0 , 1 } Named the 1-of-K coding Henrik I. Christensen (RIM@GT) Mixture Models/EM 5 / 29
Introduction K-means Clustering MoG Summary K-means - Objective Function We can then define an objective function / distortion measure N K � � r ni || x n − µ i || 2 J = n =1 i =1 Basically the squared distance to the “centres” Goal: r ni the optimal assignment to clusters µ i the centers of clusters to minimize J Henrik I. Christensen (RIM@GT) Mixture Models/EM 6 / 29
Introduction K-means Clustering MoG Summary Iterative Algorithm 1 Choose initial values for µ i 2 Minimize J wrt r ni 3 Minimize J wrt µ i 4 Repeat 2 - 3 until convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 7 / 29
Introduction K-means Clustering MoG Summary Algorithm details Consider the indicator � 1 i = arg min j || x n − µ j || 2 if r ni = 0 otherwise Extremum for J is then defined by N � 2 r ni ( x n − µ i ) = 0 n =1 or � n r ni x n µ i = � n r ni So µ is the mean of the k th cluster thus the name k-means Henrik I. Christensen (RIM@GT) Mixture Models/EM 8 / 29
Introduction K-means Clustering MoG Summary Small Example (a) (b) (c) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (d) (e) (f) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (g) (h) (i) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 9 / 29
Introduction K-means Clustering MoG Summary Objective Function 1000 J 500 0 1 2 3 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 10 / 29
Introduction K-means Clustering MoG Summary Considerations The iteration over all data points in each iteration can be a challenge The “smart” selection of candidate points for the cluster centers is important. Even uniform distribution could be ok. Organization of data is graph/mesh can be essential for efficient access / handling of data Henrik I. Christensen (RIM@GT) Mixture Models/EM 11 / 29
Introduction K-means Clustering MoG Summary Iterative Updating Sequential updating can be organized with: µ new = µ old + η t ( x n − µ old ) i i i Where η t is the learning rate and it typically decreases as more points are considered. Henrik I. Christensen (RIM@GT) Mixture Models/EM 12 / 29
Introduction K-means Clustering MoG Summary Generalization of K-means In general the Euclidean norm might not always be optimal The generalized version of the objective / distortion function is N K � � J = r ni D ( x n , µ i ) n =1 i =1 Here D ( ., . ) is a dissimilarity measure that even might handle robust outlier rejection. Henrik I. Christensen (RIM@GT) Mixture Models/EM 13 / 29
Introduction K-means Clustering MoG Summary Example of clustering - Image Compression K=2 K=3 K=10 Original Original image K = 2 K = 3 K = 10 Henrik I. Christensen (RIM@GT) Mixture Models/EM 14 / 29
Introduction K-means Clustering MoG Summary Example of clustering - Image Compression K=2 K=3 K=10 Original Henrik I. Christensen (RIM@GT) Mixture Models/EM 15 / 29
Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 16 / 29
Introduction K-means Clustering MoG Summary Mixtures of Gaussians Recall the original definition of mixtures K � p ( x ) = π i N ( x | µ k Σ k ) i =1 Define an indicator variable z i that is characterized by z k ∈ { 0 , 1 } Only one of the dimensions has unit value � k z k = 1 Assume p ( x , z ) and a conditional p ( x | z ) We can then assume p ( z k = 1) = π k Henrik I. Christensen (RIM@GT) Mixture Models/EM 17 / 29
Introduction K-means Clustering MoG Summary The parameterization We have for { π i } that 0 ≤ π i ≤ 1 � i π i = 1 p ( z ) can be be considered K � π z i p ( z ) = i i =1 Similarly p ( x | z i = 1) = N ( x | µ i , Σ i ) or K � N ( x | µ i , Σ i ) z i p ( x | z ) = i =1 ⇒ K � � p ( x ) = p ( z ) p ( x | z ) = π i N ( x | µ i , Σ i ) z i =1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 18 / 29
Introduction K-means Clustering MoG Summary Mixtures So why all the extra stuff? We can think of p ( x ) as an observation over a joint distribution p ( x , z ) where z is a latent variable. For reference introduce p ( z i = 1 | x ) also denoted γ ( z i ) π i N ( x | µ i , Σ i ) γ ( z i ) = p ( z i = 1 | x ) = � j π j N ( x | µ j , Σ j ) Henrik I. Christensen (RIM@GT) Mixture Models/EM 19 / 29
Introduction K-means Clustering MoG Summary Data Example 1 1 1 (a) (b) (c) 0.5 0.5 0.5 0 0 0 0 0.5 1 0 0.5 1 0 0.5 1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 20 / 29
Introduction K-means Clustering MoG Summary Maximum Likelihood Suppose we have a dataset X = { x 1 , x 2 , ..., x N } How can we model it using a mixture model? � K N � � � ln p ( X | π, µ, Σ) = ln π i N ( x j | µ i , Σ i ) j =1 i =1 z n π x n µ Σ N Henrik I. Christensen (RIM@GT) Mixture Models/EM 21 / 29
Introduction K-means Clustering MoG Summary EM for Gaussian Mixures Consider the extremum for the ln p () N π i N ( x i | µ k , Σ k ) � j π j N ( x i | µ j , Σ j )Σ − 1 k ( x i − µ k ) = 0 � i =1 ⇒ N µ k = 1 � γ ( z ik ) x i N k i =1 where � N k = γ ( z ik ) i Henrik I. Christensen (RIM@GT) Mixture Models/EM 22 / 29
Introduction K-means Clustering MoG Summary EM for Gaussian Mixures In a similar fashion we can compute the co-variance N Σ k = 1 � γ ( z ik )( x i − µ k )( x i − µ k ) T N k i =1 If we maximize wrt to mixing ( π i ) we need to optimize ln p but also consider the constraint � π = 1 Using a Lagrange multiplier we have � K � � ln p ( X | π, µ, Σ) + λ π i − 1 i =1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 23 / 29
Introduction K-means Clustering MoG Summary EM for Gaussian Mixtures We obtain N N ( x i | µ k , Σ k ) � j π j N ( x i | µ j , Σ j ) + λ � i =1 Which creates the intuitive solution π i = N i N Henrik I. Christensen (RIM@GT) Mixture Models/EM 24 / 29
Introduction K-means Clustering MoG Summary EM for Gaussian Mixtures Select a set of values for π , µ , and Σ Perform an initial analysis (expectation) Re-estimate the values (maximize the likelihood) Iterate Henrik I. Christensen (RIM@GT) Mixture Models/EM 25 / 29
Introduction K-means Clustering MoG Summary The detailed version 1 Initialize parameters 2 Evaluate (E Step) π i N ( x | µ i , Σ i ) γ ( z i ) = p ( z i = 1 | x ) = � j π j N ( x | µ j , Σ j ) 3 Re-estimate parameters, µ new , Σ new and π new k k k 4 Evaluate ln p ( X | π, µ, Σ) and check for convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 26 / 29
Introduction K-means Clustering MoG Summary Small Example 2 2 2 2 L = 1 0 0 0 −2 −2 −2 (a) (b) (c) −2 0 2 −2 0 2 −2 0 2 2 2 2 L = 2 L = 5 L = 20 0 0 0 −2 −2 −2 (d) (e) (f) −2 0 2 −2 0 2 −2 0 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 27 / 29
Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 28 / 29
Recommend
More recommend