K-Means Gaussian Mixture Models Expectation-Maximization Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9
K-Means Gaussian Mixture Models Expectation-Maximization Learning Parameters to Probability Distributions • We discussed probabilistic models at length • In assignment 3 you showed that given fully observed training data, setting parameters θ i to probability distributions is straight-forward • However, in many settings not all variables are observed (labelled) in the training data: x i = ( x i , h i ) • e.g. Speech recognition: have speech signals, but not phoneme labels • e.g. Object recognition: have object labels (car, bicycle), but not part labels (wheel, door, seat) • Unobserved variables are called latent variables 20 40 60 80 100 120 140 160 180 figs from Fergus et al.
K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization
K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization
K-Means Gaussian Mixture Models Expectation-Maximization Unsupervised Learning • We will start with an unsupervised (a) 2 learning (clustering) problem: • Given a dataset { x 1 , . . . , x N } , each 0 x i ∈ R D , partition the dataset into K clusters − 2 • Intuitively, a cluster is a group of − 2 0 2 points, which are close together and far from others
K-Means Gaussian Mixture Models Expectation-Maximization Distortion Measure (a) 2 • Formally, introduce prototypes (or cluster centers) µ k ∈ R D 0 • Use binary r nk , 1 if point n is in cluster k , − 2 0 otherwise (1-of- K coding scheme − 2 0 2 again) (i) 2 • Find { µ k } , { r nk } to minimize distortion measure: 0 N K � � r nk || x n − µ k || 2 J = −2 n = 1 k = 1 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence
K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence
K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence
K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance
K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance
K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance
K-Means Gaussian Mixture Models Expectation-Maximization Determining Cluster Centers • Step 2: fix r nk , minimize J wrt the cluster centers µ k (b) 2 K N r nk || x n − µ k || 2 switch order of sums � � J = 0 k = 1 n = 1 • So we can minimze wrt each µ k separately −2 −2 0 2 • Take derivative, set to zero: (c) 2 N � r nk ( x n − µ k ) = 0 2 0 n = 1 � n r nk x n −2 ⇔ µ k = −2 0 2 � n r nk i.e. mean of datapoints x n assigned to cluster k
K-Means Gaussian Mixture Models Expectation-Maximization Determining Cluster Centers • Step 2: fix r nk , minimize J wrt the cluster centers µ k (b) 2 K N r nk || x n − µ k || 2 switch order of sums � � J = 0 k = 1 n = 1 • So we can minimze wrt each µ k separately −2 −2 0 2 • Take derivative, set to zero: (c) 2 N � r nk ( x n − µ k ) = 0 2 0 n = 1 � n r nk x n −2 ⇔ µ k = −2 0 2 � n r nk i.e. mean of datapoints x n assigned to cluster k
K-Means Gaussian Mixture Models Expectation-Maximization K-means Algorithm • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Assign points to nearest cluster center • Minimize J wrt µ k • Set cluster center as average of points in cluster • Rinse and repeat until convergence
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (a) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (b) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (c) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (d) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (e) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (f) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (g) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (h) 2 0 −2 −2 0 2
K-Means Gaussian Mixture Models Expectation-Maximization K-means example (i) 2 0 −2 −2 0 2 Next step doesn’t change membership – stop
K-Means Gaussian Mixture Models Expectation-Maximization K-means Convergence • Repeat steps until no change in cluster assignments • For each step, value of J either goes down, or we stop • Finite number of possible assignments of data points to clusters, so we are guarranteed to converge eventually • Note it may be a local maximum rather than a global maximum to which we converge
K-Means Gaussian Mixture Models Expectation-Maximization K-means Example - Image Segmentation Original image �✂✁☎✄ �✂✁☎✄ �✂✁☎✄✝✆ • K-means clustering on pixel colour values • Pixels in a cluster are coloured by cluster mean • Represent each pixel (e.g. 24-bit colour value) by a cluster number (e.g. 4 bits for K = 10 ), compressed version • This technique known as vector quantization • Represent vector (in this case from RGB, R 3 ) as a single discrete value
K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization
K-Means Gaussian Mixture Models Expectation-Maximization Hard Assignment vs. Soft Assignment • In the K-means algorithm, a hard (i) 2 assignment of points to clusters is made • However, for points near the decision 0 boundary, this may not be such a good idea −2 • Instead, we could think about making a −2 0 2 soft assignment of points to clusters
K-Means Gaussian Mixture Models Expectation-Maximization Gaussian Mixture Model 1 1 (b) (a) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 • The Gaussian mixture model (or mixture of Gaussians MoG) models the data as a combination of Gaussians • Above shows a dataset generated by drawing samples from three different Gaussians
K-Means Gaussian Mixture Models Expectation-Maximization Generative Model 1 (a) z 0.5 0 x 0 0.5 1 • The mixture of Gaussians is a generative model • To generate a datapoint x n , we first generate a value for a discrete variable z n ∈ { 1 , . . . , K } • We then generate a value x n ∼ N ( x | µ k , Σ k ) for the corresponding Gaussian
Recommend
More recommend