Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 - PowerPoint PPT Presentation

K-Means Gaussian Mixture Models Expectation-Maximization Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9

K-Means Gaussian Mixture Models Expectation-Maximization Learning Parameters to Probability Distributions • We discussed probabilistic models at length • In assignment 3 you showed that given fully observed training data, setting parameters θ i to probability distributions is straight-forward • However, in many settings not all variables are observed (labelled) in the training data: x i = ( x i , h i ) • e.g. Speech recognition: have speech signals, but not phoneme labels • e.g. Object recognition: have object labels (car, bicycle), but not part labels (wheel, door, seat) • Unobserved variables are called latent variables 20 40 60 80 100 120 140 160 180 figs from Fergus et al.

K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization

K-Means Gaussian Mixture Models Expectation-Maximization Unsupervised Learning • We will start with an unsupervised (a) 2 learning (clustering) problem: • Given a dataset { x 1 , . . . , x N } , each 0 x i ∈ R D , partition the dataset into K clusters − 2 • Intuitively, a cluster is a group of − 2 0 2 points, which are close together and far from others

K-Means Gaussian Mixture Models Expectation-Maximization Distortion Measure (a) 2 • Formally, introduce prototypes (or cluster centers) µ k ∈ R D 0 • Use binary r nk , 1 if point n is in cluster k , − 2 0 otherwise (1-of- K coding scheme − 2 0 2 again) (i) 2 • Find { µ k } , { r nk } to minimize distortion measure: 0 N K � � r nk || x n − µ k || 2 J = −2 n = 1 k = 1 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence

K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance

K-Means Gaussian Mixture Models Expectation-Maximization Determining Cluster Centers • Step 2: fix r nk , minimize J wrt the cluster centers µ k (b) 2 K N r nk || x n − µ k || 2 switch order of sums � � J = 0 k = 1 n = 1 • So we can minimze wrt each µ k separately −2 −2 0 2 • Take derivative, set to zero: (c) 2 N � r nk ( x n − µ k ) = 0 2 0 n = 1 � n r nk x n −2 ⇔ µ k = −2 0 2 � n r nk i.e. mean of datapoints x n assigned to cluster k

K-Means Gaussian Mixture Models Expectation-Maximization K-means Algorithm • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Assign points to nearest cluster center • Minimize J wrt µ k • Set cluster center as average of points in cluster • Rinse and repeat until convergence

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (a) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (b) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (c) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (d) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (e) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (f) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (g) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (h) 2 0 −2 −2 0 2

K-Means Gaussian Mixture Models Expectation-Maximization K-means example (i) 2 0 −2 −2 0 2 Next step doesn’t change membership – stop

K-Means Gaussian Mixture Models Expectation-Maximization K-means Convergence • Repeat steps until no change in cluster assignments • For each step, value of J either goes down, or we stop • Finite number of possible assignments of data points to clusters, so we are guarranteed to converge eventually • Note it may be a local maximum rather than a global maximum to which we converge

K-Means Gaussian Mixture Models Expectation-Maximization K-means Example - Image Segmentation Original image �✂✁☎✄ �✂✁☎✄ �✂✁☎✄✝✆ • K-means clustering on pixel colour values • Pixels in a cluster are coloured by cluster mean • Represent each pixel (e.g. 24-bit colour value) by a cluster number (e.g. 4 bits for K = 10 ), compressed version • This technique known as vector quantization • Represent vector (in this case from RGB, R 3 ) as a single discrete value

K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization

K-Means Gaussian Mixture Models Expectation-Maximization Hard Assignment vs. Soft Assignment • In the K-means algorithm, a hard (i) 2 assignment of points to clusters is made • However, for points near the decision 0 boundary, this may not be such a good idea −2 • Instead, we could think about making a −2 0 2 soft assignment of points to clusters

K-Means Gaussian Mixture Models Expectation-Maximization Gaussian Mixture Model 1 1 (b) (a) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 • The Gaussian mixture model (or mixture of Gaussians MoG) models the data as a combination of Gaussians • Above shows a dataset generated by drawing samples from three different Gaussians

K-Means Gaussian Mixture Models Expectation-Maximization Generative Model 1 (a) z 0.5 0 x 0 0.5 1 • The mixture of Gaussians is a generative model • To generate a datapoint x n , we first generate a value for a discrete variable z n ∈ { 1 , . . . , K } • We then generate a value x n ∼ N ( x | µ k , Σ k ) for the corresponding Gaussian

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 - PowerPoint PPT Presentation

K-Means Gaussian Mixture Models Expectation-Maximization Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture Models Expectation-Maximization Learning Parameters to Probability Distributions We

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Statistical Machine Learning Lecture 06 Extra: Expectation Maximization Kristian Kersting TU

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

On the dual problem of utility maximization Yiqing LIN Joint work with L. GU and J. YANG

CSC304 Lecture 12 Mechanism Design w/ Money: Revenue maximization Myersons Auction CSC304 -

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Not Too Different From Discrete... A Note on Independence For continuous RVs, what is weird about

Week 9 Difference Equations Discrete Math April 23, 2020 Marie Demlova: Discrete Math Cyclic

Introduction to the Numerical Solution of Partial Differential Equations in Finance Claus Munk

Topic #26 Nyquist Stability Theory Reference textbook : Control Systems, Dhanesh N. Manik,

Forward and Flyback (Converters with isolation ) 4.1 Transfer of DC current via transformer 4.2

FMS161/MASM18 Financial Statistics Lecture 2, Linear Time Series Erik Lindstrm Systems with

TRANSFER FUNCTION BASED ON GREENS FUNCTION METHOD (TFBGF) APPLIED TO THE THERMAL PARAMETER