Mixture Models and EM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Mixture Models/EM 1 / 29

Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 2 / 29

Introduction K-means Clustering MoG Summary Introduction In many cases the uni-modal assumption of a normal distribution is a major challenge. I.e. handling multiple hypotheses or modelling of multiple instances - people, ... Mixtures of Gaussians are a way to model richer distributions The mixture of Gaussians can be considered a model with latent variables. Expectation Maximization (EM) is a general technique to find maximum likelihood estimators for models with latent variables Mixture models are widely used for clustering of data K-means is another clustering technique that has similarities to EM Henrik I. Christensen (RIM@GT) Mixture Models/EM 3 / 29

Introduction K-means Clustering MoG Summary K-means clustering Consider clustering of data { x 1 , x 2 , ..., x N } into K groups. Assume for now that each data point is in a D-dimensional Euclidean space Each cluster is represented by a “center” estimate µ i Challenge: how to find an optimal assignment of data to clusters? Have an indicator variable r ni ∈ { 0 , 1 } Named the 1-of-K coding Henrik I. Christensen (RIM@GT) Mixture Models/EM 5 / 29

Introduction K-means Clustering MoG Summary K-means - Objective Function We can then define an objective function / distortion measure N K � � r ni || x n − µ i || 2 J = n =1 i =1 Basically the squared distance to the “centres” Goal: r ni the optimal assignment to clusters µ i the centers of clusters to minimize J Henrik I. Christensen (RIM@GT) Mixture Models/EM 6 / 29

Introduction K-means Clustering MoG Summary Iterative Algorithm 1 Choose initial values for µ i 2 Minimize J wrt r ni 3 Minimize J wrt µ i 4 Repeat 2 - 3 until convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 7 / 29

Introduction K-means Clustering MoG Summary Algorithm details Consider the indicator � 1 i = arg min j || x n − µ j || 2 if r ni = 0 otherwise Extremum for J is then defined by N � 2 r ni ( x n − µ i ) = 0 n =1 or � n r ni x n µ i = � n r ni So µ is the mean of the k th cluster thus the name k-means Henrik I. Christensen (RIM@GT) Mixture Models/EM 8 / 29

Introduction K-means Clustering MoG Summary Small Example (a) (b) (c) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (d) (e) (f) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (g) (h) (i) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 9 / 29

Introduction K-means Clustering MoG Summary Objective Function 1000 J 500 0 1 2 3 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 10 / 29

Introduction K-means Clustering MoG Summary Considerations The iteration over all data points in each iteration can be a challenge The “smart” selection of candidate points for the cluster centers is important. Even uniform distribution could be ok. Organization of data is graph/mesh can be essential for efficient access / handling of data Henrik I. Christensen (RIM@GT) Mixture Models/EM 11 / 29

Introduction K-means Clustering MoG Summary Iterative Updating Sequential updating can be organized with: µ new = µ old + η t ( x n − µ old ) i i i Where η t is the learning rate and it typically decreases as more points are considered. Henrik I. Christensen (RIM@GT) Mixture Models/EM 12 / 29

Introduction K-means Clustering MoG Summary Generalization of K-means In general the Euclidean norm might not always be optimal The generalized version of the objective / distortion function is N K � � J = r ni D ( x n , µ i ) n =1 i =1 Here D ( ., . ) is a dissimilarity measure that even might handle robust outlier rejection. Henrik I. Christensen (RIM@GT) Mixture Models/EM 13 / 29

Introduction K-means Clustering MoG Summary Example of clustering - Image Compression K=2 K=3 K=10 Original Original image K = 2 K = 3 K = 10 Henrik I. Christensen (RIM@GT) Mixture Models/EM 14 / 29

Introduction K-means Clustering MoG Summary Example of clustering - Image Compression K=2 K=3 K=10 Original Henrik I. Christensen (RIM@GT) Mixture Models/EM 15 / 29

Introduction K-means Clustering MoG Summary Mixtures of Gaussians Recall the original definition of mixtures K � p ( x ) = π i N ( x | µ k Σ k ) i =1 Define an indicator variable z i that is characterized by z k ∈ { 0 , 1 } Only one of the dimensions has unit value � k z k = 1 Assume p ( x , z ) and a conditional p ( x | z ) We can then assume p ( z k = 1) = π k Henrik I. Christensen (RIM@GT) Mixture Models/EM 17 / 29

Introduction K-means Clustering MoG Summary The parameterization We have for { π i } that 0 ≤ π i ≤ 1 � i π i = 1 p ( z ) can be be considered K � π z i p ( z ) = i i =1 Similarly p ( x | z i = 1) = N ( x | µ i , Σ i ) or K � N ( x | µ i , Σ i ) z i p ( x | z ) = i =1 ⇒ K � � p ( x ) = p ( z ) p ( x | z ) = π i N ( x | µ i , Σ i ) z i =1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 18 / 29

Introduction K-means Clustering MoG Summary Mixtures So why all the extra stuff? We can think of p ( x ) as an observation over a joint distribution p ( x , z ) where z is a latent variable. For reference introduce p ( z i = 1 | x ) also denoted γ ( z i ) π i N ( x | µ i , Σ i ) γ ( z i ) = p ( z i = 1 | x ) = � j π j N ( x | µ j , Σ j ) Henrik I. Christensen (RIM@GT) Mixture Models/EM 19 / 29

Introduction K-means Clustering MoG Summary Data Example 1 1 1 (a) (b) (c) 0.5 0.5 0.5 0 0 0 0 0.5 1 0 0.5 1 0 0.5 1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 20 / 29

Introduction K-means Clustering MoG Summary Maximum Likelihood Suppose we have a dataset X = { x 1 , x 2 , ..., x N } How can we model it using a mixture model? � K N � � � ln p ( X | π, µ, Σ) = ln π i N ( x j | µ i , Σ i ) j =1 i =1 z n π x n µ Σ N Henrik I. Christensen (RIM@GT) Mixture Models/EM 21 / 29

Introduction K-means Clustering MoG Summary EM for Gaussian Mixures Consider the extremum for the ln p () N π i N ( x i | µ k , Σ k ) � j π j N ( x i | µ j , Σ j )Σ − 1 k ( x i − µ k ) = 0 � i =1 ⇒ N µ k = 1 � γ ( z ik ) x i N k i =1 where � N k = γ ( z ik ) i Henrik I. Christensen (RIM@GT) Mixture Models/EM 22 / 29

Introduction K-means Clustering MoG Summary EM for Gaussian Mixures In a similar fashion we can compute the co-variance N Σ k = 1 � γ ( z ik )( x i − µ k )( x i − µ k ) T N k i =1 If we maximize wrt to mixing ( π i ) we need to optimize ln p but also consider the constraint � π = 1 Using a Lagrange multiplier we have � K � � ln p ( X | π, µ, Σ) + λ π i − 1 i =1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 23 / 29

Introduction K-means Clustering MoG Summary EM for Gaussian Mixtures We obtain N N ( x i | µ k , Σ k ) � j π j N ( x i | µ j , Σ j ) + λ � i =1 Which creates the intuitive solution π i = N i N Henrik I. Christensen (RIM@GT) Mixture Models/EM 24 / 29

Introduction K-means Clustering MoG Summary EM for Gaussian Mixtures Select a set of values for π , µ , and Σ Perform an initial analysis (expectation) Re-estimate the values (maximize the likelihood) Iterate Henrik I. Christensen (RIM@GT) Mixture Models/EM 25 / 29

Introduction K-means Clustering MoG Summary The detailed version 1 Initialize parameters 2 Evaluate (E Step) π i N ( x | µ i , Σ i ) γ ( z i ) = p ( z i = 1 | x ) = � j π j N ( x | µ j , Σ j ) 3 Re-estimate parameters, µ new , Σ new and π new k k k 4 Evaluate ln p ( X | π, µ, Σ) and check for convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 26 / 29

Introduction K-means Clustering MoG Summary Small Example 2 2 2 2 L = 1 0 0 0 −2 −2 −2 (a) (b) (c) −2 0 2 −2 0 2 −2 0 2 2 2 2 L = 2 L = 5 L = 20 0 0 0 −2 −2 −2 (d) (e) (f) −2 0 2 −2 0 2 −2 0 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 27 / 29

Mixture Models and EM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT)

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points]

Classification method in single particle analysis Cluster Analysis Pawel A. Penczek

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

Machine Learning Lecture Notes on Clustering (IV) 2016-2017 Davide Eynard davide.eynard@usi.ch

Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of

Detection of faulty Beam Position Monitors E. Fol, R. Tomas Garcia Machine Learning Applications

CLUSTERING Based on Foundations of Statistical NLP, C. Manning & H. Sch utze, MIT

On learning statistical mixtures maximizing the complete likelihood The k -MLE methodology using

The impact of high dimension on clustering Gilles Celeux Inria Saclay-le-de-France, Universit

Mixture Models and EM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT)

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Assignment 3 Zahra Sheikhbahaee Zeou Hu &amp; Colin Vandenhof February 2020 1 [2 points]

Classification method in single particle analysis Cluster Analysis Pawel A. Penczek

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

Machine Learning Lecture Notes on Clustering (IV) 2016-2017 Davide Eynard davide.eynard@usi.ch

Alternative Clusterings: Current Progress and Open Challenges James Bailey Department of

Detection of faulty Beam Position Monitors E. Fol, R. Tomas Garcia Machine Learning Applications

CLUSTERING Based on Foundations of Statistical NLP, C. Manning &amp; H. Sch utze, MIT

On learning statistical mixtures maximizing the complete likelihood The k -MLE methodology using

The impact of high dimension on clustering Gilles Celeux Inria Saclay-le-de-France, Universit

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points]

CLUSTERING Based on Foundations of Statistical NLP, C. Manning & H. Sch utze, MIT