CS 6316 Machine Learning Clustering Yangfeng Ji Department of - PowerPoint PPT Presentation

CS 6316 Machine Learning Clustering Yangfeng Ji Department of Computer Science University of Virginia

Clustering

Clustering Clustering is the task of grouping a set of objects such that similar objects end up in the same group and dissimilar objects are separated into different groups [Shalev-Shwartz and Ben-David, 2014, Page 307] 2

Motivation A good clustering can help us understand the data [MacKay, 2003, Chap 20] 3

Movitation(II) A good clustering has predictive power and can be useful to build better classifiers [MacKay, 2003, Chap 20] 4

Motivation (III) Failures of a cluster model may highlight interesting properties of data or a single data point [MacKay, 2003, Chap 20] 5

Challenges ◮ Lack of ground truth — like any other unsupervised learning tasks [Shalev-Shwartz and Ben-David, 2014, Page 307] 6

Challenges ◮ Lack of ground truth — like any other unsupervised learning tasks ◮ Definition of similarity measurement ◮ Two images are similar ◮ Two documents are similar [Shalev-Shwartz and Ben-David, 2014, Page 307] 6

K -Means Clustering

K -Means Clustering ◮ A data set S � { x 1 , . . . , x m } with x i ∈ R d ◮ Partition the data set into some number K of clusters ◮ K is a hyper-parameter given before learning ◮ Another example task of unsupervised learning 8

Objective Function ◮ Introduce r i ∈ [ K ] for each data point x i , which is a determinstric variable ◮ The objective function of k -means clustering m K � � δ ( r i � k )� x i − µ k � 2 J ( r , µ ) � (1) 2 i � 1 k � 1 where { µ k } K k � 1 ∈ R d . Each µ k is called a prototype associated with the k -th cluster. 9

Objective Function ◮ Introduce r i ∈ [ K ] for each data point x i , which is a determinstric variable ◮ The objective function of k -means clustering m K � � δ ( r i � k )� x i − µ k � 2 J ( r , µ ) � (1) 2 i � 1 k � 1 where { µ k } K k � 1 ∈ R d . Each µ k is called a prototype associated with the k -th cluster. ◮ Learning: minimize equation 1 J ( r , µ ) (2) argmin r , µ 9

Learning: Initialization Randomly initialize { µ k } K k � 1 10

Learning: Assignment Step Given { µ k } K k � 1 , for each x i , find the value of r i is equivalent to assign the data point to a cluster � x i − µ k ′ � 2 r i ← argmin (3) 2 k ′ 11

Learning: Update Step Given { r i } m i � 1 , the algorithm updates µ k as � m i � 1 δ ( r i � k ) x i µ k � (4) � m i � 1 δ ( r i � k ) 12 ◮ The updated equals to the mean of all data points

Algorithm With some randomly initialized { µ k } K k � 1 , iterate the following two steps until converge Assignment Step Assign r i for each x i � x i − µ k ′ � 2 r i ← argmin (5) 2 k ′ Update Step Updates µ k with { r i } m i � 1 � m i � 1 δ ( r i � k ) x i µ k � (6) � m i � 1 δ ( r i � k ) 13

Example (Cont.) 14

From GMMs to K -means

Gaussian Mixture Models Consider a GMM with two components q ( x , z ) q ( z ) q ( x | z ) � α δ ( z � 1 ) · N ( x ; µ 1 , Σ 1 ) δ ( z � 1 ) � · ( 1 − α ) δ ( z � 2 ) · N ( x ; µ 2 , Σ 2 ) δ ( z � 2 ) (7) 16

Gaussian Mixture Models Consider a GMM with two components q ( x , z ) q ( z ) q ( x | z ) � α δ ( z � 1 ) · N ( x ; µ 1 , Σ 1 ) δ ( z � 1 ) � · ( 1 − α ) δ ( z � 2 ) · N ( x ; µ 2 , Σ 2 ) δ ( z � 2 ) (7) And the marginal probability p ( x ) is q ( x ) q ( z � 1 ) q ( x | z � 1 ) + q ( z � 2 ) q ( x | z � 2 ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (8) � 16

A Special Case Consider the first component in this GMM with parameters µ 1 and Σ 1 ◮ Assume Σ 1 � ǫ I , then ǫ d | Σ 1 | (9) � 1 ( x − µ 1 ) T Σ − 1 ǫ � x − µ � 2 1 ( x − µ ) (10) � 2 17

A Special Case Consider the first component in this GMM with parameters µ 1 and Σ 1 ◮ Assume Σ 1 � ǫ I , then ǫ d | Σ 1 | (9) � 1 ( x − µ 1 ) T Σ − 1 ǫ � x − µ � 2 1 ( x − µ ) (10) � 2 ◮ A Gaussian component can be simplified as 1 − 1 � � 2 ( x i − µ 1 ) T Σ − 1 q ( x i | z i � 1 ) 1 ( x i − µ 1 ) exp � d 1 2 | Σ 1 | ( 2 π ) 2 1 − 1 exp � 2 ǫ � x i − µ 1 � 2 � (11) � 2 d ( 2 πǫ ) 2 17

A Special Case Consider the first component in this GMM with parameters µ 1 and Σ 1 ◮ Assume Σ 1 � ǫ I , then ǫ d | Σ 1 | (9) � 1 ( x − µ 1 ) T Σ − 1 ǫ � x − µ � 2 1 ( x − µ ) (10) � 2 ◮ A Gaussian component can be simplified as 1 − 1 � � 2 ( x i − µ 1 ) T Σ − 1 q ( x i | z i � 1 ) 1 ( x i − µ 1 ) exp � d 1 2 | Σ 1 | ( 2 π ) 2 1 − 1 exp � 2 ǫ � x i − µ 1 � 2 � (11) � 2 d ( 2 πǫ ) 2 ◮ Similar results with the second component with 17 Σ 2 � ǫ I

A Special Case (II) From the previous discussion, we know that, given θ , q ( z i | x i ) is computed as α · N ( x i ; µ 1 , Σ 1 ) q ( z i � 1 | x i ) � α · N ( x i ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x i ; µ 2 , Σ 2 ) α exp (− 1 2 ǫ � x i − µ 1 � 2 2 ) � α exp (− 1 2 ǫ � x i − µ 1 � 2 2 ) + ( 1 − α ) exp (− 1 2 ǫ � x i − µ 2 � 2 2 18

A Special Case (II) From the previous discussion, we know that, given θ , q ( z i | x i ) is computed as α · N ( x i ; µ 1 , Σ 1 ) q ( z i � 1 | x i ) � α · N ( x i ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x i ; µ 2 , Σ 2 ) α exp (− 1 2 ǫ � x i − µ 1 � 2 2 ) � α exp (− 1 2 ǫ � x i − µ 1 � 2 2 ) + ( 1 − α ) exp (− 1 2 ǫ � x i − µ 2 � 2 2 ◮ When ǫ → 0 � 1 � x i − µ 1 � 2 < � x i − µ 2 � 2 q ( z i � 1 | x i ) → (12) 0 � x i − µ 1 � 2 > � x i − µ 2 � 2 18

A Special Case (II) From the previous discussion, we know that, given θ , q ( z i | x i ) is computed as α · N ( x i ; µ 1 , Σ 1 ) q ( z i � 1 | x i ) � α · N ( x i ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x i ; µ 2 , Σ 2 ) α exp (− 1 2 ǫ � x i − µ 1 � 2 2 ) � α exp (− 1 2 ǫ � x i − µ 1 � 2 2 ) + ( 1 − α ) exp (− 1 2 ǫ � x i − µ 2 � 2 2 ◮ When ǫ → 0 � 1 � x i − µ 1 � 2 < � x i − µ 2 � 2 q ( z i � 1 | x i ) → (12) 0 � x i − µ 1 � 2 > � x i − µ 2 � 2 ◮ r i in K -means is a very special case of z i in GMM 18

When K -means Will Fail? Recall that K -means is an extreme case of GMM with Σ � ǫ I and ǫ → 0 Parameters µ 1 � [ 1 . 5 , 0 ] T µ 2 � [− 1 . 5 , 0 ] T Σ 1 � Σ 2 diag ( 0 . 1 , 10 . 0 ) (13) � 19

When K -means Will Fail? (II) Recall that K -means is an extreme case of GMM with Σ � ǫ I and ǫ → 0 20

How About GMM? With the following setup 1 ◮ Randomly initialize GMM parameters (instead of using K -means to initalize) ◮ Set covariance_type to be tied 1 Please refer to the demo code for more detail 21

Spectral Clustering Instead of computing the distance between data points to some prototypes, spectral clustering is purely based on the similarity between data points, which can address the problem like this [Shalev-Shwartz and Ben-David, 2014, Section 22.3] 22

Reference Bishop, C. M. (2006). Pattern recognition and machine learning . springer. MacKay, D. (2003). Information theory, inference and learning algorithms . Cambridge university press. Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding machine learning: From theory to algorithms . Cambridge university press. 23

CS 6316 Machine Learning Clustering Yangfeng Ji Department of - PowerPoint PPT Presentation

CS 6316 Machine Learning Clustering Yangfeng Ji Department of Computer Science University of Virginia Clustering Clustering Clustering is the task of grouping a set of objects such that similar objects end up in the same group and dissimilar

CS 6316 Machine Learning The Bias-Complexity Tradeoff Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Introduction to Learning Theory Yangfeng Ji Department of Computer

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Boosting Yangfeng Ji Department of Computer Science University of

CS 6316 Machine Learning Support Vector Machines and Kernel Meth- ods Yangfeng Ji Department of

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Review of Linear Algebra and Probability Yangfeng Ji Department of

CS 6316 Machine Learning Linear Predictors Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Model Selection and Validation Yangfeng Ji Department of Computer

CS 6316 Machine Learning Generative Models Yangfeng Ji Department of Computer Science

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Expectation-Maximization Algorithm. Petr Pok Czech Technical University in Prague Faculty of

EM Algorithm 09-09-2019 For Mixture Gaussian Models Instructor - Sriram Ganapathy

MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS Classification with GMM + Bayes 1

On GANs and GMMs Eitan Richardson and Yair Weiss The Hebrew University of Jerusalem GAN: Sharp

Instrumental Variables Regression, GMM, and Weak Instruments in Time Series James H. Stock

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & Interaction Group Idiap Research

Graph Neural Network to label particle hits in Liquid Argon Time Projection Chamber Hanfei Cui

Clustering: Models and Algorithms Shikui Tu 2019-03-07 1 Outline Gaussian Mixture Models

CS 6316 Machine Learning Clustering Yangfeng Ji Department of - PowerPoint PPT Presentation

CS 6316 Machine Learning Clustering Yangfeng Ji Department of Computer Science University of Virginia Clustering Clustering Clustering is the task of grouping a set of objects such that similar objects end up in the same group and dissimilar

CS 6316 Machine Learning The Bias-Complexity Tradeoff Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Introduction to Learning Theory Yangfeng Ji Department of Computer

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Boosting Yangfeng Ji Department of Computer Science University of

CS 6316 Machine Learning Support Vector Machines and Kernel Meth- ods Yangfeng Ji Department of

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Review of Linear Algebra and Probability Yangfeng Ji Department of

CS 6316 Machine Learning Linear Predictors Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Model Selection and Validation Yangfeng Ji Department of Computer

CS 6316 Machine Learning Generative Models Yangfeng Ji Department of Computer Science

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Expectation-Maximization Algorithm. Petr Pok Czech Technical University in Prague Faculty of

EM Algorithm 09-09-2019 For Mixture Gaussian Models Instructor - Sriram Ganapathy

MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS Classification with GMM + Bayes 1

On GANs and GMMs Eitan Richardson and Yair Weiss The Hebrew University of Jerusalem GAN: Sharp

Instrumental Variables Regression, GMM, and Weak Instruments in Time Series James H. Stock

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning &amp; Interaction Group Idiap Research

Graph Neural Network to label particle hits in Liquid Argon Time Projection Chamber Hanfei Cui

Clustering: Models and Algorithms Shikui Tu 2019-03-07 1 Outline Gaussian Mixture Models

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & Interaction Group Idiap Research