Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu Dept. CS, UPC Fall 2018
Clustering Partition input examples into similar subsets
Clustering Partition input examples into similar subsets
Clustering Main challenges ◮ How to measure similarity? ◮ How many clusters? ◮ How do we evaluate the clusters? Algorithms we will cover ◮ K-means ◮ Hierarchical clustering
K-means clustering Intuition ◮ Input data are: ◮ m examples x 1 , .., x m , and ◮ K , the number of desired clusters ◮ Clusters represented by cluster centers µ 1 , .., µ K ◮ Given centers µ 1 , .., µ K , each center defines a cluster: the subset of inputs x i that are closer to it than to other centers
K-means clustering Intuition The aim is to find ◮ cluster centers µ 1 , .., µ K and ◮ a cluster assignment z = ( z 1 , .., z m ) where z i ∈ { 1, .., K } ◮ z i is the cluster assigned to example x i such that µ 1 , .., µ K , z minimize the cost function � x i − µ z i � 2 . � J ( µ 1 , .., µ K , z ) = i
K-means clustering Cost function � x i − µ z i � 2 � J ( µ 1 , .., µ K , z ) = i Pseudocode ◮ Pick initial centers µ 1 , .., µ K at random ◮ Repeat until convergence ◮ Optimize z in J ( µ 1 , .., µ K , z ) keeping µ 1 , .., µ K fixed ◮ Set z i to closest center: z i = arg min � x i − µ k � 2 k ◮ Optimize µ 1 , .., µ K in J ( µ 1 , .., µ K , z ) keeping z fixed 1 � ◮ For each k = 1, .., K , set µ k = x i |{ i | z i = k }| i : z i = k
K-Means illustrated
Limitations of k-Means K-Means works well if.. ◮ Clusters are spherical ◮ Clusters are well separated ◮ Clusters are of similar volumes ◮ Clusters have similar number of points .. so improve it with more general model ◮ Mixture of Gaussians: ◮ Learn it using Expectation Maximization
Hierarchical clustering Output is a dendogram
Agglomerative hierarchical clustering Bottom-up Pseudocode 1. Start with one cluster per example 2. Repeat until all examples in one cluster ◮ merge two closest clusters (Next example from D. Blei’s course at Princeton)
Example Data ● 80 ● ● ● 60 ● 40 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 D. Blei Clustering 02 5 / 21
Example iteration 001 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 002 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 003 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 004 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 005 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 006 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 007 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 008 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 009 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 010 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 011 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 012 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 013 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 014 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 015 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 016 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 017 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 018 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 019 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 020 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 021 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 022 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 023 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Example iteration 024 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21
Agglomerative hierarchical clustering Bottom-up Pseudocode 1. Start with one cluster per example 2. Repeat until all examples in one cluster ◮ merge two closest clusters Defining distance between clusters (i.e. sets of points) ◮ Single Linkage: d ( X , Y ) = x ∈ X , y ∈ Y d ( x , y ) min ◮ Complete Linkage: d ( X , Y ) = x ∈ X , y ∈ Y d ( x , y ) max � x ∈ X , y ∈ Y d ( x , y ) ◮ Group Average: d ( X , Y ) = | X | × | Y | ◮ Centroid Distance: d ( X , Y ) = d ( 1 x , 1 � � y ) | X | | Y | x ∈ X y ∈ Y
Many, many, many other algorithms available ..
Clustering with scikit-learn I K-means: an example with the Iris dataset
Clustering with scikit-learn II K-means: an example with the Iris dataset
Clustering with scikit-learn I Hierarchical clustering: an example with the Iris dataset
Dimensionality reduction I The curse of dimensionality ◮ When dimensionality increases, data becomes increasingly sparse in the space that it occupies ◮ Definitions of density and distance between points (critical for many tasks!) become less meaningful ◮ Visualization and qualitative analysis becomes impossible
Recommend
More recommend