unsupervised learning
play

Unsupervised learning Clustering and Dimensionality Reduction Marta - PowerPoint PPT Presentation

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu Dept. CS, UPC Fall 2018 Clustering Partition input examples into similar subsets Clustering Partition input examples into similar subsets Clustering


  1. Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu Dept. CS, UPC Fall 2018

  2. Clustering Partition input examples into similar subsets

  3. Clustering Partition input examples into similar subsets

  4. Clustering Main challenges ◮ How to measure similarity? ◮ How many clusters? ◮ How do we evaluate the clusters? Algorithms we will cover ◮ K-means ◮ Hierarchical clustering

  5. K-means clustering Intuition ◮ Input data are: ◮ m examples x 1 , .., x m , and ◮ K , the number of desired clusters ◮ Clusters represented by cluster centers µ 1 , .., µ K ◮ Given centers µ 1 , .., µ K , each center defines a cluster: the subset of inputs x i that are closer to it than to other centers

  6. K-means clustering Intuition The aim is to find ◮ cluster centers µ 1 , .., µ K and ◮ a cluster assignment z = ( z 1 , .., z m ) where z i ∈ { 1, .., K } ◮ z i is the cluster assigned to example x i such that µ 1 , .., µ K , z minimize the cost function � x i − µ z i � 2 . � J ( µ 1 , .., µ K , z ) = i

  7. K-means clustering Cost function � x i − µ z i � 2 � J ( µ 1 , .., µ K , z ) = i Pseudocode ◮ Pick initial centers µ 1 , .., µ K at random ◮ Repeat until convergence ◮ Optimize z in J ( µ 1 , .., µ K , z ) keeping µ 1 , .., µ K fixed ◮ Set z i to closest center: z i = arg min � x i − µ k � 2 k ◮ Optimize µ 1 , .., µ K in J ( µ 1 , .., µ K , z ) keeping z fixed 1 � ◮ For each k = 1, .., K , set µ k = x i |{ i | z i = k }| i : z i = k

  8. K-Means illustrated

  9. Limitations of k-Means K-Means works well if.. ◮ Clusters are spherical ◮ Clusters are well separated ◮ Clusters are of similar volumes ◮ Clusters have similar number of points .. so improve it with more general model ◮ Mixture of Gaussians: ◮ Learn it using Expectation Maximization

  10. Hierarchical clustering Output is a dendogram

  11. Agglomerative hierarchical clustering Bottom-up Pseudocode 1. Start with one cluster per example 2. Repeat until all examples in one cluster ◮ merge two closest clusters (Next example from D. Blei’s course at Princeton)

  12. Example Data ● 80 ● ● ● 60 ● 40 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 D. Blei Clustering 02 5 / 21

  13. Example iteration 001 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  14. Example iteration 002 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  15. Example iteration 003 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  16. Example iteration 004 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  17. Example iteration 005 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  18. Example iteration 006 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  19. Example iteration 007 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  20. Example iteration 008 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  21. Example iteration 009 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  22. Example iteration 010 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  23. Example iteration 011 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  24. Example iteration 012 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  25. Example iteration 013 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  26. Example iteration 014 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  27. Example iteration 015 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  28. Example iteration 016 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  29. Example iteration 017 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  30. Example iteration 018 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  31. Example iteration 019 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  32. Example iteration 020 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  33. Example iteration 021 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  34. Example iteration 022 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  35. Example iteration 023 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  36. Example iteration 024 ● 80 ● ● ● 60 ● 40 V2 ● ● 20 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● − 20 ● ● 0 20 40 60 80 V1 D. Blei Clustering 02 5 / 21

  37. Agglomerative hierarchical clustering Bottom-up Pseudocode 1. Start with one cluster per example 2. Repeat until all examples in one cluster ◮ merge two closest clusters Defining distance between clusters (i.e. sets of points) ◮ Single Linkage: d ( X , Y ) = x ∈ X , y ∈ Y d ( x , y ) min ◮ Complete Linkage: d ( X , Y ) = x ∈ X , y ∈ Y d ( x , y ) max � x ∈ X , y ∈ Y d ( x , y ) ◮ Group Average: d ( X , Y ) = | X | × | Y | ◮ Centroid Distance: d ( X , Y ) = d ( 1 x , 1 � � y ) | X | | Y | x ∈ X y ∈ Y

  38. Many, many, many other algorithms available ..

  39. Clustering with scikit-learn I K-means: an example with the Iris dataset

  40. Clustering with scikit-learn II K-means: an example with the Iris dataset

  41. Clustering with scikit-learn I Hierarchical clustering: an example with the Iris dataset

  42. Dimensionality reduction I The curse of dimensionality ◮ When dimensionality increases, data becomes increasingly sparse in the space that it occupies ◮ Definitions of density and distance between points (critical for many tasks!) become less meaningful ◮ Visualization and qualitative analysis becomes impossible

Recommend


More recommend