lecture 21 unsupervised learning and clustering algorithms
play

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. - PowerPoint PPT Presentation

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 21 April 20, 2018 Outline


  1. Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

  2. Recap Previous Lecture 2 C. Long Lecture 21 April 20, 2018

  3. Outline Introduce Unsupervised Learning and Clustering • K-means Algorithm • Hierarchy Clustering • Applications • 3 C. Long Lecture 21 April 20, 2018

  4. Outline Introduce Unsupervised Learning and Clustering • K-means Algorithm • Hierarchy Clustering • Applications • 4 C. Long Lecture 21 April 20, 2018

  5. Unsupervised learning and clustering All data is unlabeled and All data is labeled and the the algorithms learn the algorithms learn to predict inherent structure from the output from the input the input data. data. 5 C. Long Lecture 21 April 20, 2018

  6. Unsupervised learning and clustering G o a l : t o m o d e l t h e underlying structure or distribution in the input data. 6 C. Long Lecture 21 April 20, 2018

  7. What is clustering? 7 C. Long Lecture 21 April 20, 2018

  8. What is clustering for? E.g.1 : group people of similiar size together to make S,M,L T-shirts. E.g.2 : segment customers to do targeted marketing. E.g.3 : orgnize documents to produce a topic hierarchy. 8 C. Long Lecture 21 April 20, 2018

  9. What is clustering for? 9 C. Long Lecture 21 April 20, 2018

  10. Clustering evaluation Clustering is hard to evaluate. In most applications, expert judgements are still the key. 10 C. Long Lecture 21 April 20, 2018

  11. Data Clustering - Formal Definition Given a set of N unlabeled examples D = x 1 , x 2 , ..., x N in • a d-dimensional feature space, D is partitioned into a number of disjoint subsets Dj’s: A partition is denoted by: • and the problem of data clustering is thus formulated as where f(·) is formulated according to a given criterion. 11 C. Long Lecture 21 April 20, 2018

  12. Outline Introduce Unsupervised Learning and Clustering • K-means Algorithm • Hierarchy Clustering • Applications • 12 C. Long Lecture 21 April 20, 2018

  13. K-means 13 C. Long Lecture 21 April 20, 2018

  14. K-means: an example 14 C. Long Lecture 21 April 20, 2018

  15. K-means: an example 15 C. Long Lecture 21 April 20, 2018

  16. K-means: an example 1-st iteration 16 C. Long Lecture 21 April 20, 2018

  17. K-means: an example 1-st iteration 17 C. Long Lecture 21 April 20, 2018

  18. K-means: an example 2-nd iteration 18 C. Long Lecture 21 April 20, 2018

  19. K-means: an example 2-nd iteration 19 C. Long Lecture 21 April 20, 2018

  20. K-means: an example 3-rd iteration 20 C. Long Lecture 21 April 20, 2018

  21. K-means: an example 3-rd iteration 21 C. Long Lecture 21 April 20, 2018

  22. K-means: an example No changes : Done 22 C. Long Lecture 21 April 20, 2018

  23. K-means Iterate :  Assign / cluster each example to closest center iterate over each point : - get distance to each cluster center - assign to closest center ( hard cluster )  Recalculate centers as the mean of the points in a cluster How do we do this ? 23 C. Long Lecture 21 April 20, 2018

  24. K-means Iterate :  Assign / cluster each example to closest center iterate over each point : - get distance to each cluster center - assign to closest center ( hard cluster )  Recalculate centers as the mean of the points in a cluster What distance measure should we use ? 24 C. Long Lecture 21 April 20, 2018

  25. K-means Iterate :  Assign / cluster each example to closest center  Recalculate centers as the mean of the points in a cluster good for spatial data What distance measure should we use ? 25 C. Long Lecture 21 April 20, 2018

  26. Euclidean Distance Euclidean Distance • n 2 = - dist ( p q ) å k k = k 1 Where n is the number of dimensions (attributes) and p k and q k are, respectively, the k th attributes (components) or data objects p and q . Standardization is necessary , if scales differ . • 26 C. Long Lecture 21 April 20, 2018

  27. Minkowski Distance Minkowski Distance is a generalization of Euclidean • Distance Where r is a parameter, n is the number of dimensions (attributes) and p k and q k are, respectively, the k-th attributes (components) or data objects p and q. 27 C. Long Lecture 21 April 20, 2018

  28. Euclidean Distance 28 C. Long Lecture 21 April 20, 2018

  29. More about Euclidean distance 29 C. Long Lecture 21 April 20, 2018

  30. Manhattan Distance Manhattan distance represents distance that is • measured along directions that are parallel to the x and y axes Manhattan distance between two n -dimensional • vectors x=(x 1, x 2, … , x n ) and y=(y 1, y 2, … , y n ) is: = - + - + + - d ( x , y ) x y x y  x y M 1 1 2 2 n n n å = - x y i i = i 1 Where represents the absolute value of the difference betweeen x i and y i 30 C. Long Lecture 21 April 20, 2018

  31. Minkowski Distance: Examples 31 C. Long Lecture 21 April 20, 2018

  32. Minkowski Distance 32 C. Long Lecture 21 April 20, 2018

  33. K-means Iterate : • Assign - cluster each example to closest center • Recalculate centers as the mean of the points in a cluster Where are the cluster centers ? 33 C. Long Lecture 21 April 20, 2018

  34. K-means Iterate : • Assign - cluster each example to closest center • Recalculate centers as the mean of the points in a cluster How do we calculate these ? 34 C. Long Lecture 21 April 20, 2018

  35. K-means Iterate : • Assign - cluster each example to closest center • Recalculate centers as the mean of the points in a cluster 35 C. Long Lecture 21 April 20, 2018

  36. Pros and cons of K-means Weakneses: The user needs to specify the value of K.  Applicable only when mean is defined.  The algorithm is sensitive to the initial seeds.  The algorithm is sensitive to outliers.   Outliers are data points that are very far away from other data points.  Outliers could be errors in the data recording or some special data points with very different values. 36 C. Long Lecture 21 April 20, 2018

  37. Failure case 37 C. Long Lecture 21 April 20, 2018

  38. Sensitive to initial seeds 38 C. Long Lecture 21 April 20, 2018

  39. Sensitive to outliers outlier outlier 39 C. Long Lecture 21 April 20, 2018

  40. Application to visual object recognition: Bag of Words 40 C. Long Lecture 21 April 20, 2018

  41. Application to visual object recognition: Bag of Words Vector quantize descriptors from a set of training images using k - means Image representation: a normalized histogram of visual words. 41 C. Long Lecture 21 April 20, 2018

  42. Application to visual object recognition: Bag of Words The same visual word 42 C. Long Lecture 21 April 20, 2018

  43. Application to visual object recognition: Bag of Words 43 C. Long Lecture 21 April 20, 2018

  44. Summary 44 C. Long Lecture 21 April 20, 2018

  45. Outline Introduce Unsupervised Learning and Clustering • K-means Algorithm • Hierarchy Clustering • Applications • 45 C. Long Lecture 21 April 20, 2018

  46. Hierarchical Clustering Up to now , considered “flat” clustering • For some data, hierarchical clustering is more • appropriate than “flat” clustering Hierarchical clustering • 46 C. Long Lecture 21 April 20, 2018

  47. Hierarchical Clustering: Biological Taxonomy 47 C. Long Lecture 21 April 20, 2018

  48. Hierarchical Clustering: Dendogram Prefered way to represent a hierarchical clustering is a • dendogram _x0001_ Binary tree _x0001_ Level k corresponds to partitioning with n-k+1 clusters _x0001_ if need k clusters, take clustering from level n-k+1 _x0001_ If samples are in the same cluster at level k, they stay in the same cluster at higher levels _x0001_ dendogram typically shows the similarity of grouped clusters 48 C. Long Lecture 21 April 20, 2018

  49. Hierarchical Clustering: Venn Diagram Can also use Venn diagram to show hierarchical • clustering , but similarity is not represented quantitatively 49 C. Long Lecture 21 April 20, 2018

  50. Hierarchical Clustering Algorithms for hierarchical clustering can be divided • into two types: 1. Agglomerative ( bottom up ) procedures • _x0001_ Start with n singleton clusters _x0001_ Form hierarchy by merging most similar clusters 2. Divisive ( top bottom ) procedures • _x0001_ Start with all samples in one cluster _x0001_ Form hierarchy by splitting the “worst” clusters 50 C. Long Lecture 21 April 20, 2018

  51. Divisive Hierarchical Clustering Any “flat” algorithm which produces a fixed number • of clusters can be used set c = 2 51 C. Long Lecture 21 April 20, 2018

  52. Agglomerative Hierarchical Clustering Initialize with each example in singleton cluster while • there is more than 1 cluster 1. find 2 nearest clusters • 2. merge them • Four common ways to measure cluster distance • 52 C. Long Lecture 21 April 20, 2018

  53. Single Linkage or Nearest Neighbor Agglomerative clustering with minimum distance • generates minimum spanning tree • encourages growth of elongated clusters • disadvantage : very sensitive to noise • 53 C. Long Lecture 21 April 20, 2018

Recommend


More recommend