bbm406
play

BBM406 Fundamentals of Machine Learning Lecture 21: Clustering - PowerPoint PPT Presentation

Photo by Unsplash user @foodiesfeed BBM406 Fundamentals of Machine Learning Lecture 21: Clustering K-Means Aykut Erdem // Hacettepe University // Fall 2019 Last time Boosting Idea: given a weak learner, run it multiple times on


  1. � � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions 
 � and then evaluate them by 
 � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition 
 � � of the set of objects using some 
 � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � 38 � � � �

  2. Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints 39

  3. K-Means 
 Clustering 40

  4. K-Means Clustering Benefits • Fast • Conceptually straightforward • Popular slide by Tamara Broderick 41

  5. K-Means: Preliminaries slide by Tamara Broderick 42

  6. K-Means: Preliminaries Datum: Vector of continuous values slide by Tamara Broderick 43

  7. K-Means: Preliminaries Distance North Datum: Vector of continuous values slide by Tamara Broderick Distance East 44

  8. K-Means: Preliminaries Distance North Datum: Vector of continuous values 6 . 2 slide by Tamara Broderick 1 . 5 Distance East 45

  9. K-Means: Preliminaries Distance North Datum: Vector of continuous values North x 3 = (1 . 5 , 6 . 2) East Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Distance East 46

  10. K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 x 3 = (1 . 5 , 6 . 2) Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Feature 1 47

  11. K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = (1 . 5 , 6 . 2) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 48

  12. K-Means: Preliminaries Datum: Vector of D continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 49

  13. K-Means: Preliminaries Datum: Vector of D continuous values Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 50

  14. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 51

  15. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 52

  16. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 53

  17. K-Means: Preliminaries Dissimilarity: Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 54

  18. K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 55

  19. K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) D � dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 ( x 3 ,d − x 17 ,d ) 2 dis ( x 3 , x 17 ) = d =1 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 For each feature For each feature slide by Tamara Broderick Feature 1 Feature 1 56

  20. K-Means: Preliminaries Dissimilarity Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 57

  21. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) K = number of clusters slide by Tamara Broderick Feature 1 Feature 1 58

  22. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 59

  23. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 60

  24. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 61

  25. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 µ 1 slide by Tamara Broderick Feature 1 Feature 1 62

  26. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 63

  27. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 64

  28. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 65

  29. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 66

  30. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters = set of points in 
 S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 67

  31. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters S 1 , S 2 , . . . , S K = set of points in 
 S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 68

  32. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters µ 2 µ 3 S 1 , S 2 , . . . , S K = set of points in 
 S k = set of points in cluster k cluster k µ 1 slide by Tamara Broderick Feature 1 Feature 1 69

  33. K-Means: Preliminaries Dissimilarity Feature 2 Featur slide by Tamara Broderick Feature 1 70

  34. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 71

  35. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster slide by Tamara Broderick Feature 1 72

  36. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data 
 or each data point in the 
 kth cluster slide by Tamara Broderick Feature 1 73

  37. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data 
 or each data or each data point in the 
 point in the kth cluster kth cluster slide by Tamara Broderick Feature 1 or each featur For each feature 74

  38. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 75

  39. • Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 76

  40. • Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 77

  41. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 Featur the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 78

  42. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 79

  43. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 80

  44. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 81

  45. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 82

  46. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t Or no change Or no change Or no change 
 change: in in dis global ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 83

  47. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 84

  48. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 85

  49. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 86

  50. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 87

  51. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 88

  52. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 89

  53. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 90

  54. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 91

  55. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 92

  56. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick 93

  57. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 94

  58. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 95

  59. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 96

  60. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 97

  61. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 98

  62. K-Means: Evaluation slide by Tamara Broderick 99

  63. K-Means: Evaluation • Will it terminate? Yes. Always. slide by Tamara Broderick 100

Recommend


More recommend