lecture 21
play

Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe - PowerPoint PPT Presentation

Lecture 21: Clustering K-Means Aykut Erdem December 2018 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote On each


  1. � � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions 
 � and then evaluate them by 
 � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition 
 � � of the set of objects using some 
 � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � � 38 � � � �

  2. Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints � 39

  3. K-Means 
 Clustering � 40

  4. K-Means Clustering Benefits • Fast • Conceptually straightforward • Popular slide by Tamara Broderick � 41

  5. K-Means: Preliminaries slide by Tamara Broderick � 42

  6. K-Means: Preliminaries Datum: Vector of continuous values slide by Tamara Broderick � 43

  7. K-Means: Preliminaries Distance North Datum: Vector of continuous values slide by Tamara Broderick Distance East � 44

  8. K-Means: Preliminaries Distance North Datum: Vector of continuous values 6 . 2 slide by Tamara Broderick 1 . 5 Distance East � 45

  9. K-Means: Preliminaries Distance North Datum: Vector of continuous values North x 3 = (1 . 5 , 6 . 2) East Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Distance East � 46

  10. K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 x 3 = (1 . 5 , 6 . 2) Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Feature 1 � 47

  11. K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = (1 . 5 , 6 . 2) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 � 48

  12. K-Means: Preliminaries Datum: Vector of D continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 � 49

  13. K-Means: Preliminaries Datum: Vector of D continuous values Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 � 50

  14. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 � 51

  15. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 � 52

  16. K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 � 53

  17. K-Means: Preliminaries Dissimilarity: Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 � 54

  18. K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 � 55

  19. K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) D � dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 ( x 3 ,d − x 17 ,d ) 2 dis ( x 3 , x 17 ) = d =1 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 For each feature For each feature slide by Tamara Broderick Feature 1 Feature 1 � 56

  20. K-Means: Preliminaries Dissimilarity Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 � 57

  21. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) K = number of clusters slide by Tamara Broderick Feature 1 Feature 1 � 58

  22. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 � 59

  23. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 � 60

  24. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 � 61

  25. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 µ 1 slide by Tamara Broderick Feature 1 Feature 1 � 62

  26. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 � 63

  27. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 � 64

  28. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 � 65

  29. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 � 66

  30. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters = set of points in 
 S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 � 67

  31. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters S 1 , S 2 , . . . , S K = set of points in 
 S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 � 68

  32. K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters µ 2 µ 3 S 1 , S 2 , . . . , S K = set of points in 
 S k = set of points in cluster k cluster k µ 1 slide by Tamara Broderick Feature 1 Feature 1 � 69

  33. K-Means: Preliminaries Dissimilarity Feature 2 Featur slide by Tamara Broderick Feature 1 � 70

  34. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 � 71

  35. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster slide by Tamara Broderick Feature 1 � 72

  36. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data 
 or each data point in the 
 kth cluster slide by Tamara Broderick Feature 1 � 73

  37. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data 
 or each data or each data point in the 
 point in the kth cluster kth cluster slide by Tamara Broderick Feature 1 or each featur For each feature � 74

  38. K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 � 75

  39. • Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 Featur center to be the mean of its cluster’s data points slide by Tamara Broderick � 76

  40. • Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 Featur center to be the mean of its cluster’s data points slide by Tamara Broderick � 77

  41. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 Featur the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 78

  42. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 79

  43. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 80

  44. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 81

  45. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 82

  46. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t Or no change Or no change Or no change 
 change: in in dis global ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 83

  47. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to 
 the cluster with the closest 
 center. ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 84

  48. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 85

  49. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 86

  50. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 87

  51. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 88

  52. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 89

  53. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 90

  54. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 91

  55. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 92

  56. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ❖ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster 
 center to be the mean of its cluster’s data points slide by Tamara Broderick � 93

  57. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 94

  58. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 95

  59. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 96

  60. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 97

  61. • For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 
 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest 
 * Find k with smallest dis ( x n , µ k ) ✤ Put (and no 
 * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick � 98

  62. K-Means: Evaluation slide by Tamara Broderick � 99

  63. K-Means: Evaluation • Will it terminate? Yes. Always. slide by Tamara Broderick � 100

Recommend


More recommend