� � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions � and then evaluate them by � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition � � of the set of objects using some � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � 38 � � � �
Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints 39
K-Means Clustering 40
K-Means Clustering Benefits • Fast • Conceptually straightforward • Popular slide by Tamara Broderick 41
K-Means: Preliminaries slide by Tamara Broderick 42
K-Means: Preliminaries Datum: Vector of continuous values slide by Tamara Broderick 43
K-Means: Preliminaries Distance North Datum: Vector of continuous values slide by Tamara Broderick Distance East 44
K-Means: Preliminaries Distance North Datum: Vector of continuous values 6 . 2 slide by Tamara Broderick 1 . 5 Distance East 45
K-Means: Preliminaries Distance North Datum: Vector of continuous values North x 3 = (1 . 5 , 6 . 2) East Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Distance East 46
K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 x 3 = (1 . 5 , 6 . 2) Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Feature 1 47
K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = (1 . 5 , 6 . 2) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 48
K-Means: Preliminaries Datum: Vector of D continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 49
K-Means: Preliminaries Datum: Vector of D continuous values Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 50
K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 51
K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 52
K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 53
K-Means: Preliminaries Dissimilarity: Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 54
K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 55
K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) D � dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 ( x 3 ,d − x 17 ,d ) 2 dis ( x 3 , x 17 ) = d =1 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 For each feature For each feature slide by Tamara Broderick Feature 1 Feature 1 56
K-Means: Preliminaries Dissimilarity Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 57
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) K = number of clusters slide by Tamara Broderick Feature 1 Feature 1 58
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 59
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 60
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 61
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 µ 1 slide by Tamara Broderick Feature 1 Feature 1 62
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 63
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 64
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 65
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 66
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters = set of points in S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 67
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters S 1 , S 2 , . . . , S K = set of points in S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 68
K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters µ 2 µ 3 S 1 , S 2 , . . . , S K = set of points in S k = set of points in cluster k cluster k µ 1 slide by Tamara Broderick Feature 1 Feature 1 69
K-Means: Preliminaries Dissimilarity Feature 2 Featur slide by Tamara Broderick Feature 1 70
K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 71
K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster slide by Tamara Broderick Feature 1 72
K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data or each data point in the kth cluster slide by Tamara Broderick Feature 1 73
K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data or each data or each data point in the point in the kth cluster kth cluster slide by Tamara Broderick Feature 1 or each featur For each feature 74
K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 75
• Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 76
• Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 77
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to Featur the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 78
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 79
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 80
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 81
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 82
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t Or no change Or no change Or no change change: in in dis global ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 83
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to the cluster with the closest center. ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 84
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest * Find k with smallest dis ( x n , µ k ) ❖ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 85
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest * Find k with smallest dis ( x n , µ k ) ❖ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 86
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest * Find k with smallest dis ( x n , µ k ) ❖ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 87
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 88
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 89
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 90
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest * Find k with smallest dis ( x n , µ k ) ❖ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 91
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest * Find k with smallest dis ( x n , µ k ) ❖ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 92
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest * Find k with smallest dis ( x n , µ k ) ❖ Put (and no * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster center to be the mean of its cluster’s data points slide by Tamara Broderick 93
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 94
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 95
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 96
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 97
• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from 1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest * Find k with smallest dis ( x n , µ k ) ✤ Put (and no * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 98
K-Means: Evaluation slide by Tamara Broderick 99
K-Means: Evaluation • Will it terminate? Yes. Always. slide by Tamara Broderick 100
Recommend
More recommend