clustering k means and
play

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most - PowerPoint PPT Presentation

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most slides courtesy Hamed Pirsiavash Recap from last time Geometric Rationale of LDiscA & PCA Objective: to rigidly rotate the axes of the D-dimensional space to new


  1. Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most slides courtesy Hamed Pirsiavash

  2. Recap from last time…

  3. Geometric Rationale of LDiscA & PCA Objective: to rigidly rotate the axes of the D-dimensional space to new positions (principal axes): ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis D has the lowest variance covariance among each pair of the principal axes is zero (the principal axes are uncorrelated) Courtesy Antano Ε½ilinsko

  4. L-Dimensional PCA 1. Compute mean 𝜈 , priors, and common covariance Ξ£ 𝜈 = 1 Ξ£ = 1 𝑦 𝑗 βˆ’ 𝜈 π‘ˆ 𝑂 ෍ 𝑦 𝑗 𝑂 ෍ 𝑦 𝑗 βˆ’ 𝜈 𝑗 𝑗:𝑧 𝑗 =𝑙 2. Sphere the data (zero-mean, unit covariance) 3. Compute the (top L) eigenvectors, from sphere-d data, via V π‘Œ βˆ— = π‘ŠπΈ 𝐢 π‘Š π‘ˆ 4. Project the data

  5. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  6. Clustering Basic idea: group together similar instances Example: 2D points

  7. Clustering Basic idea: group together similar instances Example: 2D points One option: small Euclidean distance (squared) Clustering results are crucially dependent on the measure of similarity (or distance) between points to be clustered

  8. Clustering algorithms Simple clustering: organize elements into k groups K-means Mean shift Spectral clustering Hierarchical clustering: organize elements into a hierarchy Bottom up - agglomerative Top down - divisive

  9. Clustering examples: Image Segmentation image credit: Berkeley segmentation benchmark

  10. Clustering examples: News Feed Clustering news articles

  11. Clustering examples: Image Search Clustering queries

  12. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  13. Clustering using k-means Data: D-dimensional observations (x 1 , x 2 , …, x n ) Goal: partition the n observations into k (≀ n) sets S = {S 1 , S 2 , …, S k } so as to minimize the within-cluster sum of squared distances cluster center

  14. Lloyd’s algorithm for k -means Initialize k centers by picking k points randomly among all the points Repeat till convergence (or max iterations) Assign each point to the nearest center (assignment step) Estimate the mean of each group (update step) https://www.csee.umbc.edu/courses/graduate/678/spring18/kmeans/

  15. Properties of the Lloyd’s algorithm Guaranteed to converge in a finite number of iterations objective decreases monotonically l ocal minima if the partitions don’t change. finitely many partitions β†’ k-means algorithm must converge Running time per iteration Assignment step: O(NKD) Computing cluster mean: O(ND) Issues with the algorithm: Worst case running time is super-polynomial in input size No guarantees about global optimality Optimal clustering even for 2 clusters is NP-hard [Aloise et al., 09]

  16. k-means++ algorithm k-means++ algorithm for initialization: 1.Chose one center uniformly at A way to pick the good initial random among all the points centers 2.For each point x , compute Intuition: spread out the k D( x ), the distance between x initial cluster centers and the nearest center that has already been chosen The algorithm proceeds normally once the centers are initialized 3.Chose one new data point at random as a new center, using a weighted probability [Arthur and Vassilvitskii’07] The distribution where a point x is approximation quality is O(log k) in chosen with a probability expectation proportional to D( x ) 2 4.Repeat Steps 2 and 3 until k centers have been chosen

  17. k-means for image segmentation K=2 K=3 Grouping pixels based on intensity similarity feature space: intensity value (1D) 18

  18. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  19. Clustering Evaluation (Classification: accuracy, recall, precision, F-score) Greedy mapping: one-to-one Optimistic mapping: many-to-one Rigorous/information theoretic: V-measure

  20. Clustering Evaluation: One-to-One Each modeled cluster can at most only map to one gold tag type, and vice versa Greedily select the mapping to maximize accuracy

  21. Clustering Evaluation: Many (classes)-to-One (cluster) Each modeled cluster can map to at most one gold tag types, but multiple clusters can map to the same gold tag For each cluster: select the majority tag

  22. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝐼 π‘Œ = βˆ’ ෍ π‘ž(𝑦 𝑗 ) log π‘ž 𝑦 𝑗 𝑗 entropy

  23. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝐼 π‘Œ = βˆ’ ෍ π‘ž(𝑦 𝑗 ) log π‘ž 𝑦 𝑗 𝑗 entropy entropy(point mass) = 0 entropy(uniform) = log K

  24. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k βž” cluster harmonic mean of homogeneity c βž” gold class and completeness 1, 𝐼 𝐿, 𝐷 = 0 Homogeneity: how well does 1 βˆ’ 𝐼 𝐷 𝐿 homogeneity = ࡞ , o/w each gold class map to a single 𝐼 𝐷 cluster? β€œIn order to satisfy our homogeneity criteria, a clustering must assign only those datapoints relative entropy is maximized when a cluster that are members of a single class to a single provides no new info. on class grouping β†’ cluster. That is, the class distribution within not very homogeneous each cluster should be skewed to a single class, that is, zero entropy.”

  25. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k βž” cluster harmonic mean of homogeneity c βž” gold class and completeness Completeness: how well does 1, 𝐼 𝐿, 𝐷 = 0 each learned cluster cover a 1 βˆ’ 𝐼 𝐿 𝐷 completeness = ࡞ , o/w 𝐼 𝐿 single gold class? β€œIn order to satisfy the completeness criteria, a clustering must assign all of those datapoints relative entropy is maximized when each class that are members of a single class to a single is represented uniformly (relatively) β†’ cluster. β€œ not very complete

  26. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k βž” cluster harmonic mean of homogeneity c βž” gold class and completeness Homogeneity: how well does 1, 𝐼 𝐿, 𝐷 = 0 1 βˆ’ 𝐼 𝐷 𝐿 each gold class map to a single homogeneity = ࡞ , o/w 𝐼 𝐷 cluster? Completeness: how well does 1, 𝐼 𝐿, 𝐷 = 0 each learned cluster cover a 1 βˆ’ 𝐼 𝐿 𝐷 completeness = ࡞ , o/w single gold class? 𝐼 𝐿

  27. Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝑏 𝑑𝑙 = # elements of class c in cluster k Homogeneity: how well does each gold class map to a single cluster? Completeness: how well does each learned 1, 𝐼 𝐿, 𝐷 = 0 cluster cover a single gold class? 1 βˆ’ 𝐼 𝐷 𝐿 homogeneity = ࡞ , o/w 𝐼 𝐷 𝐷 𝑏 𝑑𝑙 𝐿 𝑏 𝑑𝑙 𝐼 𝐷 𝐿) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑑′ 𝑏 𝑑′𝑙 1, 𝐼 𝐿, 𝐷 = 0 𝑙 𝑑 1 βˆ’ 𝐼 𝐿 𝐷 completeness = ࡞ 𝐿 𝑏 𝑑𝑙 𝐷 , o/w 𝑏 𝑑𝑙 𝐼 𝐿 𝐼 𝐿 𝐷) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑙′ 𝑏 𝑑𝑙′ 𝑑 𝑙

  28. Clustering Evaluation: V-Measure clusters Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and classes completeness Homogeneity: how well does each gold class map to a single cluster? Completeness: how well does each learned cluster cover a single gold class? a ck K=1 K=2 K=3 𝐷 𝑏 𝑑𝑙 𝐿 𝑏 𝑑𝑙 3 1 1 𝐼 𝐷 𝐿) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑑′ 𝑏 𝑑′𝑙 1 1 3 𝑙 𝑑 𝐿 𝑏 𝑑𝑙 1 3 1 𝐷 𝑏 𝑑𝑙 𝐼 𝐿 𝐷) = βˆ’ ෍ ෍ 𝑂 log Οƒ 𝑙′ 𝑏 𝑑𝑙′ 𝑑 𝑙 Homogeneity = Completeness = V-Measure=0.14

  29. Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

  30. Clustering using density estimation One issue with k-means is that it is sometimes hard to pick k The mean shift algorithm seeks modes or local maxima of density in the feature space Mean shift automatically determines the number of clusters Kernel density estimator Small h implies more modes (bumpy distribution)

  31. Mean shift algorithm For each point x i : find m i , the amount to shift each point x i to its centroid return {m i }

  32. Mean shift algorithm For each point x i : set m i = x i while not converged: compute weighted average of neighboring point return {m i }

Recommend


More recommend