Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most - PowerPoint PPT Presentation

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most slides courtesy Hamed Pirsiavash

Recap from last time…

Geometric Rationale of LDiscA & PCA Objective: to rigidly rotate the axes of the D-dimensional space to new positions (principal axes): ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis D has the lowest variance covariance among each pair of the principal axes is zero (the principal axes are uncorrelated) Courtesy Antano Žilinsko

L-Dimensional PCA 1. Compute mean 𝜈 , priors, and common covariance Σ 𝜈 = 1 Σ = 1 𝑦 𝑗 − 𝜈 𝑈 𝑂 ෍ 𝑦 𝑗 𝑂 ෍ 𝑦 𝑗 − 𝜈 𝑗 𝑗:𝑧 𝑗 =𝑙 2. Sphere the data (zero-mean, unit covariance) 3. Compute the (top L) eigenvectors, from sphere-d data, via V 𝑌 ∗ = 𝑊𝐸 𝐶 𝑊 𝑈 4. Project the data

Outline Clustering basics K-means: basic algorithm & extensions Cluster evaluation Non-parametric mode finding: density estimation Graph & spectral clustering Hierarchical clustering K-Nearest Neighbor

Clustering Basic idea: group together similar instances Example: 2D points

Clustering Basic idea: group together similar instances Example: 2D points One option: small Euclidean distance (squared) Clustering results are crucially dependent on the measure of similarity (or distance) between points to be clustered

Clustering algorithms Simple clustering: organize elements into k groups K-means Mean shift Spectral clustering Hierarchical clustering: organize elements into a hierarchy Bottom up - agglomerative Top down - divisive

Clustering examples: Image Segmentation image credit: Berkeley segmentation benchmark

Clustering examples: News Feed Clustering news articles

Clustering examples: Image Search Clustering queries

Clustering using k-means Data: D-dimensional observations (x 1 , x 2 , …, x n ) Goal: partition the n observations into k (≤ n) sets S = {S 1 , S 2 , …, S k } so as to minimize the within-cluster sum of squared distances cluster center

Lloyd’s algorithm for k -means Initialize k centers by picking k points randomly among all the points Repeat till convergence (or max iterations) Assign each point to the nearest center (assignment step) Estimate the mean of each group (update step) https://www.csee.umbc.edu/courses/graduate/678/spring18/kmeans/

Properties of the Lloyd’s algorithm Guaranteed to converge in a finite number of iterations objective decreases monotonically l ocal minima if the partitions don’t change. finitely many partitions → k-means algorithm must converge Running time per iteration Assignment step: O(NKD) Computing cluster mean: O(ND) Issues with the algorithm: Worst case running time is super-polynomial in input size No guarantees about global optimality Optimal clustering even for 2 clusters is NP-hard [Aloise et al., 09]

k-means++ algorithm k-means++ algorithm for initialization: 1.Chose one center uniformly at A way to pick the good initial random among all the points centers 2.For each point x , compute Intuition: spread out the k D( x ), the distance between x initial cluster centers and the nearest center that has already been chosen The algorithm proceeds normally once the centers are initialized 3.Chose one new data point at random as a new center, using a weighted probability [Arthur and Vassilvitskii’07] The distribution where a point x is approximation quality is O(log k) in chosen with a probability expectation proportional to D( x ) 2 4.Repeat Steps 2 and 3 until k centers have been chosen

k-means for image segmentation K=2 K=3 Grouping pixels based on intensity similarity feature space: intensity value (1D) 18

Clustering Evaluation (Classification: accuracy, recall, precision, F-score) Greedy mapping: one-to-one Optimistic mapping: many-to-one Rigorous/information theoretic: V-measure

Clustering Evaluation: One-to-One Each modeled cluster can at most only map to one gold tag type, and vice versa Greedily select the mapping to maximize accuracy

Clustering Evaluation: Many (classes)-to-One (cluster) Each modeled cluster can map to at most one gold tag types, but multiple clusters can map to the same gold tag For each cluster: select the majority tag

Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝐼 𝑌 = − ෍ 𝑞(𝑦 𝑗 ) log 𝑞 𝑦 𝑗 𝑗 entropy

Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝐼 𝑌 = − ෍ 𝑞(𝑦 𝑗 ) log 𝑞 𝑦 𝑗 𝑗 entropy entropy(point mass) = 0 entropy(uniform) = log K

Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k ➔ cluster harmonic mean of homogeneity c ➔ gold class and completeness 1, 𝐼 𝐿, 𝐷 = 0 Homogeneity: how well does 1 − 𝐼 𝐷 𝐿 homogeneity = ൞ , o/w each gold class map to a single 𝐼 𝐷 cluster? “In order to satisfy our homogeneity criteria, a clustering must assign only those datapoints relative entropy is maximized when a cluster that are members of a single class to a single provides no new info. on class grouping → cluster. That is, the class distribution within not very homogeneous each cluster should be skewed to a single class, that is, zero entropy.”

Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k ➔ cluster harmonic mean of homogeneity c ➔ gold class and completeness Completeness: how well does 1, 𝐼 𝐿, 𝐷 = 0 each learned cluster cover a 1 − 𝐼 𝐿 𝐷 completeness = ൞ , o/w 𝐼 𝐿 single gold class? “In order to satisfy the completeness criteria, a clustering must assign all of those datapoints relative entropy is maximized when each class that are members of a single class to a single is represented uniformly (relatively) → cluster. “ not very complete

Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): k ➔ cluster harmonic mean of homogeneity c ➔ gold class and completeness Homogeneity: how well does 1, 𝐼 𝐿, 𝐷 = 0 1 − 𝐼 𝐷 𝐿 each gold class map to a single homogeneity = ൞ , o/w 𝐼 𝐷 cluster? Completeness: how well does 1, 𝐼 𝐿, 𝐷 = 0 each learned cluster cover a 1 − 𝐼 𝐿 𝐷 completeness = ൞ , o/w single gold class? 𝐼 𝐿

Clustering Evaluation: V-Measure Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and completeness 𝑏 𝑑𝑙 = # elements of class c in cluster k Homogeneity: how well does each gold class map to a single cluster? Completeness: how well does each learned 1, 𝐼 𝐿, 𝐷 = 0 cluster cover a single gold class? 1 − 𝐼 𝐷 𝐿 homogeneity = ൞ , o/w 𝐼 𝐷 𝐷 𝑏 𝑑𝑙 𝐿 𝑏 𝑑𝑙 𝐼 𝐷 𝐿) = − ෍ ෍ 𝑂 log σ 𝑑′ 𝑏 𝑑′𝑙 1, 𝐼 𝐿, 𝐷 = 0 𝑙 𝑑 1 − 𝐼 𝐿 𝐷 completeness = ൞ 𝐿 𝑏 𝑑𝑙 𝐷 , o/w 𝑏 𝑑𝑙 𝐼 𝐿 𝐼 𝐿 𝐷) = − ෍ ෍ 𝑂 log σ 𝑙′ 𝑏 𝑑𝑙′ 𝑑 𝑙

Clustering Evaluation: V-Measure clusters Rosenberg and Hirschberg (2008): harmonic mean of homogeneity and classes completeness Homogeneity: how well does each gold class map to a single cluster? Completeness: how well does each learned cluster cover a single gold class? a ck K=1 K=2 K=3 𝐷 𝑏 𝑑𝑙 𝐿 𝑏 𝑑𝑙 3 1 1 𝐼 𝐷 𝐿) = − ෍ ෍ 𝑂 log σ 𝑑′ 𝑏 𝑑′𝑙 1 1 3 𝑙 𝑑 𝐿 𝑏 𝑑𝑙 1 3 1 𝐷 𝑏 𝑑𝑙 𝐼 𝐿 𝐷) = − ෍ ෍ 𝑂 log σ 𝑙′ 𝑏 𝑑𝑙′ 𝑑 𝑙 Homogeneity = Completeness = V-Measure=0.14

Clustering using density estimation One issue with k-means is that it is sometimes hard to pick k The mean shift algorithm seeks modes or local maxima of density in the feature space Mean shift automatically determines the number of clusters Kernel density estimator Small h implies more modes (bumpy distribution)

Mean shift algorithm For each point x i : find m i , the amount to shift each point x i to its centroid return {m i }

Mean shift algorithm For each point x i : set m i = x i while not converged: compute weighted average of neighboring point return {m i }

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most - PowerPoint PPT Presentation

Clustering, K-Means, and K-Nearest Neighbors CMSC 678 UMBC Most slides courtesy Hamed Pirsiavash Recap from last time Geometric Rationale of LDiscA & PCA Objective: to rigidly rotate the axes of the D-dimensional space to new

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Clustering: Hierarchical Clustering and K- Means Clustering Machine

Focus of our work Timeframe Process and where we are 2 Road map for the AOD

DYNAMIC LINKING CONSIDERED HARMFUL 1 WHY WE NEED LINKING Want to access code/data defined

ELPREP PERFORMANCE ACROSS PROGRAMMING LANGUAGES PASCAL COSTANZA CHARLOTTE HERZEEL FOSDEM,

Processing for Comparative Genomics Binhai Zhu Computer Science Department Montana State

Comparative Genomics of Environmental Stress Responses in North American Hardwoods The

CombLayer: Towards a simple MCNP beamline builder Stuart Ansell European Spallation Source, Lund,

The use of GIS in modelling exposure (theory) Kees de Hoogh Swiss TPH Environmental

CAT Coalition Technical Resources Working Group Quarterly Meeting May 6, 2020 11:00-12:30