Chapter 7: Clustering (Unsupervised Data Organization) 7.1 - PowerPoint PPT Presentation

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat Clustering 7.3 Embedding into Vector Space for Visualization 7.4 Applications Clustering: unsupervised grouping (partitioning) of objects into classes (clusters) of similar objects 7-1 IRDM WS 2005

Clustering Example 1 7-2 IRDM WS 2005

Clustering Example 2 7-3 IRDM WS 2005

Clustering Search Results for Visualization and Navigation http://www.grokker.com/ 7-4 IRDM WS 2005

Example for Hierarchical Clustering dendrogram 7-5 IRDM WS 2005

Example for Hierarchical Clustering 7-6 IRDM WS 2005

Example for Hierarchical Clustering 7-7 IRDM WS 2005

Clustering: Classification based on Unsupervised Learning given: n m-dimensional data records dj ∈ D ⊆ dom(A1) × ... × dom(Am) with attributes Ai (e.g. term frequency vectors ⊆ N 0 × ... × N 0 ) or n data points with pair-wise distances (similarities) in a metric space wanted: k clusters c1, ..., ck and an assignment D → {c1, ..., ck} such that the 1  1  sim d c � ( , ) average intra-cluster similarity   � k ∑ ∑ k c | |   k ∈ k d c � k   is high and 1 sim c i c ( , ) � � j ∑ the average inter-cluster similarity − k k ( 1 ) i j , ≠ i j is low, 1 = c d � � k where the centroid c ∑ | c | of ck is: � k k ∈ d c � k 7-8 IRDM WS 2005

Desired Clustering Properties A clustering function f d maps a dataset D onto a partitioning Γ⊆ 2 D of D, with pairwise disjoint members of Γ and ∪ x ∈ D f(x) = D, based on a (metric or non-metric) distance function d: D × D → R 0 + which is symmetric and satisfies d(x,y)=0 ⇔ x=y Axiom 1: Scale-Invariance For any distance function d and any α >0: f d (x) = f α d (x) for all x ∈ D Axiom 2: Richness (Expressiveness) For every possible partitioning Γ of D there is a distance function d such that f d produces Γ Axiom 3: Consistency d is a Γ -transformation of d if for all x,y in same S ∈ Γ : d‘(x,y) ≤ d(x,y) and for all x, y in different S, S‘ ∈ Γ : d‘(x,y) ≥ d(x,y). If f d produces Γ then f d‘ produces Γ , too. Impossibility Theorem (J. Kleinberg: NIPS 2002): For each dataset D with |D| ≥ 2 there is no clustering function f that satisfies Axioms 1,2, and 3 for every possible choice of d 7-9 IRDM WS 2005

Hierarchical vs. Flat Clustering Hierarchical Clustering: Flat Clustering: • detailed and insightful • data overview & coarse analysis • hierarchy built • level of detail depends in natural manner on the choice of the from fairly simple algorithms number of clusters • relatively expensive • relatively efficient • no prevalent algorithm • K-Means and EM are simple standard algorithms 7-10 IRDM WS 2005

7.1 Hierarchical Clustering: Agglomerative Bottom-up Clustering (HAC) Principle: • start with each d i forming its own singleton cluster c i • in each iteration combine the most similar clusters c i , c j into a new, single cluster for i:=1 to n do c i := {d i } od; C := {c 1 , ..., c n }; /* set of clusters */ while |C| > 1 do determine c i , c j ∈ C with maximal inter-cluster similarity; C := C – {c i , c j } ∪ {c i ∪ c j }; od; 7-11 IRDM WS 2005

Divisive Top-down Clustering Principle: • start with a single cluster that contains all data records • in each iteration identify the least „coherent“ cluster and divide it into two new clusters c 1 := {d 1 , ..., d n }; C := {c 1 }; /* set of clusters */ while there is a cluster c j ∈ C with |c j |>1 do determine c i with the lowest intra-cluster similarity; partition c i into c i1 and c i2 (i.e. c i = c i1 ∪ c i2 and c i1 ∩ c i2 = ∅ ) such that the inter-cluster similarity between c i1 and c i2 is minimized; od; For partitioning a cluster one can use another clustering method (e.g. a bottom-up method) 7-12 IRDM WS 2005

Alternative Similarity Metrics for Clusters given: similarity on data records - sim: D × D → R oder [0,1] define: similarity between clusters – sim: 2 D × 2 D → R or [0,1] Alternatives: • Centroid method : sim (c,c‘) = sim(d, d‘) with centroid d of c and centroid d‘ of c‘ • Single-Link method : sim(c,c‘) = sim(d, d‘) with d ∈ c, d‘ ∈ c‘, such that d and d‘ have the highest similarity • Complete-Link method : sim(c,c‘) = sim(d, d‘) with d ∈ c, d‘ ∈ c‘, such that d and d‘ have the lowest similarity 1 • Group-Average method : sim d d ( , ' ) ∑ ⋅ c c ' ∈ ∈ d c d c , ' ' For hierarchical clustering the following axiom must hold: max {sim(c,c‘), sim(c,c‘‘)} ≥ sim(c, c‘ ∪ c‘‘) for all c, c‘, c‘‘ ∈ 2 D 7-13 IRDM WS 2005

Example for Bottom-up Clustering with Single-Link Metric (Nearest Neighbor) run-time: O(n 2 ) with space O(n 2 ) a b c d 5 4 3 2 e f g h 1 1 2 3 4 5 6 7 8 emphasizes „local“ cluster coherence (chaining effect) → tendency towards long clusters 7-14 IRDM WS 2005

Example for Bottom-up Clustering with Complete-Link Metric (Farthest Neighbor) run-time: O(n 2 log n) with space O(n 2 ) a b c d 5 4 3 2 e f g h 1 1 2 3 4 5 6 7 8 emphasizes „global“ cluster coherence → tendency towards round clusters with small diameter 7-15 IRDM WS 2005

Relationship to Graph Algorithms Single-Link clustering: • corresponds to construction of maximum (minimum) spanning tree for undirected, weighted graph G = (V,E) with V=D, E=D × D and edge weight sim(d,d‘) (dist(d,d‘)) for (d,d‘) ∈ E • from the maximum spanning tree the cluster hierarchy can be derived by recursively removing the shortest (longest) edge Single-Link clustering is related to the problem of finding maximal connected components (Zusammenhangskomponenten) on a graph that contains only edges (d,d‘) for which sim(d,d‘) is above some threshold Complete-Link clustering is related to the problem of finding maximal cliques in a graph. 7-16 IRDM WS 2005

Bottom-up Clustering with Group-Average Metric (1) Merge step combines those clusters c i and c j for which the intra-cluster similarity c: = c i ∪ c j 1 becomes maximal = S c sim d d ( ) : ( , ' ) ∑ ⋅ − c c ( 1 ) ∈ d d c , ' ≠ d d ' naive implementation has run-time O(n 3 ): n-1 merge steps each with O(n 2 ) computations 7-17 IRDM WS 2005

Bottom-up Clustering with Group-Average Metric (2) efficient implementation – with total run-time O(n 2 ) – for cosine similarity with length-normalized vectors, i.e. using scalar product for sim precompute similarity of all document pairs = s c d � ( ) : � and compute ∑ ∈ d c � for each cluster after every merge step ( ) ( ) Then: + ⋅ + − + s c s c s c s c c c � ( ) � ( ) � ( ) � ( ) ( ) i j i j i j ∪ = S c c ( ) i j + + − c c c c ( ) ( 1 ) i j i j Thus each merge step can be carried out in constant time. 7-18 IRDM WS 2005

Cluster Quality Measures (1) With regard to ground truth: known class labels L 1 , …, L g for data points d 1 , …, d n : L(d i ) = L j ∈ {L 1 , …, L g } With cluster assignment Γ (d 1 ), …, Γ (d n ) ∈ c 1 , …, c k ∈ = d c L d L c max | { | ( ) } | / | | cluster c j has purity ν = ν g j j 1 .. purity c k ( j / ) Complete clustering has purity ∑ = j k 1 .. Alternatives: ∩ c L c | | | | ν j j log • Entropy within cluster ∑ = ∩ c 2 c L ν g 1 .. | | | | ν j j • MI between cluster and classes ∩ ⋅ c L n c L n | | / | | | | / log ∑ ⋅ ∩ ∈ ∈ c L n 2 c L n c c c L L L { , }, { ,..., } | | | | / | | / j j g 1 7-19 IRDM WS 2005

Cluster Quality Measures (2) Without any ground truth: ratio of intra-cluster to inter-cluster similarities   1 1 1     sim d � c sim c c ( , ) / ( , ) � � �   ∑ ∑ ∑ k i j −   k c k k | | ( 1 )   k i j ∈ d c k � ,     k ≠ i j   or other cluster validity measures of this kind (e.g. considering variance of intra- and inter-cluster distances) 7-20 IRDM WS 2005

7.2 Flat Clustering: Simple Single-Pass Method given: data records d1, ..., dn wanted: (up to) k clusters C:={c1, ..., ck} C := {{d1}}; /* random choice for the first cluster */ for i:=2 to n do determine cluster cj ∈ C with the largest value of c c � � sim(di, cj) (e.g. sim(di, ) with centroid j j ); if sim(di, cj) ≥ threshold then assign di to cluster cj else if |C| < k then C := C ∪ {{di}}; /* create new cluster */ else assign di to cluster cj fi fi od 7-21 IRDM WS 2005

K-Means Method for Flat Clustering (1) Idea: • determine k prototype vectors , one for each cluster • assign each data record to the most similar prototype vector and compute new prototype vector (e.g. by averaging over the vectors assigned to a prototype) • iterate until clusters are sufficiently stable � ..., c c randomly choose k prototype vectors , � k 1 while not yet sufficiently stable do for i:=1 to n do sim d i c � assign di to cluster cj for which is minimal ( , ) � j od; 1 = c d � : � for j:=1 to k do od; j ∑ c ∈ j d c � j od; 7-22 IRDM WS 2005

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 - PowerPoint PPT Presentation

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat Clustering 7.3 Embedding into Vector Space for Visualization 7.4 Applications Clustering: unsupervised grouping (partitioning) of objects into

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

CHAPTER VIII VIII CHAPTER Data Clustering and Data Clustering and Self- -Organizing Feature

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Lecture 11 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Lecture 10 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised

Clustering Lecture notes Clustering is Exploratory, unsupervised method Data in cluster is

Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein A quick review

Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein A quick review

From Context Awareness to Socially Aware and Interactive Systems Paul Lukowicz DFKI/University

CrashCourseCrypto Cryptography 101 for Developers Mathias T ausig Erstellt von: Mathias T

Tracking David Stuart University of California Santa Barbara August 18-20, 2008 2 Plan My

Ontology-driven Annotation of Literary Texts Thierry Declerck Multilingual Technologies Lab DFKI

A tutorial on lexical classes Ricardo Bermdez-Otero University of Manchester I NTRODUCTION Goals

Ahmad Dimyati Director General of Horticulture Ministry of Agriculture Republic of Indonesia

MA111: Contemporary mathematics If you had as many people as you wanted, but only one person

Indigenous Africans toward New solar cell technology Mussie Alemseghed, Ph.D. Mussie Alemseghed,

Sambuz

Useful Links

Newsletter

Mail Us