hierarchical and ensemble clustering
play

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, - PowerPoint PPT Presentation

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred & Jain, 2005] COMP24111 Machine Learning Outline Introduction Cluster Distance Measures Agglomerative Algorithm Example and


  1. Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred & Jain, 2005] COMP24111 Machine Learning

  2. Outline • Introduction • Cluster Distance Measures • Agglomerative Algorithm • Example and Demo • Key Concepts in Hierarchal Clustering • Clustering Ensemble via Evidence Accumulation • Summary COMP24111 Machine Learning 2

  3. Introduction • Hierarchical Clustering Approach – A typical clustering analysis approach via partitioning data set sequentially – Construct nested partitions layer by layer via grouping objects into a tree of clusters (without the need to know the number of clusters in advance) – Use (generalised) distance matrix as clustering criteria • Agglomerative vs. Divisive – Agglomerative: a bottom-up strategy  Initially each data object is in its own (atomic) cluster  Then merge these atomic clusters into larger and larger clusters – Divisive: a top-down strategy  Initially all objects are in one single cluster  Then the cluster is subdivided into smaller and smaller clusters • Clustering Ensemble – Using multiple clustering results for robustness and overcoming weaknesses of single clustering algorithms. COMP24111 Machine Learning 3

  4. Introduction: Illustration • Illustrative Example: Agglomerative vs. Divisive Agglomerative and divisive clustering on the data set { a, b, c, d ,e } Step 1 Step 2 Step 3 Step 4 Step 0 Agglomerative a a b b  Cluster distance a b c d e  Termination condition c c d e d d e e Divisive Step 3 Step 2 Step 1 Step 0 Step 4 COMP24111 Machine Learning 4

  5. Cluster Distance Measures single link • Single link: smallest distance (min) between an element in one cluster and an element in the other, i.e., d(C i , C j ) = min{ d(x ip , x jq )} complete link • Complete link: largest distance (max) between an element in one cluster and an element in the other, i.e., d(C i , C j ) = max{ d(x ip , x jq )} average • Average: avg distance between elements in one cluster and elements in the other, i.e., d(C i , C j ) = avg{ d(x ip , x jq )} d(C, C)=0 COMP24111 Machine Learning 5

  6. Cluster Distance Measures Example : Given a data set of five objects characterised by a single continuous feature, assume that there are two clusters: C 1 : { a, b} and C 2 : { c, d, e} . a b c d e Feature 1 2 4 5 6 1. Calculate the distance matrix . 2. Calculate three cluster distances between C 1 and C 2 . Single link a b c d e = dist ( C , C ) min{ d ( a, c) , d ( a, d), d (a, e), d (b, c), d (b, d), d (b, e)} 1 2 = = min{3, 4, 5, 2, 3, 4} 2 a 0 1 3 4 5 Complete link b 1 0 2 3 4 = dist(C , C ) max{ d ( a, c) , d ( a, d), d (a, e), d (b, c), d (b, d), d (b, e)} 1 2 c 3 2 0 1 2 = = max{3, 4, 5, 2, 3, 4} 5 d 4 3 1 0 1 Average + + + + + d ( a, c) d ( a, d) d (a, e) d (b, c) d (b, d) d (b, e) = dist(C , C ) e 5 4 2 1 0 1 2 6 + + + + + 3 4 5 2 3 4 21 = = = 3 . 5 6 6 COMP24111 Machine Learning 6

  7. Agglomerative Algorithm • The Agglomerative algorithm is carried out in three steps: 1) Convert all object features into a distance matrix 2) Set each object as a cluster (thus if we have N objects, we will have N clusters at the beginning) 3) Repeat until number of cluster is one (or known # of clusters)  Merge two closest clusters  Update “distance matrix” COMP24111 Machine Learning 7

  8. Example • Problem: clustering analysis with agglomerative algorithm data matrix Euclidean distance distance matrix COMP24111 Machine Learning 8

  9. Example • Merge two closest clusters (iteration 1) COMP24111 Machine Learning 9

  10. Example • Update distance matrix (iteration 1) COMP24111 Machine Learning 10

  11. Example • Merge two closest clusters (iteration 2) COMP24111 Machine Learning 11

  12. Example • Update distance matrix (iteration 2) COMP24111 Machine Learning 12

  13. Example • Merge two closest clusters/update distance matrix (iteration 3) COMP24111 Machine Learning 13

  14. Example • Merge two closest clusters/update distance matrix (iteration 4) COMP24111 Machine Learning 14

  15. Example • Final result (meeting termination condition) COMP24111 Machine Learning 15

  16. Key Concepts in Hierarchal Clustering • Dendrogram tree representation 1. In the beginning we have 6 clusters: A, B, C, D, E and F 2. We merge clusters D and F into 6 cluster (D, F) at distance 0.50 3. We merge cluster A and cluster B into (A, B) at distance 0.71 lifetime 4. We merge clusters E and (D, F) into ((D, F), E) at distance 1.00 5 5. We merge clusters ((D, F), E) and C into (((D, F), E), C) at distance 1.41 4 6. We merge clusters (((D, F), E), C) 3 and (A, B) into ((((D, F), E), C), (A, B)) 2 at distance 2.50 7. The last cluster contain all the objects, thus conclude the computation object COMP24111 Machine Learning 16

  17. Key Concepts in Hierarchal Clustering • Lifetime vs K -cluster Lifetime Lifetime • The distance between that a cluster is created and that it disappears (merges with other clusters during clustering). 6 e.g. lifetime of A, B, C, D, E and F are 0.71, 0.71, 1.41, 0.50, 1.00 and 0.50, respectively, the life time of (A, B) is 2.50 – 0.71 = 1.79, …… lifetime K -cluster Lifetime • The distance from that K clusters emerge to that K clusters 5 vanish (due to the reduction to K-1 clusters). 4 e.g. 3 5-cluster lifetime is 0.71 - 0.50 = 0.21 2 4-cluster lifetime is 1.00 - 0.71 = 0.29 3-cluster lifetime is 1.41 – 1.00 = 0.41 2-cluster lifetime is 2.50 – 1.41 = 1.09 object COMP24111 Machine Learning 17

  18. Demo Agglomerative Demo COMP24111 Machine Learning 18

  19. Relevant Issues • How to determine the number of clusters – If the number of clusters known, termination condition is given! – The K -cluster lifetime as the range of threshold value on the dendrogram tree that leads to the identification of K clusters – Heuristic rule: cut a dendrogram tree with maximum life time to find a “proper” K • Major weakness of agglomerative clustering methods – Can never undo what was done previously – Sensitive to cluster distance measures and noise/outliers Less efficient: O ( n 2 logn ), where n is the number of total objects – • There are several variants to overcome its weaknesses – BIRCH: scalable to a large data set – ROCK: clustering categorical data – CHAMELEON: hierarchical clustering using dynamic modelling COMP24111 Machine Learning 19

  20. Clustering Ensemble • Motivation – A single clustering algorithm may be affected by various factors  Sensitive to initialisation and noise/outliers, e.g. the K-means is sensitive to initial centroids!  Sensitive to distance metrics but hard to find a proper one  Hard to decide a single best algorithm that can handle all types of cluster shapes and sizes – An effective treatments: clustering ensemble  Utilise the results obtained by multiple clustering analyses for robustness COMP24111 Machine Learning 20

  21. Clustering Ensemble • Clustering Ensemble via Evidence Accumulation (Fred & Jain, 2005) – A simple clustering ensemble algorithm to overcome the main weaknesses of different clustering methods by exploiting their synergy via evidence accumulation • Algorithm summary – Initial clustering analysis by using either different clustering algorithms or running a single clustering algorithm on different conditions, leading to multiple partitions e.g. the K-mean with various initial centroid settings and different K, the agglomerative algorithm with different distance metrics and forced to terminated with different number of clusters… – Converting clustering results on different partitions into binary “distance” matrices – Evidence accumulation: form a collective “distance” matrix based on all the binary “distance” matrices – Apply a hierarchical clustering algorithm (with a proper cluster distance metric) to the collective “distance” matrix and use the maximum K -cluster lifetime to decide K COMP24111 Machine Learning 21

  22. Clustering Ensemble  Example: convert clustering results into binary “Distance” matrix Cluster 2 (C2) “distance” Matrix D C B C D A   A 0 0 1 1   Cluster 1 (C1) B 0 0 1 1   = D   A B 1 C 1 1 0 0   D   1 1 0 0 22 COMP24111 Machine Learning

  23. Clustering Ensemble  Example: convert clustering results into binary “Distance” matrix Cluster 3 (C3) D “distance Matrix” Cluster 2 (C2) C B C D A   A 0 0 1 1   Cluster 1 (C1) 0 0 1 1 B   = D   2 A B 1 1 0 1 C     1 1 1 0 D 23 COMP24111 Machine Learning

  24. Clustering Ensemble  Evidence accumulation: form the collective “distance” matrix     0 0 1 1 0 0 1 1     0 0 1 1 0 0 1 1     = = D D     1 2 1 1 0 0 1 1 0 1         1 1 0 0 1 1 1 0   0 0 2 2   0 0 2 2   = + = D C D D   1 2 2 2 0 1     2 2 1 0 24 COMP24111 Machine Learning

Recommend


More recommend