Community Detection : A Simple Example Joon Ho Park, Yumlembam Hemajit and Ki-Ho Lee
Project Motivation • To understand the basics of community det ection • To apply the ideas on traditional methods fo r community detection to known system • To figure out the clustering of proteins from t he analogy of the project
Quick Review of Community Detection • Traditional Methods of Clustering – Graph Partitioning • Dividing vertices in groups of predefined size • Minimizing cut size (# edges running between clusters) – Hierarchical clustering • Including small clusters in larger clusters according to similarity • Agglomerative (bottom-up) or divisive (top-down) algorithms – Partitional clustering • Distance between vertices = dissimilarity between vertices • E.g., k - means clustering: minimizing the total intra-cluster distance – Spectral clustering • Clustering by eigenvectors of matrices (e.g., similarity matrix)
Mountain Top Valley Top Mountain Hub Valley Top Mountain Con dominium & S Valley Condo ki minium & Ski
MT Z1 H VT # of vertices : 16 AT1 # of edges : 27 Z2 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S
Graph Partitioning • Simplest conditions – Dividing into two groups of equal size – Minimal # of edges between two groups – Maximal # of edges inside the modules – Kernighan-Ling algorithm • Maximizing Q • Q = ( # of edges inside the modules ) – ( # of edges lying between them )
MT Z1 H VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 Cut Size : 11 AP3 Q = 4 MC&S VC&S
MT Z1 H VT AT1 Z2 AP1 MH V Cut Size : 9 VH Q = 9 AP2 AT2 Z3 AP3 MC&S VC&S
MT Z1 H VT AT1 Z2 AP1 MH V VH AP2 Cut Size : 8 AT2 Q = 10 Z3 AP3 MC&S VC&S
MT Z1 H VT AT1 Cut Size : 6 Z2 Q = 15 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S
Cut Size : 5 MT Z1 Q = 17 H VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S
MT Z1 H VT AT1 Cut Size : 5 Z2 Q = 17 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S
MT Cut Size : 5 Z1 H Q = 17 VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S
MT Z1 H VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 Cut Size : 5 Q = 17 AP3 MC&S VC&S
Hierarchical Clustering • Simplest conditions – Divisive algorithm • Clusters are iteratively split by removing edges conn ecting vertices with low similarity – Vertex similarity • Defined by the # of edge-(or vertex-) independent pa ths between two vertices • Independent paths do not share any edge (vertex).
MT Z1 3 2 H VT 2 3 2 3 AT1 4 3 Z2 3 3 3 2 AP1 3 MH 3 3 2 V 3 VH 2 3 4 3 AP2 AT2 3 Z3 3 3 2 2 AP3 2 MC&S VC&S
MT Z1 H VT 2 AT1 Z2 3 3 AP1 MH 3 3 V VH 3 3 AP2 AT2 Z3 3 2 AP3 MC&S VC&S
MT Z1 H VT 2 AT1 Z2 3 AP1 MH 3 V VH 3 3 AP2 AT2 Z3 AP3 MC&S VC&S
MT Z1 H VT AT1 Z2 3 3 AP1 MH 3 V VH 3 3 AP2 AT2 Z3 AP3 MC&S VC&S
MT Z1 H VT 2 AT1 Z2 AP1 MH 3 V VH 3 AP2 AT2 Z3 3 2 AP3 MC&S VC&S
3 1 2 T s Ls cut size 4 = 1 4 10 6 9 11 5 7 12 14 3 1 0 0 ... 0 − ⎡ ⎤ 8 ⎢ ⎥ 1 3 1 0 ... 0 − − ⎢ ⎥ 13 0 1 4 0 ... 0 ⎢ ⎥ − L = ⎢ ⎥ 16 0 0 1 3 ... 0 − 15 ⎢ ⎥ ⎢ ... ... ... ... ... ... ⎥ ⎢ ⎥ 0 0 0 0 ... 2 ⎢ ⎥ ⎣ ⎦
3 1 2 T s Ls cut size 4 = 1 4 10 6 1 ⎡ ⎤ 9 ⎢ ⎥ 1 11 5 ⎢ ⎥ 1 7 ⎢ ⎥ ⎢ ⎥ 1 12 ⎢ ⎥ 14 1 ⎢ ⎥ ⎢ ⎥ 1 8 ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ 13 1 ⎢ ⎥ S = ⎢ ⎥ 1 − 16 ⎢ ⎥ 15 1 ⎢ − ⎥ ⎢ ⎥ 1 − ⎢ ⎥ 1 ⎢ ⎥ − ⎢ ⎥ 1 − ⎢ ⎥ ⎢ ⎥ 1 − ⎢ ⎥ 1 − ⎢ ⎥ ⎢ ⎥ 1 − ⎣ ⎦
5
MT Z1 3 2 H VT 2 3 2 3 AT1 4 3 Z2 3 3 3 2 AP1 3 MH 3 3 2 V 3 VH 2 3 4 3 AP2 AT2 3 Z3 3 3 2 2 AP3 2 MC&S VC&S
MT Z1 H VT 2 AT1 Z2 AP1 MH 3 V 3 VH 3 AP2 AT2 Not good 3 Z3 enough ! AP3 MC&S VC&S
Conclusions • Graph partitioning proposes a basic idea for com munity detection • The concept of similarity is adopted to hierarchic al, partitional and spectral clustering • We ’ ve realized that the community detection can be used for the clustering of protein databases if the similarity is replaced by the score (TM-score or RMSD, etc)
Recommend
More recommend