community detection
play

Community Detection : A Simple Example Joon Ho Park, Yumlembam - PowerPoint PPT Presentation

Community Detection : A Simple Example Joon Ho Park, Yumlembam Hemajit and Ki-Ho Lee Project Motivation To understand the basics of community det ection To apply the ideas on traditional methods fo r community detection to known system


  1. Community Detection : A Simple Example Joon Ho Park, Yumlembam Hemajit and Ki-Ho Lee

  2. Project Motivation • To understand the basics of community det ection • To apply the ideas on traditional methods fo r community detection to known system • To figure out the clustering of proteins from t he analogy of the project

  3. Quick Review of Community Detection • Traditional Methods of Clustering – Graph Partitioning • Dividing vertices in groups of predefined size • Minimizing cut size (# edges running between clusters) – Hierarchical clustering • Including small clusters in larger clusters according to similarity • Agglomerative (bottom-up) or divisive (top-down) algorithms – Partitional clustering • Distance between vertices = dissimilarity between vertices • E.g., k - means clustering: minimizing the total intra-cluster distance – Spectral clustering • Clustering by eigenvectors of matrices (e.g., similarity matrix)

  4. Mountain Top Valley Top Mountain Hub Valley Top Mountain Con dominium & S Valley Condo ki minium & Ski

  5. MT Z1 H VT # of vertices : 16 AT1 # of edges : 27 Z2 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S

  6. Graph Partitioning • Simplest conditions – Dividing into two groups of equal size – Minimal # of edges between two groups – Maximal # of edges inside the modules – Kernighan-Ling algorithm • Maximizing Q • Q = ( # of edges inside the modules ) – ( # of edges lying between them )

  7. MT Z1 H VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 Cut Size : 11 AP3 Q = 4 MC&S VC&S

  8. MT Z1 H VT AT1 Z2 AP1 MH V Cut Size : 9 VH Q = 9 AP2 AT2 Z3 AP3 MC&S VC&S

  9. MT Z1 H VT AT1 Z2 AP1 MH V VH AP2 Cut Size : 8 AT2 Q = 10 Z3 AP3 MC&S VC&S

  10. MT Z1 H VT AT1 Cut Size : 6 Z2 Q = 15 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S

  11. Cut Size : 5 MT Z1 Q = 17 H VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S

  12. MT Z1 H VT AT1 Cut Size : 5 Z2 Q = 17 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S

  13. MT Cut Size : 5 Z1 H Q = 17 VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 AP3 MC&S VC&S

  14. MT Z1 H VT AT1 Z2 AP1 MH V VH AP2 AT2 Z3 Cut Size : 5 Q = 17 AP3 MC&S VC&S

  15. Hierarchical Clustering • Simplest conditions – Divisive algorithm • Clusters are iteratively split by removing edges conn ecting vertices with low similarity – Vertex similarity • Defined by the # of edge-(or vertex-) independent pa ths between two vertices • Independent paths do not share any edge (vertex).

  16. MT Z1 3 2 H VT 2 3 2 3 AT1 4 3 Z2 3 3 3 2 AP1 3 MH 3 3 2 V 3 VH 2 3 4 3 AP2 AT2 3 Z3 3 3 2 2 AP3 2 MC&S VC&S

  17. MT Z1 H VT 2 AT1 Z2 3 3 AP1 MH 3 3 V VH 3 3 AP2 AT2 Z3 3 2 AP3 MC&S VC&S

  18. MT Z1 H VT 2 AT1 Z2 3 AP1 MH 3 V VH 3 3 AP2 AT2 Z3 AP3 MC&S VC&S

  19. MT Z1 H VT AT1 Z2 3 3 AP1 MH 3 V VH 3 3 AP2 AT2 Z3 AP3 MC&S VC&S

  20. MT Z1 H VT 2 AT1 Z2 AP1 MH 3 V VH 3 AP2 AT2 Z3 3 2 AP3 MC&S VC&S

  21. 3 1 2 T s Ls cut size 4 = 1 4 10 6 9 11 5 7 12 14 3 1 0 0 ... 0 − ⎡ ⎤ 8 ⎢ ⎥ 1 3 1 0 ... 0 − − ⎢ ⎥ 13 0 1 4 0 ... 0 ⎢ ⎥ − L = ⎢ ⎥ 16 0 0 1 3 ... 0 − 15 ⎢ ⎥ ⎢ ... ... ... ... ... ... ⎥ ⎢ ⎥ 0 0 0 0 ... 2 ⎢ ⎥ ⎣ ⎦

  22. 3 1 2 T s Ls cut size 4 = 1 4 10 6 1 ⎡ ⎤ 9 ⎢ ⎥ 1 11 5 ⎢ ⎥ 1 7 ⎢ ⎥ ⎢ ⎥ 1 12 ⎢ ⎥ 14 1 ⎢ ⎥ ⎢ ⎥ 1 8 ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ 13 1 ⎢ ⎥ S = ⎢ ⎥ 1 − 16 ⎢ ⎥ 15 1 ⎢ − ⎥ ⎢ ⎥ 1 − ⎢ ⎥ 1 ⎢ ⎥ − ⎢ ⎥ 1 − ⎢ ⎥ ⎢ ⎥ 1 − ⎢ ⎥ 1 − ⎢ ⎥ ⎢ ⎥ 1 − ⎣ ⎦

  23. 5

  24. MT Z1 3 2 H VT 2 3 2 3 AT1 4 3 Z2 3 3 3 2 AP1 3 MH 3 3 2 V 3 VH 2 3 4 3 AP2 AT2 3 Z3 3 3 2 2 AP3 2 MC&S VC&S

  25. MT Z1 H VT 2 AT1 Z2 AP1 MH 3 V 3 VH 3 AP2 AT2 Not good 3 Z3 enough ! AP3 MC&S VC&S

  26. Conclusions • Graph partitioning proposes a basic idea for com munity detection • The concept of similarity is adopted to hierarchic al, partitional and spectral clustering • We ’ ve realized that the community detection can be used for the clustering of protein databases if the similarity is replaced by the score (TM-score or RMSD, etc)

Recommend


More recommend