brendan meeder carnegie mellon university
play

Brendan Meeder Carnegie Mellon University Christos Faloutsos - PowerPoint PPT Presentation

Leman Akoglu Carnegie Mellon University Hanghang Tong IBM T. J. Watson Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a graph with node attributes (features) social networks + user interests


  1. Leman Akoglu Carnegie Mellon University Hanghang Tong IBM T. J. Watson Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University

  2. Given a graph with node attributes (features) social networks + user interests phone call networks + customer demographics gene interaction networks + gene expression info Find cohesive clusters, bridges, anomalies B A cohesive cluster: similar connectivity & attribute coherence 2 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  3. Feature (Binary) People People Groups Groups Features People Groups People Groups People People A F Given adjacency matrix A and feature matrix F Find homogeneous blocks (clusters) in A and F * parameter-free * scalable 3 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  4.  Flat clustering  Graph clustering  Additional feature nodes  heterogeneous graph  Weighted edges by both connectivity and feature similarity  quadratic pairwise computations!  choice of similarity function 4 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  5.   Flat clustering (e.g. k-means) [Kriegel+] [Leeuwen+] METIS [Karypis and Kumar], [Flake+]   [Girvan and Newman] [Andersen+] spectral [Ng+], co-clustering [Dhillon+] SA-cluster [Zhou+], Spect. rel. clus. [Long+]   CoPaM [Moser+], Gamer [Gunneman+]   ? ,     Autopart and cross-assoc.s [Chakrabarti+], GraphScope [Sun+], PaCK [He+] 5 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  6. DETAILS 1.How many node- & attribute-clusters? 2.How to assign nodes and attributes to clusters? Main idea: employ Minimum Description Length L (M) + L (D|M) encoding length encoding length of clustering of blocks Good Good implies Clustering Compression 6 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  7. BACKGROUND Given database D and set of models for D, d = 1 MDL selects model M that minimizes L (M) + L (D|M) vs. length in bits: data , length in bits: d = 9 encoded by M description of model M a 1 x+a 0 deltas vs. Bishop: PR&ML a 9 x 9 +…+ a 1 x+a 0 {} 7 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  8. DETAILS  L (M) : Model description cost 1. n: #nodes f: #attributes 2. k: #node-clus. l: #attribute-clus. size of node cluster i 3. size of attr. cluster j r     i optimal # bits log log p i n r r r        i i i node clus . c ost r . log n . log nH ( P ) i n n n i i 8 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  9. DETAILS  L(D|M): Data description cost given Model 1. For each block in A and F , #1s: 2. Encoding cost of a block where or 9 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  10. DETAILS  L (M) : Model description cost 1. as n: #nodes, f: #attributes 2. k: #node-clusters, l: #attribute-clusters 3. size of node-cluster i size of attribute-cluster j A similar problem (column re-ordering for minimum  L(D|M): Data description cost given Model total run length) is shown to be NP-hard 1. For each block in A and F , #1s: [Johnson+]. (reduction from Hamiltonian Path) 2. where or 10 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  11. The algorithm is iterative and monotonic – will converge to local optimum 11 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  12. 12 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  13. Computational complexity: time/iteration (s) # non-zeros 13 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  14. Graphs Description n f nnz 1. Phone call users, titles 94 7 391 2. Device users, titles 94 7 5K 3. PolBooks books, incl. 92 2 840 4. PolBlogs blogs, incl. 1.5K 2 20K 5. Twitter users, h-tags 9.6K 10K 82K 6. YouTube users, groups 77K 30K 1M 7. YeastGene genes, articles 844 17K 64K 14 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  15. Books Book groups liberal vs. conservative “core and periphery” 15 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  16. Examples of “core” liberal and conservative books Books Book groups liberal vs. Examples of bridging ‘conservative’ books conservative “core and periphery” – – – 16 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  17. call-center casual business grad Subjects title Phone calls Subjects title Device scans 17 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  18. 1 A 1 Yeast genes 2 A2 3 A3 Yeast genes Articles survey 844 genes 17K articles 18 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  19. casual Italian bloggers heavy-hitters Twitter users @hashtags 9,6K users 10K hashtags 19 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  20. familiar strangers anime lovers bridges YouTube users YouTube 77K users groups 30K groups 20 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  21.  Novel clustering model: ▪ PICS finds groups of nodes in an attributed graph with (1) similar connectivity, and (2) attribute homogeneity. ▪ It also groups the node attributes into attribute-clusters.  Parameter-free nature: ▪ No user input, e.g. number of clusters, similarity functions/thresholds  Effectiveness: ▪ Insightful clusters, bridges and outliers in diverse real- world datasets including YouTube and Twitter.  Scalability: ▪ Linearly growing run time with graph + attribute size 21 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

  22. lakoglu@cs.cmu.edu http://www.cs.cmu.edu/~lakoglu/ Source code: www.cs.cmu.edu/~lakoglu/#pics 22 Leman Akoglu (CMU) PICS: Parameter-free Identification of Cohesive Subgroups

Recommend


More recommend