http cs224w stanford edu non overlapping vs overlapping
play

http://cs224w.stanford.edu Non overlapping vs overlapping - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping vs. overlapping communities


  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu

  2.  Non overlapping vs overlapping communities  Non ‐ overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3. [Palla et al., ‘05]  A node belongs to many social circles  A node belongs to many social circles 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4. [Palla et al., ‘05]  Two nodes belong to the same community if they Two nodes belong to the same community if they can be connected through adjacent k ‐ cliques:  k ‐ clique:  Fully connected graph on k nodes 4-clique  Adjacent k ‐ cliques: Adjacent k cliques:  overlap in k-1 nodes  k ‐ clique community  Set of nodes that can adjacent be reached through a 3-cliques sequence of adjacent sequence of adjacent k ‐ cliques 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5. [Palla et al., ‘05]  Clique Percolation Method: Clique Percolation Method:  Find maximal ‐ cliques (not k ‐ cliques!)  Clique overlap matrix: q p  Each clique is a node  Connect two cliques if they overlap in at least k-1 nodes overlap in at least k 1 nodes  Communities:  Connected components of th the clique overlap matrix li l t i  How to set k ?  Set k so that we get the “richest” (most widely distributed cluster sizes) community structure 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6. [Palla et al., ‘05]  Start with graph g p and find maximal cliques  Create clique  Create clique overlap matrix (1) Graph (2) Clique overlap matrix  Threshold the matrix at value k ‐ 1  If a ij <k-1 set 0  Communities are  Communities are the connected components of the thresholded matrix thresholded matrix (3) Thresholded (3) Thresholded matrix at k=4 (4) Communities (connected components) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7. [Palla et al., ‘07] Communities in a “tiny” part of a phone calls network of 4 ll t k f million users [Barabasi ‐ Palla, 2007] 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8.  Each node is a community  Each node is a community  Nodes are weighted for community size community size  Links are weighted for overlap size overlap size  DIP “core” data base of protein interactions (S. cerevisiase, yeast) ( y ) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9.  No nice way NP hard combinatorial problem  No nice way, NP ‐ hard combinatorial problem  Simple Algorithm:  Start with max clique size s  Start with max ‐ clique size s  Choose node u , extract cliques of size s node cliques of size s node u is member of  Delete u and its edges Delete u and its edges  When graph is empty, s=s-1 , restart on original graph restart on original graph 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. [Palla et al., ‘05]  Finding cliques around u of size s :  Finding cliques around u of size s :  2 sets A and B :  Each node in B links to all nodes in A  Each node in B links to all nodes in A  Set A grows by moving nodes from B to it  Start with A={u} B={v: (u v)  E} Start with A {u}, B {v: (u,v)  E}  Recursively move each possible v  B to A and prune B v  B to A and prune B  If B runs out of nodes before A reaches size s ,  backtrack the recursion and try a different v 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  Let’s rethink what we Let s rethink what we are doing…  Given a network  Want to find clusters!  Need to:  Formalize the notion of a cluster  Need to design an algorithm Need to design an algorithm that will find sets of nodes that are “good” clusters  More generally:  How to think about clusters in large networks? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12. S What is a good cluster? What is a good cluster?  Many edges internally  Few pointing outside Few pointing outside S’ Formally, conductance: Where: A(S)….volume Small Φ (S) corresponds to good clusters 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13. [WWW ‘08]  Define: Network community profile ( NCP ) plot Plot the score of best community of size k k=5 k=7 log Φ (k) Φ (5)=0.25 Φ (7)=0.18 (7) Community size, log k 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14. [WWW ‘08]  Meshes grids dense random graphs:  Meshes, grids, dense random graphs: California road network d-dimensional meshes d dimensional meshes 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15. [WWW ‘08]  Collaborations between scientists in networks [Newman, 2005] log Φ (k) ductance, Cond Community size, log k 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16. [Internet Mathematics ‘09] Natural hypothesis about NCP: Natural hypothesis about NCP:  NCP of real networks slopes downward  Slope of the NCP corresponds to the dimensionality of the network What about large What about large networks? Examine more than 100 large networks 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

  17. [Internet Mathematics ‘09] Typical example: General Relativity collaborations Typical example: General Relativity collaborations ( n=4,158, m=13,422 ) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18. [Internet Mathematics ‘09] 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

  19. [Internet Mathematics ‘09] B tt Better and better d b tt communities nce) nductan Communities get worse and worse k), (con Φ ( Best community has ~ 100 nodes k, (cluster size) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20. [Internet Mathematics ‘09]  Each successive edge inside the  Each successive edge inside the community costs more cut ‐ edges NCP plot Φ =1/3 = 0.33 Φ /3 0 33 Φ =2/4 = 0 5 Φ =2/4 = 0.5 Φ =8/6 = 1.3 Φ =64/14 = 4.5 Each node has twice as many children 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

  21. [Internet Mathematics ‘09]  Empirically we note that best clusters (call them  Empirically we note that best clusters (call them whiskers ) are barely connected to the network If we remove whiskers.. How does NCP look like?  Core ‐ periphery structure 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

  22. [Internet Mathematics ‘09] Nothing happens!  Nestedness of the  Nestedness of the core ‐ periphery structure 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

  23. Denser and denser Denser and denser network core Small good communities Nested core ‐ periphery 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

  24. [Internet Mathematics ‘09] Practically Practically constant!  Each dot is a different network 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

  25. [Internet Mathematics ‘09] LiveJournal LiveJournal DBLP DBLP Rewired Network Ground truth Amazon IMDB 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

  26.  Some issues with community detection: Some issues with community detection:  Many different formalizations of clustering objective functions  Objectives are NP ‐ hard to optimize exactly  Methods can find clusters that are systematically “biased” biased  Methods can perform well/poorly on some kinds of graphs  Questions:  How well do algorithms optimize objectives?  What clusters do different methods find? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

Recommend


More recommend