http cs224w stanford edu networks of tightly networks of
play Networks of tightly Networks of tightly - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University Networks of tightly Networks of tightly connected groups Network communities: Sets of

  1. CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University

  2.  Networks of tightly  Networks of tightly connected groups  Network communities:  Sets of nodes with lots of  Sets of nodes with lots of connections inside and few to outside (the rest few to outside (the rest of the network) Communities, clusters, , , groups, modules 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 2

  3. [Onnela et al. ‘07] Edge strengths (call volume) Edge betweenness in real network in real network 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 3

  4. [Girvan ‐ Newman PNAS ‘02]  Divisive hierarchical clustering based on edge b t betweenness: Number of shortest paths passing through the edge  Girvan Newman Algorithm:  Girvan ‐ Newman Algorithm:  Repeat until no edges are left:  Calculate betweenness of edges  Remove edges with highest betweenness  Connected components are communities  Gives a hierarchical decomposition of the network Gives a hierarchical decomposition of the network  Example: 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 4

  5. [Newman ‐ Girvan PhysRevE ‘03]  Zachary’s Karate club:  Zachary s Karate club: hierarchical decomposition 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 5

  6. [Newman ‐ Girvan PhysRevE ‘03] Communities in physics collaborations 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 6

  7.  Breath first search starting from A: t ti f A  Want to compute betweenness of paths starting at node A 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 7

  8.  Count the number of shortest paths from A to  Count the number of shortest paths from A to all other nodes of the network: 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 8

  9.  Compute betweenness by working up the tree:  Compute betweenness by working up the tree: If there are multiple paths count them fractionally • Repeat the BFS 1+1 paths to H Split evenly procedure for each node of the network • Add edge scores 1+0.5 paths to J Split 1:2 • Runtime (all pairs shortest path): Runtime (all pairs shortest path): ‐‐ Weighted graphs: O(N 3 ) 1 path to K ‐‐ Unweighted graphs: O(N 2 ) Split evenly 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 9

  10. Define modularity to be Define modularity to be Q = (number of edges within groups) – (expected number within groups) (expected number within groups) Actual number of edges between i and j is Expected number of edges between i and j is m…number of edges 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 10

  11.  Q = (number of edges within groups) –  Q = (number of edges within groups) (expected number within groups)  Then:  Then: m … number of edges     A ij … 1 if (i,j) is edge, else 0 k k 1        k i … degree of node i i j     Q Q A ( ( c , , c ) )     c i c i … group id of node i group id of node i ij ij i i j j     4 4 m  2 2 m   (a, b) … 1 if a=b, else 0 i , j  Modularity lies in the range [ − 1,1] y g [ , ]  It is positive if the number of edges within groups exceeds the expected number  0.3<Q<0.7 means significant community structure 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 11

  12.  Modularity is useful for selecting the  Modularity is useful for selecting the number of clusters: Why not optimize modularity directly? 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 12

  13.  Consider splitting the graph in two communities  Consider splitting the graph in two communities k k  Modularity Q is:   2 i j A y ij 2 m m i , j in same group  Or we can write in matrix form as  s … vector of group memberships s i ={+1, ‐ 1}  B … modularity matrix Note: each row (column) of B sums to 0 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 13

  14.  Task: Find s  { 1 +1} n that maximizes Q  Task: Find s  { ‐ 1,+1} that maximizes Q  Rewrite Q in terms of eigenvalues β i of B         n    2 2       T T T T T Q s  u u  s s u u s s u i i i i i i i i    i i i 1  To maximize Q, easiest way is to make s =  u 1  Assigns all weight in the sum to β 1 (largest eigval) A i ll i h i h β (l i l)  (all other s T u i terms zero because of orthonormality)  Unfortunately elements of s must be  1  Unfortunately, elements of s must be  1  In general, finding optimal s is NP ‐ hard 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 14

  15. 2          n n    2           T T Q Q s u s u i i i 1 i 1     i 1 i 1  Heuristic: try to maximize only the β 1 term β  Similar in spirit to the spectral partitioning p p p g algorithm (we will explore it next time)  Continue the bisection hierarchically 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 15

  16.  Fast Modularity Optimization Algorithm: Fast Modularity Optimization Algorithm:  Find leading eigenvector u 1 of modularity matrix B  Divide the nodes by the signs of the elements of u 1 y g 1  Repeat hierarchically until:  If a proposed split does not cause modularity to increase declare modularity to increase, declare community indivisible and do not split it  If all communities are indivisible, stop  How to find u 1 ? Power method! Bv  Iterative multiplication, normalization   k v v  1 k  Start with random v, until convergence: Bv k 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 16

  17.  Also, can combine with other methods: ,  Randomly divide the nodes into two groups  Move the node that, if moved, will increase Q the most  Repeat for all nodes, with each node only moved once epeat o a odes, t eac ode o y o ed o ce  Once complete, find intermediate state with highest Q  Start from this state and repeat until Q stops increasing  Good results for “fine ‐ tuning” the spectral method Good results for fine tuning the spectral method  CNM Algorithm (Clauset ‐ Newman ‐ Moore ‘04):  (1) Separate each vertex solely into n community (1) Separate each vertex solely into n community  (2) Calculate  Q for all possible community pairs  (3) Merge the pair of the largest increase in Q  Repeat (2)&(3) until one community remains Repeat (2)&(3) until one community remains  Cross cut the dendogram where Q is maximum 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 17

  18. Fast modularity Fast modularity GN = Girvan ‐ Newman, O(n 3 ) CNM = Greedy merging (n log 2 n) DA = External Optimization O(n 2 log 2 n)  Issues with modularity:  May not find communities with less than  m links  NP ‐ hard to optimize exactly [Brandes et al. ‘07] 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 18

  19. 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 19

  20. [Kumar et al. ‘99]  Searching for small communities  Searching for small communities in a Web graph  (1) The signature of a community/discussion  (1) The signature of a community/discussion in the context of a Web graph Intuition: a bunch of people all A dense 2 ‐ layer graph talking about the same things 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 20

  21.  (2) A more well defined problem:  (2) A more well ‐ defined problem: Enumerate complete bipartite subgraphs K s,t  Where K  Where K s,t = s nodes where each links to the same s nodes where each links to the same t other nodes 11/3/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 21


More recommend