http cs224w stanford edu
play

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Non-overlapping


  1. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. ¡ Non-overlapping vs. overlapping communities 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. [Palla et al., ‘05] ¡ A node can belong to many social “circles” 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. High school Company Stanford (Basketball) Stanford (Squash) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. [Palla et al., ‘05] ¡ Two nodes belong to the same community if they can be connected through adjacent k -cliques: § k -clique: § Fully connected graph on k nodes 3-clique § Adjacent k -cliques: Adjacent Non-adjacent 3-cliques § overlap in k-1 nodes 3-cliques ¡ k -clique community § Set of nodes that can be reached through a sequence of adjacent k -cliques Two overlapping 3 -clique communities 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. [Palla et al., ‘05] ¡ Two nodes belong to the same community if they can be connected through adjacent k - cliques: Adjacent 4-cliques 4-clique Communities for k=4 Non-adjacent 4-cliques 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. Set: k=3 ¡ Clique Percolation Method: A § Find maximal-cliques § Def: Clique is maximal if B D no superset is a clique C § Clique overlap super-graph: § Each clique is a super-node Cliques Communities § Connect two cliques if they A overlap in at least k-1 nodes § Communities: B § Connected components of D the clique overlap matrix C ¡ How to set k ? § Set k so that we get the “richest” (most widely distributed cluster sizes) community structure 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

  10. Overlap ¡ Start with graph size Cliques ¡ Find maximal cliques ¡ Create clique overlap Cliques matrix 𝐵 § Rows/Cols are max- cliques, entry is number (1) Graph (2) Clique overlap of nodes in common matrix ¡ Threshold the matrix at value k-1 § If 𝑏 #$ < 𝑙 − 1 set 0 ¡ Communities are the connected components (3) Thresholded of the thresholded matrix at 3 matrix (4) Communities (connected components) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  11. [Palla et al., ‘07] Communities in a “tiny” part of a phone call network of 4 million users [Palla et al., ‘07] 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  12. [Farkas et. al. 07] 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  13. ¡ No nice way, hard combinatorial problem ¡ Maximal clique: Clique that can’t be extended § {𝑏, 𝑐, 𝑑} is a clique but not maximal clique § {𝑏, 𝑐, 𝑑, 𝑒} is maximal clique ¡ Algorithm: Sketch § Start with a seed node § Expand the clique around the seed § Once the clique cannot be further expanded we found the maximal clique § Note: § This method will generate the same clique multiple times 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  14. ¡ Start with a seed vertex 𝒃 ¡ Goal: Find the max clique 𝑹 that 𝒃 belongs to § Observation: § If some 𝒚 belongs to 𝑹 then it is a neighbor of 𝒃 § Why? If 𝒃, 𝒚 ∈ 𝑹 but edge (𝒃, 𝒚) does not exist, 𝑹 is not a clique! ¡ Recursive algorithm: § 𝑹 … current clique § 𝑺 … candidate vertices to expand the clique to ¡ Example: Start with 𝒃 and expand around it Q= {a} {a,b} {a,b,c} bktrack {a,b,d} {c,d} ÇG (c)={} {c} ÇG (d)={} R= {b,c,d} {b,c,d} ÇG (b)={c,d} G (u)…neighbor set of u Steps of the recursive algorithm 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

  15. ¡ Start with a seed vertex 𝒃 ¡ Goal: Find the max clique 𝑹 that 𝒃 belongs to § Observation: § If some 𝒚 belongs to 𝑹 then it is a neighbor of 𝒃 § Why? If 𝒃, 𝒚 ∈ 𝑹 but edge (𝒃, 𝒚) does not exist, 𝑹 is not a clique! ¡ Recursive algorithm: § 𝑹 … current clique § 𝑺 … candidate vertices to expand the clique to ¡ Example: Start with 𝒃 and expand around it Q= {a} {a,b} {a,b,c} bktrack {a,b,d} {d} ÇG (c)={} {c} ÇG (d)={} R= {b,c,d} {b,c,d} ÇG (b)={c,d} G (u)…neighbor set of u Steps of the recursive algorithm 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  16. § 𝑹 … current clique § 𝑺 … candidate vertices ¡ Expand(R,Q) § while R ≠ {} § p = vertex in R § Q p = Q È {p} § R p = R Ç G (p) § if R p ≠ {}: Expand(R p, Q p ) else: output Q p § R = R – {p} 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

  17. Start: Expand(V, {}) § 𝑹 … current clique R={a,…f}, Q={} p = {b} Q p = {b} § 𝑺 … candidate vertices R p = {a,c,d} ¡ Expand(R,Q) Expand(R p , Q): R = {a,c,d}, Q={b} p = {a} § while R ≠ {} Q p = {b,a} R p = {d} § p = vertex in R Expand(R p , Q): § Q p = Q È {p} R = {d}, Q={b,a} p = {d} § R p = R Ç G (p) Q p = {b,a,d} R p = {} : output {b,a,d} § if R p ≠ {}: Expand(R p, Q p ) p = {c} Q p = {b,c} else: output Q p R p = {d} § R = R – {p} Expand(R p , Q): R = {d}, Q={b,c} p = {d} Q p = {b,c,d} R p = {} : output {b,c,d} 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

  18. ¡ How to prevent maximal cliques from being generated multiple times? § Only output cliques that are lexicographically minimum § {𝒃, 𝒄, 𝒅} < {𝒄, 𝒃, 𝒅} § Even better: Only expand to the nodes higher in the lexicographical order 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

  19. ¡ How should we think about large scale organization of clusters in networks? § Finding: Community Structure 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

  20. ¡ How should we think about large scale organization of clusters in networks? § Finding: Core-periphery structure Nested Core-Periphery 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

  21. ¡ How do we reconcile these two views? (and still do community detection) vs. Community structure Core-periphery 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

  22. ¡ How community-like is a set of nodes? ¡ A good cluster S has S § Many edges internally § Few edges pointing outside ¡ What’s a good metric: S’ Conductance Î Î Ï | {( i , j ) E ; i S , j S } | f = ( S ) å d s Î s S Small conductance corresponds to good clusters Note: We are assuming |𝑇| < |𝑊|/2 , d s degree of node s 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

  23. [WWW ‘08] (Note |S| < |V|/2) ¡ Define: Network community profile ( NCP ) plot Plot the score of best community of size k k=5 k=7 k=10 log Φ(k) Community size, log k 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

  24. Cluster score, log Φ (k) • Run the favorite clustering method(s) • Each dot represents a cluster • For each size 𝑙 find “best” cluster (min Φ (k) ) Spectral Graclus Metis Cluster size, log k 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

  25. [WWW ‘08] ¡ Meshes, grids, dense random graphs: California road network d-dimensional meshes 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26

  26. [WWW ‘08] ¡ Collaborations between scientists in networks [Newman, 2005] Conductance, log Φ(k) Community size, log k Dips in the conductance graph correspond to the "good" clusters we can visually detect 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27

  27. [Internet Mathematics ‘09] Natural hypothesis about NCP: ¡ NCP of real networks slopes downward ¡ Slope of the NCP corresponds to the “dimensionality“ of the network What about large networks? 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28

  28. [Internet Mathematics ‘09] Typical example: General Relativity collaborations ( n=4,158, m=13,422 ) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

  29. [Internet Mathematics ‘09] -- Rewired graph -- Real graph 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

  30. Better and better clusters Φ(k), (score) Clusters get worse and worse Best cluster has ~100 nodes k, (cluster size) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

Recommend


More recommend