http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

¡ Non-overlapping vs. overlapping communities 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

[Palla et al., ‘05] ¡ A node can belong to many social “circles” 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

High school Company Stanford (Basketball) Stanford (Squash) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

[Palla et al., ‘05] ¡ Two nodes belong to the same community if they can be connected through adjacent k -cliques: § k -clique: § Fully connected graph on k nodes 3-clique § Adjacent k -cliques: Adjacent Non-adjacent 3-cliques § overlap in k-1 nodes 3-cliques ¡ k -clique community § Set of nodes that can be reached through a sequence of adjacent k -cliques Two overlapping 3 -clique communities 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

[Palla et al., ‘05] ¡ Two nodes belong to the same community if they can be connected through adjacent k - cliques: Adjacent 4-cliques 4-clique Communities for k=4 Non-adjacent 4-cliques 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

Set: k=3 ¡ Clique Percolation Method: A § Find maximal-cliques § Def: Clique is maximal if B D no superset is a clique C § Clique overlap super-graph: § Each clique is a super-node Cliques Communities § Connect two cliques if they A overlap in at least k-1 nodes § Communities: B § Connected components of D the clique overlap matrix C ¡ How to set k ? § Set k so that we get the “richest” (most widely distributed cluster sizes) community structure 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

Overlap ¡ Start with graph size Cliques ¡ Find maximal cliques ¡ Create clique overlap Cliques matrix 𝐵 § Rows/Cols are max- cliques, entry is number (1) Graph (2) Clique overlap of nodes in common matrix ¡ Threshold the matrix at value k-1 § If 𝑏 #$ < 𝑙 − 1 set 0 ¡ Communities are the connected components (3) Thresholded of the thresholded matrix at 3 matrix (4) Communities (connected components) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

[Palla et al., ‘07] Communities in a “tiny” part of a phone call network of 4 million users [Palla et al., ‘07] 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

[Farkas et. al. 07] 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

¡ No nice way, hard combinatorial problem ¡ Maximal clique: Clique that can’t be extended § {𝑏, 𝑐, 𝑑} is a clique but not maximal clique § {𝑏, 𝑐, 𝑑, 𝑒} is maximal clique ¡ Algorithm: Sketch § Start with a seed node § Expand the clique around the seed § Once the clique cannot be further expanded we found the maximal clique § Note: § This method will generate the same clique multiple times 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

¡ Start with a seed vertex 𝒃 ¡ Goal: Find the max clique 𝑹 that 𝒃 belongs to § Observation: § If some 𝒚 belongs to 𝑹 then it is a neighbor of 𝒃 § Why? If 𝒃, 𝒚 ∈ 𝑹 but edge (𝒃, 𝒚) does not exist, 𝑹 is not a clique! ¡ Recursive algorithm: § 𝑹 … current clique § 𝑺 … candidate vertices to expand the clique to ¡ Example: Start with 𝒃 and expand around it Q= {a} {a,b} {a,b,c} bktrack {a,b,d} {c,d} ÇG (c)={} {c} ÇG (d)={} R= {b,c,d} {b,c,d} ÇG (b)={c,d} G (u)…neighbor set of u Steps of the recursive algorithm 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

¡ Start with a seed vertex 𝒃 ¡ Goal: Find the max clique 𝑹 that 𝒃 belongs to § Observation: § If some 𝒚 belongs to 𝑹 then it is a neighbor of 𝒃 § Why? If 𝒃, 𝒚 ∈ 𝑹 but edge (𝒃, 𝒚) does not exist, 𝑹 is not a clique! ¡ Recursive algorithm: § 𝑹 … current clique § 𝑺 … candidate vertices to expand the clique to ¡ Example: Start with 𝒃 and expand around it Q= {a} {a,b} {a,b,c} bktrack {a,b,d} {d} ÇG (c)={} {c} ÇG (d)={} R= {b,c,d} {b,c,d} ÇG (b)={c,d} G (u)…neighbor set of u Steps of the recursive algorithm 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

§ 𝑹 … current clique § 𝑺 … candidate vertices ¡ Expand(R,Q) § while R ≠ {} § p = vertex in R § Q p = Q È {p} § R p = R Ç G (p) § if R p ≠ {}: Expand(R p, Q p ) else: output Q p § R = R – {p} 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

Start: Expand(V, {}) § 𝑹 … current clique R={a,…f}, Q={} p = {b} Q p = {b} § 𝑺 … candidate vertices R p = {a,c,d} ¡ Expand(R,Q) Expand(R p , Q): R = {a,c,d}, Q={b} p = {a} § while R ≠ {} Q p = {b,a} R p = {d} § p = vertex in R Expand(R p , Q): § Q p = Q È {p} R = {d}, Q={b,a} p = {d} § R p = R Ç G (p) Q p = {b,a,d} R p = {} : output {b,a,d} § if R p ≠ {}: Expand(R p, Q p ) p = {c} Q p = {b,c} else: output Q p R p = {d} § R = R – {p} Expand(R p , Q): R = {d}, Q={b,c} p = {d} Q p = {b,c,d} R p = {} : output {b,c,d} 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

¡ How to prevent maximal cliques from being generated multiple times? § Only output cliques that are lexicographically minimum § {𝒃, 𝒄, 𝒅} < {𝒄, 𝒃, 𝒅} § Even better: Only expand to the nodes higher in the lexicographical order 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18

¡ How should we think about large scale organization of clusters in networks? § Finding: Community Structure 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

¡ How should we think about large scale organization of clusters in networks? § Finding: Core-periphery structure Nested Core-Periphery 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

¡ How do we reconcile these two views? (and still do community detection) vs. Community structure Core-periphery 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22

¡ How community-like is a set of nodes? ¡ A good cluster S has S § Many edges internally § Few edges pointing outside ¡ What’s a good metric: S’ Conductance Î Î Ï | {( i , j ) E ; i S , j S } | f = ( S ) å d s Î s S Small conductance corresponds to good clusters Note: We are assuming |𝑇| < |𝑊|/2 , d s degree of node s 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23

[WWW ‘08] (Note |S| < |V|/2) ¡ Define: Network community profile ( NCP ) plot Plot the score of best community of size k k=5 k=7 k=10 log Φ(k) Community size, log k 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

Cluster score, log Φ (k) • Run the favorite clustering method(s) • Each dot represents a cluster • For each size 𝑙 find “best” cluster (min Φ (k) ) Spectral Graclus Metis Cluster size, log k 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25

[WWW ‘08] ¡ Meshes, grids, dense random graphs: California road network d-dimensional meshes 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26

[WWW ‘08] ¡ Collaborations between scientists in networks [Newman, 2005] Conductance, log Φ(k) Community size, log k Dips in the conductance graph correspond to the "good" clusters we can visually detect 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27

[Internet Mathematics ‘09] Natural hypothesis about NCP: ¡ NCP of real networks slopes downward ¡ Slope of the NCP corresponds to the “dimensionality“ of the network What about large networks? 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28

[Internet Mathematics ‘09] Typical example: General Relativity collaborations ( n=4,158, m=13,422 ) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29

[Internet Mathematics ‘09] -- Rewired graph -- Real graph 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

Better and better clusters Φ(k), (score) Clusters get worse and worse Best cluster has ~100 nodes k, (cluster size) 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Non-overlapping

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

Graphs Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola

review and recommendations for the Under 5s Professor John J Reilly (WG chair) Kathryn Hesketh,

A Comprehensive Clinical Research Database based on CDISC ODM and i2b2 F. Meineke, S. Stubert.

Some Perspectives of Graphical Methods for Genetic Data Zhao JH, Q Tan, S Li, J Luan, W Qian, RJF

CS 10: Problem solving via Object Oriented Programming Winter

Chapter 28 Graphs and Applications CS2: Data Structures and Algorithms Colorado State University

CS 225 Data Structures No Novem ember er 15 Gr Graph aph Trav aversal als G G Carl

CS 225 Data Structures April 16 Graph Traversal Wad ade Fag agen-Ulm lmschneid ider