personalized pagerank based community detection
play

Personalized PageRank based Community Detection Code - PowerPoint PPT Presentation

Personalized PageRank based Community Detection Code bit.ly/dgleich-codes Joint work with C. Seshadhri, David F. Gleich Joyce Jiyoung Whang, and Inderjit S. Dhillon, supported by Purdue University NSF CAREER 1149756-CCF Todays


  1. Personalized � PageRank based Community Detection Code bit.ly/dgleich-codes � Joint work with C. Seshadhri, David F. Gleich � Joyce Jiyoung Whang, and Inderjit S. Dhillon, supported by Purdue University � NSF CAREER 1149756-CCF

  2. Today’s talk 1. Personalized PageRank � based community detection 2. Conductance, Egonets, and � Network Community Profiles 3. Egonet seeding 4. Improved seeding 2 David Gleich · Purdue MLG2013

  3. A community is a set of vertices that is denser inside than out. 3 David Gleich · Purdue MLG2013

  4. 4 250 node GEOP network in 2 dimensions

  5. 5 250 node GEOP network in 2 dimensions

  6. We can find communities using Personalized PageRank (PPR) [Andersen et al. 2006] PPR is a Markov chain on nodes 1. with probability 𝛽 , � , � follow a random edge 2. with probability 1- 𝛽 , � , � restart at a seed aka random surfer aka random walk with restart unique stationary distribution 6 David Gleich · Purdue MLG2013

  7. Personalized PageRank community detection 1. Given a seed, approximate the stationary distribution. 2. Extract the community. Both are local operations. 7 David Gleich · Purdue MLG2013

  8. Demo! 8 David Gleich · Purdue MLG2013

  9. Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: (edges leaving the set) cut( S ) cut( S ) = 7 φ ( S ) = (total edges vol( S ), vol( ¯ � � min S ) vol( S ) = 33 in the set) vol( ¯ S ) = 11 Equivalently, it’s the probability that a random edge leaves the set. φ ( S ) = 7 / 11 Small conductance ó Good community 9 David Gleich · Purdue MLG2013

  10. Andersen- Informally Chung-Lang � Suppose the seeds are in a set personalized of good conductance, then the PageRank personalized PageRank method will find a set with conductance community that’s nearly as good. theorem � … also, it’s really fast. [Andersen et al. 2006] � 10 10 David Gleich · Purdue MLG2013

  11. � # G is graph as dictionary-of-sets � alpha=0.99 � tol=1e-4 � x = {} # Store x, r as dictionaries � r = {} # initialize residual � Q = collections.deque() # initialize queue � for s in seed: � r(s) = 1/len(seed) � Q.append(s) � while len(Q) > 0: � v = Q.popleft() # v has r[v] > tol*deg(v) � if v not in x: x[v] = 0. � x[v] += (1-alpha)*r[v] � mass = alpha*r[v]/(2*len(G[v])) � for u in G[v]: # for neighbors of u � if u not in r: r[u] = 0. � if r[u] < len(G[u])*tol and \ � r[u] + mass >= len(G[u])*tol: � Q.append(u) # add u to queue if large � r[u] = r[u] + mass � r[v] = mass*len(G[v]) � 11 11 David Gleich · Purdue MLG2013

  12. Demo 2! 12 12 David Gleich · Purdue MLG2013

  13. Problem 1, which seeds? 13 13 David Gleich · Purdue MLG2013

  14. Problem 2, not fast enough. 14 14 David Gleich · Purdue MLG2013

  15. Gleich-Seshadhri, KDD 2012 � Neighborhoods are good communities 15 15 David Gleich · Purdue MLG2013

  16. Gleich-Seshadhri, KDD 2012 � Egonets and Conductance Vertex conductance Neighborhoods are good communities Egonets? ^ ^ … in graphs that look like social and information networks 16 16 David Gleich · Purdue MLG2013

  17. Vertex neighborhoods or � Egonets The induced subgraph of � set a vertex its neighbors Prior research on egonets of social networks from the “structural holes” perspective [Burt95,Kleinberg08] . Used for anomaly detection [Akoglu10] , � community seeds [Huang11,Schaeffer11] , � overlapping communities [Schaeffer07,Rees10] . 17 17 David Gleich · Purdue MLG2013

  18. Simple version of theorem If global clustering coefficient = 1, then � the graph is a disjoint union of cliques. Vertex neighborhoods are optimal communities! 18 18 David Gleich · Purdue MLG2013

  19. Theorem Condition Let graph G have clustering coefficient 𝜆 and � α 1 n / d γ log probability have vertex degrees bounded � by a power-law function with α 2 n / d γ exponent 𝛿 less than 3. log degree Theorem Then there exists a vertex neighborhood with conductance ≤ 4(1 − κ ) / (3 − 2 κ ) 19 19 David Gleich · Purdue MLG2013

  20. Confession � The theory is weak This bound is useless φ ( S ) ≤ 4(1 − κ ) / (3 − 2 κ ) unless 𝜆 ≥ 1/2 ¯ Graph Verts Edges C κ ca-AstroPh 17903 196972 0.318 0.633 Collaboration email-Enron 33696 180811 0.085 0.509 cond-mat-2005 36458 171735 0.243 0.657 networks � arxiv 86376 517563 0.560 0.678 𝜆 ~ [0.1 – 0.5] dblp 226413 716460 0.383 0.635 hollywood-2009 1069126 56306653 0.310 0.766 fb-Penn94 41536 1362220 0.098 0.212 Social networks � fb-A-oneyear 1138557 4404989 0.038 0.060 𝜆 ~ [0.05 – 0.1] fb-A 3097165 23667394 0.048 0.097 soc-LiveJournal1 4843953 42845684 0.118 0.274 oregon2-010526 11461 32730 0.037 0.352 Tech. networks � p2p-Gnutella25 22663 54693 0.005 0.005 as-22july06 22963 48436 0.011 0.230 𝜆 ~ [0.005 – 0.05] itdk0304 190914 607610 0.061 0.158 20 20 David Gleich · Purdue MLG2013

  21. We view this theory as � “intuition for the truth” 21 21 David Gleich · Purdue MLG2013

  22. Empirical Evaluation using Network Community Profiles fb-A-oneyear 0 0 10 10 Approximate canonical shape Minimum − 1 − 1 10 10 found by Leskovec, Lang, conductance for Dasgupta, and − 2 − 2 10 10 any community of Mahoney the given size − 3 − 3 Holds for a variety 10 10 of approximations to conductance. max max − 4 − 4 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size 22 22 David Gleich · Purdue MLG2013

  23. Empirical Evaluation using Network Community Profiles Facebook data from Wilson et al. 2009 fb-A-oneyear 1.1M verts, 4M edges 0 0 10 10 “Egonet community profile” shows Minimum − 1 − 1 10 10 the same conductance for shape, 3 secs to compute. − 2 − 2 10 10 any community The Fiedler neighborhood of − 3 − 3 community 10 10 the given size computed from the normalized max max − 4 − 4 10 10 Laplacian is a deg deg neighborhood! 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � (Degree + 1) 23 23 David Gleich · Purdue MLG2013

  24. Not just one graph arXiv – 86k verts, 500k edges soc-LiveJournal – 5M verts, 42M edges 0 0 0 0 10 10 10 10 − 1 − 1 − 1 − 1 10 10 10 10 − 2 − 2 − 2 − 2 10 10 10 10 − 3 − 3 − 3 − 3 10 10 10 10 max max max max − 4 − 4 − 4 − 4 ver t s ver t s 10 10 10 10 deg deg deg deg 2 2 0 0 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 15 more graphs available www.cs.purdue.edu/~dgleich/codes/neighborhoods 24 24 David Gleich · Purdue MLG2013

  25. Filling in the � Network Community Profile fb-A-oneyear Facebook Sample - 1.1M verts, 4M edges 0 0 10 10 We are missing a region of the Minimum − 1 − 1 10 10 NCP when we just look at conductance for neighborhoods − 2 − 2 10 10 any community neighborhood of − 3 − 3 10 10 the given size max max − 4 − 4 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � (Degree + 1) 25 25 David Gleich · Purdue MLG2013

  26. Filling in the � Network Community Profile Facebook Sample - 1.1M verts, 4M edges fb-A-oneyear 0 0 10 10 This region fills when − 1 − 1 10 10 Minimum using the PPR method conductance for − 2 − 2 (like now!) 10 10 any community of the given size − 3 − 3 10 10 max max 7807 seconds − 4 − 4 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � 26 26 David Gleich · Purdue MLG2013

  27. Am I a good seed? � Locally Minimal Communities “My conductance is the best locally.” φ ( N ( v )) ≤ φ ( N ( w )) for all w adjacent to v In Zachary’s Karate Club network, there are four locally minimal communities, the two leaders and two peripheral nodes. 27 27 David Gleich · Purdue MLG2013

  28. Locally minimal communities capture extremal neighborhoods Facebook Sample - 1.1M verts, 4M edges fb-A-oneyear 0 0 10 10 Red dots are The red conductance � circles – the − 1 − 1 10 10 and size of a � best local mins – find locally minimal − 2 − 2 the extremes 10 10 in the egonet community profile. − 3 − 3 10 10 Usually about 1% max max − 4 − 4 of # of vertices. 10 10 deg deg 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � 28 28 David Gleich · Purdue MLG2013

  29. Filling in the NCP � Growing locally minimal comm. Original fb-A-oneyear Egonet 0 0 10 10 PPR growing Full NCP only locally min − 1 − 1 10 10 Locally min communities, NCP seeded from − 2 − 2 10 10 entire egonet − 3 − 3 10 10 3 seconds max max − 4 − 4 283 seconds 10 10 deg deg 7807 seconds 0 0 1 1 2 2 3 3 4 4 5 5 10 10 10 10 10 10 10 10 10 10 10 10 Community Size � 29 29 David Gleich · Purdue MLG2013

  30. But there’s a small problem. Most people want to cover a network with communities! We just looked at the best. 30 30 David Gleich · Purdue MLG2013

Recommend


More recommend