prune the unnecessary
play

Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push - PowerPoint PPT Presentation

Jesmin Jahan Tithi Andrzej Stasiak * Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push Louvain Algorithms Fabrizio Petrini with Automatic Edge Pruning Parallel Computing Labs, Intel, * Data Center Group, Intel. What


  1. Jesmin Jahan Tithi ♥ Andrzej Stasiak * Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push Louvain Algorithms Fabrizio Petrini ♥ with Automatic Edge Pruning ♥ Parallel Computing Labs, Intel, * Data Center Group, Intel.

  2. What is community?

  3. What is Community?  Sets of vertices that have dense intra-connections, but sparse inter-connections  Uncover hidden structures inside a graph in a form of coherent modules of vertices  Strongly correlated to functional and structural properties community Protein-Protein Interaction Network World Wide Web Image source: Google Image

  4. What is community detection?

  5. What is Community Detection?  Algorithms to identify communities in a network  Applications: network analysis to retrieve information or patterns of the network http://senseable.mit.edu Virality Prediction and Community Structure in Social Networks Nodus Labs Against Putin Facebook protest group visualization, December 2011 /community_detection/

  6. How to measure the quality of the detected communities ?

  7. A Measure of Solution Quality  Modularity: A measure of interconnectedness of the communities � ����������, � = � ∑ � �� 2� − ∑� ��� 4� � Max Value of Q = 1 �∈� � � �� = � � �,� , ��� ��� �, � ∈ � � � ��� = � � �,� , ��� ��� � ∈ � �� � ∈ � � = ∑ � �,� �(�,�)  |Q| ∈ (0, 1] , and the higher the better  Community detection algorithm identifies communities in a way that maximizes modularity

  8. How do we maximize modularity?

  9. A Recipe of Modularity Optimization  Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� ��� ����������, � = ∑ �� − �� � Max Value of Q = 1 �∈�  Large values of � correlate with high ∑ � �� and low ∑� ��� - Communities that are dense within their structure and weakly coupled among each other  To get high ∑ � �� , the highest possible number of edges should fall in each community

  10. A Recipe of Modularity Optimization  Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� ��� ����������, � = ∑ �� − �� � Max Value of Q = 1 �∈�  Large values of � correlate with high ∑ � �� and low ∑� ��� - Communities that are dense within their structure and weakly coupled among each other  To decrease ∑� ��� , divide the network into several communities with small total degrees

  11. NP-hardness of Modularity Optimization  Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� ��� ����������, � = ∑ �� − �� � Max Value of Q = 1 �∈� Challenge: Finding communities with optimal modularity is “NP-hard”

  12. Louvain Maximizes modularity following a greedy algorithm V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008

  13. Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities

  14. Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities  Phase 1: Modularity Optimization/Inner loop V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008

  15. Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities  Phase 2: Community Aggregation and Graph Reconstruction V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008

  16. Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities  Phase 1: Modularity Optimization/Inner loop - � � � + �  Phase 2: Community Aggregation and Graph Reconstruction - � � + �

  17. A key data structure to decide pull or push

  18. Hash map NCW– ⟨ community_id, Some of edge weights ⟩ A hash map with ⟨key = neighboring community, val = sum of edge weights to that community ⟩ 2 3 Vertex 1 is neighbor to 2, 3 (members of community 1) => sum of edges weights=2 c=1 6 1 5 c=4 Vertex 1 is neighbor to 7 (member of community 7) => sum of edge weight = 1 4 c=7 ⟨ � ommunity_id, Some of edge weights ⟩ ��� � =[ ⟨ c=1, � �→� =2 ⟩ , ⟨ c=7, � �→� =1 ⟩ ]

  19. Hash map NCW– ⟨ community_id, Some of edge weights ⟩ A hash map with ⟨key = neighboring community, val = sum of edge weights to that community ⟩ 2 3 Vertex 1 is neighbor to 2, 3 (members of community 1) => sum of edges weights=2 c=1 6 1 5 c=4 Vertex 1 is neighbor to 7 (member of community 7) => sum of edge weight = 1 4 ⟨ � ommunity_id, Some of edge weights ⟩ c=7 ��� � =[ ⟨ c=1, � �→� =2 ⟩ , ⟨ c=7, � �→� =1 ⟩ ]

  20. Louvain Pseudocode Repeat if there is a change in community membership

  21. Louvain Pseudocode Initialize each vertex in its own community Compute initial modularity

  22. Louvain Pseudocode Phase 1/ inner loop starts

  23. Louvain Pseudocode For each vertex, build NCW by pulling community info from neighbors

  24. Louvain Pseudocode Find the best community to move into by iterating though all entries of NCW

  25. Louvain Pseudocode Move to the best community and update community info

  26. Louvain Pseudocode Once done for all vertices, compute new modularity and repeat if modularity increased by a threshold

  27. Louvain Pseudocode merged 3 2 3 1 c=1 6 1 1 5 3 3 1 c=4 4 2 c=7 When modularity stabilizes, create a new graph by merging all vertices in same community into one

  28. We call the standard Louvain Algorithm a Pull-based Louvain Algorithm To build ��� at each iteration, it pulls latest info from neighbors

  29. Unnecessary work in Louvain

  30. Observations

  31. Number of vertex moves drops significantly after the first few iterations of phase1 JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations  For a particular outer loop, the number of vertices that change communities drops drastically after the first few inner loop iterations (e.g., 5).

  32. Number of vertex moves drops significantly after the first few iterations of phase1 JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations  The number of vertices that change communities in the later inner loop iterations is minimal

  33. Implications JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations  Wasteful to scan all neighbors to compute ��� , if no change in neighborhood  Wasteful to iterate over all vertices for each iteration of phase 1, vertices do not move

  34. Pruning Unnecessary Work in Louvain Prune vertices that are unlikely to move Prune unnecessary neighborhood exploration

  35. Push-based Louvain Vertex does not pull, rather neighbors actively push any Algorithm changes

  36. Push-based Louvain The Push-based algorithm starts with an initialized ��� , assuming each vertex is in its own community

  37. Push-based Louvain During Phase 1, it never recreates ���

  38. Push-based Louvain If there is a change in community membership

  39. Push-based Louvain Update ��� for the vertex itself, and push updates to all its neighbors

  40. Pros and Cons of Pull and Push

  41. Pull – Cons Does redundant memory read by scanning all vertices and their neighbors to rebuild ��� for each inner loop, even when the vertex’s neighborhood has not changed pokec outer loop 0 outer loop 1 1,600,000 1,400,000 Vertices moved 1,200,000 1,000,000 800,000 Unnecessary neighborhood scan 600,000 400,000 200,000 0 0 10 20 30 40 50 inner loop iterations

  42. Push – Pros Scans through all neighbors of a vertex only when a vertex changes its community to update ��� pokec outer loop 0 outer loop 1 1,600,000 1,400,000 Vertices moved 1,200,000 1,000,000 800,000 Avoids exploring edges unnecessarily 600,000 400,000 200,000 0 0 10 20 30 40 50 inner loop iterations

Recommend


More recommend