Jesmin Jahan Tithi ♥ Andrzej Stasiak * Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push Louvain Algorithms Fabrizio Petrini ♥ with Automatic Edge Pruning ♥ Parallel Computing Labs, Intel, * Data Center Group, Intel.
What is community?
What is Community? Sets of vertices that have dense intra-connections, but sparse inter-connections Uncover hidden structures inside a graph in a form of coherent modules of vertices Strongly correlated to functional and structural properties community Protein-Protein Interaction Network World Wide Web Image source: Google Image
What is community detection?
What is Community Detection? Algorithms to identify communities in a network Applications: network analysis to retrieve information or patterns of the network http://senseable.mit.edu Virality Prediction and Community Structure in Social Networks Nodus Labs Against Putin Facebook protest group visualization, December 2011 /community_detection/
How to measure the quality of the detected communities ?
A Measure of Solution Quality Modularity: A measure of interconnectedness of the communities � ����������, � = � ∑ � �� 2� − ∑� ��� 4� � Max Value of Q = 1 �∈� � � �� = � � �,� , ��� ��� �, � ∈ � � � ��� = � � �,� , ��� ��� � ∈ � �� � ∈ � � = ∑ � �,� �(�,�) |Q| ∈ (0, 1] , and the higher the better Community detection algorithm identifies communities in a way that maximizes modularity
How do we maximize modularity?
A Recipe of Modularity Optimization Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� ��� ����������, � = ∑ �� − �� � Max Value of Q = 1 �∈� Large values of � correlate with high ∑ � �� and low ∑� ��� - Communities that are dense within their structure and weakly coupled among each other To get high ∑ � �� , the highest possible number of edges should fall in each community
A Recipe of Modularity Optimization Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� ��� ����������, � = ∑ �� − �� � Max Value of Q = 1 �∈� Large values of � correlate with high ∑ � �� and low ∑� ��� - Communities that are dense within their structure and weakly coupled among each other To decrease ∑� ��� , divide the network into several communities with small total degrees
NP-hardness of Modularity Optimization Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� ��� ����������, � = ∑ �� − �� � Max Value of Q = 1 �∈� Challenge: Finding communities with optimal modularity is “NP-hard”
Louvain Maximizes modularity following a greedy algorithm V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008
Louvain: Algorithm Steps Outer Loop: Traverse the graph in several passes to incrementally build communities
Louvain: Algorithm Steps Outer Loop: Traverse the graph in several passes to incrementally build communities Phase 1: Modularity Optimization/Inner loop V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008
Louvain: Algorithm Steps Outer Loop: Traverse the graph in several passes to incrementally build communities Phase 2: Community Aggregation and Graph Reconstruction V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008
Louvain: Algorithm Steps Outer Loop: Traverse the graph in several passes to incrementally build communities Phase 1: Modularity Optimization/Inner loop - � � � + � Phase 2: Community Aggregation and Graph Reconstruction - � � + �
A key data structure to decide pull or push
Hash map NCW– ⟨ community_id, Some of edge weights ⟩ A hash map with ⟨key = neighboring community, val = sum of edge weights to that community ⟩ 2 3 Vertex 1 is neighbor to 2, 3 (members of community 1) => sum of edges weights=2 c=1 6 1 5 c=4 Vertex 1 is neighbor to 7 (member of community 7) => sum of edge weight = 1 4 c=7 ⟨ � ommunity_id, Some of edge weights ⟩ ��� � =[ ⟨ c=1, � �→� =2 ⟩ , ⟨ c=7, � �→� =1 ⟩ ]
Hash map NCW– ⟨ community_id, Some of edge weights ⟩ A hash map with ⟨key = neighboring community, val = sum of edge weights to that community ⟩ 2 3 Vertex 1 is neighbor to 2, 3 (members of community 1) => sum of edges weights=2 c=1 6 1 5 c=4 Vertex 1 is neighbor to 7 (member of community 7) => sum of edge weight = 1 4 ⟨ � ommunity_id, Some of edge weights ⟩ c=7 ��� � =[ ⟨ c=1, � �→� =2 ⟩ , ⟨ c=7, � �→� =1 ⟩ ]
Louvain Pseudocode Repeat if there is a change in community membership
Louvain Pseudocode Initialize each vertex in its own community Compute initial modularity
Louvain Pseudocode Phase 1/ inner loop starts
Louvain Pseudocode For each vertex, build NCW by pulling community info from neighbors
Louvain Pseudocode Find the best community to move into by iterating though all entries of NCW
Louvain Pseudocode Move to the best community and update community info
Louvain Pseudocode Once done for all vertices, compute new modularity and repeat if modularity increased by a threshold
Louvain Pseudocode merged 3 2 3 1 c=1 6 1 1 5 3 3 1 c=4 4 2 c=7 When modularity stabilizes, create a new graph by merging all vertices in same community into one
We call the standard Louvain Algorithm a Pull-based Louvain Algorithm To build ��� at each iteration, it pulls latest info from neighbors
Unnecessary work in Louvain
Observations
Number of vertex moves drops significantly after the first few iterations of phase1 JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations For a particular outer loop, the number of vertices that change communities drops drastically after the first few inner loop iterations (e.g., 5).
Number of vertex moves drops significantly after the first few iterations of phase1 JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations The number of vertices that change communities in the later inner loop iterations is minimal
Implications JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations Wasteful to scan all neighbors to compute ��� , if no change in neighborhood Wasteful to iterate over all vertices for each iteration of phase 1, vertices do not move
Pruning Unnecessary Work in Louvain Prune vertices that are unlikely to move Prune unnecessary neighborhood exploration
Push-based Louvain Vertex does not pull, rather neighbors actively push any Algorithm changes
Push-based Louvain The Push-based algorithm starts with an initialized ��� , assuming each vertex is in its own community
Push-based Louvain During Phase 1, it never recreates ���
Push-based Louvain If there is a change in community membership
Push-based Louvain Update ��� for the vertex itself, and push updates to all its neighbors
Pros and Cons of Pull and Push
Pull – Cons Does redundant memory read by scanning all vertices and their neighbors to rebuild ��� for each inner loop, even when the vertex’s neighborhood has not changed pokec outer loop 0 outer loop 1 1,600,000 1,400,000 Vertices moved 1,200,000 1,000,000 800,000 Unnecessary neighborhood scan 600,000 400,000 200,000 0 0 10 20 30 40 50 inner loop iterations
Push – Pros Scans through all neighbors of a vertex only when a vertex changes its community to update ��� pokec outer loop 0 outer loop 1 1,600,000 1,400,000 Vertices moved 1,200,000 1,000,000 800,000 Avoids exploring edges unnecessarily 600,000 400,000 200,000 0 0 10 20 30 40 50 inner loop iterations
Recommend
More recommend