The More the Merrier: Efficient Multi-Source Graph Traversal Manuel Then * , Moritz Kaufmann * , Fernando Chirigati † , Tuan-Anh Hoang-Vu † , Kien Pham † , Huy T. Vo † , Alfons Kemper * , Thomas Neumann * * Technische Universit ä t M ü nchen, † New York University
Outline • Motivation • Challenges • Goals • Multi-Source BFS • Evaluation • Summary 2015-09-01 2 The More the Merrier: Efficient Multi-Source Graph Traversal The More the Merrier: Efficient Multi-Source Graph Traversal
Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... 2015-09-01 3 The More the Merrier: Efficient Multi-Source Graph Traversal
Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... • Often multiple BFS traversals necessary to compute results - Closeness centrality, Shortest paths, ... 2015-09-01 4 The More the Merrier: Efficient Multi-Source Graph Traversal
Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... • Often multiple BFS traversals necessary to compute results - Closeness centrality, Shortest paths, ... • Real-world graphs often are small-world networks - Social networks, Web graphs, Communication networks 2015-09-01 5 The More the Merrier: Efficient Multi-Source Graph Traversal
Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... • Often multiple BFS traversals necessary to compute results - Closeness centrality, Shortest paths, ... • Real-world graphs often are small-world networks - Social networks, Web graphs, Communication networks • Subject of this talk: efficiently run multiple BFSs on real-world graphs 2015-09-01 6 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenges • Random data access intrinsic to graph traversal algorithms - bad cache behavior, frequent CPU stalls 2015-09-01 7 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenges • Random data access intrinsic to graph traversal algorithms - bad cache behavior, frequent CPU stalls • Single bit accesses waste memory bandwidth - e.g. for BFS seen bitmaps 2015-09-01 8 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenges • Random data access intrinsic to graph traversal algorithms - bad cache behavior, frequent CPU stalls • Single bit accesses waste memory bandwidth - e.g. for BFS seen bitmaps • Independent BFS runs redundantly visit vertices multiple times 2015-09-01 9 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits Example: BFSs in a simple graph Initial BFS 1 2015-09-01 10 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 BFS 1 2015-09-01 11 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 2015-09-01 12 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 BFS 2 2015-09-01 13 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 BFS 2 2015-09-01 14 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 BFS 2 2015-09-01 15 The More the Merrier: Efficient Multi-Source Graph Traversal
Challenge - Redundant visits (cont.) Redundant vertex visits for 512 BFSs on LDBC 1M social network graph • After a few iterations, many redundant visits in small-world networks 2015-09-01 16 The More the Merrier: Efficient Multi-Source Graph Traversal
Goals • Leverage knowledge that multiple BFS traversal are run 2015-09-01 17 The More the Merrier: Efficient Multi-Source Graph Traversal
Goals • Leverage knowledge that multiple BFS traversal are run • Optimize data access patterns - embrace memory accesses instead of trying to hide them - CPUs always fetch full cache lines - use all of them 2015-09-01 18 The More the Merrier: Efficient Multi-Source Graph Traversal
Goals • Leverage knowledge that multiple BFS traversal are run • Optimize data access patterns - embrace memory accesses instead of trying to hide them - CPUs always fetch full cache lines - use all of them • Avoid redundant computation and vertex visits - touch vertex information as rarely as possible 2015-09-01 19 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core 2015-09-01 20 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core X X + visit seen next 2015-09-01 21 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core • Store concurrent BFSs state as 3 bitsets per vertex visit seen next • Represent BFS traversal as SIMD bit operations on these bitsets 2015-09-01 22 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core • Store concurrent BFSs state as 3 bitsets per vertex visit seen next • Represent BFS traversal as SIMD bit operations on these bitsets • Fully utilize cache line-sized memory accesses of modern CPUs • Efficiently share traversals whenever possible - neighbors traversed only once for all concurrent BFSs 2015-09-01 23 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial 2015-09-01 24 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial Iteration 1 2015-09-01 25 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial Iteration 1 2015-09-01 26 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial Iteration 1 2015-09-01 27 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial Iteration 1 Iteration 2 2015-09-01 28 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial Iteration 1 Iteration 2 2015-09-01 29 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Example Initial Iteration 1 Iteration 2 2015-09-01 30 The More the Merrier: Efficient Multi-Source Graph Traversal
Multi-Source BFS - Further Improvements • Aggregated neighbor processing - reduce number of random writes • Batching heuristics for maximum sharing • Direction-optimizing • Prefetching ... see paper 2015-09-01 31 The More the Merrier: Efficient Multi-Source Graph Traversal
Evaluation - The More the Merrier 2015-09-01 32 The More the Merrier: Efficient Multi-Source Graph Traversal
Evaluation • MS-BFS-based closeness centrality. 4x Intel Xeon E7-4870v2, 1TB 2015-09-01 33 The More the Merrier: Efficient Multi-Source Graph Traversal
Evaluation • MS-BFS-based closeness centrality. 4x Intel Xeon E7-4870v2, 1TB 2015-09-01 34 The More the Merrier: Efficient Multi-Source Graph Traversal
Summary • Making graph traversals aware of each other can lead to substantial performance increase • Multi-Source BFS (MS-BFS) runs multiple independent BFSs ... - ... on the same graph ... - ... concurrently on a single CPU ... - ... and shares their traversals. • MS-BFS shows 10-100x speedup over existing single-source BFSs 2015-09-01 35 The More the Merrier: Efficient Multi-Source Graph Traversal
Backup 1 2015-09-01 36 The More the Merrier: Efficient Multi-Source Graph Traversal
Backup 2 2015-09-01 37 The More the Merrier: Efficient Multi-Source Graph Traversal
Recommend
More recommend