the more the merrier efficient multi source graph
play

The More the Merrier: Efficient Multi-Source Graph Traversal Manuel - PowerPoint PPT Presentation

The More the Merrier: Efficient Multi-Source Graph Traversal Manuel Then * , Moritz Kaufmann * , Fernando Chirigati , Tuan-Anh Hoang-Vu , Kien Pham , Huy T. Vo , Alfons Kemper * , Thomas Neumann * * Technische Universit t M


  1. The More the Merrier: Efficient Multi-Source Graph Traversal Manuel Then * , Moritz Kaufmann * , Fernando Chirigati † , Tuan-Anh Hoang-Vu † , Kien Pham † , Huy T. Vo † , Alfons Kemper * , Thomas Neumann * * Technische Universit ä t M ü nchen, † New York University

  2. Outline • Motivation • Challenges • Goals • Multi-Source BFS • Evaluation • Summary 2015-09-01 2 The More the Merrier: Efficient Multi-Source Graph Traversal The More the Merrier: Efficient Multi-Source Graph Traversal

  3. Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... 2015-09-01 3 The More the Merrier: Efficient Multi-Source Graph Traversal

  4. Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... • Often multiple BFS traversals necessary to compute results - Closeness centrality, Shortest paths, ... 2015-09-01 4 The More the Merrier: Efficient Multi-Source Graph Traversal

  5. Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... • Often multiple BFS traversals necessary to compute results - Closeness centrality, Shortest paths, ... • Real-world graphs often are small-world networks - Social networks, Web graphs, Communication networks 2015-09-01 5 The More the Merrier: Efficient Multi-Source Graph Traversal

  6. Motivation • Graph traversal vital part of graph analytics - BFS, DFS, Neighbor traversals, Random walks, ... • Often multiple BFS traversals necessary to compute results - Closeness centrality, Shortest paths, ... • Real-world graphs often are small-world networks - Social networks, Web graphs, Communication networks • Subject of this talk: efficiently run multiple BFSs on real-world graphs 2015-09-01 6 The More the Merrier: Efficient Multi-Source Graph Traversal

  7. Challenges • Random data access intrinsic to graph traversal algorithms - bad cache behavior, frequent CPU stalls 2015-09-01 7 The More the Merrier: Efficient Multi-Source Graph Traversal

  8. Challenges • Random data access intrinsic to graph traversal algorithms - bad cache behavior, frequent CPU stalls • Single bit accesses waste memory bandwidth - e.g. for BFS seen bitmaps 2015-09-01 8 The More the Merrier: Efficient Multi-Source Graph Traversal

  9. Challenges • Random data access intrinsic to graph traversal algorithms - bad cache behavior, frequent CPU stalls • Single bit accesses waste memory bandwidth - e.g. for BFS seen bitmaps • Independent BFS runs redundantly visit vertices multiple times 2015-09-01 9 The More the Merrier: Efficient Multi-Source Graph Traversal

  10. Challenge - Redundant visits Example: BFSs in a simple graph Initial BFS 1 2015-09-01 10 The More the Merrier: Efficient Multi-Source Graph Traversal

  11. Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 BFS 1 2015-09-01 11 The More the Merrier: Efficient Multi-Source Graph Traversal

  12. Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 2015-09-01 12 The More the Merrier: Efficient Multi-Source Graph Traversal

  13. Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 BFS 2 2015-09-01 13 The More the Merrier: Efficient Multi-Source Graph Traversal

  14. Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 BFS 2 2015-09-01 14 The More the Merrier: Efficient Multi-Source Graph Traversal

  15. Challenge - Redundant visits Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS 1 BFS 2 2015-09-01 15 The More the Merrier: Efficient Multi-Source Graph Traversal

  16. Challenge - Redundant visits (cont.) Redundant vertex visits for 512 BFSs on LDBC 1M social network graph • After a few iterations, many redundant visits in small-world networks 2015-09-01 16 The More the Merrier: Efficient Multi-Source Graph Traversal

  17. Goals • Leverage knowledge that multiple BFS traversal are run 2015-09-01 17 The More the Merrier: Efficient Multi-Source Graph Traversal

  18. Goals • Leverage knowledge that multiple BFS traversal are run • Optimize data access patterns - embrace memory accesses instead of trying to hide them - CPUs always fetch full cache lines - use all of them 2015-09-01 18 The More the Merrier: Efficient Multi-Source Graph Traversal

  19. Goals • Leverage knowledge that multiple BFS traversal are run • Optimize data access patterns - embrace memory accesses instead of trying to hide them - CPUs always fetch full cache lines - use all of them • Avoid redundant computation and vertex visits - touch vertex information as rarely as possible 2015-09-01 19 The More the Merrier: Efficient Multi-Source Graph Traversal

  20. Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core 2015-09-01 20 The More the Merrier: Efficient Multi-Source Graph Traversal

  21. Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core X X + visit seen next 2015-09-01 21 The More the Merrier: Efficient Multi-Source Graph Traversal

  22. Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core • Store concurrent BFSs state as 3 bitsets per vertex visit seen next • Represent BFS traversal as SIMD bit operations on these bitsets 2015-09-01 22 The More the Merrier: Efficient Multi-Source Graph Traversal

  23. Multi-Source BFS • Concurrently run many independent BFS traversals on the same graph - 100s of BFSs on a single CPU core • Store concurrent BFSs state as 3 bitsets per vertex visit seen next • Represent BFS traversal as SIMD bit operations on these bitsets • Fully utilize cache line-sized memory accesses of modern CPUs • Efficiently share traversals whenever possible - neighbors traversed only once for all concurrent BFSs 2015-09-01 23 The More the Merrier: Efficient Multi-Source Graph Traversal

  24. Multi-Source BFS - Example Initial 2015-09-01 24 The More the Merrier: Efficient Multi-Source Graph Traversal

  25. Multi-Source BFS - Example Initial Iteration 1 2015-09-01 25 The More the Merrier: Efficient Multi-Source Graph Traversal

  26. Multi-Source BFS - Example Initial Iteration 1 2015-09-01 26 The More the Merrier: Efficient Multi-Source Graph Traversal

  27. Multi-Source BFS - Example Initial Iteration 1 2015-09-01 27 The More the Merrier: Efficient Multi-Source Graph Traversal

  28. Multi-Source BFS - Example Initial Iteration 1 Iteration 2 2015-09-01 28 The More the Merrier: Efficient Multi-Source Graph Traversal

  29. Multi-Source BFS - Example Initial Iteration 1 Iteration 2 2015-09-01 29 The More the Merrier: Efficient Multi-Source Graph Traversal

  30. Multi-Source BFS - Example Initial Iteration 1 Iteration 2 2015-09-01 30 The More the Merrier: Efficient Multi-Source Graph Traversal

  31. Multi-Source BFS - Further Improvements • Aggregated neighbor processing - reduce number of random writes • Batching heuristics for maximum sharing • Direction-optimizing • Prefetching ... see paper 2015-09-01 31 The More the Merrier: Efficient Multi-Source Graph Traversal

  32. Evaluation - The More the Merrier 2015-09-01 32 The More the Merrier: Efficient Multi-Source Graph Traversal

  33. Evaluation • MS-BFS-based closeness centrality. 4x Intel Xeon E7-4870v2, 1TB 2015-09-01 33 The More the Merrier: Efficient Multi-Source Graph Traversal

  34. Evaluation • MS-BFS-based closeness centrality. 4x Intel Xeon E7-4870v2, 1TB 2015-09-01 34 The More the Merrier: Efficient Multi-Source Graph Traversal

  35. Summary • Making graph traversals aware of each other can lead to substantial performance increase • Multi-Source BFS (MS-BFS) runs multiple independent BFSs ... - ... on the same graph ... - ... concurrently on a single CPU ... - ... and shares their traversals. • MS-BFS shows 10-100x speedup over existing single-source BFSs 2015-09-01 35 The More the Merrier: Efficient Multi-Source Graph Traversal

  36. Backup 1 2015-09-01 36 The More the Merrier: Efficient Multi-Source Graph Traversal

  37. Backup 2 2015-09-01 37 The More the Merrier: Efficient Multi-Source Graph Traversal

Recommend


More recommend